可能有另一种方法可以做到这一点,但我会说是的,您可能需要稍微转换数据以便使用 Drill 进行查询。
这看起来像是您想要使用 KVGEN 的情况。KVGEN 会为您提供 Chris Matta 所描述的那种列,但是 KVGEN 对列进行操作,在这种情况下,实际上并没有要使用的列:
0: jdbc:drill:zk=local> select t.* from dfs.`/Users/vince/data/stackoverflow/users.json` t;
+---+---+
| 3 | 4 |
+---+---+
| {"company":"","d_year":"","email":"mario.giambanco@domain.com","facebook":"","fullname":"Mario Test","google":"","igoto":"","image":"","notifications":{"-Jx6fpaJHvKPHc8CylPd":{"from":"System","image":"/img/system_icon.jpg","msg":"System:","param":"3","posteddate":1440016723546,"type":"system"}},"school":"","school_year":"","tags":{"-JxWuEPs183UEwsI-XNb":{"title":"Anesthesia"},"-JxWuZ-ePcx0XqYRmzc6":{"title":"Bridges"}},"twitter":""} | {"company":"","d_year":"","email":"mariogiambanco@domain.com","fullname":"mario test","igoto":"","image":"img/a0.jpg","notifications":{"-JxAQpWGzY-gOzej7Xis":{"from":"System","image":"/img/system_icon.jpg","msg":"System:","param":"4","posteddate":1440079641420,"type":"system"}},"school":"","school_year":""} |
+---+---+
1 row selected (0.133 seconds)
由于这些列是动态的并且位于 JSON 对象的“顶级”,因此您不能在此处使用 KVGEN。但是如果你只是稍微转换一下数据,你就可以使用 KVGEN。我使用这个最优秀的工具 jq 的调用将数据转换成 KVGEN 可以使用的格式:
$ jq '.| { "user": . }' < users.json > users_kv.json
这将获取输入,并将 JSON 对象包装在另一个映射中,这将为我们提供我们需要执行以下操作的“静态”列:
0: jdbc:drill:zk=local> select kvgen(t.`user`) from dfs.`/Users/vince/data/stackoverflow/users_kv.json` t;
+--------+
| EXPR$0 |
+--------+
| [{"key":"3","value":{"company":"","d_year":"","email":"mario.giambanco@domain.com","facebook":"","fullname":"Mario Test","google":"","igoto":"","image":"","notifications":{"-Jx6fpaJHvKPHc8CylPd":{"from":"System","image":"/img/system_icon.jpg","msg":"System:","param":"3","posteddate":1440016723546,"type":"system"},"-JxAQpWGzY-gOzej7Xis":{}},"school":"","school_year":"","tags":{"-JxWuEPs183UEwsI-XNb":{"title":"Anesthesia"},"-JxWuZ-ePcx0XqYRmzc6":{"title":"Bridges"}},"twitter":""}},{"key":"4","value":{"company":"","d_year":"","email":"mariogiambanco@domain.com","fullname":"mario test","igoto":"","image":"img/a0.jpg","notifications":{"-Jx6fpaJHvKPHc8CylPd":{},"-JxAQpWGzY-gOzej7Xis":{"from":"System","image":"/img/system_icon.jpg","msg":"System:","param":"4","posteddate":1440079641420,"type":"system"}},"school":"","school_year":"","tags":{"-JxWuEPs183UEwsI-XNb":{},"-JxWuZ-ePcx0XqYRmzc6":{}}}}] |
+--------+
1 row selected (1.774 seconds)
由于我在列中有一个列表,因此仍然不能以您想要的方式进行查询。所以使用扁平化:
0: jdbc:drill:zk=local> select flatten(kvgen(t.`user`)) as `user` from dfs.`/Users/vince/data/stackoverflow/users_kv.json` t;
+------+
| user |
+------+
| {"key":"3","value":{"company":"","d_year":"","email":"mario.giambanco@domain.com","facebook":"","fullname":"Mario Test","google":"","igoto":"","image":"","notifications":{"-Jx6fpaJHvKPHc8CylPd":{"from":"System","image":"/img/system_icon.jpg","msg":"System:","param":"3","posteddate":1440016723546,"type":"system"},"-JxAQpWGzY-gOzej7Xis":{}},"school":"","school_year":"","tags":{"-JxWuEPs183UEwsI-XNb":{"title":"Anesthesia"},"-JxWuZ-ePcx0XqYRmzc6":{"title":"Bridges"}},"twitter":""}} |
| {"key":"4","value":{"company":"","d_year":"","email":"mariogiambanco@domain.com","fullname":"mario test","igoto":"","image":"img/a0.jpg","notifications":{"-Jx6fpaJHvKPHc8CylPd":{},"-JxAQpWGzY-gOzej7Xis":{"from":"System","image":"/img/system_icon.jpg","msg":"System:","param":"4","posteddate":1440079641420,"type":"system"}},"school":"","school_year":"","tags":{"-JxWuEPs183UEwsI-XNb":{},"-JxWuZ-ePcx0XqYRmzc6":{}}}} |
+------+
2 rows selected (0.257 seconds)
两排 - 好多了。现在你已经准备好做你想做的事了(注意子查询以及保留字 user 和 value 周围的反引号:
0: jdbc:drill:zk=local> select u.`user`.`key` as userid, u.`user`.`value`.fullname as fullname, u.`user`.`value`.email as email from (select flatten(kvgen(t.`user`)) as `user` from dfs.`/Users/vince/data/stackoverflow/users_kv.json` t) u where u.`user`.`value`.fullname = 'Mario Test';
+---------+-------------+-----------------------------+
| userid | fullname | email |
+---------+-------------+-----------------------------+
| 3 | Mario Test | mario.giambanco@domain.com |
+---------+-------------+-----------------------------+
1 row selected (0.22 seconds)