0

我有一个包含要加载到 Hive 的 json 记录的文本文件。我的 json 看起来像:

{"vr":1,"tm":1312816191516,"tms":"08-08-2011 15:09:51.516 GMT","as":1002,"pb":1102,"cts":[1204,1205],"ctgs":[1304,1305],"op":1400,"ev":2,"dv":1503,"dvgs":[1605,1606],"cnt":"cnt5","usr":"usr8","atts":[{"id":8002,"val":"ccc"},{"id":8003,"val":"ddd"}],"sel":{"cm":2102,"ty":"PRE","ag":3002,"ad":4002,"fl":5002,"fla":6002,"hg":7002,"mc":"WAP","pr":0.1}}

如您所见,我有一个嵌套的 json,其中包含基元数组和对象数组。

是否可以使用任何内置函数将其按原样加载到 Hive?

约西

4

3 回答 3

1

You should be able to load it into Hive as is. It's possible you may need to escape the "s. I haven't loaded JSON into hive, so not 100% if any escaping needs to be done.

To access the JSON elements once it is in hive; Hive has a built in function for doinh so. get_json_object, which can be seen in details at https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-getjsonobject

于 2011-08-08T19:12:51.937 回答
1

您可以使用自定义 serde 将 json 文件读取到配置单元表。请参阅 github 上的以下 serde - https://github.com/rcongiu/Hive-JSON-Serde

于 2012-06-26T22:34:39.617 回答
1

还要检查砖房- https://github.com/klout/brickhouse。他们有相当不错的用于 json 的 UDF(如 json_split 和 json_map)。使用砖房和 get_json_object / json_tuple(Nija 在这里也提到过),您甚至可以避免使用自定义 SerDe,例如 Hive-JSON-Serde。

于 2014-02-14T00:21:32.100 回答