我是 Hive 的新手,有一些东西可以解析格式的日志
[Time Stamp] {Complex JSON data}
到目前为止,从我的搜索中可以看出,有可用的 JSON Serde。
我可以扩展那些 JSON Serde 代码以满足我的需要吗?如果是这样,选择哪个 JSON serde 代码会更好?
如果这种方法不好,还有其他指针吗?
谢谢
我是 Hive 的新手,有一些东西可以解析格式的日志
[Time Stamp] {Complex JSON data}
到目前为止,从我的搜索中可以看出,有可用的 JSON Serde。
我可以扩展那些 JSON Serde 代码以满足我的需要吗?如果是这样,选择哪个 JSON serde 代码会更好?
如果这种方法不好,还有其他指针吗?
谢谢
Instead of using any other open source serde,
I found writing a serde myself was much simpler. Apart from the boiler plate code, I just had to write my business logic in deserialize method, that worked like a charm.
This link was very helpful. http://blog.cloudera.com/blog/2012/12/how-to-use-a-serde-in-apache-hive/
Also, I tried with UDTF, that too worked smoothly, found that serde was much faster.
Hope this helps someone