我正在使用基于基本数据集以各种方式处理 Thrift 对象条目的 SerDe。它本质上是一个美化的 Hive Struct,它在运行时处理基本数据集,而不是将结果存储在表中。最近,我已将集群从 Hive 0.7.1 升级到 Hive 0.10.0(使用 CDH3 -> CDH4.3.0),SerDe 不再懒惰地处理数据,而是似乎正在处理定义的每个字段。
I've dug through Hive's code, and looked through how our data is being deserialized in order to understand how it determines what fields it wants to process, but unfortunately it seems like it is processing all of the columns simply because our ObjectInspector returns all the fields of our custom object, and I can't seem to figure out how to control what fields are being processed.
What parts of Hive can I manipulate to change what fields are being processed? Is there a way I can detect what fields are being used in a query in order to disable functions in my object's internal state?
Edit: I realized that it'd be useful to include a stack trace to show where a particular function to process the data is being called due to it being an inspected field.
我已将自定义类名称替换为角色的描述性名称。
2013-10-08 17:02:45,198 INFO CustomStructFunction: Stack trace: java.lang.Throwable
at CustomStructFunction.init(CustomStructFunction.java:490)
at CustomStructFunctionBase.process(CustomStructFunctionBase.java:27)
at CustomStructObject.callImplementor(CustomStructObject.java:332)
at CustomStructField.callImplementor(CustomStructField.java:161)
at CustomStructField.getValue(CustomStructField.java:131)
at CustomStructObjectInspector.getStructFieldData(CustomStructObjectInspector.java:46)
at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.convert(ObjectInspectorConverters.java:298)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:630)
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:141)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)