我正在尝试将一些非常大的 Hive 表从文本格式转换为 ORC 格式,希望它会占用更少的存储空间并且查询会更快。由于我们使用 LZO 作为整个资产的压缩,因此我也尝试将其用于 ORC 格式。
对于你们可以提供的任何帮助,我将不胜感激。
我正在使用 Hadoop 2.4.0 和 Hive 0.13.1
根据下面的链接,似乎应该可以使用 ORC 格式的 LZO 压缩:
http://2013.berlinbuzzwords.de/sessions/orc-file-improving-hive-data-storage https://hive.apache.org/javadocs/r1.1.0/api/ql/org/apache/hadoop/hive /ql/io/orc/package-summary.html
但是当我做类似的事情时
create table sa_orc_lzo
stored as orc tblproperties ("orc.compress"="LZO")
as select * from sa;
我收到以下错误:
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"data".......}
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
**Caused by: java.lang.IllegalArgumentException: LZO is not available.
at org.apache.hadoop.hive.ql.io.orc.WriterImpl.createCodec**(WriterImpl.java:200)
at org.apache.hadoop.hive.ql.io.orc.WriterImpl.<init>(WriterImpl.java:175)
at org.apache.hadoop.hive.ql.io.orc.OrcFile.createWriter(OrcFile.java:369)
at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.write(OrcOutputFormat.java:83)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:649)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540)
... 9 more
**Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.io.orc.LzoCodec**