1

我正在RC尝试ORC使用LZ4. 我已经安装了 Hadoop-2.7.1 和 Hive-1.2.1。在这种情况下LZ4,我可以RC毫无问题地压缩文件。但是,当我尝试使用 加载ORC文件中的数据时LZ4,它不起作用。我创建了ORC如下表:

CREATE TABLE FINANCE_orc(
    PERMNO STRING,
    DATE STRING,
    CUSIP STRING,
    NCUSIP STRING,
    COMNAM STRING,
    TICKET STRING,
    PERMCO STRING,
    SHRCD STRING,
    EXCHCD STRING,
    HEXCD STRING,
    SICCD STRING,
    HSLCCD STRING,
    PRC STRING,
    VOL STRING,
    RET STRING,
    SHROUT STRING,
    DLRET STRING,
    VWRETD STRING,
    EWRETD STRING,
    SPRTRN STRING)
STORED AS ORC tblproperties ("orc.compress"="Lz4");

set mapred.output.compress=true; 
set hive.exec.compress.output=true; 
set mapred.output.compression.type = BLOCK;
set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec; 
set io.compression.codecs=org.apache.hadoop.io.compress.GzipCodec; 

INSERT OVERWRITE table finance_orc select * from finance; 

但是在加载数据时,它给出了以下错误:

Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"permno":"PERMNO","ndate":"DATE","cusip":"CUSIP","ncusip":"NCUSIP","comnam":"COMNAM","ticket":"TICKER","permco":"PERMCO","shrcd":"SHRCD","exchcd":"EXCHCD","hexcd":"HEXCD","siccd":"SICCD","hslccd":"HSICCD","prc":"PRC","vol":"VOL","ret":"RET","shrout":"SHROUT","dlret":"DLRET","vwretd":"VWRETD","ewretd":"EWRETD","sprtrn":"SPRTRN"}
    at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:172)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"permno":"PERMNO","ndate":"DATE","cusip":"CUSIP","ncusip":"NCUSIP","comnam":"COMNAM","ticket":"TICKER","permco":"PERMCO","shrcd":"SHRCD","exchcd":"EXCHCD","hexcd":"HEXCD","siccd":"SICCD","hslccd":"HSICCD","prc":"PRC","vol":"VOL","ret":"RET","shrout":"SHROUT","dlret":"DLRET","vwretd":"VWRETD","ewretd":"EWRETD","sprtrn":"SPRTRN"}
    at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:518)
    at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:163)
    ... 8 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.hive.ql.io.orc.CompressionKind.Lz4
    at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:577)
    at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:675)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
    at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
    at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:97)
    at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:162)
    at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
    ... 9 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.hive.ql.io.orc.CompressionKind.Lz4
    at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:249)
    at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketForFileIdx(FileSinkOperator.java:622)
    at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:566)
    ... 16 more
Caused by: java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.hive.ql.io.orc.CompressionKind.Lz4
    at java.lang.Enum.valueOf(Enum.java:236)
    at org.apache.hadoop.hive.ql.io.orc.CompressionKind.valueOf(CompressionKind.java:25)
    at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat.getOptions(OrcOutputFormat.java:143)
    at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat.getHiveRecordWriter(OrcOutputFormat.java:203)
    at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat.getHiveRecordWriter(OrcOutputFormat.java:52)
    at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordWriter(HiveFileFormatUtils.java:261)
    at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:246)
    ... 18 more
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 4   HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec

我已经使用SnappyZlib使用相同的命令,它工作正常。但问题仅在于LZ4. 不知是什么原因?

4

2 回答 2

2
  1. 除了 ORC 柱状压缩之外,我们可以使用的压缩是 NONE、ZLIB、SNAPPY 之一。
  2. 默认压缩编解码器是 ZLIB。
  3. 不允许使用上述以外的压缩编解码器。
  4. 一般来说,要了解错误,请完整阅读错误日志以在一定程度上找到问题。错误日志说 -

        "org.apache.hadoop.hive.ql.metadata.HiveException:   java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.hive.ql.io.orc.CompressionKind.Lz4"
    
于 2016-02-09T14:07:37.317 回答
1

现在您可以在官方 spark repo 上手动替换 lz4 的 jar 文件。

于 2021-08-20T03:19:26.083 回答