2

我必须将 pig 连接到一个从 Hadoop 0.20.0 略有变化的 hadoop。我选择 pig 0.7.0,并设置 PIG_CLASSPATH

export PIG_CLASSPATH=$HADOOP_HOME/conf

当我运行 pig 时,报错如下:

ERROR org.apache.pig.Main - ERROR 2999: Unexpected internal error. Failed to create DataStorage

因此,我复制 $HADOOP_HOME 中的 hadoop-core.jar 以覆盖 $PIG_HOME/lib 中的 hadoop20.jar,然后是“ant”。现在,我可以运行 pig,但是当我使用dumpor时store,另一个错误:

Pig Stack Trace
---------------
ERROR 2998: Unhandled internal error. org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.setOutputPath(Lorg/apache/hadoop/mapreduce/Job;Lorg/apache/    hadoop/fs/Path;)V

java.lang.NoSuchMethodError: org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.setOutputPath(Lorg/apache/hadoop/mapreduce/Job;Lorg/apache/hadoop/fs/   Path;)V
    at org.apache.pig.builtin.BinStorage.setStoreLocation(BinStorage.java:369)
    ...
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
    at org.apache.pig.Main.main(Main.java:357)
================================================================================

有没有人遇到过这个错误,还是我的编译方式不对?谢谢。

4

3 回答 3

0

There is a section about this issue in the Pig FAQ which should give you a good idea what's wrong. Here is the outline taken from this page:

This usually happens when you are connecting hadoop cluster other than standard Apache hadoop 20.2 release. Pig bundles standard hadoop 20.2 jars in release. If you want to connect to other version of hadoop cluster, you need to replace bundled hadoop 20.2 jars with compatible jars. You can try:

  1. do "ant"
  2. copy hadoop jars from your hadoop installation to overwrite ivy/lib/Pig/hadoop-core-0.20.2.jar and ivy/lib/Pig/hadoop-test-0.20.2.jar
  3. do "ant" again
  4. cp pig.jar to overwrite pig-*-core.jar

Some other tricks is also possible. You can use "bin/pig -secretDebugCmd" to inspect the command line of Pig. Make sure you are using the right version of hadoop.

As pointed in this FAQ section, if nothing works I would advise just upgrading to a recent version of Pig after 0.9.1, Pig 0.7 is a bit old.

于 2013-05-29T03:11:05.633 回答
0

Pig(核心)jar 具有捆绑的Hadoop 依赖项,它可能您要使用的版本不同。如果您有旧的 Pig 版本(<0.9),您可以选择在没有Hadoop 的情况下构建 jar:

cd $PIG_HOME
ant jar-withouthadoop
cp $PIG_HOME/build/pig-x.x.x-dev-withouthadoop.jar $PIG_HOME

Then start Pig:
cd $PIG_HOME/bin
export PIG_CLASSPATH=$HADOOP_HOME/hadoop-core-x.x.x.jar:$HADOOP_HOME/lib/*:$HADOOP_HOME/conf:$PIG_HOME/pig-x.x.x-dev-withouthadoop.jar; ./pig


较新的 Pig 版本包含预构建的无 Hadoop 版本(请参阅票证),因此您可以跳过构建过程。此外,当您运行 pig 时,它将从 PIG_HOME 而不是捆绑版本中获取 withouthadoop jar,因此您也不需要将 withouthadoop.jar 添加到 PIG_CLASSPATH 中(前提是您从中运行 Pig $PIG_HOME/bin

..回到您的问题:
Hadoop 0.20 及其修改后的变体(0.20-append?)甚至可以使用最新的 Pig 发行版(0.11.1):
您只需要执行以下操作:

unpack Pig 0.11.1
cd $PIG_HOME/bin
export PIG_CLASSPATH=$HADOOP_HOME/hadoop-core-x.x.jar:$HADOOP_HOME/lib/*:$HADOOP_HOME/conf; ./pig

如果您仍然得到“ Failed to create DataStorage”,那么值得-secretDebugCmd按照 Charles Menguy 的建议启动 Pig,这样您就可以查看 Pig 是否获得了正确的 Hadoop 版本..等等。

于 2013-05-29T08:18:42.697 回答
0

你还记得逃跑start-all.sh/usr/local/bin?我遇到了同样的问题,我基本上回顾了配置 Hadoop 本身的步骤。我现在可以使用 Pig。

于 2013-08-22T18:59:49.520 回答