我的集群上安装了 cloudera hadoop 版本 4。它与 google protobuffer jar 版本 2.4 一起打包。在我的应用程序代码中,我使用使用 protobuffer 2.5 版编译的 protobuffer 类。
这会在运行时导致未解决的编译问题。有没有办法使用外部 jar 运行 map reduce 作业,或者我被卡住直到 cloudera 升级他们的服务?
谢谢。
我的集群上安装了 cloudera hadoop 版本 4。它与 google protobuffer jar 版本 2.4 一起打包。在我的应用程序代码中,我使用使用 protobuffer 2.5 版编译的 protobuffer 类。
这会在运行时导致未解决的编译问题。有没有办法使用外部 jar 运行 map reduce 作业,或者我被卡住直到 cloudera 升级他们的服务?
谢谢。
Yes you can run MR jobs with external jars.
Be sure to add any dependencies to both the HADOOP_CLASSPATH
and -libjars
upon submitting a job like in the following examples:
You can use the following to add all the jar dependencies from current and lib
directories:
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:`echo *.jar`:`echo lib/*.jar | sed 's/ /:/g'`
Bear in mind that when starting a job through hadoop jar
you'll need to also pass it the jars of any dependencies through use of -libjars
. I like to use:
hadoop jar <jar> <class> -libjars `echo ./lib/*.jar | sed 's/ /,/g'` [args...]
NOTE: The sed
commands require a different delimiter character; the HADOOP_CLASSPATH
is :
separated and the -libjars
need to be ,
separated.
EDIT: If you need your classpath to be interpreted first to ensure your jar (and not the pre-packaged jar) is the one that gets used, you can set the following:
export HADOOP_USER_CLASSPATH_FIRST=true