hadoop - Apache Pig 中的连接错误

Question

我正在使用 Hadoop 2.0.5 运行 Apache Pig .11.1。

我在 Pig 中运行的大多数简单作业都运行良好。

但是，每当我尝试在大型数据集或 LIMIT 运算符上使用 GROUP BY 时，都会收到以下连接错误：

2013-07-29 13:24:08,591 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 
013-07-29 11:57:29,421 [main] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

2013-07-29 11:57:30,421 [main] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

2013-07-29 11:57:31,422 [main] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
...
2013-07-29 13:24:18,597 [main] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-07-29 13:24:18,598 [main] ERROR org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:gpadmin (auth:SIMPLE) cause:java.io.IOException

奇怪的是，在这些错误持续出现大约 2 分钟后，它们就会停止，并且正确的输出显示在底部。

所以 Hadoop 运行良好并计算出正确的输出。问题只是这些不断弹出的连接错误。

LIMIT操作员总是会收到此错误。它发生在 MapReduce 模式和本地模式上。该GROUP BY运算符将在小型数据集上正常工作。

我注意到的一件事是，每当出现此错误时，作业都会在作业期间创建并运行多个 JAR 文件。但是，在这些消息弹出几分钟后，最终出现了正确的输出。

关于如何摆脱这些消息的任何建议？

score 33 · Accepted Answer

是的，问题是作业历史服务器没有运行。

要解决这个问题，我们所要做的就是在命令提示符中输入以下命令：

mr-jobhistory-daemon.sh start historyserver

此命令启动作业历史服务器。现在，如果我们输入“jps”，我们可以看到 JobHistoryServer 正在运行，并且我的 Pig 作业不再浪费时间尝试连接到服务器。

score 4 · Accepted Answer

我认为，这个问题与 hadoop mapred-site 配置问题有关。历史服务器在 localhost 中默认运行，因此您需要添加配置的主机。

<property>
 <name>mapreduce.jobhistory.address</name>
 <value>host:port</value>
</property>

然后触发这个命令 -

mr-jobhistory-daemon.sh start historyserver

score 1 · Accepted Answer

我正在使用 Hadoop 2.6.0，所以我必须这样做

$ mr-jobhistory-daemon.sh --config /usr/local/hadoop/etc start historyserver

其中，/usr/local/hadoop/etc 是我的 HADOOP_CONF_DIR。

score 0 · Accepted Answer

我正在使用 Hadoop 2.2.0。这个问题是由于历史服务器没有运行。我不得不启动历史服务器。我使用以下命令启动历史服务器：

[root@localhost ~]$ /usr/lib/hadoop-2.2.0/sbin/mr-jobhistory-daemon.sh 启动历史服务器

hadoop - Apache Pig 中的连接错误

4 回答 4

Related

Reference