1

我正在运行一个 hadoop 集群(版本 0.20.205),我必须定期将新代码部署到集群中,这涉及关闭集群并使用新代码重新启动它。我的问题是,由于过于复杂而无法进入此处的原因,我无法确保 jobtracker 在 tasktracker 节点之前出现。我看到 tasktracker 节点尝试连接到尚未启动的 jobtracker,并在将其打印到日志后关闭:

- Can not start task tracker because java.io.IOException: Call to <jobtracker node> failed on local exception: java.io.IOException: Connection reset by peer
at org.apache.hadoop.ipc.Client.wrapException(Client.java:1103)
at org.apache.hadoop.ipc.Client.call(Client.java:1071)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at org.apache.hadoop.mapred.$Proxy5.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:370)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:429)
at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:331)
at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:296)
at org.apache.hadoop.mapred.TaskTracker$3.run(TaskTracker.java:794)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.mapred.TaskTracker.initialize(TaskTracker.java:790)
at org.apache.hadoop.mapred.TaskTracker.<init>(TaskTracker.java:1428)
at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3674)
Caused by: java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
at sun.nio.ch.IOUtil.read(IOUtil.java:171)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245)
at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
at java.io.FilterInputStream.read(FilterInputStream.java:116)
at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:342)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at java.io.DataInputStream.readInt(DataInputStream.java:370)
at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:800)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:745)

- SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down TaskTracker at <tasktracker node>
************************************************************/

我的问题是:有什么方法可以配置 tasktracker 节点以尝试循环重新连接,直到它们成功连接到 jobtracker?

谢谢您的帮助!

4

0 回答 0