我正在运行一个部署在 yarn 客户端模式下的 spark 流作业,它会经常处理 HDFS,我们的 hadoop 集群版本是 hadoop-2.6.0-cdh5.7.3,并且 jira HDFS-9276 中的补丁文件已被引入此版本,但是几天后(主要是 7 天),我仍然遇到以下错误:
> 18-09-2017 10:05:48 CST crm_user_select ERROR - 17/09/18 10:05:48 WARN security.UserGroupInformation: PriviledgedActionException as:bd_recom@FHC (auth:KERBEROS) cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (token for bd_recom: HDFS_DELEGATION_TOKEN owner=bd_recom@FHC, renewer=yarn, realUser=, issueDate=1505095524480, maxDate=1505700324480, sequenceNumber=2244503, masterKeyId=504) is expired
18-09-2017 10:05:48 CST crm_user_select ERROR - 17/09/18 10:05:48 WARN ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (token for bd_recom: HDFS_DELEGATION_TOKEN owner=bd_recom@FHC, renewer=yarn, realUser=, issueDate=1505095524480, maxDate=1505700324480, sequenceNumber=2244503, masterKeyId=504) is expired
18-09-2017 10:05:48 CST crm_user_select ERROR - 17/09/18 10:05:48 WARN security.UserGroupInformation: PriviledgedActionException as:bd_recom@FHC (auth:KERBEROS) cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (token for bd_recom: HDFS_DELEGATION_TOKEN owner=bd_recom@FHC, renewer=yarn, realUser=, issueDate=1505095524480, maxDate=1505700324480, sequenceNumber=2244503, masterKeyId=504) is expired
18-09-2017 10:05:48 CST crm_user_select ERROR - 17/09/18 10:05:48 WARN hdfs.LeaseRenewer: Failed to renew lease for [DFSClient_NONMAPREDUCE_-2053099090_1] for 30 seconds. Will retry shortly ...
18-09-2017 10:05:48 CST crm_user_select ERROR - org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (token for bd_recom: HDFS_DELEGATION_TOKEN owner=bd_recom@FHC, renewer=yarn, realUser=, issueDate=1505095524480, maxDate=1505700324480, sequenceNumber=2244503, masterKeyId=504) is expired
18-09-2017 10:05:48 CST crm_user_select ERROR - at org.apache.hadoop.ipc.Client.call(Client.java:1471)
18-09-2017 10:05:48 CST crm_user_select ERROR - at org.apache.hadoop.ipc.Client.call(Client.java:1408)
18-09-2017 10:05:48 CST crm_user_select ERROR - at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
18-09-2017 10:05:48 CST crm_user_select ERROR - at com.sun.proxy.$Proxy14.renewLease(Unknown Source)
18-09-2017 10:05:48 CST crm_user_select ERROR - at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.renewLease(ClientNamenodeProtocolTranslatorPB.java:576)
18-09-2017 10:05:48 CST crm_user_select ERROR - at sun.reflect.GeneratedMethodAccessor42.invoke(Unknown Source)
18-09-2017 10:05:48 CST crm_user_select ERROR - at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
18-09-2017 10:05:48 CST crm_user_select ERROR - at java.lang.reflect.Method.invoke(Method.java:606)
18-09-2017 10:05:48 CST crm_user_select ERROR - at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
18-09-2017 10:05:48 CST crm_user_select ERROR - at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
18-09-2017 10:05:48 CST crm_user_select ERROR - at com.sun.proxy.$Proxy15.renewLease(Unknown Source)
18-09-2017 10:05:48 CST crm_user_select ERROR - at org.apache.hadoop.hdfs.DFSClient.renewLease(DFSClient.java:941)
18-09-2017 10:05:48 CST crm_user_select ERROR - at org.apache.hadoop.hdfs.LeaseRenewer.renew(LeaseRenewer.java:423)
18-09-2017 10:05:48 CST crm_user_select ERROR - at org.apache.hadoop.hdfs.LeaseRenewer.run(LeaseRenewer.java:448)
18-09-2017 10:05:48 CST crm_user_select ERROR - at org.apache.hadoop.hdfs.LeaseRenewer.access$700(LeaseRenewer.java:71)
18-09-2017 10:05:48 CST crm_user_select ERROR - at org.apache.hadoop.hdfs.LeaseRenewer$1.run(LeaseRenewer.java:304)
18-09-2017 10:05:48 CST crm_user_select ERROR - at java.lang.Thread.run(Thread.java:745)
and following error info:
> 18-09-2017 10:19:35 CST crm_user_select ERROR - 17/09/18 10:19:35 WARN security.UserGroupInformation: PriviledgedActionException as:bd_recom@FHC (auth:KERBEROS) cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (token for bd_recom: HDFS_DELEGATION_TOKEN owner=bd_recom@FHC, renewer=yarn, realUser=, issueDate=1505095524480, maxDate=1505700324480, sequenceNumber=2244503, masterKeyId=504) can't be found in cache
18-09-2017 10:19:35 CST crm_user_select ERROR - 17/09/18 10:19:35 WARN hdfs.LeaseRenewer: Failed to renew lease for [DFSClient_NONMAPREDUCE_-2053099090_1] for 857 seconds. Will retry shortly ...
18-09-2017 10:19:35 CST crm_user_select ERROR - org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (token for bd_recom: HDFS_DELEGATION_TOKEN owner=bd_recom@FHC, renewer=yarn, realUser=, issueDate=1505095524480, maxDate=1505700324480, sequenceNumber=2244503, masterKeyId=504) can't be found in cache
18-09-2017 10:19:35 CST crm_user_select ERROR - at org.apache.hadoop.ipc.Client.call(Client.java:1471)
18-09-2017 10:19:35 CST crm_user_select ERROR - at org.apache.hadoop.ipc.Client.call(Client.java:1408)
18-09-2017 10:19:35 CST crm_user_select ERROR - at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
18-09-2017 10:19:35 CST crm_user_select ERROR - at com.sun.proxy.$Proxy14.renewLease(Unknown Source)
18-09-2017 10:19:35 CST crm_user_select ERROR - at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.renewLease(ClientNamenodeProtocolTranslatorPB.java:576)
18-09-2017 10:19:35 CST crm_user_select ERROR - at sun.reflect.GeneratedMethodAccessor42.invoke(Unknown Source)
18-09-2017 10:19:35 CST crm_user_select ERROR - at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
18-09-2017 10:19:35 CST crm_user_select ERROR - at java.lang.reflect.Method.invoke(Method.java:606)
18-09-2017 10:19:35 CST crm_user_select ERROR - at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
18-09-2017 10:19:35 CST crm_user_select ERROR - at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
18-09-2017 10:19:35 CST crm_user_select ERROR - at com.sun.proxy.$Proxy15.renewLease(Unknown Source)
18-09-2017 10:19:35 CST crm_user_select ERROR - at org.apache.hadoop.hdfs.DFSClient.renewLease(DFSClient.java:941)
18-09-2017 10:19:35 CST crm_user_select ERROR - at org.apache.hadoop.hdfs.LeaseRenewer.renew(LeaseRenewer.java:423)
18-09-2017 10:19:35 CST crm_user_select ERROR - at org.apache.hadoop.hdfs.LeaseRenewer.run(LeaseRenewer.java:448)
18-09-2017 10:19:35 CST crm_user_select ERROR - at org.apache.hadoop.hdfs.LeaseRenewer.access$700(LeaseRenewer.java:71)
18-09-2017 10:19:35 CST crm_user_select ERROR - at org.apache.hadoop.hdfs.LeaseRenewer$1.run(LeaseRenewer.java:304)
18-09-2017 10:19:35 CST crm_user_select ERROR - at java.lang.Thread.run(Thread.java:745)
顺便说一句: 1. NameNode HA 已启用。2. Kerberos 已启用。3. HDFS Delegation Token(不是 Keytab 或 TGT)用于与 NameNode 通信。
我尝试使用配置“--conf spark.hadoop.fs.hdfs.impl.disable.cache=true”,但没有成功。所以任何人都可以帮助我,我真的很感激!