我在 Cascading 中使用 TextLine 来加载 Cascading 中具有非常大行的文件。这些行很长——平均大约 30Mb,有些更长。当我在本地运行作业以对其进行测试时,它运行良好,但是当我在集群上运行它时,它在经过一段时间的密集处理后失败。它给出了如下错误:
cascading.tuple.TupleException: unable to read from input identifier: maprfs:/xxx/xxx/xxx/part-00001
at cascading.tuple.TupleEntrySchemeIterator.hasNext(TupleEntrySchemeIterator.java:127)
at cascading.flow.stream.SourceStage.map(SourceStage.java:76)
at cascading.flow.stream.SourceStage.run(SourceStage.java:58)
at cascading.flow.hadoop.FlowMapper.run(FlowMapper.java:127)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:443)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:353)
at org.apache.hadoop.mapred.Child$4.run(Child.java:282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1122)
at org.apache.hadoop.mapred.Child.main(Child.java:271)
它有时也会抱怨陈旧的文件句柄。它试图读取的文件肯定在那里。有人可以帮我吗?