我有可以拆分的 xml 文件,以便不同的地图获取 xml 数据块(使用自定义记录读取器)。
现在我想对 bzip 压缩的 xml 文件执行此操作。
到处都有文档说 bzip 说它是可拆分的。
如果它是可拆分的,那么我之前的代码应该可以在没有任何更改的情况下工作。但它不起作用。
hadoop 版本是 hadoop-1.2.1
我不想要https://github.com/whym/wikihadoop。我想知道发生了什么而不是复制代码。
错误信息:
13/10/10 06:52:49 ERROR security.UserGroupInformation: PriviledgedActionException as:admin cause:org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://localhost/xmlinputformat/_xmlinputformat_sample.xml.bz2
Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://localhost/xmlinputformat/_xmlinputformat_sample.xml.bz2
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235)
at com.rhl.xmlinputformat.XmlInputFormat.listStatus(XmlInputFormat.java:42)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)
at com.rhl.xmlinputformat.XmlInputFormat.getSplits(XmlInputFormat.java:56)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1054)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1071)
at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:983)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:550)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:580)
at com.rhl.xmlinputformat.XmlInputFormatDriver.run(XmlInputFormatDriver.java:252)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at com.rhl.xmlinputformat.XmlInputFormatDriver.main(XmlInputFormatDriver.java:260)