configuration - Hadoop：从 HDFS 映射/减少

Question

我可能错了，但是我在 Apache Hadoop 中看到的所有（？）示例都将存储在本地文件系统上的文件（例如 org.apache.hadoop.examples.Grep）作为输入

有没有办法在 Hadoop 文件系统 (HDFS) 上加载和保存数据？例如，我在 HDFS 上使用了一个名为“stored.xls”的制表符分隔文件hadoop-0.19.1/bin/hadoop dfs -put ~/local.xls stored.xls。我应该如何配置 JobConf 来阅读它？

谢谢。

score 1 · Accepted Answer

JobConf conf = new JobConf(getConf(), ...);
...
FileInputFormat.setInputPaths(conf, new Path("stored.xls"))
...
JobClient.runJob(conf);
...

setInputPaths 会做到这一点。

score 1 · Accepted Answer

Pierre，Hadoop 的默认配置是在本地模式下运行，而不是在分布式模式下运行。您可能只需要修改 hadoop-site.xml 中的一些配置。看起来您的默认文件系统仍然是 localhost，而它应该是 hdfs://youraddress:yourport。查看 fs.default.name 的设置，并查看Michael Noll 博客上的设置帮助以获取更多详细信息。

score 1 · Accepted Answer

1

FileInputFormat.setInputPaths(conf, new Path("hdfs://hostname:port/user/me/stored.xls"));

这会做

于 2009-05-14T17:02:31.047 回答

configuration - Hadoop：从 HDFS 映射/减少

3 回答 3

Related

Reference