2

我正在尝试使用 python mrjob 学习 mapreduce 程序。我收到以下错误:

追溯:

dumping stdin to local file /tmp/pyes_mrjob.testuser.20131004.103251.998597/STDIN
Making directory hdfs:///user/testuser/tmp/mrjob/pyes_mrjob.user.20131004.103251.998597/files/ on HDFS
> /usr/lib/hadoop-mapreduce/bin/hadoop fs -mkdir hdfs:///user/testuser/tmp/mrjob/pyes_mrjob.testuser.20131004.103251.998597/files/
Traceback (most recent call last):
  File "pyes_mrjob.py", line 34, in <module>
    MRWordFrequencyCount.run()
  File "/usr/local/lib/python2.7/dist-packages/mrjob/job.py", line 500, in run
    mr_job.execute()
  File "/usr/local/lib/python2.7/dist-packages/mrjob/job.py", line 518, in execute
    super(MRJob, self).execute()
  File "/usr/local/lib/python2.7/dist-packages/mrjob/launch.py", line 146, in execute
    self.run_job()
  File "/usr/local/lib/python2.7/dist-packages/mrjob/launch.py", line 207, in run_job
    runner.run()
  File "/usr/local/lib/python2.7/dist-packages/mrjob/runner.py", line 458, in run
    self._run()
  File "/usr/local/lib/python2.7/dist-packages/mrjob/hadoop.py", line 236, in _run
    self._upload_local_files_to_hdfs()
  File "/usr/local/lib/python2.7/dist-packages/mrjob/hadoop.py", line 263, in _upload_local_files_to_hdfs
    self._mkdir_on_hdfs(self._upload_mgr.prefix)
  File "/usr/local/lib/python2.7/dist-packages/mrjob/hadoop.py", line 271, in _mkdir_on_hdfs
    self.invoke_hadoop(['fs', '-mkdir', path])
  File "/usr/local/lib/python2.7/dist-packages/mrjob/fs/hadoop.py", line 81, in invoke_hadoop
    proc = Popen(args, stdout=PIPE, stderr=PIPE)
  File "/usr/lib/python2.7/subprocess.py", line 679, in __init__
    errread, errwrite)
  File "/usr/lib/python2.7/subprocess.py", line 1249, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory

我手动执行了命令,它在那里工作正常,但是当我尝试执行我的程序时它不起作用。由于刚开始学习,有人可以建议我必须选择什么图书馆。根据一些博客,一些图书馆有很好的文档,一些图书馆有更好的性能和....我遇到了下面看起来更旧的帖子 http://blog.cloudera.com/blog/2013/01/a-guide-to-python -frameworks-for-hadoop/

但是最近有很多图书馆得到了更新。所以有人可以建议我图书馆我可以开始..

4

2 回答 2

5

我猜这个问题是由 mrjob 调用“hadoop fs -mkdir”的方式引起的,如果你要创建的目标目录的父目录不存在,-mkdir 将失败。这意味着您必须使用“hadoop fs -mkdir -p [path]”。最终,您需要在第 271 行的 [mrjob 安装路径](我的是 /usr/lib/python2.6/site-packages/mrjob)/hadoop.py 中手动修改 mrjob 库:

self.invoke_hadoop(['fs', '-mkdir', path])

self.invoke_hadoop(['fs', '-mkdir', '-p', path])

祝你好运!

于 2013-11-14T10:03:14.353 回答
1

看起来您将 HADOOP_HOME 设置为“/usr/lib/hadoop-mapreduce”。但是,这是错误的,应该设置为“/usr/lib/hadoop”。

此外,如果您收到错误提示找不到 hadoop-streaming.jar,请在“/usr/lib/hadoop”中创建指向此 jar 的符号链接,如下所示:

    sudo ln -s /usr/lib/hadoop-mapreduce/hadoop-streaming.jar /usr/lib/hadoop
于 2013-11-06T14:15:51.687 回答