0

我正在阅读本指南,了解如何让 mrjob 在 EMR 上工作。我遵循所有步骤,但是当我运行示例脚本时,我收到了这个错误:

matthew@WinterMute:~/work/projects/mrjob_examples$ python word_count.py -r emr moby.txt
using configs in /etc/mrjob.conf
using existing scratch bucket mrjob-4db6342a70e021ad
using s3://mrjob-4db6342a70e021ad/tmp/ as our scratch dir on S3
creating tmp directory /tmp/word_count.matthew.20140603.181541.006786
writing master bootstrap script to /tmp/word_count.matthew.20140603.181541.006786/b.py
Copying non-input files into s3://mrjob-4db6342a70e021ad/tmp/word_count.matthew.20140603.181541.006786/files/
Waiting 5.0s for S3 eventual consistency
Creating Elastic MapReduce job flow
Job flow created with ID: j-3DCN7LULSRILW
Created new job flow j-3DCN7LULSRILW
Job on job flow j-3DCN7LULSRILW failed with status FAILED: The given SSH key name was invalid
Logs are in s3://mrjob-4db6342a70e021ad/tmp/logs/j-3DCN7LULSRILW/
Scanning S3 logs for probable cause of failure
Waiting 5.0s for S3 eventual consistency
Terminating job flow: j-3DCN7LULSRILW
Traceback (most recent call last):
  File "word_count.py", line 16, in <module>
    MRWordFrequencyCount.run()
  File "/usr/local/lib/python2.7/dist-packages/mrjob/job.py", line 494, in run
    mr_job.execute()
  File "/usr/local/lib/python2.7/dist-packages/mrjob/job.py", line 512, in execute
    super(MRJob, self).execute()
  File "/usr/local/lib/python2.7/dist-packages/mrjob/launch.py", line 147, in execute
    self.run_job()
  File "/usr/local/lib/python2.7/dist-packages/mrjob/launch.py", line 208, in run_job
    runner.run()
  File "/usr/local/lib/python2.7/dist-packages/mrjob/runner.py", line 458, in run
    self._run()
  File "/usr/local/lib/python2.7/dist-packages/mrjob/emr.py", line 809, in _run
    self._wait_for_job_to_complete()
  File "/usr/local/lib/python2.7/dist-packages/mrjob/emr.py", line 1599, in _wait_for_job_to_complete
    raise Exception(msg)
Exception: Job on job flow j-3DCN7LULSRILW failed with status FAILED: The given SSH key name was invalid
4

2 回答 2

0

您的工作似乎开始正常,但随后 mrjob 无法通过 ssh 连接到主节点以监控其状态。ec2_key_pair_file如果没有看到您的配置文件,主要是和ec2_key_pair选项,很难说出究竟是什么设置不正确。确保您遵循了配置 AWS 凭证指南。您必须指定有效的密钥对名称(在“密钥对”部分下的 EC2 管理仪表板中检查)和相应.pem文件的路径。

于 2014-06-04T06:58:42.553 回答
0

我自己搜索错误时发现了这个问题。

我设法解决了这个问题 - SSH 密钥是特定于区域的,因此您需要将 mrjob.conf 文件中的区域设置为 SSH 密钥所属的区域:

runners:
    emr:
        aws_access_key_id: HADOOPHADOOPBOBADOOP
        aws_region: us-west-1
        aws_secret_access_key: MEMIMOMADOOPBANANAFANAFOFADOOPHADOOP

见这里:https ://pythonhosted.org/mrjob/guides/configs-basics.html

于 2015-05-08T07:29:17.350 回答