python - python hadoop：mapreduce作业不工作

Question

我的 map reduce 程序正在处理 20 个视频，所以我在 hdfs 中上传了 20 个视频，当我开始在终端上执行 map reduce 代码时，它没有继续。当我运行此命令时，pydoop submit --upload-file-to-cache stage1.py stage1 path_directory stage1_output它停止了。终端上的登录如下。

hduser@Barca-FC:/home/uday/Project/final project/algo2$ pydoop submit --upload-file-to-cache twodct.py twodct  path_directory twodct_output
16/05/30 18:19:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/05/30 18:19:21 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
16/05/30 18:19:22 WARN mapreduce.JobSubmitter: No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
16/05/30 18:19:22 INFO input.FileInputFormat: Total input paths to process : 1
16/05/30 18:19:22 INFO mapreduce.JobSubmitter: number of splits:1
16/05/30 18:19:22 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1464609268645_0002
16/05/30 18:19:23 INFO mapred.YARNRunner: Job jar is not present. Not adding any jar to the list of resources.
16/05/30 18:19:23 INFO impl.YarnClientImpl: Submitted application application_1464609268645_0002
16/05/30 18:19:23 INFO mapreduce.Job: The url to track the job: http://Barca-FC:8088/proxy/application_1464609268645_0002/
16/05/30 18:19:23 INFO mapreduce.Job: Running job: job_1464609268645_0002

我的hadoop配置文件是这样的：

mapred-site.xml:
<configuration>
<property>
  <name>mapred.job.tracker</name>
  <value>localhost:54311</value>
  <description>The host and port that the MapReduce job tracker runs
  at.  If "local", then jobs are run in-process as a single map
  and reduce task.
  </description>
 </property>
<property>
<name>mapred.reduce.tasks</name>
<value>1</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

hdfs-site.xml：

<configuration>
<property>
  <name>dfs.replication</name>
  <value>1</value>
  <description>Default block replication.
  The actual number of replications can be specified when the file is created.
  The default is used if replication is not specified in create time.
  </description>
 </property>
 <property>
   <name>dfs.namenode.name.dir</name>
   <value>file:/usr/local/hadoop_store/hdfs/namenode</value>
 </property>
 <property>
   <name>dfs.datanode.data.dir</name>
   <value>file:/usr/local/hadoop_store/hdfs/datanode</value>
 </property>
<property>
 <name>dfs.webhdfs.enabled</name>
 <value>true</value>
</property>
</configuration>

核心站点.xml：

<configuration>
<property>
  <name>hadoop.tmp.dir</name>
  <value>/app/hadoop/tmp</value>
  <description>A base for other temporary directories.</description>
 </property>

 <property>
  <name>fs.default.name</name>
  <value>hdfs://localhost:54310</value>
  <description>The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem.</description>
 </property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

谁能告诉我为什么我的 mapreduce 工作没有进行？提前致谢！

python - python hadoop：mapreduce作业不工作

0 回答 0

Related

Reference