0

我设置了 2 个节点(虚拟机)hadoop 集群设置。成功启动 dfs 和 mapred deamons 后,我运行 hadoop 演示示例,此终端显示后程序变慢:

Number of Maps = 4 Samples per Map = 10000
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Starting Job`enter code here`
13/06/10 21:36:43 INFO mapred.FileInputFormat: Total input paths to process : 4
13/06/10 21:36:43 INFO mapred.FileInputFormat: Total input paths to process : 4
13/06/10 21:36:43 INFO mapred.JobClient: Running job: job_201306101254_0005
13/06/10 21:36:44 INFO mapred.JobClient:  map 0% reduce 0%
13/06/10 21:36:49 INFO mapred.JobClient:  map 75% reduce 0%
13/06/10 21:36:50 INFO mapred.JobClient:  map 100% reduce 0%

所以基本上map任务正在正确完成。我在查看尝试日志后检查了这个行为,特别是reduce任务的尝试日志,确认reduce任务无法读取其他slave生成的mapoutput**,错误是这样的:

****13/06/11 01:55:45 WARN mapred.JobClient: Error reading task outputhttp://hadoop-desk.localdomain:50060/tasklog?plaintext=true&taskid=attempt_201306110154_0001_m_000000_0&filter=stdout
13/06/11 01:55:45 WARN mapred.JobClient: Error reading task outputhttp://hadoop-desk.localdomain:50060/tasklog?plaintext=true&taskid=attempt_201306110154_0001_m_000000_0&filter=stderr
13/06/11 01:55:49 INFO mapred.JobClient:  map 75% reduce 16%****

因此,生成此 map 输出的 map 任务被认为失败并在不同的从机(reduce 正在运行的从机)上重新调度,从而使整个程序变慢。我认为原因是因为 ubuntu 的 etc/hosts 是:

127.0.0.1   localhost
127.0.1.1   hadoop-desk.localdomain hadoop-desk
192.168.196.128 master
192.168.196.129 slave


# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

即使在删除本地主机后,我也会遇到同样的错误,这一行

127.0.0.1   localhost

我删除了这条线

127.0.1.1   hadoop-desk.localdomain hadoop-desk

然后我得到了这个错误:

Number of Maps = 4 Samples per Map = 10000
org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.SafeModeException: Cannot delete /user/hadoop-user/test-mini-mr. Name node is in safe mode.
The ratio of reported blocks 0.0000 has not reached the threshold 0.9990. Safe mode will be turned off automatically.
    at org.apache.hadoop.dfs.FSNamesystem.deleteInternal(FSNamesystem.java:1494)
    at org.apache.hadoop.dfs.FSNamesystem.delete(FSNamesystem.java:1466)
    at org.apache.hadoop.dfs.NameNode.delete(NameNode.java:425)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)

    at org.apache.hadoop.ipc.Client.call(Client.java:715)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
    at org.apache.hadoop.dfs.$Proxy0.delete(Unknown Source)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
    at org.apache.hadoop.dfs.$Proxy0.delete(Unknown Source)
    at org.apache.hadoop.dfs.DFSClient.delete(DFSClient.java:529)
    at org.apache.hadoop.dfs.DistributedFileSystem.delete(DistributedFileSystem.java:192)
    at org.apache.hadoop.examples.PiEstimator.launch(PiEstimator.java:188)
    at org.apache.hadoop.examples.PiEstimator.run(PiEstimator.java:245)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.examples.PiEstimator.main(PiEstimator.java:252)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
    at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
    at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:53)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
    at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
4

1 回答 1

0

ubuntu 8(不知道上面的版本)在 etc/hosts 文件中有这个条目

127.0.1.1   yourhostname.localdomain yourhostname

此条目正在产生问题,因此请评论该行,这是我的“etc/hosts”文件:

127.0.0.1   localhost
#127.0.1.1  hadoop-desk.localdomain hadoop-desk
192.168.196.128 master
192.168.196.129 slave


# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

之后会出现这个错误:

org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.SafeModeException: <Cannot delete /user/hadoop-user/test-mini-mr>. Name node is in safe mode.

为避免此错误,请将 hdfs-default.xml 中的 dfs.safemode.threshold.pct 参数从 0.999f 更改为 0.0f

<property>
  <name>dfs.safemode.threshold.pct</name>
  <value>0.00f</value>
  <description>
    Specifies the percentage of blocks that should satisfy 
    the minimal replication requirement defined by dfs.replication.min.
    Values less than or equal to 0 mean not to wait for any particular
    percentage of blocks before exiting safemode.
    Values greater than 1 will make safe mode permanent.
  </description>
</property>

现在我仍然收到“读取地图输出错误”(不同从站的地图输出)。我的主机名和从机主机名相同,因此我通过编辑它们各自的“etc/hostname”文件将它们的主机名分别更改为“master”和“slave”。由于我已经在 etc/hosts 文件中输入了“master”和“slave”,因此我没有收到任何错误。

于 2013-06-14T04:38:42.900 回答