0

到目前为止,对于这个问题,我已经尝试了从这里1和这里2的解决方案。但是,虽然这些解决方案确实会执行 mapreduce 任务,但它们似乎只在名称节点上运行,因为我得到的输出类似于此处3

基本上,我正在使用我自己设计的 mapreduce 算法运行一个2 节点集群。mapreduce jar在单节点集群上完美执行,这让我觉得我的hadoop多节点配置有问题。要设置多节点,我按照这里的教程进行操作。

为了报告出了什么问题,当我执行我的程序时(在检查 namenodes、tasktrackers、jobtrackers 和 Datanodes 是否在各自的节点上运行之后)我的程序在终端中以这条线停止

INFO mapred.JobClient: map 100% reduce 0%

如果我查看我看到的任务的日志,copy failed: attempt... from slave-node然后是SocketTimeoutException.

查看我的从节点(DataNode)上的日志显示执行在以下行停止

TaskTracker: attempt... 0.0% reduce > copy >

正如链接 1 和 2 中的解决方案所建议的那样,etc/hosts文件中删除各种 ip 地址会导致成功执行,但是我最终会在我的从节点 (DataNode) 日志中的链接 4 中得到一些项目,例如:

INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0381

WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0381 being deleted.

作为一个新的hadoop用户,这对我来说看起来很可疑,但看到这个可能是完全正常的。对我来说,这看起来好像是某些东西指向了主机文件中的错误 IP 地址,并且通过删除这个 IP 地址,我只是停止了从节点上的执行,而是在名称节点上继续处理(这并不是真正有利的完全)。

总结一下:

  1. 这是预期的输出吗?
  2. 有没有办法我可以看到在哪个节点执行后执行了什么?
  3. 任何人都可以发现我可能做错了什么吗?

编辑为每个节点添加主机和配置文件

主:等/主机

127.0.0.1       localhost
127.0.1.1       joseph-Dell-System-XPS-L702X

#The following lines are for hadoop master/slave setup
192.168.1.87    master
192.168.1.74    slave

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

从站:等/主机

127.0.0.1       localhost
127.0.1.1       joseph-Home # this line was incorrect, it was set as 7.0.1.1

#the following lines are for hadoop mutli-node cluster setup
192.168.1.87    master
192.168.1.74    slave

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

大师:core-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
    <name>hadoop.tmp.dir</name>
    <value>/home/hduser/tmp</value>
    <description>A base for other temporary directories.</description>
</property>
    <property>
        <name>fs.default.name</name>
        <value>hdfs://master:54310</value>
        <description>The name of the default file system. A URI whose
        scheme and authority determine the FileSystem implementation. The
        uri’s scheme determines the config property (fs.SCHEME.impl) naming
        the FileSystem implementation class. The uri’s authority is used to
        determine the host, port, etc. for a filesystem.</description>
    </property>
</configuration>

从站:core-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

    <property>
        <name>hadoop.tmp.dir</name>
        <value>/home/hduser/tmp</value>
        <description>A base for other temporary directories.</description>
    </property>

    <property>
        <name>fs.default.name</name>
        <value>hdfs://master:54310</value>
        <description>The name of the default file system. A URI whose
        scheme and authority determine the FileSystem implementation. The
        uri’s scheme determines the config property (fs.SCHEME.impl) naming
        the FileSystem implementation class. The uri’s authority is used to
        determine the host, port, etc. for a filesystem.</description>
    </property>

</configuration>

大师:hdfs-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>2</value>
        <description>Default block replication.
        The actual number of replications can be specified when the file is created.
        The default is used if replication is not specified in create time.
        </description>
    </property>
</configuration>

从站:hdfs-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>2</value>
        <description>Default block replication.
        The actual number of replications can be specified when the file is created.
        The default is used if replication is not specified in create time.
        </description>
    </property>
</configuration>

大师:mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
    <property>
        <name>mapred.job.tracker</name>
        <value>master:54311</value>
        <description>The host and port that the MapReduce job tracker runs
        at. If “local”, then jobs are run in-process as a single map
        and reduce task.
        </description>
    </property>
</configuration>

从站:mapre-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

    <property>
        <name>mapred.job.tracker</name>
        <value>master:54311</value>
        <description>The host and port that the MapReduce job tracker runs
        at. If “local”, then jobs are run in-process as a single map
        and reduce task.
        </description>
    </property>

</configuration>
4

3 回答 3

2

错误在 etc/hosts 中:

在错误运行期间,从属 etc/hosts 文件如下所示:

127.0.0.1       localhost
7.0.1.1       joseph-Home # THIS LINE IS INCORRECT, IT SHOULD BE 127.0.1.1

#the following lines are for hadoop mutli-node cluster setup
192.168.1.87    master
192.168.1.74    slave

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

您可能已经发现,这台计算机“joseph-Home”的 IP 地址配置不正确。它设置为 7.0.1.1,而应设置为 127.0.1.1。因此,更改从属 etc/hosts 文件第 2 行以127.0.1.1 joseph-Home修复该问题,并且我的日志正常显示在从属节点上。

新的 etc/hosts 文件:

127.0.0.1       localhost
127.0.1.1       joseph-Home # THIS LINE IS INCORRECT, IT SHOULD BE 127.0.1.1

#the following lines are for hadoop mutli-node cluster setup
192.168.1.87    master
192.168.1.74    slave

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
于 2013-09-06T12:12:43.230 回答
0

经过测试的解决方案是将以下属性添加到 hadoop-env.sh 并重新启动所有 hadoop 集群服务

hadoop-env.sh

导出 HADOOP_CLIENT_OPTS="-Xmx2048m $HADOOP_CLIENT_OPTS"

于 2014-07-29T08:15:55.233 回答
0

我今天也遇到了这个问题。我的问题是集群中一个节点的磁盘已满,因此hadoop无法将日志文件写入本地磁盘,因此解决此问题的可能方法是删除本地磁盘上一些未使用的文件。希望能帮助到你

于 2016-05-09T07:04:29.157 回答