0

我已经在 Virtualbox 上的 Ubuntu 20.04 中安装了 Hadoop 3.2.1,用于我的大学学习和大学的最后期限,所以我是 Hadoop 的新手。而且我在互联网上搜索了几个资源如何在 Hadoop 上进行 mapreduce。

但是,当我在终端上键入时:

hadoop jar '/home/tamminen/WordCountTutorial/firstTutorial.jar' WordCount /WordCountTutorial/Input /WordCountTutorial/Output

格式:

hadoop jar <JAR_FILE> <CLASS_NAME> <HDFS_INPUT_DIRECTORY> <HDFS_OUTPUT_DIRECTORY>

该命令如下所示:

2020-10-11 18:59:04,584 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-10-11 18:59:05,595 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-10-11 18:59:06,598 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-10-11 18:59:07,618 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8032. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-10-11 18:59:08,619 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8032. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-10-11 18:59:09,621 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8032. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-10-11 18:59:10,624 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8032. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-10-11 18:59:11,625 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8032. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-10-11 18:59:12,627 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8032. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-10-11 18:59:13,629 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8032. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-10-11 18:59:13,632 INFO retry.RetryInvocationHandler: java.net.ConnectException: Call From 18k10018-data-mining/10.0.2.15 to localhost:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused, while invoking ApplicationClientProtocolPBClientImpl.getNewApplication over null after 3 failover attempts. Trying to failover after sleeping for 34444ms.

它让我无法做到 hadoop dfs -cat <HDFS_OUTPUT_DIRECTORY>*

这是我的 hadoop 配置文件,我已更改如下:

核心站点.xml

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
    <property>
        <name>hadoop.proxyuser.dataflair.groups</name>
        <value>*</value>
    </property>
    <property>
        <name>hadoop.proxyuser.dataflair.hosts</name>
        <value>*</value>
    </property>
    <property>
        <name>hadoop.proxyuser.server.hosts</name>
        <value>*</value>
    </property>
    <property>
        <name>hadoop.proxyuser.server.groups</name>
    <value>*</value>
    </property>
</configuration> 

hdfs-site.xml

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

mapred-site.xml

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapred.child.java.opts</name>
        <value>-Xmx4096m</value>
    </property>
    <property>
        <name>mapreduce.application.classpath</name>
        <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
    </property>
</configuration>

纱线站点.xml

<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.resourcemanager.address</name>
        <value>127.0.0.1:8032</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>127.0.0.1:8030</value>
    </property>
    <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>127.0.0.1:8032</value>
    </property>
    <property>
        <name>yarn.nodemanager.env-whitelist</name>
        <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
    </property>
</configuration>

然后是 hadoop-env.sh

...
# Extra Java runtime options for all Hadoop commands. We don't support
# IPv6 yet/still, so by default the preference is set to IPv4.
# export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true"
export HADOOP_OPTS="-Xmx5096m"  --> Only this I added from searching hadoop tutorial solution beside of JAVA_HOME
# For Kerberos debugging, an extended option set logs more information
# export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true -Dsun.security.krb5.debug=true -Dsun.security.spnego.debug"
...

谁能解释为什么这是错误并给我解决方案我应该怎么做才能做hadoop jar?

4

1 回答 1

0

这可能是因为 Hadoop 有时会在服务器的内部 IP 地址而不是 localhost 或 127.0.0.1 上启动一些服务。您可以尝试在所有 Hadoop 配置文件中将 127.0.0.1 更改为服务器的实际 IP 地址,看看它是否有效。其他方法是以 root 身份编辑 /etc/hosts 文件并将 localhost 映射到服务器的实际 IP。

有关更精确的说明,请按照以下文章 https://hadooptutorials.info/2020/10/05/part-1-apache-hadoop-installation-on-single-node-cluster-with-google-cloud-virtual-machine/

于 2020-10-19T03:59:30.043 回答