我有以下系统:
- 视窗主机
- 带有 Docker 的 Linux 客户机(在 Virtual Box 中)
我已经在 Docker(Ubuntu,Virtual Box)中安装了 HDFS。我使用了来自 Docker Hub 的 bde2020 hadoop 映像。这是我的码头工人撰写:
namenode:
image: bde2020/hadoop-namenode:2.0.0-hadoop3.2.1-java8
container_name: namenode
restart: always
ports:
- 9870:9870
- 9000:9000
volumes:
- hadoop_namenode:/hadoop/dfs/name
environment:
- CLUSTER_NAME=test
env_file:
- ./hadoop.env
networks:
control_net:
ipv4_address: 10.0.1.20
datanode:
image: bde2020/hadoop-datanode:2.0.0-hadoop3.2.1-java8
container_name: datanode
restart: always
ports:
- 9864:9864
volumes:
- hadoop_datanode:/hadoop/dfs/data
environment:
SERVICE_PRECONDITION: "namenode:9870"
env_file:
- ./hadoop.env
networks:
control_net:
ipv4_address: 10.0.1.21
resourcemanager:
image: bde2020/hadoop-resourcemanager:2.0.0-hadoop3.2.1-java8
container_name: resourcemanager
restart: always
environment:
SERVICE_PRECONDITION: "namenode:9000 namenode:9870 datanode:9864"
env_file:
- ./hadoop.env
networks:
control_net:
ipv4_address: 10.0.1.22
nodemanager1:
image: bde2020/hadoop-nodemanager:2.0.0-hadoop3.2.1-java8
container_name: nodemanager
restart: always
environment:
SERVICE_PRECONDITION: "namenode:9000 namenode:9870 datanode:9864 resourcemanager:8088"
env_file:
- ./hadoop.env
networks:
control_net:
ipv4_address: 10.0.1.23
historyserver:
image: bde2020/hadoop-historyserver:2.0.0-hadoop3.2.1-java8
container_name: historyserver
restart: always
environment:
SERVICE_PRECONDITION: "namenode:9000 namenode:9870 datanode:9864 resourcemanager:8088"
volumes:
- hadoop_historyserver:/hadoop/yarn/timeline
env_file:
- ./hadoop.env
networks:
control_net:
ipv4_address: 10.0.1.24
volumes:
hadoop_namenode:
hadoop_datanode:
hadoop_historyserver:
networks:
processing_net:
driver: bridge
ipam:
driver: default
config:
- subnet: 10.0.0.0/24
gateway: 10.0.0.1
我的 hdfs-site.xml 是:
<configuration>
<property><name>dfs.namenode.datanode.registration.ip-hostname-check</name><value>false</value></property>
<property><name>dfs.webhdfs.enabled</name><value>true</value></property>
<property><name>dfs.permissions.enabled</name><value>false</value></property>
<property><name>dfs.namenode.name.dir</name><value>file:///hadoop/dfs/name</value></property>
<property><name>dfs.namenode.rpc-bind-host</name><value>0.0.0.0</value></property>
<property><name>dfs.namenode.servicerpc-bind-host</name><value>0.0.0.0</value></property>
<property><name>dfs.namenode.http-bind-host</name><value>0.0.0.0</value></property>
<property><name>dfs.namenode.https-bind-host</name><value>0.0.0.0</value></property>
<property><name>dfs.client.use.datanode.hostname</name><value>true</value></property>
<property><name>dfs.datanode.use.datanode.hostname</name><value>true</value></property>
</configuration>
如果我在导航器中从 Linux(在 Virtual Box 内)编写:
然后我可以访问 Hadoop web ui。
如果我从 Windows(主机系统,在 Virtual Box 外部)的导航器中写入:
http://192.168.56.1:9870然后我也可以访问(我已映射此 IP 以便能够从 Virtual Box 外部连接)。
但是当我在 web ui 中导航并且我想下载文件时会出现问题。然后导航器说它无法连接到服务器 dcfb0bf3b42c 并在地址选项卡中显示如下一行:
http://dcfb0bf3b42c:9864/webhdfs/v1/tmp/datalakes/myJsonTest1/part-00000-0009b521-b474-49e7-be20-40f5e8b3a7b4-c000.json?op=OPEN&namenoderpcaddress=namenode:9000&offset=0
如果我将此部分“dcfb0bf3b42c”更改为 IP:10.0.1.21(来自 Linux)或 192.168.56.1(来自 Windows),它将正常工作并下载文件。
我需要自动化这个过程以避免每次都需要手动编写 IP,因为我需要使用程序来访问 HDFS 数据(Power BI),并且当它尝试访问数据时由于上述问题而失败。
我是 Hadoop 新手。我可以通过编辑任何配置文件来解决这个问题吗?