0

我正在做一个涉及 Horovod 的项目。我在培训期间添加了额外的记录代码。Afaik,培训不在驱动程序或执行程序中进行。Horovod 启动了自己的培训流程。

执行代码失败并出现异常:Py4JNetworkError:尝试连接到 Java 服务器时发生错误。

在下面找到完整的错误堆栈:

Tue Aug 31 09:09:47 2021[0]:Traceback (most recent call last):
Tue Aug 31 09:09:47 2021[0]:  File "/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/py4j/java_gateway.py", line 977, in _get_connection
Tue Aug 31 09:09:47 2021[0]:    connection = self.deque.pop()
Tue Aug 31 09:09:47 2021[0]:IndexError: pop from an empty deque
Tue Aug 31 09:09:47 2021[0]:
Tue Aug 31 09:09:47 2021[0]:During handling of the above exception, another exception occurred:
Tue Aug 31 09:09:47 2021[0]:
Tue Aug 31 09:09:47 2021[0]:Traceback (most recent call last):
Tue Aug 31 09:09:47 2021[0]:  File "/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/py4j/java_gateway.py", line 1115, in start
Tue Aug 31 09:09:47 2021[0]:    self.socket.connect((self.address, self.port))
Tue Aug 31 09:09:47 2021[0]:ConnectionRefusedError: [Errno 111] Connection refused
Tue Aug 31 09:09:47 2021[0]:
Tue Aug 31 09:09:47 2021[0]:During handling of the above exception, another exception occurred:

Traceback (most recent call last):
Tue Aug 31 09:09:47 2021[0]:  File "/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/wrapper/core.py", line 23, in __init__
Tue Aug 31 09:09:47 2021[0]:    logger.logInfo("Training started")
Tue Aug 31 09:09:47 2021[0]:  File "/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/wrapper/utils/logger.py", line 5, in logInfo
Tue Aug 31 09:09:47 2021[0]:    spark = SparkSession.builder.appName("TestApp").config("spark.executor.allowSparkContext", "true").getOrCreate()
Tue Aug 31 09:09:47 2021[0]:  File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 228, in getOrCreate
Tue Aug 31 09:09:47 2021[0]:  File "/opt/spark/python/lib/pyspark.zip/pyspark/context.py", line 384, in getOrCreate
Tue Aug 31 09:09:47 2021[0]:  File "/opt/spark/python/lib/pyspark.zip/pyspark/context.py", line 144, in __init__
Tue Aug 31 09:09:47 2021[0]:  File "/opt/spark/python/lib/pyspark.zip/pyspark/context.py", line 331, in _ensure_initialized
Tue Aug 31 09:09:47 2021[0]:  File "/opt/spark/python/lib/pyspark.zip/pyspark/java_gateway.py", line 153, in launch_gateway
Tue Aug 31 09:09:47 2021[0]:  File "/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/py4j/java_gateway.py", line 180, in java_import
Tue Aug 31 09:09:47 2021[0]:    answer = gateway_client.send_command(command)
Tue Aug 31 09:09:47 2021[0]:  File "/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/py4j/java_gateway.py", line 1031, in send_command
Tue Aug 31 09:09:47 2021[0]:    connection = self._get_connection()
Tue Aug 31 09:09:47 2021[0]:  File "/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/py4j/java_gateway.py", line 979, in _get_connection
Tue Aug 31 09:09:47 2021[0]:    connection = self._create_connection()
Tue Aug 31 09:09:47 2021[0]:  File "/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/py4j/java_gateway.py", line 985, in _create_connection
Tue Aug 31 09:09:47 2021[0]:    connection.start()
Tue Aug 31 09:09:47 2021[0]:  File "/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/py4j/java_gateway.py", line 1127, in start
Tue Aug 31 09:09:47 2021[0]:    raise Py4JNetworkError(msg, e)
Tue Aug 31 09:09:47 2021[0]:py4j.protocol.Py4JNetworkError: An error occurred while trying to connect to the Java server (127.0.0.1:33173)

我已验证端口是否正确。

4

0 回答 0