0

环境为:JDK 1.7;鼎晖 5.8.0

代码

from pyspark.ml.feature import PCA
from pyspark.mllib.linalg import Vectors
data = [(Vectors.sparse(5, [(1, 1.0), (3, 7.0)]),),
    (Vectors.dense([2.0, 0.0, 3.0, 4.0, 5.0]),),
    (Vectors.dense([4.0, 0.0, 0.0, 6.0, 7.0]),)]
df = sqlContext.createDataFrame(data,["features"])
pca = PCA(k=2, inputCol="features", outputCol="pca_features")
model = pca.fit(df)

图表有助于描述 在此处输入图像描述

错误堆栈是

[Stage 2:>                                                          (0 + 1) / 2]/usr/java/jdk1.7.0_67-cloudera/bin/java: symbol lookup error: /tmp/jniloader73074               80764352992550netlib-native_system-linux-x86_64.so: undefined symbol: cblas_daxpy
----------------------------------------
Exception happened during processing of request from ('127.0.0.1', 47504)
Traceback (most recent call last):
  File "/usr/lib64/python2.7/SocketServer.py", line 295, in _handle_request_noblock
    self.process_request(request, client_address)
  File "/usr/lib64/python2.7/SocketServer.py", line 321, in process_request
    self.finish_request(request, client_address)
  File "/usr/lib64/python2.7/SocketServer.py", line 334, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/usr/lib64/python2.7/SocketServer.py", line 649, in __init__
    self.handle()
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/pyspark/accumulators.py", line 235, in handle
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server
Traceback (most recent call last):
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 690, in start
    self.socket.connect((self.address, self.port))
  File "/usr/lib64/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
error: [Errno 111] Connection refused
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/pyspark/ml/pipeline.py", line 69, in fit
    num_updates = read_int(self.rfile)
      File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/pyspark/serializers.py", line 545, in read_int
return self._fit(dataset)
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/pyspark/ml/wrapper.py", line 133, in _fit
    java_model = self._fit_java(dataset)
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/pyspark/ml/wrapper.py", line 130, in _fit_java
    return self._java_obj.fit(dataset._jdf)
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 811, in __call__
    raise EOFError
EOFError
----------------------------------------
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 631, in send_command
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 624, in send_command
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 579, in _get_connection
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 585, in _create_connection
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 697, in start
py4j.protocol.Py4JNetworkError: An error occurred while trying to connect to the Java server
>>> ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server
Traceback (most recent call last):
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 690, in start
    self.socket.connect((self.address, self.port))
  File "/usr/lib64/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
error: [Errno 111] Connection refused

Traceback (most recent call last):
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/pyspark/context.py", line 224, in signal_handler
    self.cancelAllJobs()
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/pyspark/context.py", line 909, in cancelAllJobs
    self._jsc.sc().cancelAllJobs()
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 811, in __call__
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 624, in send_command
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 579, in _get_connection
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 585, in _create_connection
  File "/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 697, in start
py4j.protocol.Py4JNetworkError: An error occurred while trying to connect to the Java server

关于这个问题的事情 Python Spark Context can't connect to the Py4J Spark Context because of the Py4J java server down 这是由

symbol lookup error: /tmp/jniloader73074               80764352992550netlib-native_system-linux-x86_64.so: undefined symbol: cblas_daxpy

因此,python Spark Context 无法连接到 Py4J Spark 上下文,这表明Py4J Spark context ('127.0.0.1', 47504) Connection refused

另一个证明在执行程序日志中,它显示

 CoarseGrainedExecutorBackend: An unknown (executor_IP:executor_port) driver disconnected
CoarseGrainedExecutorBackend: Driver (executor_IP:executor_port) disassociated! Shutting down

这意味着执行程序也无法连接到 Py4J Spark 上下文。

纱线日志-applicationId application_xxxxxxxxx_xxxxxx

Container: container_e37_1484199111776_8460_01_000001 on node_xxxxx
LogType:stderr
Log Upload Time:Mon Feb 20 11:18:07 +1300 2017
LogLength:94
Log Contents:
17/02/20 11:18:05 WARN yarn.YarnAllocator: Expected to find pending requests, but found none.

LogType:stdout
Log Upload Time:Mon Feb 20 11:18:07 +1300 2017
LogLength:0
Log Contents:

Container: container_e37_1484199111776_8460_01_000002 on node_xxxxx_2
LogType:stderr
Log Upload Time:Mon Feb 20 11:18:07 +1300 2017
LogLength:250
Log Contents:
17/02/20 11:18:06 WARN executor.CoarseGrainedExecutorBackend: An unknown (driver IP:PORT) driver disconnected

LogType:stdout
Log Upload Time:Mon Feb 20 11:18:07 +1300 2017
LogLength:0
Log Contents:

知道为什么吗?

4

1 回答 1

1

看起来问题的根源问题是本机库的打包不正确。该问题记录在 netlib 问题跟踪器中:https ://github.com/fommil/netlib-java/issues/66

推荐的解决方案是:

试试 OpenBLAS 或英特尔的数学内核库。

于 2017-03-25T00:45:35.147 回答