Spark Master 和 Worker 都在 localhost 中运行。我已经通过触发命令启动了 Master 和 Worker 节点:
sbin/start-all.sh
主节点调用日志:
Spark Command: /Library/Java/JavaVirtualMachines/jdk1.8.0_181.jdk/Contents/Home/jre/bin/java -cp /Users/gaurishi/spark/spark-2.3.1-bin-hadoop2.7/conf/:/Users/gaurishi/spark/spark-2.3.1-bin-hadoop2.7/jars/* -Xmx1g org.apache.spark.deploy.master.Master --host 192.168.0.38 --port 7077 --webui-port 8080
Worker 节点调用的日志:
Spark Command: /Library/Java/JavaVirtualMachines/jdk1.8.0_181.jdk/Contents/Home/jre/bin/java -cp /Users/gaurishi/spark/spark-2.3.1-bin-hadoop2.7/conf/:/Users/gaurishi/spark/spark-2.3.1-bin-hadoop2.7/jars/* -Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://192.168.0.38:7077
我在conf/spark-env.sh中有以下配置
SPARK_MASTER_HOST=192.168.0.38
/etc/hosts 的内容:
127.0.0.1 localhost
::1 localhost
255.255.255.255 broadcasthost
Scala 代码,我正在调用它来建立远程火花连接:
val sparkConf = new SparkConf()
.setAppName(AppConstants.AppName)
.setMaster("spark://192.168.0.38:7077")
val sparkSession = SparkSession.builder()
.appName(AppConstants.AppName)
.config(sparkConf)
.enableHiveSupport()
.getOrCreate()
从 IDE 执行代码时,我在控制台中收到以下异常:
2018-10-04 14:43:33,426 ERROR [main] spark.SparkContext (Logging.scala:logError(91)) - Error initializing SparkContext.
org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
........
Caused by: org.apache.spark.SparkException: Could not find BlockManagerMaster.
at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:157)
at org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:132)
.......
2018-10-04 14:43:33,432 INFO [stop-spark-context] spark.SparkContext (Logging.scala:logInfo(54)) - Successfully stopped SparkContext
Exception in thread "main" org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
........
Caused by: org.apache.spark.SparkException: Could not find BlockManagerMaster.
at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:157)
at org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:132)
........
/logs/master 的日志显示以下错误:
18/10/04 14:43:13 ERROR TransportRequestHandler: Error while invoking RpcHandler#receive() for one-way message.
java.io.InvalidClassException: org.apache.spark.rpc.RpcEndpointRef; local class incompatible: stream classdesc serialVersionUID = 1835832137613908542, local class serialVersionUID = -1329125091869941550
at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:699)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1885)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1751)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1885)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1751)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2042)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)
.......
远程连接 spark 需要做哪些改变?
火花版本:
Spark: spark-2.3.1-bin-hadoop2.7
构建依赖项:
Scala: 2.11
Spark-hive: 2.2.2
Maven-org-spark-project-hive hive-metastore = 1.x;
日志: