2

我正在使用 scala 和 spark cassandra 连接器在 Intellij 中开发一个测试应用程序。这是我的 build.sbt 代码:

scalaVersion := "2.11.8"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.1"
libraryDependencies += "org.apache.spark".%%("spark-sql") % "1.6.1"
libraryDependencies += "com.datastax.spark" %% "spark-cassandra-connector" % "1.6.0-M2"

我使用具有 4 个节点的 ccm 创建了 cassandra 集群。我创建的键空间具有复制因子 3。这是我在 scala 应用程序中的代码

val conf = new SparkConf()
.setMaster("local[*]")
.setAppName("SparkCassandra")
//set Cassandra host address as your local address
.set("spark.cassandra.connection.host", "127.0.0.1")

val sc = new SparkContext(conf)
val rdd = sc.cassandraTable("excelsior", "emp")
val total = rdd.count()
println(total)
println("exiting now:")
sc.stop()

但火花工作挂在下一行

CassandraConnector:与 Cassandra 集群断开连接:cluster4nodes

4 个任务中只有 3 个任务完成。这是完整的日志:

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/07/08 14:23:26 INFO SparkContext: Running Spark version 1.6.0
16/07/08 14:23:26 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/07/08 14:23:27 WARN Utils: Your hostname, renuka-Inspiron-3542 resolves to a loopback address: 127.0.1.1; using 192.168.1.189 instead (on interface wlan0)
16/07/08 14:23:27 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
16/07/08 14:23:27 INFO SecurityManager: Changing view acls to: renuka
16/07/08 14:23:27 INFO SecurityManager: Changing modify acls to: renuka
16/07/08 14:23:27 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(renuka); users with modify permissions: Set(renuka)
16/07/08 14:23:28 INFO Utils: Successfully started service 'sparkDriver' on port 41027.
16/07/08 14:23:28 INFO Slf4jLogger: Slf4jLogger started
16/07/08 14:23:28 INFO Remoting: Starting remoting
16/07/08 14:23:28 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.1.189:46329]
16/07/08 14:23:28 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 46329.
16/07/08 14:23:29 INFO SparkEnv: Registering MapOutputTracker
16/07/08 14:23:29 INFO SparkEnv: Registering BlockManagerMaster
16/07/08 14:23:29 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-d1db2ea9-3fba-4b64-ad2b-80deddd3f05a
16/07/08 14:23:29 INFO MemoryStore: MemoryStore started with capacity 1091.3 MB
16/07/08 14:23:29 INFO SparkEnv: Registering OutputCommitCoordinator
16/07/08 14:23:29 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/07/08 14:23:29 INFO SparkUI: Started SparkUI at http://192.168.1.189:4040
16/07/08 14:23:30 INFO Executor: Starting executor ID driver on host localhost
16/07/08 14:23:30 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 40956.
16/07/08 14:23:30 INFO NettyBlockTransferService: Server created on 40956
16/07/08 14:23:30 INFO BlockManagerMaster: Trying to register BlockManager
16/07/08 14:23:30 INFO BlockManagerMasterEndpoint: Registering block manager localhost:40956 with 1091.3 MB RAM, BlockManagerId(driver, localhost, 40956)
16/07/08 14:23:30 INFO BlockManagerMaster: Registered BlockManager
16/07/08 14:23:30 INFO NettyUtil: Found Netty's native epoll transport in the classpath, using it
16/07/08 14:23:31 INFO Cluster: New Cassandra host /127.0.0.1:9042 added
16/07/08 14:23:31 INFO Cluster: New Cassandra host /127.0.0.2:9042 added
16/07/08 14:23:31 INFO LocalNodeFirstLoadBalancingPolicy: Added host 127.0.0.2 (datacenter1)
16/07/08 14:23:31 INFO Cluster: New Cassandra host /127.0.0.3:9042 added
16/07/08 14:23:31 INFO LocalNodeFirstLoadBalancingPolicy: Added host 127.0.0.3 (datacenter1)
16/07/08 14:23:31 INFO Cluster: New Cassandra host /127.0.0.4:9042 added
16/07/08 14:23:31 INFO LocalNodeFirstLoadBalancingPolicy: Added host 127.0.0.4 (datacenter1)
16/07/08 14:23:31 INFO CassandraConnector: Connected to Cassandra cluster: cluster4nodes
16/07/08 14:23:31 INFO SparkContext: Starting job: count at hello.scala:36
16/07/08 14:23:32 INFO DAGScheduler: Got job 0 (count at hello.scala:36) with 4 output partitions
16/07/08 14:23:32 INFO DAGScheduler: Final stage: ResultStage 0 (count at hello.scala:36)
16/07/08 14:23:32 INFO DAGScheduler: Parents of final stage: List()
16/07/08 14:23:32 INFO DAGScheduler: Missing parents: List()
16/07/08 14:23:32 INFO DAGScheduler: Submitting ResultStage 0 (CassandraTableScanRDD[0] at RDD at CassandraRDD.scala:18), which has no missing parents
16/07/08 14:23:32 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 7.2 KB, free 7.2 KB)
16/07/08 14:23:32 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 3.7 KB, free 10.9 KB)
16/07/08 14:23:32 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:40956 (size: 3.7 KB, free: 1091.2 MB)
16/07/08 14:23:32 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1006
16/07/08 14:23:32 INFO DAGScheduler: Submitting 4 missing tasks from ResultStage 0 (CassandraTableScanRDD[0] at RDD at CassandraRDD.scala:18)
16/07/08 14:23:32 INFO TaskSchedulerImpl: Adding task set 0.0 with 4 tasks
16/07/08 14:23:32 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0,NODE_LOCAL, 3530 bytes)
16/07/08 14:23:32 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, partition 1,NODE_LOCAL, 3530 bytes)
16/07/08 14:23:32 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, localhost, partition 2,NODE_LOCAL, 3530 bytes)
16/07/08 14:23:32 INFO Executor: Running task 2.0 in stage 0.0 (TID 2)
16/07/08 14:23:32 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
16/07/08 14:23:32 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
16/07/08 14:23:34 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 2082 bytes result sent to driver
16/07/08 14:23:34 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 2082 bytes result sent to driver
16/07/08 14:23:34 INFO Executor: Finished task 2.0 in stage 0.0 (TID 2). 2082 bytes result sent to driver
16/07/08 14:23:34 INFO TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 2089 ms on localhost (1/4)
16/07/08 14:23:34 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 2179 ms on localhost (2/4)
16/07/08 14:23:34 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 2117 ms on localhost (3/4)
16/07/08 14:23:41 INFO CassandraConnector: Disconnected from Cassandra cluster: cluster4nodes

如果我在 4 个节点集群上创建了一个复制因子为 4 的键空间,则应用程序可以正常工作并且永远不会挂起。我是否缺少配置中的任何内容。提前致谢。

4

0 回答 0