1

我第一次在运行 spark 2.2 的独立集群上设置 Sparkling Water。我之前通过 R(使用 rsparkling + sparklyr + h2o)在这样的集群上运行了 Sparkling Water,但是在将其设置为 spark 应用程序(在 scala 中)时遇到问题。

该应用程序是使用 Maven 构建的,因此我添加了最新的苏打水依赖项:

    <dependency>
        <groupId>ai.h2o</groupId>
        <artifactId>sparkling-water-core_2.11</artifactId>
        <version>2.2.2</version>
    </dependency>

那么app代码如下:

package com.me.app

import org.apache.spark.sql.{DataFrame, SparkSession}        
import org.apache.spark.h2o._
import water.Key
import water.fvec.Frame

object sparklingWaterH2o {

  def sparklingWaterH2o(): Unit = {

    val sparkSession = SparkSession
      .builder()
      .master("spark://clsuter.address:0077")
      .appName("sparklingWaterH2o")
      .config("spark.executor.memory", "32G")
      .config("spark.executor.cores", "5")
      .config("spark.cores.max", "40")
      .config("spark.ext.h2o.nthreads", "40")
      .config("spark.jars", "/path/to/fat/jar/app-1.0-SNAPSHOT-jar-with-dependencies.jar")
      .getOrCreate()

    val h2oContext = H2OContext.getOrCreate(sparkSession)

    import h2oContext._

    val df = Seq(
      (1, "2014/07/31 23:00:01"),
      (1, "2016/12/09 10:12:43")).toDF("id", "date")

    val h2oTrainFrame = h2oContext.asH2OFrame(df)

    println(s"h2oContext = ${h2oContext.toString()}")

然后我编译 fat jar 以发送到集群,但是 h2oContext 永远不会被创建并且 SparkContext 被关闭exit code 255。应用程序在创建 h2o 上下文之前退出且没有错误代码 - 唯一可能有用的消息是IP address not found on this machine.

我已经尝试使用 Sparkling Water 版本 2.2.0 并遇到相同的问题,还尝试为sparkling-water-mland添加依赖sparkling-water-repl项,以及添加所有 h2o 核心依赖项(尽管假设这些不需要,因为它们已集成到苏打水中?) . 请参阅下面的日志文件。

objc[39611]: Class JavaLaunchHelper is implemented in both /Library/Java/JavaVirtualMachines/jdk1.8.0_131.jdk/Contents/Home/bin/java (0x10ab4b4c0) and /Library/Java/JavaVirtualMachines/jdk1.8.0_131.jdk/Contents/Home/jre/lib/libinstrument.dylib (0x10bb724e0). One of the two will be used. Which one is undefined.
Usinrg Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/Users/username/.m2/repository/org/slf4j/slf4j-log4j12/1.7.16/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/Users/username/.m2/repository/org/apache/logging/log4j/log4j-slf4j-impl/2.6.2/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
17/11/17 10:16:01 INFO SparkContext: Running Spark version 2.2.0
17/11/17 10:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/11/17 10:16:02 INFO SparkContext: Submitted application: sparklingWaterH2o
17/11/17 10:16:02 INFO SecurityManager: Changing view acls to: username
17/11/17 10:16:02 INFO SecurityManager: Changing modify acls to: username
17/11/17 10:16:02 INFO SecurityManager: Changing view acls groups to: 
17/11/17 10:16:02 INFO SecurityManager: Changing modify acls groups to: 
17/11/17 10:16:02 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(username); groups with view permissions: Set(); users  with modify permissions: Set(username); groups with modify permissions: Set()
17/11/17 10:16:03 INFO Utils: Successfully started service 'sparkDriver' on port 53775.
17/11/17 10:16:03 INFO SparkEnv: Registering MapOutputTracker
17/11/17 10:16:03 INFO SparkEnv: Registering BlockManagerMaster
17/11/17 10:16:03 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
17/11/17 10:16:03 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
17/11/17 10:16:03 INFO DiskBlockManager: Created local directory at /private/var/folders/gl/vgw262w9227cwqvzk595rbvjygdzh8/T/blockmgr-d29de5c5-9116-4abf-812c-04ca680781fe
17/11/17 10:16:03 INFO MemoryStore: MemoryStore started with capacity 1002.3 MB
17/11/17 10:16:03 INFO SparkEnv: Registering OutputCommitCoordinator
17/11/17 10:16:03 INFO Utils: Successfully started service 'SparkUI' on port 4040.
17/11/17 10:16:03 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.103.46:4040
17/11/17 10:16:03 INFO SparkContext: Added JAR /path/to/app/target/app-1.0-SNAPSHOT-jar-with-dependencies.jar at spark://192.168.103.46:53775/jars/app-1.0-SNAPSHOT-jar-with-dependencies.jar with timestamp 1510913763424
17/11/17 10:16:03 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://rnd-centos7-ben-31.nominet.org.uk:7077...
17/11/17 10:16:03 INFO TransportClientFactory: Successfully created connection to rnd-centos7-ben-31.nominet.org.uk/XXX.XXX.211.31:7077 after 26 ms (0 ms spent in bootstraps)
17/11/17 10:16:03 INFO StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20171117101603-0031
17/11/17 10:16:03 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20171117101603-0031/0 on worker-20171013100055-XXX.XXX.211.30-33565 (XXX.XXX.211.30:33565) with 5 cores
17/11/17 10:16:03 INFO StandaloneSchedulerBackend: Granted executor ID app-20171117101603-0031/0 on hostPort XXX.XXX.211.30:33565 with 5 cores, 32.0 GB RAM
17/11/17 10:16:03 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20171117101603-0031/1 on worker-20171013100055-XXX.XXX.211.33-34424 (XXX.XXX.211.33:34424) with 5 cores
17/11/17 10:16:03 INFO StandaloneSchedulerBackend: Granted executor ID app-20171117101603-0031/1 on hostPort XXX.XXX.211.33:34424 with 5 cores, 32.0 GB RAM
17/11/17 10:16:03 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20171117101603-0031/2 on worker-20171013100055-XXX.XXX.211.31-37513 (XXX.XXX.211.31:37513) with 5 cores
17/11/17 10:16:03 INFO StandaloneSchedulerBackend: Granted executor ID app-20171117101603-0031/2 on hostPort XXX.XXX.211.31:37513 with 5 cores, 32.0 GB RAM
17/11/17 10:16:03 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20171117101603-0031/3 on worker-20171013100054-XXX.XXX.211.32-36797 (XXX.XXX.211.32:36797) with 5 cores
17/11/17 10:16:03 INFO StandaloneSchedulerBackend: Granted executor ID app-20171117101603-0031/3 on hostPort XXX.XXX.211.32:36797 with 5 cores, 32.0 GB RAM
17/11/17 10:16:03 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20171117101603-0031/2 is now RUNNING
17/11/17 10:16:03 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20171117101603-0031/1 is now RUNNING
17/11/17 10:16:03 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20171117101603-0031/3 is now RUNNING
17/11/17 10:16:03 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20171117101603-0031/0 is now RUNNING
17/11/17 10:16:03 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 53777.
17/11/17 10:16:03 INFO NettyBlockTransferService: Server created on 192.168.103.46:53777
17/11/17 10:16:03 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
17/11/17 10:16:03 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.103.46, 53777, None)
17/11/17 10:16:03 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.103.46:53777 with 1002.3 MB RAM, BlockManagerId(driver, 192.168.103.46, 53777, None)
17/11/17 10:16:03 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.103.46, 53777, None)
17/11/17 10:16:03 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.103.46, 53777, None)
17/11/17 10:16:05 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (XXX.XXX.211.31:46906) with ID 2
17/11/17 10:16:05 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (XXX.XXX.211.30:54738) with ID 0
17/11/17 10:16:05 INFO BlockManagerMasterEndpoint: Registering block manager XXX.XXX.211.31:45376 with 8.4 GB RAM, BlockManagerId(2, XXX.XXX.211.31, 45376, None)
17/11/17 10:16:05 INFO BlockManagerMasterEndpoint: Registering block manager XXX.XXX.211.30:34172 with 8.4 GB RAM, BlockManagerId(0, XXX.XXX.211.30, 34172, None)
17/11/17 10:16:05 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (XXX.XXX.211.32:53076) with ID 3
17/11/17 10:16:05 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (XXX.XXX.211.33:47478) with ID 1
17/11/17 10:16:05 INFO BlockManagerMasterEndpoint: Registering block manager XXX.XXX.211.32:34360 with 8.4 GB RAM, BlockManagerId(3, XXX.XXX.211.32, 34360, None)
17/11/17 10:16:05 INFO BlockManagerMasterEndpoint: Registering block manager XXX.XXX.211.33:34342 with 8.4 GB RAM, BlockManagerId(1, XXX.XXX.211.33, 34342, None)
17/11/17 10:16:33 INFO StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000(ms)
17/11/17 10:16:33 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/path/to/app/spark-warehouse/').
17/11/17 10:16:33 INFO SharedState: Warehouse path is 'file:/path/to/app/spark-warehouse/'.
17/11/17 10:16:34 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
17/11/17 10:16:34 WARN InternalH2OBackend: Increasing 'spark.locality.wait' to value 30000
17/11/17 10:16:34 WARN InternalH2OBackend: Due to non-deterministic behavior of Spark broadcast-based joins
We recommend to disable them by
configuring `spark.sql.autoBroadcastJoinThreshold` variable to value `-1`:
sqlContext.sql("SET spark.sql.autoBroadcastJoinThreshold=-1")
17/11/17 10:16:34 INFO InternalH2OBackend: Starting H2O services: Sparkling Water configuration:
  backend cluster mode : internal
  workers              : None
  cloudName            : sparkling-water-username_app-20171117101603-0031
  flatfile             : true
  clientBasePort       : 54321
  nodeBasePort         : 54321
  cloudTimeout         : 60000
  h2oNodeLog           : INFO
  h2oClientLog         : WARN
  nthreads             : 40
  drddMulFactor        : 10
17/11/17 10:16:34 INFO SparkContext: Starting job: collect at SpreadRDDBuilder.scala:105
17/11/17 10:16:34 INFO DAGScheduler: Got job 0 (collect at SpreadRDDBuilder.scala:105) with 41 output partitions
17/11/17 10:16:34 INFO DAGScheduler: Final stage: ResultStage 0 (collect at SpreadRDDBuilder.scala:105)
17/11/17 10:16:34 INFO DAGScheduler: Parents of final stage: List()
17/11/17 10:16:34 INFO DAGScheduler: Missing parents: List()
17/11/17 10:16:34 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at mapPartitionsWithIndex at SpreadRDDBuilder.scala:102), which has no missing parents
17/11/17 10:16:34 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 2.1 KB, free 1002.3 MB)
17/11/17 10:16:34 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1379.0 B, free 1002.3 MB)
17/11/17 10:16:34 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.103.46:53777 (size: 1379.0 B, free: 1002.3 MB)
17/11/17 10:16:34 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1006
17/11/17 10:16:34 INFO DAGScheduler: Submitting 41 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at mapPartitionsWithIndex at SpreadRDDBuilder.scala:102) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14))
17/11/17 10:16:34 INFO TaskSchedulerImpl: Adding task set 0.0 with 41 tasks
17/11/17 10:16:34 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, XXX.XXX.211.31, executor 2, partition 0, PROCESS_LOCAL, 4829 bytes)
17/11/17 10:16:34 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, XXX.XXX.211.30, executor 0, partition 1, PROCESS_LOCAL, 4829 bytes)
...
17/11/17 10:16:34 INFO TaskSetManager: Starting task 19.0 in stage 0.0 (TID 19, XXX.XXX.211.33, executor 1, partition 19, PROCESS_LOCAL, 4829 bytes)
17/11/17 10:16:43 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on XXX.XXX.211.30:34172 (size: 1379.0 B, free: 8.4 GB)
17/11/17 10:16:43 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on XXX.XXX.211.32:34360 (size: 1379.0 B, free: 8.4 GB)
17/11/17 10:16:43 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on XXX.XXX.211.33:34342 (size: 1379.0 B, free: 8.4 GB)
17/11/17 10:16:43 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on XXX.XXX.211.31:45376 (size: 1379.0 B, free: 8.4 GB)
17/11/17 10:16:43 INFO BlockManagerInfo: Added rdd_0_13 in memory on XXX.XXX.211.30:34172 (size: 32.0 B, free: 8.4 GB)
...
17/11/17 10:16:43 INFO TaskSetManager: Finished task 40.0 in stage 0.0 (TID 40) in 29 ms on XXX.XXX.211.33 (executor 1) (41/41)
17/11/17 10:16:43 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
17/11/17 10:16:43 INFO DAGScheduler: ResultStage 0 (collect at SpreadRDDBuilder.scala:105) finished in 8.913 s
17/11/17 10:16:43 INFO DAGScheduler: Job 0 finished: collect at SpreadRDDBuilder.scala:105, took 9.072610 s
17/11/17 10:16:43 INFO ParallelCollectionRDD: Removing RDD 0 from persistence list
17/11/17 10:16:43 INFO BlockManager: Removing RDD 0
17/11/17 10:16:43 INFO SpreadRDDBuilder: Detected 4 spark executors for 4 H2O workers!
17/11/17 10:16:43 INFO InternalH2OBackend: Launching H2O on following 4 nodes: (0,XXX.XXX.211.30,-1),(1,XXX.XXX.211.33,-1),(2,XXX.XXX.211.31,-1),(3,XXX.XXX.211.32,-1)
17/11/17 10:16:43 INFO SparkContext: Starting job: collect at InternalBackendUtils.scala:163
17/11/17 10:16:43 INFO DAGScheduler: Got job 1 (collect at InternalBackendUtils.scala:163) with 4 output partitions
17/11/17 10:16:43 INFO DAGScheduler: Final stage: ResultStage 1 (collect at InternalBackendUtils.scala:163)
17/11/17 10:16:43 INFO DAGScheduler: Parents of final stage: List()
17/11/17 10:16:43 INFO DAGScheduler: Missing parents: List()
17/11/17 10:16:43 INFO DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[3] at map at InternalBackendUtils.scala:100), which has no missing parents
17/11/17 10:16:43 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.1 KB, free 1002.3 MB)
17/11/17 10:16:43 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2029.0 B, free 1002.3 MB)
17/11/17 10:16:43 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.103.46:53777 (size: 2029.0 B, free: 1002.3 MB)
17/11/17 10:16:43 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006
17/11/17 10:16:43 INFO DAGScheduler: Submitting 4 missing tasks from ResultStage 1 (MapPartitionsRDD[3] at map at InternalBackendUtils.scala:100) (first 15 tasks are for partitions Vector(0, 1, 2, 3))
17/11/17 10:16:43 INFO TaskSchedulerImpl: Adding task set 1.0 with 4 tasks
17/11/17 10:16:43 INFO TaskSetManager: Starting task 2.0 in stage 1.0 (TID 41, XXX.XXX.211.31, executor 2, partition 2, NODE_LOCAL, 4821 bytes)
17/11/17 10:16:43 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 42, XXX.XXX.211.30, executor 0, partition 0, NODE_LOCAL, 4821 bytes)
17/11/17 10:16:43 INFO TaskSetManager: Starting task 3.0 in stage 1.0 (TID 43, XXX.XXX.211.32, executor 3, partition 3, NODE_LOCAL, 4821 bytes)
17/11/17 10:16:43 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 44, XXX.XXX.211.33, executor 1, partition 1, NODE_LOCAL, 4821 bytes)
17/11/17 10:16:43 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on XXX.XXX.211.30:34172 (size: 2029.0 B, free: 8.4 GB)
17/11/17 10:16:43 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on XXX.XXX.211.31:45376 (size: 2029.0 B, free: 8.4 GB)
17/11/17 10:16:43 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on XXX.XXX.211.33:34342 (size: 2029.0 B, free: 8.4 GB)
17/11/17 10:16:43 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on XXX.XXX.211.32:34360 (size: 2029.0 B, free: 8.4 GB)
17/11/17 10:16:43 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 42) in 349 ms on XXX.XXX.211.30 (executor 0) (1/4)
17/11/17 10:16:43 INFO TaskSetManager: Finished task 2.0 in stage 1.0 (TID 41) in 358 ms on XXX.XXX.211.31 (executor 2) (2/4)
17/11/17 10:16:43 INFO TaskSetManager: Finished task 3.0 in stage 1.0 (TID 43) in 394 ms on XXX.XXX.211.32 (executor 3) (3/4)
17/11/17 10:16:43 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 44) in 408 ms on XXX.XXX.211.33 (executor 1) (4/4)
17/11/17 10:16:43 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
17/11/17 10:16:43 INFO DAGScheduler: ResultStage 1 (collect at InternalBackendUtils.scala:163) finished in 0.411 s
17/11/17 10:16:43 INFO DAGScheduler: Job 1 finished: collect at InternalBackendUtils.scala:163, took 0.428038 s
17/11/17 10:16:43 INFO SparkContext: Starting job: foreach at InternalBackendUtils.scala:175
17/11/17 10:16:43 INFO DAGScheduler: Got job 2 (foreach at InternalBackendUtils.scala:175) with 4 output partitions
17/11/17 10:16:43 INFO DAGScheduler: Final stage: ResultStage 2 (foreach at InternalBackendUtils.scala:175)
17/11/17 10:16:43 INFO DAGScheduler: Parents of final stage: List()
17/11/17 10:16:43 INFO DAGScheduler: Missing parents: List()
17/11/17 10:16:43 INFO DAGScheduler: Submitting ResultStage 2 (InvokeOnNodesRDD[2] at RDD at InvokeOnNodesRDD.scala:27), which has no missing parents
17/11/17 10:16:43 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 1832.0 B, free 1002.3 MB)
17/11/17 10:16:43 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 1209.0 B, free 1002.3 MB)
17/11/17 10:16:43 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 192.168.103.46:53777 (size: 1209.0 B, free: 1002.3 MB)
17/11/17 10:16:43 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1006
17/11/17 10:16:43 INFO DAGScheduler: Submitting 4 missing tasks from ResultStage 2 (InvokeOnNodesRDD[2] at RDD at InvokeOnNodesRDD.scala:27) (first 15 tasks are for partitions Vector(0, 1, 2, 3))
17/11/17 10:16:43 INFO TaskSchedulerImpl: Adding task set 2.0 with 4 tasks
17/11/17 10:16:43 INFO TaskSetManager: Starting task 2.0 in stage 2.0 (TID 45, XXX.XXX.211.31, executor 2, partition 2, NODE_LOCAL, 4821 bytes)
17/11/17 10:16:43 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID 46, XXX.XXX.211.30, executor 0, partition 0, NODE_LOCAL, 4821 bytes)
17/11/17 10:16:43 INFO TaskSetManager: Starting task 3.0 in stage 2.0 (TID 47, XXX.XXX.211.32, executor 3, partition 3, NODE_LOCAL, 4821 bytes)
17/11/17 10:16:43 INFO TaskSetManager: Starting task 1.0 in stage 2.0 (TID 48, XXX.XXX.211.33, executor 1, partition 1, NODE_LOCAL, 4821 bytes)
17/11/17 10:16:43 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on XXX.XXX.211.31:45376 (size: 1209.0 B, free: 8.4 GB)
17/11/17 10:16:43 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on XXX.XXX.211.33:34342 (size: 1209.0 B, free: 8.4 GB)
17/11/17 10:16:43 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on XXX.XXX.211.32:34360 (size: 1209.0 B, free: 8.4 GB)
17/11/17 10:16:43 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on XXX.XXX.211.30:34172 (size: 1209.0 B, free: 8.4 GB)
17/11/17 10:16:43 INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID 46) in 28 ms on XXX.XXX.211.30 (executor 0) (1/4)
17/11/17 10:16:43 INFO TaskSetManager: Finished task 1.0 in stage 2.0 (TID 48) in 28 ms on XXX.XXX.211.33 (executor 1) (2/4)
17/11/17 10:16:43 INFO TaskSetManager: Finished task 2.0 in stage 2.0 (TID 45) in 30 ms on XXX.XXX.211.31 (executor 2) (3/4)
17/11/17 10:16:43 INFO TaskSetManager: Finished task 3.0 in stage 2.0 (TID 47) in 32 ms on XXX.XXX.211.32 (executor 3) (4/4)
17/11/17 10:16:43 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool 
17/11/17 10:16:43 INFO DAGScheduler: ResultStage 2 (foreach at InternalBackendUtils.scala:175) finished in 0.034 s
17/11/17 10:16:43 INFO DAGScheduler: Job 2 finished: foreach at InternalBackendUtils.scala:175, took 0.043737 s
17/11/17 10:16:43 INFO InternalH2OBackend: Starting H2O client on the Spark Driver (192.168.103.46): -name sparkling-water-username_app-20171117101603-0031 -nthreads 40 -ga_opt_out -quiet -log_level WARN -log_dir /path/to/app/h2ologs/app-20171117101603-0031 -baseport 54321 -client -ip 192.168.103.46 -flatfile /var/folders/gl/vgw262w9227cwqvzk595rbvjygdzh8/T/1510913803950-0/flatfile.txt
17/11/17 10:16:44 INFO NativeLibrary: Loaded XGBoost library from lib/osx_64/libxgboost4j.dylib (/var/folders/gl/vgw262w9227cwqvzk595rbvjygdzh8/T/libxgboost4j2584224510491657515.dylib)
Found XGBoost backend with library: xgboost4j
Your system supports only minimal version of XGBoost (no GPUs, no multithreading)!
IP address not found on this machine
17/11/17 10:16:45 INFO SparkContext: Invoking stop() from shutdown hook
17/11/17 10:16:45 INFO SparkUI: Stopped Spark web UI at http://192.168.103.46:4040
17/11/17 10:16:45 INFO StandaloneSchedulerBackend: Shutting down all executors
17/11/17 10:16:45 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asking each executor to shut down
17/11/17 10:16:45 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/11/17 10:16:45 INFO MemoryStore: MemoryStore cleared
17/11/17 10:16:45 INFO BlockManager: BlockManager stopped
17/11/17 10:16:45 INFO BlockManagerMaster: BlockManagerMaster stopped
17/11/17 10:16:45 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17/11/17 10:16:45 INFO SparkContext: Successfully stopped SparkContext
17/11/17 10:16:45 INFO ShutdownHookManager: Shutdown hook called
17/11/17 10:16:45 INFO ShutdownHookManager: Deleting directory /private/var/folders/gl/vgw262w9227cwqvzk595rbvjygdzh8/T/spark-51594e29-1ea0-4a4d-9aa0-dd65ef5146dd
4

1 回答 1

1

您的异常是从这行代码引发的: https ://github.com/h2oai/h2o-3/blob/master/h2o-core/src/main/java/water/init/HostnameGuesser.java#L227

由于这种情况:

 if (!allowedIps.contains(addr)) {
          throw new HostnameGuessingException("IP address not found on this machine");
        }

addr是驱动IP:

17/11/17 10:16:43 INFO InternalH2OBackend:在 Spark 驱动程序 (192.168.103.46) 上启动 H2O 客户端:-name sparkling-water-username_app-20171117101603-0031 -nthreads 40 -ga_opt_out -quiet -log_level WARN -log_dir / path/to/app/h2ologs/app-20171117101603-0031 -baseport 54321 -client -ip 192.168.103.46 -flatfile /var/folders/gl/vgw262w9227cwqvzk595rbvjygdzh8/T/1510913803950-0/flatfile.txt

allowedIps使用以下函数计算calcPrioritizedInetAddressListhttps ://github.com/h2oai/h2o-3/blob/master/h2o-core/src/main/java/water/init/HostnameGuesser.java#L161

由于某种原因,addr不在allowedIps.
从这里很难知道为什么,所以我建议你calcPrioritizedInetAddressList自己运行 run 函数,并尝试了解原因(它是私有的,但你可以复制代码)

于 2017-11-17T14:21:22.713 回答