0

查看来自https://spark.apache.org/docs/latest/cluster-overview.html的图片。

在此处输入图像描述

运行在 kubernetes 之外的 spark 集群。但我要在 kubernetes中运行驱动程序。问题是如何让spark集群知道驱动程序是否存在。

我的 Kubernetes yaml 文件:

kind: List
apiVersion: v1
items:
- kind: Deployment
  apiVersion: extensions/v1beta1
  metadata:
    name: counter-uat
  spec:
    replicas: 1
    selector:
      matchLabels:
        name: spark-driver
    template:
      metadata:
        labels:
          name: spark-driver
      spec:
        containers:
          - name: counter-uat
            image: counter:0.1.0
            command: ["/opt/spark/bin/spark-submit", "--class", "Counter", "--master", "spark://spark.uat:7077", "/usr/src/counter.jar"]
- kind: Service
  apiVersion: v1
  metadata:
    name: spark-driver
    labels:
      name: spark-driver
  spec:
    type: NodePort
    ports:
    - name: port
      port: 4040
      targetPort: 4040
    selector:
      name: spark-driver

错误是:

Caused by: java.io.IOException: Failed to connect to /172.17.0.8:44117
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:228)
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:179)
    at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:197)
    at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:191)
    at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: io.netty.channel.AbstractChannel$AnnotatedNoRouteToHostException: Host is unreachable: /172.17.0.8:44117

spark集群试图访问ip为172.17.0.8的驱动程序。172.17.0.8 可能是 Kubernetes 内部的一个内部 ip。

如何解决问题?如何修复我的 yaml 文件?谢谢

更新

我添加了以下两个参数:“--conf”、“spark.driver.bindAddress=192.168.42.8”、“--conf”、“spark.driver.host=0.0.0.0”。

但是从日志来看,还是试图到达172.17.0.8,也就是kubernetes内部的pod ip。

更新

kind: List
apiVersion: v1
items:
- kind: Deployment
  apiVersion: extensions/v1beta1
  metadata:
    name: counter-uat
  spec:
    replicas: 1
    selector:
      matchLabels:
        name: counter-driver
    template:
      metadata:
        labels:
          name: counter-driver
      spec:
        containers:
          - name: counter-uat
            image: counter:0.1.0
            command: ["/opt/spark/bin/spark-submit", "--class", "Counter", "--master", "spark://spark.uat:7077", "--conf", "spark.driver.bindAddress=192.168.42.8","/usr/src/counter.jar"]

kind: Service
apiVersion: v1
metadata:
  name: counter-driver
  labels:
    name: counter-driver
spec:
  type: NodePort
  ports:
  - name: driverport
    port: 42761
    targetPort: 42761
    nodePort: 30002
  selector:
    name: counter-driver

另一个错误:

2017-06-23T20:00:07.487656154Z Exception in thread "main" java.net.BindException: Cannot assign requested address: Service 'sparkDriver' failed after 16 retries (starting from 31319)! Consider explicitly setting the appropriate port for the service 'sparkDriver' (for example spark.ui.port for SparkUI) to an available port or increasing spark.port.maxRetries.
4

1 回答 1

1

尝试在 Spark 本身中设置spark.driver.hostspark.driver.bindAddress"spark.uat""spark-driver.uat"或实际的驱动程序主机。这是此类分布式项目的常见问题,其中主节点告诉客户端连接的位置。如果您不指定spark.driver.host,它会尝试自行找出正确的主机并仅使用它看到的 IP。但在这种情况下,它看到的 IP 是内部 Kubernetes IP,可能无法为客户端正常工作。

您也可以尝试设置SPARK_PUBLIC_DNS环境变量。它实际上有一个更有希望的描述:

您的 Spark 程序将向其他机器通告的主机名。

于 2017-06-23T16:09:24.643 回答