1

注意:将密钥烘焙到映像中是您能做的最糟糕的事情,我在这里这样做是为了在调试时在 Docker 和 Kubernetes 之间建立一个二进制相等的文件系统。

我正在尝试启动一个在 GCS 中保持其状态的 flink-jobmanager,所以我在我的中添加了high-availability.storageDir: gs://BUCKET/ha一行,flink-conf.yaml并且我正在构建我的 Dockerfile,如此处所述

这是我的 Dockerfile:

FROM flink:1.5-hadoop28

ADD https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-latest-hadoop2.jar /opt/flink/lib/gcs-connector-latest-hadoop2.jar

RUN mkdir /opt/flink/etc-hadoop
COPY flink-conf.yaml /opt/flink/conf/flink-conf.yaml
COPY key.json /opt/flink/etc-hadoop/key.json
COPY core-site.xml /opt/flink/etc-hadoop/core-site.xml

现在,如果我通过构建这个容器docker build -t flink:dev .并在其中启动一个交互式 shell docker run -ti flink:dev /bin/bash,我可以通过以下方式启动 flink jobmanager:

flink-console.sh jobmanager --configDir=/opt/flink/conf/ --executionMode=cluster

Flink 正在拿起罐子并正常启动。但是,当我使用以下 yaml 在 Kubernetes 上启动它时,基于此处

apiVersion: apps/v1
kind: Deployment
metadata:
  name: flink-jobmanager
spec:
  replicas: 1
  selector:
    matchLabels:
      app: flink
      component: jobmanager
  template:
    metadata:
      labels:
        app: flink
        component: jobmanager
    spec:
      containers:
      - name: jobmanager
        image: flink:dev
        imagePullPolicy: Always
        resources:
          requests:
            memory: "1024Mi"
            cpu: "500m"
          limits:
            memory: "2Gi"
            cpu: "1000m"
        args: ["jobmanager"]
        ports:
        - containerPort: 6123
          name: rpc
        - containerPort: 6124
          name: blob
        - containerPort: 6125
          name: query
        - containerPort: 8081
          name: ui
        - containerPort: 46110
          name: ha
        env:
        - name: GOOGLE_APPLICATION_CREDENTIALS
          value: /opt/flink/etc-hadoop/key.json
        - name: JOB_MANAGER_RPC_ADDRESS
          value: flink-jobmanager

Flink 似乎无法注册文件系统:

2018-10-04 09:20:51,357 DEBUG org.apache.flink.runtime.util.HadoopUtils                     - Cannot find hdfs-default configuration-file path in Flink config.
2018-10-04 09:20:51,358 DEBUG org.apache.flink.runtime.util.HadoopUtils                     - Cannot find hdfs-site configuration-file path in Flink config.
2018-10-04 09:20:51,359 DEBUG org.apache.flink.runtime.util.HadoopUtils                     - Adding /opt/flink/etc-hadoop//core-site.xml to hadoop configuration
2018-10-04 09:20:51,767 DEBUG org.apache.hadoop.security.UserGroupInformation               - PrivilegedActionException as:flink (auth:SIMPLE) cause:java.io.IOException: Could not create FileSystem for highly available storage (high-availability.storageDir)
2018-10-04 09:20:51,767 ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Cluster initialization failed.
java.io.IOException: Could not create FileSystem for highly available storage (high-availability.storageDir)
        at org.apache.flink.runtime.blob.BlobUtils.createFileSystemBlobStore(BlobUtils.java:122)
        at org.apache.flink.runtime.blob.BlobUtils.createBlobStoreFromConfig(BlobUtils.java:95)
        at org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createHighAvailabilityServices(HighAvailabilityServicesUtils.java:115)
        at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.createHaServices(ClusterEntrypoint.java:402)
        at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.initializeServices(ClusterEntrypoint.java:270)
        at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:225)
        at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$0(ClusterEntrypoint.java:189)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
        at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
        at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:188)
        at org.apache.flink.runtime.entrypoint.StandaloneSessionClusterEntrypoint.main(StandaloneSessionClusterEntrypoint.java:91)
Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find a file system implementation for scheme 'gs'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded.
        at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:405)
        at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:320)
        at org.apache.flink.core.fs.Path.getFileSystem(Path.java:298)
        at org.apache.flink.runtime.blob.BlobUtils.createFileSystemBlobStore(BlobUtils.java:119)
        ... 12 more
Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Hadoop File System abstraction does not support scheme 'gs'. Either no file system implementation exists for that scheme, or the relevant classes are missing from the classpath.
        at org.apache.flink.runtime.fs.hdfs.HadoopFsFactory.create(HadoopFsFactory.java:102)
        at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:401)
        ... 15 more
Caused by: java.io.IOException: No FileSystem for scheme: gs
        at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2799)
        at org.apache.flink.runtime.fs.hdfs.HadoopFsFactory.create(HadoopFsFactory.java:99)
        ... 16 more

由于 Kubernetes 应该使用相同的图像,我很困惑这怎么可能。我在这里监督什么吗?

4

1 回答 1

1

问题在于使用dev标签。使用特定的版本标签解决了这个问题。

于 2018-10-04T12:06:08.167 回答