2

我正处于使用 Spark 操作员探索 Argo 的早期阶段,以便在我的 EC2 实例上的 minikube 设置上运行 Spark 示例。

以下是资源详细信息,不知道为什么看不到 spark 应用程序日志。

工作流.YAML

kind: Workflow
metadata:
  name: spark-argo-groupby
spec:
  entrypoint: sparkling-operator
  templates:
  - name: spark-groupby
    resource:
      action: create
      manifest: |
        apiVersion: "sparkoperator.k8s.io/v1beta2"
        kind: SparkApplication
        metadata:
          generateName: spark-argo-groupby
        spec:
          type: Scala
          mode: cluster
          image: gcr.io/spark-operator/spark:v3.0.3
          imagePullPolicy: Always
          mainClass: org.apache.spark.examples.GroupByTest
          mainApplicationFile:  local:///opt/spark/spark-examples_2.12-3.1.1-hadoop-2.7.jar
          sparkVersion: "3.0.3"
          driver:
            cores: 1
            coreLimit: "1200m"
            memory: "512m"
            labels:
              version: 3.0.0
          executor:
            cores: 1
            instances: 1
            memory: "512m"
            labels:
              version: 3.0.0
  - name: sparkling-operator
    dag:
      tasks:
      - name: SparkGroupBY
        template: spark-groupby

角色

# Role for spark-on-k8s-operator to create resources on cluster
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: spark-cluster-cr
  labels:
    rbac.authorization.kubeflow.org/aggregate-to-kubeflow-edit: "true"
rules:
  - apiGroups:
      - sparkoperator.k8s.io
    resources:
      - sparkapplications
    verbs:
      - '*'
---
# Allow airflow-worker service account access for spark-on-k8s
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: argo-spark-crb
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: spark-cluster-cr
subjects:
  - kind: ServiceAccount
    name: default
    namespace: argo

阿尔戈用户界面

工作流状态

工作流日志

为了深入挖掘,我尝试了https://dev.to/crenshaw_dev/how-to-debug-an-argo-workflow-31ng上列出的所有步骤,但无法获取应用程序日志。

基本上,当我运行这些示例时,我期望打印火花应用程序日志 - 在这种情况下,输出以下 Scala 示例

https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/GroupByTest.scala

有趣的是,当我列出 PODS 时,我期待看到驱动程序 pod 和执行程序 pod,但总是只看到一个 POD,并且它有上面的日志,如附图所示。请帮助我了解为什么不生成日志以及如何获取它?

RAW LOGS
$ kubectl logs spark-pi-dag-739246604 -n argo

time="2021-12-10T13:28:09.560Z" level=info msg="Starting Workflow Executor" version="{v3.0.3 2021-05-11T21:14:20Z 02071057c082cf295ab8da68f1b2027ff8762b5a v3.0.3 clean go1.15.7 gc linux/amd64}"
time="2021-12-10T13:28:09.581Z" level=info msg="Creating a docker executor"
time="2021-12-10T13:28:09.581Z" level=info msg="Executor (version: v3.0.3, build_date: 2021-05-11T21:14:20Z) initialized (pod: argo/spark-pi-dag-739246604) with template:\n{\"name\":\"sparkpi\",\"inputs\":{},\"outputs\":{},\"metadata\":{},\"resource\":{\"action\":\"create\",\"manifest\":\"apiVersion: \\\"sparkoperator.k8s.io/v1beta2\\\"\\nkind: SparkApplication\\nmetadata:\\n  generateName: spark-pi-dag\\nspec:\\n  type: Scala\\n  mode: cluster\\n  image: gjeevanm/spark:v3.1.1\\n  imagePullPolicy: Always\\n  mainClass: org.apache.spark.examples.SparkPi\\n  mainApplicationFile: local:///opt/spark/spark-examples_2.12-3.1.1-hadoop-2.7.jar\\n  sparkVersion: 3.1.1\\n  driver:\\n    cores: 1\\n    coreLimit: \\\"1200m\\\"\\n    memory: \\\"512m\\\"\\n    labels:\\n      version: 3.0.0\\n  executor:\\n    cores: 1\\n    instances: 1\\n    memory: \\\"512m\\\"\\n    labels:\\n      version: 3.0.0\\n\"},\"archiveLocation\":{\"archiveLogs\":true,\"s3\":{\"endpoint\":\"minio:9000\",\"bucket\":\"my-bucket\",\"insecure\":true,\"accessKeySecret\":{\"name\":\"my-minio-cred\",\"key\":\"accesskey\"},\"secretKeySecret\":{\"name\":\"my-minio-cred\",\"key\":\"secretkey\"},\"key\":\"spark-pi-dag/spark-pi-dag-739246604\"}}}"
time="2021-12-10T13:28:09.581Z" level=info msg="Loading manifest to /tmp/manifest.yaml"
time="2021-12-10T13:28:09.581Z" level=info msg="kubectl create -f /tmp/manifest.yaml -o json"
time="2021-12-10T13:28:10.348Z" level=info msg=argo/SparkApplication.sparkoperator.k8s.io/spark-pi-daghhl6s
time="2021-12-10T13:28:10.348Z" level=info msg="Starting SIGUSR2 signal monitor"
time="2021-12-10T13:28:10.348Z" level=info msg="No output parameters"

4

2 回答 2

3

正如 Michael 在他的回答中提到的,Argo Workflows 不知道其他 CRD(例如SparkApplication您使用的 CRD)是如何工作的,因此无法从该特定 CRD 创建的 pod 中提取日志。

但是,您可以将标签添加workflows.argoproj.io/workflow: {{workflow.name}}到生成的 podSparkApplication以让 Argo Workflows 知道,然后用于argo logs -c <container-name>从这些 pod 中提取日志。

您可以在此处找到一个示例,但 Kubeflow CRD 但在您的情况下,您需要在资源模板中向CRDexecutor和CRD 添加标签: https ://github.com/argoproj/argo-workflows/blob/master/examples /k8s-resource-log-selector.yamldriverSparkApplication

于 2021-12-10T15:33:03.763 回答
2

Argo Workflows 的resource模板(如您的spark-groupby模板)非常简单。工作流控制器正在运行kubectl create,这就是它参与 SparkApplication 的地方。

您从 Argo Workflow 窗格中看到的日志描述了该kubectl create过程。您的资源将写入临时 yaml 文件,然后应用到集群。

time="2021-12-10T13:28:09.581Z" level=info msg="Loading manifest to /tmp/manifest.yaml"
time="2021-12-10T13:28:09.581Z" level=info msg="kubectl create -f /tmp/manifest.yaml -o json"
time="2021-12-10T13:28:10.348Z" level=info msg=argo/SparkApplication.sparkoperator.k8s.io/spark-pi-daghhl6s

老答案:

要查看 SparkApplication 生成的日志,您需要遵循 Spark 文档。我不熟悉,但我猜应用程序会在某个 Pod 中运行。如果您可以找到该 pod,您应该能够使用kubectl logs.

如果 Argo Workflows 可以将 Spark 日志提取到其 UI 中,那将是非常酷的。但是构建一个通用的解决方案可能会非常困难。

更新:

检查袁的答案。有一种方法可以将 Spark 日志拉入 Workflows CLI!

于 2021-12-10T15:04:05.440 回答