google-cloud-platform - GCP 工作流程超出内存限制

Question

我正在尝试使用 Google Cloud Workflows 对 BigQuery 数据集执行 SQL 查询。我的管道将执行几个连续的查询，主要形式为：

create or replace table project_x.dataset_y.table_z as 
     select * from project_x.dataset_y.view_z

其中查询 n 的视图正在读取查询 n-1 的结果表。为了解决这个依赖问题，我使用了这个问题末尾的代码。问题是我的工作流程即使有一个查询也会返回错误：

{"message":"ResourceLimitError: Memory usage limit exceeded","tags":["ResourceLimitError"]}

在控制台上，查询不到 1 分钟即可完成。任何想法我的工作流程如何使用比它应该更多的内存？什么是最好的修复它？

工作流代码：

main:
  steps:
    - initialize:
        assign:
          - project: "project_x"
          - dataset: "dataset_y"
    - query_n:
        call: BQJobsQuery
        args:
          project: ${project}
          sqlQuery: ${"Create or Replace table `dataset_y.table_z`
                          as select * from `dataset_y.view_z`;"}
        result: bq_response
    - get_job_status:
        call: getJobFinalStatus
        args:
          project: ${project}
          job_id: ${bq_response.jobReference.jobId}
        result: job_status_response

    - returned_result:
        return: ${job_status_response}

BQJobsQuery:
  params: [project, sqlQuery]
  steps:
    - runQuery:
        try:
          call: http.post
          args:
            url: ${"https://bigquery.googleapis.com/bigquery/v2/projects/"+project+"/queries"}
            auth:
              type: OAuth2
            body:
              useLegacySql: false
              query: ${sqlQuery}
          result: queryResult
        except:
          as: e
          steps:
            - UnhandledException:
                raise: ${e}
    - queryCompleted:
        return: ${queryResult.body}

getJobFinalStatus:
    params: [project, job_id]
    steps:
      - sleep:
          call: sys.sleep
          args:
            seconds: 5
      - getJobCurrentStatus:
          call: http.get
          args:
            url: ${"https://bigquery.googleapis.com/bigquery/v2/projects/"+project+"/jobs/"+job_id}
            auth:
              type: OAuth2
          result: jobStatusRes
      - isJobFinished:
          switch:
            - condition: ${jobStatusRes.body.status.state == "RUNNING"}
              next: sleep
      - jobFinished:
          return: ${jobStatusRes.body}

score 1 · Accepted Answer

变量的总大小限制为 64Kb 。我不知道 BigQuery 查询响应的长度，但我想你打破了这个限制。

已经有一个公开的问题来增加 GCP 工作流上变量的内存限制。您可以在此问题跟踪器上关注其进度。它应该很快就会发布！敬请关注！

score 1 · Accepted Answer

请注意，每次将值分配给“结果”时，它都会添加到变量的总大小中，因此如果您以后不需要结果，可以省略它。此外，您可以将 null 分配给不再需要的变量，这样可以为其他变量释放空间。

google-cloud-platform - GCP 工作流程 超出内存限制

2 回答 2

Related

Reference

google-cloud-platform - GCP 工作流程超出内存限制