我正在尝试使用 Google Cloud Workflows 对 BigQuery 数据集执行 SQL 查询。我的管道将执行几个连续的查询,主要形式为:
create or replace table project_x.dataset_y.table_z as
select * from project_x.dataset_y.view_z
其中查询 n 的视图正在读取查询 n-1 的结果表。为了解决这个依赖问题,我使用了这个问题末尾的代码。问题是我的工作流程即使有一个查询也会返回错误:
{"message":"ResourceLimitError: Memory usage limit exceeded","tags":["ResourceLimitError"]}
在控制台上,查询不到 1 分钟即可完成。任何想法我的工作流程如何使用比它应该更多的内存?什么是最好的修复它?
工作流代码:
main:
steps:
- initialize:
assign:
- project: "project_x"
- dataset: "dataset_y"
- query_n:
call: BQJobsQuery
args:
project: ${project}
sqlQuery: ${"Create or Replace table `dataset_y.table_z`
as select * from `dataset_y.view_z`;"}
result: bq_response
- get_job_status:
call: getJobFinalStatus
args:
project: ${project}
job_id: ${bq_response.jobReference.jobId}
result: job_status_response
- returned_result:
return: ${job_status_response}
BQJobsQuery:
params: [project, sqlQuery]
steps:
- runQuery:
try:
call: http.post
args:
url: ${"https://bigquery.googleapis.com/bigquery/v2/projects/"+project+"/queries"}
auth:
type: OAuth2
body:
useLegacySql: false
query: ${sqlQuery}
result: queryResult
except:
as: e
steps:
- UnhandledException:
raise: ${e}
- queryCompleted:
return: ${queryResult.body}
getJobFinalStatus:
params: [project, job_id]
steps:
- sleep:
call: sys.sleep
args:
seconds: 5
- getJobCurrentStatus:
call: http.get
args:
url: ${"https://bigquery.googleapis.com/bigquery/v2/projects/"+project+"/jobs/"+job_id}
auth:
type: OAuth2
result: jobStatusRes
- isJobFinished:
switch:
- condition: ${jobStatusRes.body.status.state == "RUNNING"}
next: sleep
- jobFinished:
return: ${jobStatusRes.body}