我试图使用Bq load 加载BigQuery 外部表通过 Bq 命令行命令。执行 Bq 加载命令 - bq load --source_format=NEWLINE_DELIMITED_JSON {provided dataset_name}.{provided bq external_table_name} gs://{provided bucket_name} /{provided folder_name}/{provided folder_name}/{provided folder_name}/20220107/* 错误我得到的是:错误处理作业 '*:bqjob_r6bde3e8976b407bd_0000017e4295db78_1': bq_project_name:bq_dataset_name.bq_external_table_name is not allowed for this operation,因为它目前是外部的。任何人都遇到过这个错误,我没有找到我需要传递的任何参数来告诉 Bq 这是 Google 的 Bq 负载文档中的外部表。对此有任何见解真的会有所帮助吗?我尝试使用带有 external_table=True 的 GoogleCloudStorageToBigQueryOperator 加载外部表,但这也会产生一个错误,提示“' BigQuery 作业失败。错误是:{}'.format(err.content
Exception: BigQuery job failed. Error was: b'{\n "error": {\n "code": 409,\n "message": "Already Exists: Table project_name:dataset_name.Bq_Externaltable_name",\n "errors": [\n {\n "message": "Already Exists: Table project_name:dataset_name.Bq_Externaltable_name",\n "domain": "global",\n "reason": "duplicate"\n }\n ],\n "status": "ALREADY_EXISTS"\n }\n}\n
[2022-01-09 17:10:20,995] {base_task_runner.py:113} INFO - Job 230862: Subtask {subtask_name} [2022-01-09 17:10:20,993] {taskinstance.py:1147} ERROR - BigQuery job failed. Error was: b'{\n "error": {\n "code": 409,\n "message": "Already Exists: Table project_name:dataset_name.Bq_Externaltable_name",\n "errors": [\n {\n "message": "Already Exists: Table project_name:dataset_name.Bq_Externaltable_name",\n "domain": "global",\n "reason": "duplicate"\n }\n ],\n "status": "ALREADY_EXISTS"\n }\n}\n'
"
this error also threw me off because I created the external table using terraform using below code block
resource google_bigquery_table external_table_name {
project = local.project
dataset_id = google_bigquery_dataset.{provided_dataset_name}.dataset_id
table_id = local.{variable defined for Bq external table}
schema = file("${path.module}/../../../schema/{folder which holds schema json}/schemajsonforexternaltable.json")
depends_on = [google_bigquery_dataset.{provided_dataset_name}]
deletion_protection = false
external_data_configuration {
autodetect = false
source_format = "NEWLINE_DELIMITED_JSON"
source_uris = [
"gs://{bucket_name}-${var.environment}/{folder_name}/{folder_name}/{folder_name}/*"
]
}
}
so why am I doing all this and whats my end goal is I want to retrieve the file name like mentioned in the query below which Google provides an option to the external table as a pseudo column (_FILE_NAME)
SELECT
p_num,
_FILE_NAME AS file_loc /* use this column to know the file name used to build the row in the Bq External table*/
FROM
`gcp_project_name.{dataset_name}.{Bq_External_Table_name}`;
If there is any any alternative other than using Bq external table to get the file name being used to build the row thats also fine I can switch to that approach as well.
@MikeKarp - 我上面的帖子有两个问题,一个是使用失败的 Bq load 命令加载 Bq 外部表,从这个尝试我的问题是是否可以使用 Bq load 加载 Bq 外部表?第二个是我试图加载通过 terraform 创建的外部表(提供外部表所需的源 uri 路径)使用带有 external_table=True 的 GoogleCloudStorageToBigQueryOperator 失败,“代码”:409,\n“消息”:“表已经存在。来自第二个不确定为什么当外部表已经通过 Terraform 在我的 GCP 项目中创建时,GoogleCloudStorageToBigQueryOperator 试图再次创建表