0

我正在使用与以下代码类似的代码将 zip 文件从 bquery 提取到 GCS。有时我需要提取大约 90 个文件。我想提取一个压缩文件夹而不是一个一个发送文件。
注意:我正在使用 Jupyter。
谢谢你的帮助。

from google.cloud import bigquery
client = bigquery.Client()

project_id = 'fh-bigquery'
dataset_id = 'public_dump'
table_id = 'afinn_en_165'


bucket_name = 'your_bucket'

destination_uri = 'gs://{}/{}'.format(bucket_name, 'file.csv.gz')

dataset_ref = client.dataset(dataset_id, project=project_id)
table_ref = dataset_ref.table(table_id)

job_config = bigquery.job.ExtractJobConfig()
job_config.compression = 'GZIP'
extract_job = client.extract_table(
    table_ref,
    destination_uri,
    job_config = job_config
) 
extract_job.result()`
4

1 回答 1

0

我相信不可能使用单个 API 请求来提取整个数据集。要将相应的表导出到 Google Cloud Storage 存储桶中,我将使用以下代码迭代一次存储每个表的 tableID:

from google.cloud import bigquery
from google.oauth2 import service_account

key_path = "SERVICE_ACCOUNT_PATH"
credentials = service_account.Credentials.from_service_account_file(\
    key_path,
    scopes=["https://www.googleapis.com/auth/cloud- platform"],)

client = bigquery.Client()

project_id = 'PROJECT_ID'
dataset_id = 'DATASET_ID'
bucket_name = 'BUCKET_NAME'

dataset_ref = client.dataset(dataset_id, project=project_id)

for t in client.list_tables(dataset_ref):

    print("Extracting table {}".format(t.table_id))

    zip_file = '{}.csv.zip'.format(t.table_id)
    destination_uri = 'gs://{}/{}'.format(bucket_name, zip_file)

    table_ref = dataset_ref.table(t.table_id)

    job_config = bigquery.job.ExtractJobConfig()
    job_config.compression = 'GZIP'
    extract_job = client.extract_table(
        table_ref,
        destination_uri,
        job_config = job_config
    )
    extract_job.result()
于 2019-11-13T16:00:38.397 回答