2

有人能够解释如何在使用 JobConfig 在 google Bigquery 中使用 loadjob 时创建日期分区表。

https://cloud.google.com/bigquery/docs/creating-column-partitions#creating_a_partitioned_table_when_loading_data

我无法理解文档,如果有人可以举例说明,那将非常有帮助。

编辑:所以我想感谢@irvifa,我想出了这个对象,但我仍然无法创建一个 TimePartitioned 表,这是我试图使用的代码。

import pandas
from google.cloud import bigquery


def load_df(self, df):
  project_id="ProjectID"
  dataset_id="Dataset"
  table_id="TableName"
  table_ref=project_id+"."+dataset_id+"."+table_id
  time_partitioning = bigquery.table.TimePartitioning(field="PartitionColumn")
  job_config = bigquery.LoadJobConfig(
                         schema="Schema",
                         destinationTable=table_ref
                         write_disposition="WRITE_TRUNCATE",
                         timePartitioning=time_partitioning
                         )
  Job = Client.load_table_from_dataframe(df, table_ref, 
                                         job_config=job_config)
  Job.result()
4

2 回答 2

5

我不知道它是否会有所帮助,但您可以使用以下示例加载带有分区的作业:

from datetime import datetime, time
from concurrent import futures
import math
from pathlib import Path
from google.cloud import bigquery

def run_query(self, query_job_config):
  time_partitioning = bigquery.table.TimePartitioning(field="partition_date")
  job_config = bigquery.QueryJobConfig()
  job_config.destination = query_job_config['destination_dataset_table']
  job_config.time_partitioning = time_partitioning
  job_config.use_legacy_sql = False
  job_config.allow_large_results = True
  job_config.write_disposition = 'WRITE_APPEND'
  sql = query_job_config['sql']
  query_job = self.client.query(sql, job_config=job_config)
  query_job.result()
于 2020-04-10T02:30:04.423 回答
3

感谢伊尔维法。

我试图加载数据框并正在寻找 LoadJobConfig,但它非常相似。

我会发布我的答案,以防有人需要 LoadJob 的任何示例。

import pandas
from google.cloud import bigquery


def load_df(self, df):
  project_id="ProjectID"
  dataset_id="Dataset"
  table_id="TableName"
  table_ref=project_id+"."+dataset_id+"."+table_id
  time_partitioning = bigquery.table.TimePartitioning(field="PartitionColumn")
  job_config = bigquery.LoadJobConfig(
                         schema="Schema",
                         write_disposition="WRITE_TRUNCATE",
                         time_partitioning=time_partitioning
                         )
  Job = Client.load_table_from_dataframe(df, table_ref, 
                                         job_config=job_config)
  Job.result()
于 2020-04-10T23:32:48.397 回答