23

我需要一个查询来在 Bigquery 中查找表的列名(表元数据),例如 SQL 中的以下查询:

SELECT column_name,data_type,data_length,data_precision,nullable FROM all_tab_cols where table_name ='EMP';
4

6 回答 6

35

BigQuery 现在支持信息架构。

假设您有一个名为 MY_PROJECT.MY_DATASET 的数据集和一个名为 MY_TABLE 的表,那么您可以运行以下查询:

SELECT column_name
FROM MY_PROJECT.MY_DATASET.INFORMATION_SCHEMA.COLUMNS
WHERE table_name = 'MY_TABLE'
于 2019-05-07T16:28:22.430 回答
9

是的,您可以使用INFORMATION_SCHEMA获取表元数据。

过去链接中提到的示例之一从INFORMATION_SCHEMA.COLUMN_FIELD_PATHSgithub_repos 数据集中的提交表的视图中检索元数据,您只需要

  1. 在 GCP Console 中打开 BigQuery 网页界面。

  2. 在查询编辑器框中输入以下标准 SQL 查询。INFORMATION_SCHEMA 需要标准的 SQL 语法。标准 SQL 是 GCP Console 中的默认语法。

     SELECT
      *
     FROM
      `bigquery-public-data`.github_repos.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS
     WHERE
      table_name="commits"
      AND column_name="author"
      OR column_name="difference"
    

注意:INFORMATION_SCHEMA 视图名称区分大小写。

  1. 单击运行。

结果应如下所示

  +------------+-------------+---------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
  | table_name | column_name |     field_path      |                                                                      data_type                                                                      | description |
  +------------+-------------+---------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
  | commits    | author      | author              | STRUCT<name STRING, email STRING, time_sec INT64, tz_offset INT64, date TIMESTAMP>                                                                  | NULL        |
  | commits    | author      | author.name         | STRING                                                                                                                                              | NULL        |
  | commits    | author      | author.email        | STRING                                                                                                                                              | NULL        |
  | commits    | author      | author.time_sec     | INT64                                                                                                                                               | NULL        |
  | commits    | author      | author.tz_offset    | INT64                                                                                                                                               | NULL        |
  | commits    | author      | author.date         | TIMESTAMP                                                                                                                                           | NULL        |
  | commits    | difference  | difference          | ARRAY<STRUCT<old_mode INT64, new_mode INT64, old_path STRING, new_path STRING, old_sha1 STRING, new_sha1 STRING, old_repo STRING, new_repo STRING>> | NULL        |
  | commits    | difference  | difference.old_mode | INT64                                                                                                                                               | NULL        |
  | commits    | difference  | difference.new_mode | INT64                                                                                                                                               | NULL        |
  | commits    | difference  | difference.old_path | STRING                                                                                                                                              | NULL        |
  | commits    | difference  | difference.new_path | STRING                                                                                                                                              | NULL        |
  | commits    | difference  | difference.old_sha1 | STRING                                                                                                                                              | NULL        |
  | commits    | difference  | difference.new_sha1 | STRING                                                                                                                                              | NULL        |
  | commits    | difference  | difference.old_repo | STRING                                                                                                                                              | NULL        |
  | commits    | difference  | difference.new_repo | STRING                                                                                                                                              | NULL        |
  +------------+-------------+---------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
于 2019-09-25T10:13:10.073 回答
4

对于像我这样的新手,上面的语法如下:

select * from project_name.dataset_name.INFORMATION_SCHEMA.COLUMNS where table_catalog=project_name and table_schema=dataset_name and table_name=table_name
于 2019-05-22T01:56:32.763 回答
3

更新:这现在是可能的!请参阅下面的INFORMATION SCHEMA文档和答案。

答案,大约在 2012 年:

目前无法通过查询检索表元数据(即列名和类型),尽管这不是第一次被请求。

您是否有理由需要将此作为查询?表元数据可通过表 API获得。

于 2012-07-05T07:11:27.087 回答
2

实际上,使用 SQL 可以做到这一点。为此,您需要查询正在创建的此特定表的最后一个日志的日志记录表。

例如,假设表每天加载/创建:

    CREATE TEMP FUNCTION jsonSchemaStringToArray(jsonSchema String)
          RETURNS ARRAY<STRING> AS ((
            SELECT
              SPLIT(
                REGEXP_REPLACE(REPLACE(LTRIM(jsonSchema,'{ '),'"fields": [',''), r'{[^{]+"name": "([^\"]+)"[^}]+}[, ]*', '\\1,')
              ,',')
          ));
    WITH valid_schema_columns AS (
      WITH array_output aS (SELECT
        jsonSchemaStringToArray(jsonSchema) AS column_names
      FROM (
        SELECT
          protoPayload.serviceData.jobInsertRequest.resource.jobConfiguration.load.schemaJson AS jsonSchema
          , ROW_NUMBER() OVER (ORDER BY metadata.timestamp DESC) AS record_count
        FROM `realself-main.bigquery_logging.cloudaudit_googleapis_com_data_access_20170101`
        WHERE
          protoPayload.serviceData.jobInsertRequest.resource.jobConfiguration.load.destinationTable.tableId = '<table_name>'
          AND
          protoPayload.serviceData.jobInsertRequest.resource.jobConfiguration.load.destinationTable.datasetId = '<schema_name>'
          AND
          protoPayload.serviceData.jobInsertRequest.resource.jobConfiguration.load.createDisposition = 'CREATE_IF_NEEDED'
      ) AS t
      WHERE
        t.record_count = 1 -- grab the latest entry
      )
      -- this is actually what UNNESTS the array into standard rows
      SELECT
        valid_column_name
      FROM array_output
      LEFT JOIN UNNEST(column_names) AS valid_column_name

    )
于 2017-04-22T19:55:45.770 回答
0

要检查列,您可以通过 CLI 访问您的表 轻松简单地查找

bq query --use_legacy_sql=false 'select Hour, sum(column 1) as column from `project_id.dataset.table_name` where Date(Hour) = '2020-06-10';'
于 2020-06-13T20:09:12.687 回答