我有一个 JSON 数据文件,如下所示
{
"key_a": "value_a",
"key_b": "value_b",
"key_c": {
"c_nested/invalid.key.according.to.bigquery": "valid_value_though"
}
}
我们知道 BigQuery 将c_nested/invalid.key.according.to.bigquery视为无效的列名。我有大量由 StackDriver 导出到 Google Cloud Storage 的日志数据,其中包含很多无效字段(根据 BigQueryFields must contain only letters, numbers, and underscores, start with a letter or underscore, and be at most 128 characters long).
作为一种解决方法,我试图将值作为字符串存储到key_c
(整个 )BigQuery 表中。{"c_nested/invalid.key.according.to.bigquery": "valid_value_though"}
我认为我的表定义如下所示:
[
{
"mode": "NULLABLE",
"name": "key_a",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "key_b",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "key_c",
"type": "STRING"
}
]
当我尝试使用此架构创建表时,出现以下错误:
Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the errors[] collection for more details.
Error while reading data, error message: JSON processing encountered too many errors, giving up. Rows: 1; errors: 1; max bad: 0; error percent: 0
Error while reading data, error message: JSON parsing error in row starting at position 0: Expected key
假设 BigQuery 现在支持它,我想简单地跳过key_c
具有以下架构的列:
[
{
"mode": "NULLABLE",
"name": "key_a",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "key_b",
"type": "STRING"
}
]
上面的架构让我至少可以创建一个永久表(用于查询外部数据),但是当我尝试查询数据时,我收到以下错误:
Error while reading table:
projectname.dataset_name.table_name, error message:
JSON parsing error in row starting at position 0: No such field: key_c.
我知道这里描述了一种将每个JSON行原始加载到 BigQuery 的方法 - 就好像它是一个 CSV 一样 - 然后在 BigQuery 中解析,但这会使查询过于复杂。
清理数据是唯一的方法吗?我该如何解决这个问题?
我正在寻找一种方法来跳过为无效字段创建列并直接存储为 STRING 或完全忽略它们。这可能吗?