0

我正在尝试将非重复记录插入 BigQuery,但一直收到错误消息Array specified for non-repeated field: record.

我的问题是:如何使用bigrquery库将非重复记录插入 BigQuery?

如果我有以下架构:

bqSchema <- bq_fields(list(
  bq_field(name = "record", type = "RECORD", fields = list(
    bq_field(name = "a", type = "INTEGER"),
    bq_field(name = "b", type = "STRING")
  ))
))

而这个数据框:

df <- tibble(
  record = list(
    a = 1,
    b = "B"
  )
)

如下插入数据会导致 BigQuery 中的错误:

bq_perform_upload(bqTableObj, df, fields = bqSchema)
# Array specified for non-repeated field: record

我认为这部分是因为 bigrquery将数据帧转换为 JSON with jsonlite::stream_out(),但不使用参数auto_unbox = TRUE,导致数组,而不是对象。这会导致将以下以换行符分隔的 JSON 发送到 BigQuery:

{"record": [1]}
{"record": ["B"]}

我认为应该发送到 BigQuery 的正确 NDJSON 应该是:

{"record": {"a": 1, "b", "B"}}

以前有没有人遇到过这个问题,或者有想法我可以如何解决这个问题?

4

1 回答 1

0

您应该在设置 mode = "REPEATED" 的地方尝试以下操作:

bqSchema <- bq_fields(list(
  bq_field(name = "record", type = "RECORD", mode = "REPEATED",
           fields = list(bq_field(name = "a", type = "INTEGER"),
                         bq_field(name = "b", type = "STRING")
                         )
           )
 ))
于 2020-05-26T05:19:15.267 回答