我正在使用 scio 使用 Scala 将数据写入 BigQuery,升级到 0.10.0 版后遇到奇怪的错误。
这是我的简单示例:
package com.databius.demo
import com.google.api.services.bigquery.model.{TableFieldSchema, TableSchema}
import com.spotify.scio.bigquery._
import com.spotify.scio.{ContextAndArgs, bigquery}
import org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO
import scala.jdk.CollectionConverters._
object ScioDemo {
def main(args: Array[String]): Unit = {
val (sc, _) = ContextAndArgs(args)
val schema = new TableSchema().setFields(
List(
new TableFieldSchema()
.setName("blob")
.setType("BYTES")
.setMode("NULLABLE")
).asJava
)
val blob = "test".getBytes
val tr = bigquery.TableRow("blob" -> blob)
sc.parallelize(Seq(tr))
.saveAsCustomOutput(
"custom bigquery IO",
BigQueryIO
.writeTableRows()
.to("demo:demo_ds.demo_tb")
.withSchema(schema)
.withCreateDisposition(CREATE_IF_NEEDED)
.withWriteDisposition(WRITE_TRUNCATE)
)
sc.run()
}
}
该示例适用于 scio 版本 0.9.2 (build.gradle):
plugins {
id "java"
id "scala"
}
def scioVersion = "0.9.2"
repositories {
mavenCentral()
}
dependencies {
implementation("org.scala-lang:scala-library:2.13.8")
implementation("com.spotify:scio-core_2.13:$scioVersion")
implementation("com.spotify:scio-bigquery_2.13:$scioVersion")
implementation('com.google.cloud:google-cloud-bigquery:2.6.2')
}
当我升级到 0.10.0 版时。我一直在关注scio 团队的迁移指南。
plugins {
id "java"
id "scala"
}
def scioVersion = "0.10.0"
repositories {
mavenCentral()
}
dependencies {
implementation("org.scala-lang:scala-library:2.13.8")
implementation("com.spotify:scio-core_2.13:$scioVersion")
implementation("com.spotify:scio-google-cloud-platform_2.13:$scioVersion")
implementation('com.google.cloud:google-cloud-bigquery:2.6.2')
}
我得到了错误:
"message" : "读取数据时出错,错误消息:从位置 0 开始的行中的 JSON 解析错误:为非重复字段指定的数组:blob。",
我也尝试了最新版本(0.11.3),但仍然是同样的错误。你知道如何解决这个问题吗?