apache-spark - 使用 simba 驱动程序将数据帧发送到 Bigquery

Question

在尝试使用 Simba 驱动程序将数据帧写入 Bigquery 时。我得到以下异常。下面是数据框。在 bigquery 中创建了一个具有相同架构的表。

df.printSchema
root
 |-- empid: integer (nullable = true)
 |-- firstname: string (nullable = true)
 |-- middle: string (nullable = true)
 |-- last: string (nullable = true)
 |-- gender: string (nullable = true)
 |-- age: double (nullable = true)
 |-- weight: integer (nullable = true)
 |-- salary: integer (nullable = true)
 |-- city: string (nullable = true)

Simba 驱动程序抛出以下错误

 Caused by: com.simba.googlebigquery.support.exceptions.GeneralException: [Simba][BigQueryJDBCDriver](100032) Error executing query job. Message: 400 Bad Request
    {
      "code" : 400,
      "errors" : [ {
        "domain" : "global",
        "location" : "q",
        "locationType" : "parameter",
        "message" : "Syntax error: Unexpected string literal \"empid\" at [1:38]",
        "reason" : "invalidQuery"
      } ],
      "message" : "Syntax error: Unexpected string literal \"empid\" at [1:38]",
      "status" : "INVALID_ARGUMENT"
    }
      ... 24 more

下面是我使用的代码：

val url = "jdbc:bigquery://https://www.googleapis.com/bigquery/v2;ProjectId=my_project_id;OAuthType=0;OAuthPvtKeyPath=service_account_jsonfile;OAuthServiceAcctEmail=googleaccount"
df.write.mode(SaveMode.Append).jdbc(url,"orders_dataset.employee",new java.util.Properties)

请让我知道是否缺少任何其他配置或哪里出错了。提前致谢！

score 0 · Accepted Answer

似乎该行为是由 Spark 引起的，它在column names周围发送额外的配额。

要在 Spark 中修复此行为，您需要在创建 Spark 上下文之后和创建数据帧之前添加以下代码：

JdbcDialects.registerDialect(new JdbcDialect() {

override def canHandle(url: String): Boolean = url.toLowerCase.startsWith("jdbc:bigquery:")

override

def quoteIdentifier(column: String): String =  column

})

apache-spark - 使用 simba 驱动程序将数据帧发送到 Bigquery

1 回答 1

Related

Reference