ruby - 通过 API 上传本地 CSV 失败

Question

我正在使用官方的 Google Ruby gem，虽然到目前为止我尝试过的所有其他东西都运行良好（包括列出项目、数据集和表以及创建表），但启动加载作业失败，JSON 中出现以下内容错误响应：

"Job configuration must contain exactly one job-specific configuration object (e.g., query, load, extract, spreadsheetExtract), but there were 0: "

我创建的正文字符串如下所示：

--xxx
Content-Type: application/json; charset=UTF-8
{"configuration":{"load":{"destinationTable":{"projectId":"mycompany.com:projectId","datasetId":"all_events","tableId":"install"},"createDisposition":"CREATE_NEVER","writeDisposition":"WRITE_APPEND"}}}
--xxx
Content-Type: application/octet-stream
test,second,1234,6789,83838
--xxx

我之前已经install为该数据创建了具有适当架构的表，所以这不应该是问题。

最后，为了完整起见，这里是我用来触发请求的实际代码（这是更大类中的两种方法）：

def create_insert_job
  config = {
    'configuration' => {
      'load' => {
        'destinationTable' => {
          'projectId' => 'mycompany.com:projectId',
          'datasetId' => 'all_events',
          'tableId' => 'install'
        },
        'createDisposition' => 'CREATE_NEVER',
        'writeDisposition' => 'WRITE_APPEND'
      }
    }
  }

  body = "#{multipart_boundary}\n"
  body += "Content-Type: application/json; charset=UTF-8\n"
  body += "#{config.to_json}\n"
  body += "#{multipart_boundary}\n"
  body +="Content-Type: application/octet-stream\n"
  body += "test,second,1234,6789,83838\n"
  body += "#{multipart_boundary}\n"

  prepare_big_query # This simply gets tokens and instantiates google_client and big_query_api
  param_hash = { api_method: big_query_api.jobs.insert }
  param_hash[:parameters] = {'projectId' =>'mycompany.com:projectId'}
  param_hash[:body] = body
  param_hash[:headers] = {'Content-Type' => "multipart/related; boundary=#{multipart_boundary}"}

  result = google_client.execute(param_hash)
  JSON.parse(result.response.body)
end

def multipart_boundary
  '--xxx'
end

有任何想法吗？

对以下答案的补充以使此代码有效

请注意，上面的#multipart_boundary 方法返回时已经添加了“--”。这是一个问题，因为设置边界标头（在参数散列中）会在我们真正想要“xxx”时导致“--xxx”。

此外，这个 gem 的文档非常粗糙，因为在修复了我的换行问题（根据 @jcondit 的回答）后，我收到了一个关于上传到错误 URL 的新错误。那是因为您需要添加：

'uploadType' => 'multipart'

到参数，以便将请求发送到正确的 URL。

所以最终的 param_hash （同样，在修复换行符和边界问题之后）看起来像：

param_hash = { api_method: big_query_api.jobs.insert }
param_hash[:parameters] = {'projectId' => project_id, 'uploadType' => 'multipart'}
param_hash[:body] = body
param_hash[:headers] = {'Content-Type' => "multipart/related; boundary=#{multipart_boundary}"}

score 0 · Accepted Answer

您的 http 请求格式不正确—— bigquery 根本不认为这是一个加载作业。我正在出去吃晚饭的路上，所以我不能做任何更深入的调查，但希望这能给你一个继续前进的指导。

我仔细看了看，我看不出你的要求有什么问题。一个建议是尝试在 bigquery UI 中执行相同的加载，并使用 chrome 工具->开发者工具/网络选项卡查看发送的 RPC。

如果我使用虚拟 csv 文件执行此操作，我会得到：

--yql9f05215ct
Content-Type: application/json; charset=utf-8

{"jobReference":{"projectId":"helixdata2"},"configuration":{"load":{"destinationTable":{"projectId":"helixdata2","datasetId":"lotsOdata","tableId":"import"}}}}
--yql9f05215ct
Content-Type: application/octet-stream
Content-Transfer-Encoding: base64

YSxiLGMKYyxkLGUKZixnLGgK
--yql9f05215ct--

score 0 · Accepted Answer

您需要在每个 MIME 部分的标题和每个 MIME 部分的正文之间插入一个额外的换行符。您的请求正文应如下所示：

--xxx
Content-Type: application/json; charset=UTF-8

{"configuration":{"load":{"destinationTable":{"projectId":"mycompany.com:projectId","datasetId":"all_events","tableId":"install"},"createDisposition":"CREATE_NEVER","writeDisposition":"WRITE_APPEND"}}}
--xxx
Content-Type: application/octet-stream

test,second,1234,6789,83838
--xxx--

请注意每个部分中 Content-Type 标头后的额外换行符。

另外，不要忘记最后的边界分隔符有一个尾随 -- 附加到它上面。

ruby - 通过 API 上传本地 CSV 失败

2 回答 2

Related

Reference