python-2.7 - 如何将 DataFrame 存储到 Google DataLab 中的 BigTable 中？

Question

我有一个 DataFrame df。我创建了一个 BigQuery 表。

# Create the schema, using the convenience of basing it on example DataFrame
schema = bq.Schema.from_dataframe(df)

# Create the dataset
bq.DataSet('ids').create()

# Create the table
suri_table = bq.Table('ids.suri').create(schema = schema, overwrite = True)


project = gcp.Context.default().project_id

我想使用 Pandas 函数 [to_gbq()][1] 来存储 DataFrame。

df.to_gbq(df, 'ids.suri', project)

尽管表存在，但这会返回“未找到异常”。我刚刚在上面的代码中创建了它。有人可以帮我解决问题的真正原因吗？

NotFoundException：无效的表名。应该是“datasetId.tableId”的形式

如果我做：

from pandas.io import gbq

df.to_gbq('ids.suri', project_id=projectid)

我得到：

/usr/lib/python2.7/dist-packages/pkg_resources.pyc in resolve(self, requirements, env, installer, replace_conflicting)
    637                         # unfortunately, zc.buildout uses a str(err)
    638                         # to get the name of the distribution here..
--> 639                         raise DistributionNotFound(req)
    640                 to_activate.append(dist)
    641             if dist not in req:

DistributionNotFound: google-api-python-client

  [1]: http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.io.gbq.to_gbq.html

score 1 · Accepted Answer

您将 Cloud Datalab 方式与 gbq 方式混为一谈。您应该使用其中一种。要从 Cloud Datalab 执行此操作，创建数据后，您只需使用：

suri_table.insert_data(df)

如果要包含索引等，有几个选项；见http://googlecloudplatform.github.io/datalab/gcp.bigquery.html#gcp.bigquery.Table.insert_data

python-2.7 - 如何将 DataFrame 存储到 Google DataLab 中的 BigTable 中？

1 回答 1

Related

Reference