我们正在为bq.py编写一个包装器,并且在结果集大于 100k 行时遇到了一些问题。过去这似乎效果很好(我们在 Odd Attempts 上遇到了 Google BigQuery Incomplete Query Replies 的相关问题)。也许我不理解文档页面上解释的限制?
例如:
#!/bin/bash
for i in `seq 99999 100002`;
do
bq query -q --nouse_cache --max_rows 99999999 "SELECT id, FROM [publicdata:samples.wikipedia] LIMIT $i" > $i.txt
j=$(cat $i.txt | wc -l)
echo "Limit $i Returned $j Rows"
done
产量(注意有 4 行格式):
Limit 99999 Returned 100003 Rows
Limit 100000 Returned 100004 Rows
Limit 100001 Returned 100004 Rows
Limit 100002 Returned 100004 Rows
在我们的包装器中,我们直接访问 API:
while row_count < total_rows:
data = client.apiclient.tabledata().list(maxResults=total_rows - row_count,
pageToken=page_token,
**table_dict).execute()
# If there are more results than will fit on a page,
# you will recieve a token for the next page
page_token = data.get('pageToken', None)
# How many rows are there across all pages?
total_rows = min(total_rows, int(data['totalRows'])) # Changed to use get(data[rows],0)
raw_page = data.get('rows', [])
在这种情况下,我们希望得到一个令牌,但没有返回。