我对 Google Cloud Datalab 还是很陌生,并且在执行参数化查询时遇到了一些麻烦。
我按照从Datalab 教程传递查询参数的示例,并尝试将其应用于以下查询:
%sql
SELECT user_id, localTime, event
FROM (SELECT user_id, DATE_ADD(date, timezoneOffset, "SECOND") AS localTime, event
FROM (TABLE_QUERY([my_project:my_dataset:user_events],
'table_id CONTAINS "user_events_0"
AND RIGHT(table_id, 8) BETWEEN "20160401" AND "20160408"'))
WHERE
user_id IS NOT NULL AND
timezoneOffset IS NOT NULL AND
event IS NOT NULL)
WHERE
user_id IN (SELECT id FROM [my_project:my_dataset.topUsers])
ORDER BY user_id, localTime
我想遍历所有user_events表,索引为 0,1,2,3 ...为此,我想传递 TABLE_QUERY 的参数并在循环的一次迭代中查询每个表 - 不是全部桌子同时。(因为我需要在每个表中对用户记录进行排序;一次对所有user_events表执行查询时超出了资源)
1.)我定义了一个新的查询(%%sql --module topUserEvents
等)并从上面的查询中替换了以下部分:
FROM (TABLE_QUERY([my_project:my_dataset:user_events],
'table_id CONTAINS "user_events_0"
AND RIGHT(table_id, 8) BETWEEN "20160401" AND "20160408"'))
和:
FROM (TABLE_QUERY([my_project:my_dataset:user_events],
'table_id CONTAINS "user_events_'+$tableNr+
'" AND RIGHT(table_id, 8) BETWEEN "20160401" AND "20160408"'))
执行查询,将表号作为字符串传递 - 不起作用:
invalidQuery: Expected a string literal for TABLE_QUERY clause
2.) 我还尝试传递整个字符串,将原始查询的一部分替换为:
FROM (TABLE_QUERY([my_project:my_dataset:user_events], $tableString))
执行查询,传递整个字符串,返回大查询异常:
invalidQuery: Error preparing subsidiary query:
com.google.cloud.helix.server.bqsql.common.BigQueryException:
Encountered " "CONTAINS" "CONTAINS "" at line 1, column 94.
Was expecting:
")" ...
有谁知道如何为 TABLE_QUERY 参数传递(一部分)字符串,例如上面的情况?
任何帮助将不胜感激 :)