1

我对 Google Cloud Datalab 还是很陌生,并且在执行参数化查询时遇到了一些麻烦。

我按照从Datalab 教程传递查询参数的示例,并尝试将其应用于以下查询:

%sql
SELECT user_id, localTime, event
FROM (SELECT user_id, DATE_ADD(date, timezoneOffset, "SECOND") AS localTime, event
  FROM (TABLE_QUERY([my_project:my_dataset:user_events], 
       'table_id CONTAINS "user_events_0" 
       AND RIGHT(table_id, 8) BETWEEN "20160401" AND "20160408"'))
  WHERE 
  user_id IS NOT NULL AND
  timezoneOffset IS NOT NULL AND
  event IS NOT NULL)
WHERE 
  user_id IN (SELECT id FROM [my_project:my_dataset.topUsers])
ORDER BY user_id, localTime

我想遍历所有user_events表,索引为 0,1,2,3 ...为此,我想传递 TABLE_QUERY 的参数并在循环的一次迭代中查询每个表 - 不是全部桌子同时。(因为我需要在每个表中对用户记录进行排序;一次对所有user_events表执行查询时超出了资源)

1.)我定义了一个新的查询(%%sql --module topUserEvents等)并从上面的查询中替换了以下部分:

 FROM (TABLE_QUERY([my_project:my_dataset:user_events], 
      'table_id CONTAINS "user_events_0" 
       AND RIGHT(table_id, 8) BETWEEN "20160401" AND "20160408"'))

和:

  FROM (TABLE_QUERY([my_project:my_dataset:user_events], 
       'table_id CONTAINS "user_events_'+$tableNr+ 
       '" AND RIGHT(table_id, 8) BETWEEN "20160401" AND "20160408"'))

执行查询,将表号作为字符串传递 - 不起作用:

invalidQuery: Expected a string literal for TABLE_QUERY clause

2.) 我还尝试传递整个字符串,将原始查询的一部分替换为:

  FROM (TABLE_QUERY([my_project:my_dataset:user_events], $tableString))

执行查询,传递整个字符串,返回大查询异常:

invalidQuery: Error preparing subsidiary query:
com.google.cloud.helix.server.bqsql.common.BigQueryException:
Encountered " "CONTAINS" "CONTAINS "" at line 1, column 94.
Was expecting:
")" ...

有谁知道如何为 TABLE_QUERY 参数传递(一部分)字符串,例如上面的情况?

任何帮助将不胜感激 :)

4

1 回答 1

1

你可以试试下面的吗?

定义模块'test1':

%%sql --module test1
SELECT count(*)
FROM TABLE_QUERY(publicdata:samples, 
  'MSEC_TO_TIMESTAMP(creation_time) < DATE_ADD(CURRENT_TIMESTAMP(), -7, $period)')

运行查询:

period = 'DAY'
bq.Query(test1, period = period).sample()

定义模块'test2':

%sql --module test2
SELECT user_id, localTime, event
FROM (SELECT user_id, DATE_ADD(date, timezoneOffset, "SECOND") AS localTime, event
  FROM (TABLE_QUERY([my_project:my_dataset:user_events], 
       'table_id CONTAINS $events_table_num 
       AND RIGHT(table_id, 8) BETWEEN "20160401" AND "20160408"'))
  WHERE 
  user_id IS NOT NULL AND
  timezoneOffset IS NOT NULL AND
  event IS NOT NULL)
WHERE 
  user_id IN (SELECT id FROM [my_project:my_dataset.topUsers])
ORDER BY user_id, localTime

运行查询:

events_table_num = 'user_events_0'
bq.Query(test2,events_table_num = events_table_num).sample()
于 2016-04-11T19:02:52.547 回答