我正在尝试在 github 存档(http://www.githubarchive.org/)数据上使用 Google BigQuery 来获取存储库在其最新事件发生时的统计信息,并且我正在尝试为最多的存储库获取此信息观察者。我意识到这很多,但我觉得我真的很接近在一次查询中得到它。
这是我现在的查询:
SELECT repository_name, repository_owner, repository_organization, repository_size, repository_watchers as watchers, repository_forks as forks, repository_language, MAX(PARSE_UTC_USEC(created_at)) as time
FROM [githubarchive:github.timeline]
GROUP EACH BY repository_name, repository_owner, repository_organization, repository_size, watchers, forks, repository_language
ORDER BY watchers DESC, time DESC
LIMIT 1000
唯一的问题是我得到了来自最高关注存储库(twitter bootstrap)的所有事件:
结果:
Row repository_name repository_owner repository_organization repository_size watchers forks repository_language time
1 bootstrap twbs twbs 83875 61191 21602 JavaScript 1384991582000000
2 bootstrap twbs twbs 83875 61190 21602 JavaScript 1384991337000000
3 bootstrap twbs twbs 83875 61190 21603 JavaScript 1384989683000000
...
我怎样才能让它返回一个repository_name的单个结果(最近的,又名Max(time))?
我试过了:
SELECT repository_name, repository_owner, repository_organization, repository_size, repository_watchers as watchers, repository_forks as forks, repository_language, MAX(PARSE_UTC_USEC(created_at)) as time
FROM [githubarchive:github.timeline]
WHERE PARSE_UTC_USEC(created_at) IN (SELECT MAX(PARSE_UTC_USEC(created_at)) FROM [githubarchive:github.timeline])
GROUP EACH BY repository_name, repository_owner, repository_organization, repository_size, watchers, forks, repository_language
ORDER BY watchers DESC, time DESC
LIMIT 1000
不确定这是否可行,但没关系,因为我收到错误消息:
Error: Join attribute is not defined: PARSE_UTC_USEC
任何帮助都会很棒,谢谢。