我正在尝试运行一个连接太大数据集的简单查询,但我遇到了各种错误。此处转载的是使用公共数据库的类似查询
SELECT gn1.actor_attributes.blog, gn1.actor_attributes.company, gn1.actor_attributes.email, gn1.actor_attributes.gravatar_id, gn1.actor_attributes.location, gn1.actor_attributes.login, gn1.actor_attributes.name,gn2.actor_attributes.blog, gn2.actor_attributes.company, gn2.actor_attributes.email, gn2.actor_attributes.gravatar_id, gn2.actor_attributes.location, gn2.actor_attributes.login, gn2.actor_attributes.name
FROM [publicdata:samples.github_nested] as gn1 inner join (select actor_attributes.blog, actor_attributes.company,actor_attributes.email, actor_attributes.gravatar_id, actor_attributes.location, actor_attributes.login, actor_attributes.name from [publicdata:samples.github_nested] group by actor_attributes.blog, actor_attributes.company,actor_attributes.email, actor_attributes.gravatar_id, actor_attributes.location, actor_attributes.login, actor_attributes.name) as gn2 on gn1.payload.target.login=gn2.actor_attributes.login
WHERE gn1.type='FollowEvent'
如果没有“inner join each”,则表示数据库太大。当我使用“inner join each”运行查询时,大查询会给出错误说明:
无法执行分区连接,因为 gn2 不可并行化:(SELECT [actor_attributes.blog]、[actor_attributes.company]、[actor_attributes.email]、[actor_attributes.gravatar_id]、[actor_attributes.location]、[actor_attributes.login]、[actor_attributes .name] FROM [publicdata:samples.github_nested] GROUP BY [actor_attributes.blog], [actor_attributes.company], [actor_attributes.email], [actor_attributes.gravatar_id], [actor_attributes.location], [actor_attributes.login], [ actor_attributes.name])
任何帮助将不胜感激
谢谢