我对嵌套数据的概念相对较新,并试图了解正确的方法来扁平化 BigQuery 中的一些 GA 数据 ( https://support.google.com/analytics/answer/3437719?hl=en )。
现在给出一些上下文,对于每个访问者会话,我试图捕获查看的产品 SKU 列表(详细视图)以及是否有交易,交易 id。根据我的估计,在做了一些研究之后,最简单的方法是这样的,使用 LEFT JOINS 来恢复所有内容:
SELECT fullVisitorId as uId, visitId as vId, h.transaction.transactionId as
trId, STRING_AGG(p.productSKU, "|") as skus
FROM
`test-bigquery.12345678.ga_sessions_*` t
LEFT JOIN UNNEST(hits) h
LEFT JOIN UNNEST(h.product) p
WHERE
_TABLE_SUFFIX = '20170709'
AND h.eCommerceAction.action_type = '2'
GROUP BY uId, vId, trId
但是,这似乎返回零结果,其中 trId 不为空......
然后,我尝试将上述内容分成两个查询并加入。这似乎可行,并返回看似合理的行数(~1000),其中 trId 不为空。
WITH skus AS
(SELECT fullVisitorId as uId, visitId as vId, STRING_AGG(p.productSKU, "|") as skus
FROM
`test-bigquery.12345678.ga_sessions_*` t
LEFT JOIN UNNEST(hits) h
LEFT JOIN UNNEST(h.product) p
WHERE
_TABLE_SUFFIX = '20170709'
AND h.eCommerceAction.action_type = '2'
GROUP BY uId, vId),
transactions AS
(SELECT fullVisitorId as uId_trans, visitId as vId_trans, h.transaction.transactionId as trId
FROM
`test-bigquery.12345678.ga_sessions_*` t
LEFT JOIN UNNEST(hits) h
WHERE
_TABLE_SUFFIX = '20170709'
AND h.transaction.transactionId IS NOT NULL
GROUP BY uId_trans, vId_trans, trId)
SELECT skus.uId, skus.vId, transactions.trId, skus.skus
FROM skus
LEFT JOIN transactions ON transactions.vId_trans = skus.vId AND transactions.uId_trans = skus.uId
如果有人能解释为什么两者不给出相同的答案,并希望让我能够在未来参与各种嵌套乐趣,那就太棒了....谢谢!