这是关于查看 Firebase Analytics Data 中的分布的第二篇文章(我的第一篇文章的后续文章)。这一次,我想根据 Firebase 会话数据在 BigQuery中创建一个用户分布表。输出应如下所示:
我设法创建了以下脚本来依靠 app_instance_id:
#standardSQL
SELECT
COUNT(DISTINCT(CASE WHEN sess_id = 0 THEN app_instance_id END)) AS sess_count_0,
COUNT(DISTINCT(CASE WHEN sess_id = 1 THEN app_instance_id END)) AS sess_count_1,
COUNT(DISTINCT(CASE WHEN sess_id > 1 AND sess_id <= 5 THEN app_instance_id END)) AS sess_count_2BETWEEN5,
COUNT(DISTINCT(CASE WHEN sess_id > 5 AND sess_id <= 10 THEN app_instance_id END)) AS sess_count_6BETWEEN10,
COUNT(DISTINCT(CASE WHEN sess_id > 10 AND sess_id <= 30 THEN app_instance_id END)) AS sess_count_11BETWEEN30,
COUNT(DISTINCT(CASE WHEN sess_id > 30 THEN app_instance_id END)) AS sess_count_PLUS31
FROM (SELECT *, SUM(session_start) OVER(PARTITION BY app_instance_id ORDER BY min_time) sess_id
FROM (SELECT *, IF(previous IS null OR (min_time-previous)>(20*60*1000*1000),1, 0) session_start
FROM (SELECT *, LAG(max_time, 1) OVER(PARTITION BY app_instance_id ORDER BY max_time) previous
FROM (SELECT user_dim.app_info.app_instance_id,
user_dim.device_info.mobile_model_name,
user_dim.device_info.platform_version,
(SELECT MIN(timestamp_micros)
FROM UNNEST(event_dim)) min_time,
(SELECT MAX(timestamp_micros) FROM UNNEST(event_dim)) max_time
FROM `firebase-public-project.com_firebase_demo_IOS.app_events_*`
WHERE (_TABLE_SUFFIX BETWEEN '20170701' AND '20170731')
)
)
)
)
问题:
考虑到用户(而不是会话),我想 100% 确定我是否仍然应该依赖应用程序实例(而不是会话 ID)?
关于优化此查询的任何想法是否有一种更有效的方法可以通过一个查询聚合所有分布范围?
最后,我想将我从上面得到的总体总数与
session_start
在同一时期触发 - 事件的不同用户进行比较。我希望看到它大致对齐,但事实并非如此。为什么会有这么大的差异:7688 vs 16310 (488+7343+4967+1956+1165+391) ?我的逻辑哪里出错了?#standardSQL SELECT COUNT (DISTINCT user_dim.app_info.app_instance_id) as users FROM `firebase-public-project.com_firebase_demo_IOS.app_events_*`, UNNEST(event_dim) AS event WHERE (_TABLE_SUFFIX BETWEEN '20170701' AND '20170731') AND event.name = "session_start"