0

我正在尝试在 BigQuery 中重新创建 GA 漏斗,这个打开的漏斗将排除查看过某些页面的会话,我尝试使用以下内容:AND NOT REGEXP_MATCH,NOT IN,但它仍然无法按我的预期工作,我仍在获取会话查看了我要排除的页面。

如果可能的话,我也想让它成为一个封闭的漏斗,这段代码返回一个开放的漏斗。

另外,有没有更好的方法用标准 SQL 编写这个查询?

在这些方面需要帮助。谢谢。

选择 COUNT(s0.firstHit) 作为 _test_your_details,
总和(s0.exit)作为_test_your_details_exits,
COUNT(s1.firstHit) AS _test_additional_new_details,
SUM(s1.exit) AS _test_additional_new_details_exits,
COUNT(s2.firstHit) AS _test_new_dress,
总和(s2.exit)作为_test_new_dress_exits,
COUNT(s3.firstHit) AS _test_test_details,
SUM(s3.exit) AS _test_test_details_exits,
COUNT(s4.firstHit) AS _test_cover_for_the_test,
总和(s4.exit)作为_test_cover_for_the_test_exits,
COUNT(s5.firstHit) AS _test_your_order,
SUM(s5.exit) AS _test_your_order_exits
从
  (选择 s0.fullVisitorId,
          s0.visitId,
          s0.firstHit,
          s0.退出,
          s1.firstHit,
          s1.退出,
          s2.firstHit,
          s2.退出,
          s3.firstHit,
          s3.退出,
          s4.firstHit,
          s4.退出,
          s5.firstHit,
          s5.exit
   从
     (选择 s0.fullVisitorId,
             s0.visitId,
             s0.firstHit,
             s0.退出,
             s1.firstHit,
             s1.退出,
             s2.firstHit,
             s2.退出,
             s3.firstHit,
             s3.退出,
             s4.firstHit,
             s4.退出
      从
        (选择 s0.fullVisitorId,
                s0.visitId,
                s0.firstHit,
                s0.退出,
                s1.firstHit,
                s1.退出,
                s2.firstHit,
                s2.退出,
                s3.firstHit,
                s3.退出
         从
           (选择 s0.fullVisitorId,
                   s0.visitId,
                   s0.firstHit,
                   s0.退出,
                   s1.firstHit,
                   s1.退出,
                   s2.firstHit,
                   s2.退出
            从
              (选择 s0.fullVisitorId,
                      s0.visitId,
                      s0.firstHit,
                      s0.退出,
                      s1.firstHit,
                      s1.退出
               从
                 (选择 fullVisitorId,
                         访问ID,
                         MIN(hits.hitNumber) AS firstHit,
                         MAX(IF(hits.isExit, 1, 0)) AS 退出
                  FROM TABLE_DATE_RANGE([xxxxxxxx.ga_sessions_], TIMESTAMP('2018-11-01'), TIMESTAMP('2018-11-30'))
                  WHERE REGEXP_MATCH(hits.page.pagePath, '/test - 你的详细信息')
                    AND totals.visits = 1
                    AND channelGrouping NOT LIKE '%organic%'
                   AND hits.page.pagePath NOT in ('/test - 附加测试详细信息', '/test - test dress', '/test - cover dress')
                   AND NOT REGEXP_MATCH(hits.page.pagePath, r"^/(测试 - 附加测试细节|测试 - 测试服|测试 - 封面礼服)")
                  GROUP BY fullVisitorId,
                           访问ID)s0
               完全外连接
                 (选择 fullVisitorId,
                         访问ID,
                         MIN(hits.hitNumber) AS firstHit,
                         MAX(IF(hits.isExit, 1, 0)) AS 退出
                  FROM TABLE_DATE_RANGE([xxxxxxxx.ga_sessions_], TIMESTAMP('2018-11-01'), TIMESTAMP('2018-11-30'))
                  WHERE REGEXP_MATCH(hits.page.pagePath, '/test - 额外的新细节')
                    AND totals.visits = 1
                    AND channelGrouping NOT LIKE '%organic%'
                  GROUP BY fullVisitorId,
                           visitId) s1 ON s0.fullVisitorId = s1.fullVisitorId
               AND s0.visitId = s1.visitId) s01
            完全外连接
              (选择 fullVisitorId,
                      访问ID,
                      MIN(hits.hitNumber) AS firstHit,
                      MAX(IF(hits.isExit, 1, 0)) AS 退出
               FROM TABLE_DATE_RANGE([xxxxxxxx.ga_sessions_], TIMESTAMP('2018-11-01'), TIMESTAMP('2018-11-30'))
               WHERE REGEXP_MATCH(hits.page.pagePath, '/test - 新衣服')
                 AND totals.visits = 1
                 AND channelGrouping NOT LIKE '%organic%'
               GROUP BY fullVisitorId,
                        visitId) s2 ON s0.fullVisitorId = s2.fullVisitorId
            AND s0.visitId = s2.visitId) s012
         完全外连接
           (选择 fullVisitorId,
                   访问ID,
                   MIN(hits.hitNumber) AS firstHit,
                   MAX(IF(hits.isExit, 1, 0)) AS 退出
            FROM TABLE_DATE_RANGE([xxxxxxxx.ga_sessions_], TIMESTAMP('2018-11-01'), TIMESTAMP('2018-11-30'))
            WHERE REGEXP_MATCH(hits.page.pagePath, '/test - 测试详情')
              AND totals.visits = 1
              AND channelGrouping NOT LIKE '%organic%'
            GROUP BY fullVisitorId,
                     visitId) s3 ON s0.fullVisitorId = s3.fullVisitorId
         AND s0.visitId = s3.visitId) s0123
      完全外连接
        (选择 fullVisitorId,
                访问ID,
                MIN(hits.hitNumber) AS firstHit,
                MAX(IF(hits.isExit, 1, 0)) AS 退出
         FROM TABLE_DATE_RANGE([xxxxxxxx.ga_sessions_], TIMESTAMP('2018-11-01'), TIMESTAMP('2018-11-30'))
         WHERE REGEXP_MATCH(hits.page.pagePath, '/test - 测试封面')
           AND totals.visits = 1
          AND channelGrouping NOT LIKE '%organic%'
          AND hits.page.pagePath 不在 ('/test - 附加测试详细信息', '/test - 测试服')
         GROUP BY fullVisitorId,
                  visitId) s4 ON s0.fullVisitorId = s4.fullVisitorId
      AND s0.visitId = s4.visitId) s01234
   完全外连接
     (选择 fullVisitorId,
             访问ID,
             MIN(hits.hitNumber) AS firstHit,
             MAX(IF(hits.isExit, 1, 0)) AS 退出
      FROM TABLE_DATE_RANGE([xxxxxxxx.ga_sessions_], TIMESTAMP('2018-11-01'), TIMESTAMP('2018-11-30'))
      WHERE REGEXP_MATCH(hits.page.pagePath, '/test - 你的订单')
        AND totals.visits = 1
        AND channelGrouping NOT LIKE '%organic%'
        AND hits.page.pagePath 不在 ('/test - 附加测试详细信息', '/test - 测试服')
         AND NOT REGEXP_MATCH(hits.page.pagePath, r"^/(测试 - 附加测试细节|测试 - 测试服|测试 - 封面礼服)")
      GROUP BY fullVisitorId,
               visitId) s5 ON s0.fullVisitorId = s5.fullVisitorId
   AND s0.visitId = s5.visitId) s012345
4

1 回答 1

1

在标准 SQL 中,您可以编写一个简单的子查询hits来检查。例如:

SELECT 
  fullvisitorid, visitstarttime,
  ARRAY(
    SELECT AS STRUCT hitNumber, type, page FROM t.hits ORDER BY hitNumber
  ) hits
FROM
    `bigquery-public-data.google_analytics_sample.ga_sessions_20161104` t
WHERE 
  -- exclude sessions with pages containing '/asearch.html'
  -- subquery checks for occurences in the whole query and returns boolean TRUE if found 
  -- NOT turns it into FALSE which filters it out
  NOT (SELECT COUNT(1)>0 FROM t.hits WHERE page.pagePath = '/asearch.html')
ORDER BY array_length(hits) DESC
LIMIT 1000

我还编写了一个子查询来显示数组中会话的命中。在旧版 SQL 中,您将使用OMIT RECORD IF

SELECT 
  fullvisitorid, visitstarttime, hits.page.pagePath
FROM
    [bigquery-public-data:google_analytics_sample.ga_sessions_20161104] t
-- OMIT RECORD IF excludes on record level 
-- if dimension is below record level, you need to aggregate (like with WITHIN)
-- in this case I used MAX() to surface any possible TRUE resulting from the comparison
OMIT RECORD IF MAX(hits.page.pagePath = '/asearch.html')
LIMIT 1000

希望有帮助!

于 2019-03-20T13:57:54.830 回答