2

这是我的查询:

    WITH desc_table(counter, hourly, current_weather_description, current_icons, time_stamp) AS (
Select count(*) AS counter, CASE WHEN  strftime('%M',  'now') < '30' 
                THEN strftime('%H', 'now')  
                ELSE strftime('%H', time_stamp, '+1 hours') END as hourly, 
                current_weather_description,
                current_icons,
                time_stamp
                From weather_events
                GROUP BY strftime('%H',  time_stamp, '+30 minutes'), current_weather_description
                UNION ALL
                Select count(*) as counter, hourly - 1, current_weather_description, current_icons, time_stamp
                From weather_events
                GROUP BY strftime('%H',  time_stamp, '+30 minutes'), current_weather_description
                Order By counter desc limit 1
                ),
        avg_temp_table(avg_temp, hour_seg, time_stamp) AS (
        select avg(current_temperatures) as avg_temp, CASE WHEN  strftime('%M',  time_stamp) < '30' 
                THEN strftime('%H', time_stamp)  
                ELSE strftime('%H', time_stamp, '+1 hours') END as hour_seg, 
                time_stamp
                from weather_events
                group by strftime('%H',  time_stamp, '+30 minutes')
                order by hour_seg desc
                )

                Select  hourly, current_weather_description
                from desc_table
                join avg_temp_table
                on desc_table.hourly=avg_temp_table.hour_seg

基本上我有一些天气数据,我按小时间隔分组(偏移 30 分钟),我想专门计算在该时间间隔内获得特定天气描述(和匹配图标)的次数,并在其中选择天气描述出现次数最多的时间间隔(计数)(desc_table)。然后我想获得该时间段内的平均温度((avg_temp_table)(也许我需要一个子查询?要做到这一点而不是我如何拥有它)并沿着他们的小时列加入两个查询。

我希望我的锚基于查询的时间(现在)并计算出现次数,然后下一个成员将每次减去一个小时并进入下一个时间间隔并计数等。

样本数据,对于常规数据集 {current_temperatures, current_weather_description, current_icons, time_stamp},每个时间段内会有更多的行:

"87"    "Rain"  "rainicon"  "2016-01-20 02:15:08"
"65"    "Snow"  "snowicon"  "2016-01-20 02:39:08"
"49"    "Rain"  "rainicon"  "2016-01-20 03:15:08"
"49"    "Rain"  "rainicon"  "2016-01-20 03:39:08"
"46"    "Clear" "clearicon" "2016-01-20 04:15:29"
"46"    "Clear" "clearicon" "2016-01-20 04:38:53"
"46"    "Cloudy" "cloudyicon" "2016-01-20 05:15:08"
"46"    "Clear" "clearicon" "2016-01-20 05:39:08"
"45"    "Clear" "clearicon" "2016-01-20 06:14:17"
"45"    "Clear" "clearicon" "2016-01-20 06:34:23"
"45"    "Clear" "clearicon" "2016-01-20 07:24:54"
"45"    "Rain"  "rainicon"  "2016-01-20 07:44:41"
"43"    "Rain"  "rainicon"  "2016-01-20 08:19:08"
"36"    "Clear" "clearicon" "2016-01-20 08:39:08"
"35"    "Meatballs" "meatballsicon" "2016-01-20 09:18:08"
"18"    "Cloudy" "cloudyicon" "2016-01-20 09:39:08"

输出是时间间隔的平均温度 (avg_temp_table) 与第一个聚合 CTE (desc_table) {avg_temp, weather_description, current_icon} 的输出之间的连接:

"87"    "Rain"  "rainicon"
"57"    "Rain"  "rainicon"
"47"    "Clear" "clearicon"
"46"    "Clear" "clearicon"
"46"    "Cloudy" "cloudyicon"
"45"    "Clear" "clearicon"
"44"    "Rain"  "rainicon"
"36"    "Clear" "clearicon"
"18"    "Cloudy" "cloudyicon"

现在我没有这样的列错误,因为我的锚来自我的 weather_events 表,我的递归成员也是如此。当我将递归成员从更改为 desc_table 时,出现“不支持递归聚合查询错误”。但我不想从 desc_table 中获取我的递归成员,我想按小时分段,然后通过每个小时间隔并获取计数。我猜我一开始也做错了锚。

4

1 回答 1

7

我仍然不确定您的desc_table递归 CTE 应该如何选择每小时出现的最高天气描述及其图标,但这很好,因为使用您的口头描述,我想我已经找到了一种无需递归的方法.

首先,按小时和描述对结果进行分组,并计算每组中的行数:

SELECT
  strftime('%H', time_stamp, '+30 minutes') AS hour,
  current_weather_description,
  current_icons,
  COUNT(*) AS event_count
FROM
  weather_events
GROUP BY
  strftime('%H', time_stamp, '+30 minutes'),
  current_weather_description

下一步,按小时对上述查询的结果进行分组,并获得每小时的最大事件数:

SELECT
  hour,
  MAX(event_count) AS max_event_count
FROM
  (
    SELECT
      strftime('%H', time_stamp, '+30 minutes') AS hour,
      current_weather_description,
      current_icons,
      COUNT(*) AS event_count
    FROM
      weather_events
    GROUP BY
      strftime('%H', time_stamp, '+30 minutes'),
      current_weather_description
  ) AS s
GROUP BY
  hour

这仍然不是您想要的,因为您实际上希望描述和图标与最大计数匹配,而不是计数本身。好吧,这很容易解决——只需将这些列添加到 SELECT而不将它们添加到 GROUP BY

SELECT
  hour,
  current_weather_description,
  current_icons,
  MAX(event_count) AS max_event_count
FROM
  (
    SELECT
      strftime('%H', time_stamp, '+30 minutes') AS hour,
      current_weather_description,
      current_icons,
      COUNT(*) AS event_count
    FROM
      weather_events
    GROUP BY
      strftime('%H', time_stamp, '+30 minutes'),
      current_weather_description
  ) AS s
GROUP BY
  hour

您仍然需要将 保留MAX(event_count)在查询中才能使技巧起作用。它起作用的原因是因为在 SQLite 中,当 SELECT 语句包含单个 MAX 或单个 MIN 调用时,任何既不在 GROUP BY 中也不聚合的选定列的值将从匹配所述 MAX 或 MIN 值的行中获取. 这种对 SQL 的非标准扩展记录在SQLite 3.7.11 的发行说明中

这么多desc_table。至于avg_temp_tableCTE,您当前的方法似乎没有任何问题,除了我可能会使用 GROUP BY 表达式作为小时定义而不是您正在使用的 CASE 表达式,以保持一致性,并且time_stamp似乎对结果也是多余的。所以稍微修改后的 CTE 看起来像这样:

SELECT
  strftime('%H', time_stamp, '+30 minutes') AS hour,
  AVG(current_temperatures) AS avg_temp
FROM
  weather_events
GROUP BY
  strftime('%H', time_stamp, '+30 minutes')

而现在你只需要将列上的两个集合加入hour并选择相关列进行最终输出:

SELECT
  t.avg_temp,
  d.current_weather_description,
  d.current_icons
FROM
  avg_temp_table AS t
  INNER JOIN desc_table AS d on t.hour = d.hour
ORDER BY
  t.hour

所以你来了。现在我只想解决一个与结果查询有关的问题,即

可以避免加入吗?

虽然您的解决方案方法(分别获取描述和平均温度,然后将两组连接在一起)很简单且非常有意义,但最好避免连接并同时进行所有计算。这很可能会使查询更快,因为源只会被扫描一次。这可以实现吗?

碰巧,是的,它可以。将这两部分结合起来的主要困难在于,描述是分两步获得的,而平均温度的计算是单步操作。简单地放入AVG(current_temperatures)第一个 CTE 的嵌套 SELECT(按小时和描述分组),然后对外部 SELECT(按小时分组)中的结果执行 AVG 在数学上与在整个小时组中执行一次 AVG 是不等价的。

相反,您需要记住 AVG = SUM / COUNT。如果您在第一步中获得 SUM 和 COUNT,然后在第二步中获得 SUM 的 SUM 和 COUNTs 的 SUM,则只需将第一个外部 SUM 除以第二个外部 SUM 即可得到平均值。

这是修改后的新desc_tableCTE 以结合查询的两个部分(因此它不再应该是 CTE,而是完整的查询),必要的更改以粗体突出显示:

SELECT
  SUM(total_temp) / SUM(event_count) AS avg_temp,
  current_weather_description,
  current_icons,
  MAX(event_count) AS max_event_count
FROM
  (
    SELECT
      strftime('%H', time_stamp, '+30 minutes') AS hour,
      current_weather_description,
      current_icons,
      COUNT(*) AS event_count,
      SUM(current_temperatures) AS total_temp
    FROM
      weather_events
    GROUP BY
      strftime('%H', time_stamp, '+30 minutes'),
      current_weather_description
  ) AS s
GROUP BY
  hour
ORDER BY
  hour
;

显然,该max_event_count列对于输出来说是多余的——对于查询所依赖的“每组最大 N”方法仍然至关重要。就个人而言,在这种情况下,我不会担心一个冗余列,但是如果您有充分的理由将其从结果集中排除,您可以将上述查询用作派生表(是的,再次)并使用最外层的 SELECT 拉取除了max_event_count- 例如,像这样的所有列:

SELECT
  avg_temp,
  current_weather_description,
  current_icons
FROM
  (
    SELECT
      hour,
      SUM(total_temp) / SUM(event_count) AS avg_temp,
      current_weather_description,
      current_icons,
      MAX(event_count) AS max_event_count
    FROM
      (
        SELECT
          strftime('%H', time_stamp, '+30 minutes') AS hour,
          current_weather_description,
          current_icons,
          COUNT(*) AS event_count,
          SUM(current_temperatures) AS total_temp
        FROM
          weather_events
        GROUP BY
          strftime('%H', time_stamp, '+30 minutes'),
          current_weather_description
      ) AS s
    GROUP BY
      hour
  ) AS s
ORDER BY
  hour desc
;

如您所见,中间层 SELECT 现在也包括在内hour,这是最外层的 ORDER BY 所必需的。(我在这里假设顺序对于调用应用程序很重要。)

我只需要提及两种方法的结果之间的一个区别。在第一个中,AVG(current_temperatures)给你一个浮点结果。在第二个中,SUM(total_temp) / SUM(event_count)给你一个整数。由于您的预期结果显示整数平均值,我想这应该不是问题。但是,如果您以后决定要更精确地计算平均值,请记住,您可以将 SUM 函数替换为TOTAL 函数SUM(total_temp)SUM(current_temperatures)TOTAL 函数,该函数返回与 SUM 相同的值,但结果始终为 a real。在 SQLite 中将 a除以 areal得到integera real,因此使用 TOTAL 您将获得与第一种方法中使用 AVG 相同的结果。

于 2016-01-22T13:30:43.397 回答