mysql - SQL 帮助 - 选择相关行数最多的表

Question

我有三个数据库表：routes、trips和stoptimes，其中包含运输信息。它们与外键的关系如下：

         routes -> ROUTE_ID -> trips -> TRIP_ID -> stoptimes

即有一些路线，每条路线有很多次旅行，每次旅行甚至更多的停留时间。

对于表中的每条路线，我想选择停留时间最多的行程。

此外，每条路线也有一个枚举（INT）direction_id，我想为每条路线选择每个方向停留时间最多的行程。

这都是为了一些数据预处理，想法是这些选定的行程将在它们上设置一个标志，以便将来可以轻松地调用它们。

是否有可能在 SQL 中实现这一点？

编辑：

根据要求提供更多信息。这是一个示例 SELECT 查询/结果表：

select t.route_id, t.direction_id, t.trip_id, NumStops, t.isPrototypical
from trips t join
     (select st.trip_id, count(*) as NumStops
      from stoptimes st
      group by st.trip_id
     ) st
     on st.trip_id = t.trip_id;

结果：

示例 sql 结果表

在上面的示例中，我想要一个 SQL 语句来选择行程 2 和 10，因为它们在每个方向上都有（相等）最大的 NumStops。更好SELECTING的是，而不是 SQL 语句可以UPDATE将列isPrototypical用于TRUE那些特定的行。

请记住：在生产数据库中，每次行程都会有多个route_id和任意数量的direction_ids。该声明需要为每个方向和每个路线发挥作用。

最终答案

下面的 Gordon Linoff 提供了一个正确的、性能良好的解决方案，我想我也会发布他用来解决问题的代码的修改版本。

这是选择和更新每条路线、每个方向的停靠点最多的行程的 SQL，而在出现平局时只选择一次行程：

update trips t join  ( select substring_index(group_concat(t.trip_id order by NumStops desc), ',', 1) as prototripid from trips t join
     (select st.trip_id, count(*) as NumStops
      from stoptimes st
      group by st.trip_id
     ) st
     on st.trip_id = t.trip_id group by t.route_id, t.direction_id ) t2 on t2.prototripid = t.trip_id set isPrototypical = 1 ;

我相信这可能是 MySQL 特有的。

score 1 · Accepted Answer

您可以使用 MySQL 中的一个技巧来做到这一点，包括组连接。

这是查询：

select t.route_id,
       substring_index(group_concat(t.trip_id order by NumStops desc), ',', 1),
       max(NumStops) as Length
from trips t join
     (select st.trip_id, count(*) as NumStops
      from stoptimes st
      group by st.trip_id
     ) st
     on st.trip_id = t.trip_id
group by t.route_id;

（routes除非您需要路线的名称，否则您不需要该表。）

子查询计算每次行程的停靠次数。然后由聚合route_id。

通常，group_concat()将用于将所有行程放在逗号分隔的字符串中。在这里它会这样做，但需要注意的是，它们是按照最长的停靠点的数量排序的。然后该函数substring_index()取第一个值。

这会将转换trip_id为字符串。您可能希望将其转换回它开始时的任何数据类型。

以下是每个方向的最佳选择：

select t.route_id, t.direction_id,
       substring_index(group_concat(t.trip_id order by NumStops desc), ',', 1),
       max(NumStops) as Length
from trips t join
     (select st.trip_id, count(*) as NumStops
      from stoptimes st
      group by st.trip_id
     ) st
     on st.trip_id = t.trip_id
group by t.route_id, t.direction_id;

因为方向是按行程st级别存储的，所以它不会干扰行程中停靠点的计数（也就是说，在子查询中似乎不需要它。

score 0 · Accepted Answer

如果您正确地将所有表格连接在一起，您将获得每个停止时间的一行，因此 aCOUNT(*)将为您提供总站数。

至于按方向计数，我假设方向值为1, 2, 3, .... 我不知道哪个表direction_id在其中，所以我在查询中将其保留为非别名：

SELECT routes.Route_ID
   COUNT(*) AS TotalStops,
   COUNT(CASE WHEN direction_id = 1 THEN 1 END) AS Direction1Stops,
   COUNT(CASE WHEN direction_id = 2 THEN 1 END) AS Direction2Stops,
   COUNT(CASE WHEN direction_id = 3 THEN 1 END) AS Direction3Stops,
   ... and the remaining direction_id values
FROM routes
INNER JOIN trips ON routes.Route_ID = trips.Route_ID
INNER JOIN stoptimes on trips.Trip_ID = stoptimes.Trip_ID
GROUP BY routes.Route_ID

score 0 · Accepted Answer

虽然我确信有一种更优雅的方法可以做到这一点，但其概念是使用MAX和GROUP BY. 如果 MySQL 支持 Common Table Expressions，这看起来不会那么糟糕：

update trips t
  join (
    select t.route_id, t.direction_id, t.trip_id, NumStops, t.isPrototypical
    from trips t join
         (select st.trip_id, count(*) as NumStops
          from stoptimes st
          group by st.trip_id
         ) st
         on st.trip_id = t.trip_id
    ) t2 on t.trip_id = t2.trip_id
  join (
    select max(numstops) maxnumstops, route_id, direction_id
    from (
      select t.route_id, t.direction_id, t.trip_id, NumStops, t.isPrototypical
      from trips t join
         (select st.trip_id, count(*) as NumStops
          from stoptimes st
          group by st.trip_id
         ) st
         on st.trip_id = t.trip_id
      ) t
    group by route_id, direction_id
    ) t3 on t2.numstops = t3.maxnumstops and t2.route_id = t3.route_id and t2.direction_id = t3.direction_id
set t.isPrototypical = 1;

SQL 小提琴演示

mysql - SQL 帮助 - 选择相关行数最多的表

3 回答 3

Related

Reference