-1

I have a table with items that are booked together and in a certain order. This results in a table with a data set like this.

id item_id group_id 
 1     1        1
 2     2        1
 3     3        1
 4     1        2
 5     2        2
 6     3        2
 7     2        3
 8     1        3
 9     3        3
10     3        4
11     2        4
12     1        4
13     1        5
14     2        5
15     3        5
16     4        5
 .
 .
 .

Now, I am looking for a query (or multiple) that finds the different sort orders within the groups and that can indicate the dominant one. In this case the answer should be something like:

group_id order_used_nr_times
     1        3
     2        3
     3        1
     4        1
     5        3
     .             
     .             
     .             

Note, as group 5 indicates, it is well possible that more items exist within the group and that the searched items are a subset (e.g.,looking for order of items 4,5,6 and found in 1,2,3,4,5,6,7,8,9 is an option.

I've been thinking about a query with group and having or something with mysql transpose but I can't get my head around it.

Additional info:

I need the query to give me the dominant sort order (this case 1,2,3) so it can be used to insert a new group that consists of the items 1,2,3 ordered 1,2,3 and not 2,1,3 or 3,2,1, in this example.

From a business perspective: There are two "groups of people" using the system, group A and Group B. Group A knows how to order the items, therefore sets the order manually and the systems just inserts the data in the given order. Group B however, doesn't know the order. Therefore the system (query) needs to look if Group A already booked these items and if so, in which order they occurs most often (order can differ as the example shows). The order from group A will then be used to insert the data from group B assuming this is the most logical.

I hope this explanation helps.

4

1 回答 1

1

可以找到相同组的计数。group_id您可以首先按GROUP_CONCAT值对行进行分组item_id

SELECT
  group_id,
  GROUP_CONCAT(item_id ORDER BY id) AS item_list
FROM atable
GROUP BY
  group_id
;

这会给你一个这样的结果集:

group_id  item_list
--------  ---------
1         1,2,3
2         1,2,3
3         2,1,3
4         3,2,1
5         1,2,3,4

现在很容易获得每个不同项目列表的条目数:

SELECT
  item_list,
  COUNT(*) AS nr_times
FROM (
  SELECT
    group_id,
    GROUP_CONCAT(item_id ORDER BY id) AS item_list
  FROM atable
  GROUP BY
    group_id
) AS s
GROUP BY
  item_list
;

查询返回:

item_list  nr_times
---------  --------
1,2,3      2
1,2,3,4    1
2,1,3      1
3,2,1      1

这不是您想要的输出,因为您需要组 ID 旁边的计数。因此,最后一个行集需要连接到前一个行集:

SELECT
  groups.group_id,
  counts.nr_times
FROM (
  SELECT
    group_id,
    GROUP_CONCAT(item_id ORDER BY id) AS item_list
  FROM atable
  GROUP BY group_id
) AS groups
INNER JOIN (
  SELECT
    item_list,
    COUNT(*) AS nr_times
  FROM (
    SELECT GROUP_CONCAT(item_id ORDER BY id) AS item_list
    FROM atable
    GROUP BY group_id
  ) AS s
  GROUP BY item_list
) AS counts
ON groups.item_list = counts.item_list
;

输出:

group_id  nr_times
--------  --------
1         2
2         2
3         1
4         1
5         1

在这一点上,很明显将同一组分组两次可能不是一个好主意。也许最好将第一次分组的结果存储到一个临时表中,然后使用它来获得最终结果:

CREATE TEMPORARY TABLE temp_results
AS
SELECT
  group_id,
  GROUP_CONCAT(item_id ORDER BY id) AS item_list
FROM atable
GROUP BY
  group_id
;

SELECT
  groups.group_id,
  counts.nr_times
FROM temp_results AS groups
INNER JOIN (
  SELECT
    item_list,
    COUNT(*) AS nr_times
  FROM temp_results
  GROUP BY item_list
) AS counts
ON groups.item_list = counts.item_list
;

现在要获得所需输出中的数字,您可以尝试将两组与 LIKE 匹配,如下所示:

SELECT
  groups.group_id,
  counts.nr_times
FROM temp_results AS groups
INNER JOIN (
  SELECT
    item_list,
    COUNT(*) AS nr_times
  FROM temp_results
  GROUP BY item_list
) AS counts
ON CONCAT(',', groups.item_list, ',') LIKE CONCAT('%,', counts.item_list, ',%')
OR CONCAT(',', counts.item_list, ',') LIKE CONCAT('%,', groups.item_list, ',%')
;

以上将为您提供以下信息:

group_id  nr_times
--------  --------
1         2
1         1
2         2
2         1
3         1
4         1
5         2
5         1

显然,你现在只需要坚持

GROUP BY groups.group_id

在最后一个查询的末尾并将counts.nr_times其 SELECT 子句中的替换为

SUM(counts.nr_times) AS order_used_nr_times

获得与您的问题相同的输出:

group_id  order_used_nr_times
--------  -------------------
1         3
2         3
3         1
4         1
5         3

但是请注意,如果您有包含 items 1,2,33,4,5和的组1,2,3,4,5,6,则在最后一个查询中使用的 LIKE 连接条件将匹配前两个组中的任何一个,仅与第三个组匹配,而不是彼此匹配,而第三个组将匹配与前两组。

我不确定这是否满足您的要求,因为我仍然无法就该特定点做出您的解释(对不起)。我确实希望这篇文章至少能给你一些想法,最终如何得出正确的结果。

于 2013-10-12T19:12:21.327 回答