2

我有一个包含大约 300,000 行产品信息的数据库。
我需要检索具有重复 UPC (COUNT(upc) > 1) 的行,其中至少有一个结果描述与某个字符串匹配(例如,“Reed”。)

例如,以下行将全部被选中(desc、upc 对)

Deer D7394    62226173
Reed R2536    62226173
Deer D7217    62226173

但没有一个

Deer D0173    62278389
Deer D7289    62278389
Deer D9272    62278389

这是我正在使用的查询:

SELECT a.desc, a.upc, a.sku, a.short_description 
FROM inventory a 
JOIN 
    (SELECT upc, desc 
    FROM inventory 
    GROUP BY upc 
    HAVING COUNT(upc) > 1) b 
ON a.upc = b.upc 
WHERE ((a.desc LIKE '%Reed%') OR (b.desc LIKE '%Reed%'))
AND a.upc != '' 
AND a.upc != 0 
ORDER BY upc;

我对 MySQL 比较陌生,但这似乎应该可以工作。但是,某些结果无法返回不匹配的行(即,将返回 Reed R2536,但不会返回 Deer D7394)。

任何见解将不胜感激!

4

3 回答 3

3

当重复的数量很少时, Brian 的group_concat方法会起作用,但如果不是,它会默默地失败。你永远不会知道; 你只会丢失应该存在的行。

您要做的是选择至少一个描述匹配(并且存在重复项)的所有 UPC,然后从该列表中选择与这些 UPC 中的每一个匹配的所有行。

如果您按 UPC 对所有项目进行分组,那么您可以用计数注释每个项目,并标记是否有任何描述匹配:

SELECT upc, COUNT(*) c, MAX(`desc` LIKE '%Reed%') desc_matches
FROM inventory
GROUP BY upc

(这利用了布尔运算符,如LIKE,实际上返回0false 和1true 的事实。取该列的最大值告诉您是否有任何行匹配)

然后,您可以根据您的条件过滤该列表,以获得您感兴趣的 UPC:

SELECT upc, COUNT(*) c, MAX(`desc` LIKE '%Reed%') desc_matches
FROM inventory
GROUP BY upc
HAVING desc_matches = 1 AND c > 1

获得该列表后,您希望查看与这些 UPC 中的任何一个匹配的所有产品。你可以通过一个简单的(不是外部的)加入来做到这一点:

SELECT a.desc, a.upc, a.sku, a.short_description 
FROM inventory a 
JOIN 
    ( SELECT upc, COUNT(*) c, MAX(`desc` LIKE '%Reed%') desc_matches
      FROM inventory
      GROUP BY upc
      HAVING desc_matches = 1 AND c > 1
    ) b USING (upc)
于 2012-09-19T23:04:02.470 回答
1

假设您没有太多重复记录,另一种可能的方法是:

select * from inventory i
  join (
         SELECT upc 
           FROM inventory 
            GROUP BY upc 
            HAVING COUNT(upc) > 1
              and group_concat(`desc`) like '%reed%') as available_upc 
          on available_upc.upc = i.upc

这假设您的表格看起来像:

CREATE TABLE inventory(
  sku CHAR(32) NOT NULL,
  `desc` CHAR(32) NOT NULL,
  upc CHAR(32) NOT NULL,
  short_description CHAR(32) NOT NULL,
  PRIMARY KEY (sku)
);

insert into inventory values ('D7394','Deer','62226173','Small Deer');
insert into inventory values ('R2536','Reed','62226173','Small Reed');
insert into inventory values ('D7217','Deer','62226173','Large Deer');


insert into inventory values ('D0173','Deer','62278389','Small Deer');
insert into inventory values ('D7289','Deer','62278389','Small Reed');
insert into inventory values ('D9272','Deer','62278389','Large Deer');
于 2012-09-19T22:22:25.617 回答
0

没有测试很难说,但试试:

SELECT a.desc, a.upc, a.sku, a.short_description 
FROM inventory a 
OUTER RIGHT JOIN 
    (SELECT upc
    FROM inventory 
    GROUP BY upc 
    HAVING COUNT(upc) > 1) b 
ON a.upc = b.upc 
WHERE ((a.desc LIKE '%Reed%') OR (b.desc LIKE '%Reed%'))
AND a.upc != '' 
AND a.upc != 0 
ORDER BY upc;

关键是OUTER RIGHT JOIN. 请参阅文章: http: //www.codeproject.com/Articles/33052/Visual-Representation-of-SQL-Joins

此外,您只需要从内部SELECT查询返回 upc。

于 2012-09-19T22:17:35.720 回答