mysql - 从不同的相关记录组中选择两列之一中包含重复值的所有行

Question

我正在尝试创建一个 MySQL 查询，该查询将返回包含一组相关记录中重复值的所有单个行（未分组）。“相关记录组”是指具有相同帐号的那些（根据下面的示例）。

基本上，在共享相同不同帐号的每组相关记录中，仅选择那些其date或amount列的值与该帐户记录组中另一行的值相同的行。仅应将值视为该帐户组中的重复值。下面的示例表和理想输出详细信息应该可以解决问题。

此外，我不关心返回状态为 X 的任何记录，即使它们具有重复值。

带有相关数据的小样本表：

id   account   invoice   date         amount   status
1    1         1         2012-04-01   0        X
2    1         2         2012-04-01   120      P
3    1         2         2012-05-01   120      U
4    1         3         2012-05-01   117      U
5    2         4         2012-04-01   82       X
6    2         4         2012-05-01   82       U
7    2         5         2012-03-01   81       P
8    2         6         2012-05-01   80       U
9    3         7         2012-03-01   80       P
10   3         8         2012-04-01   79       U
11   3         9         2012-04-01   78       U

从所需的 SQL 查询返回的理想输出：

id   account   invoice   date         amount   status
2    1         2         2012-04-01   120      P
3    1         2         2012-05-01   120      U
4    1         3         2012-05-01   117      U
6    2         4         2012-05-01   82       U
8    2         6         2012-05-01   80       U
10   3         8         2012-04-01   79       U
11   3         9         2012-04-01   78       U

因此，不应同时返回第 7/9 行和第 8/9 行，因为它们的重复值在其各自帐户的范围内不被视为重复值。但是，应该返回第 8 行，因为它与第 6 行共享一个重复值。

稍后，我可能想通过仅抓取具有匹配状态的重复行来进一步完善选择，因此将排除第 2 行，因为它与该帐户记录组中的其他两个不匹配。这会使查询变得更加困难吗？是否只是添加 WHERE 或 HAVING 子句的问题，还是比这更复杂？

我希望我对我要完成的工作的解释是有道理的。我尝试过使用 INNER JOIN ，但这会多次返回每个所需的行。我不想要重复的重复。

表结构和样本值：

CREATE TABLE payment (
  id int(11) NOT NULL auto_increment,
  account int(10) NOT NULL default '0',
  invoice int(10) NOT NULL default '0',
  date date NOT NULL default '0000-00-00',
  amount int(10) NOT NULL default '0',
  status char(1) NOT NULL default '',
  PRIMARY KEY  (id)
);

INSERT INTO payment VALUES (1, 1, 1, '2012-04-01', 0, 'X'); 
INSERT INTO payment VALUES (2, 1, 2, '2012-04-01', 120, 'P'); 
INSERT INTO payment VALUES (3, 1, 2, '2012-05-01', 120, 'U'); 
INSERT INTO payment VALUES (4, 1, 3, '2012-05-01', 117, 'U'); 
INSERT INTO payment VALUES (5, 2, 4, '2012-04-01', 82, 'X'); 
INSERT INTO payment VALUES (6, 2, 4, '2012-05-01', 82, 'U'); 
INSERT INTO payment VALUES (7, 2, 5, '2012-03-01', 81, 'p'); 
INSERT INTO payment VALUES (8, 2, 6, '2012-05-01', 80, 'U'); 
INSERT INTO payment VALUES (9, 3, 7, '2012-03-01', 80, 'U'); 
INSERT INTO payment VALUES (10, 3, 8, '2012-04-01', 79, 'U'); 
INSERT INTO payment VALUES (11, 3, 9, '2012-04-01', 78, 'U');

score 10 · Accepted Answer

这种类型的查询可以实现为半连接。

半连接用于从连接中的一个表中选择行。

例如：

select distinct l.*
from payment l
inner join payment r
on 
  l.id != r.id and l.account = r.account and
  (l.date = r.date or l.amount = r.amount)
where l.status != 'X' and r.status != 'X'
order by l.id asc;

请注意 , 的使用distinct，并且我只从左表中选择列。这样可以确保没有重复。

连接条件检查：

它没有加入一行（l.id != r.id）
行在同一个帐户中 ( l.account = r.account)
并且日期或金额相同 ( l.date = r.date or l.amount = r.amount)

对于问题的第二部分，您需要更新on查询中的子句。

score 3 · Accepted Answer

这似乎有效

select * from payment p1
join payment p2 on
(p1.id != p2.id 
 and p1.status != 'X'
 and p1.account = p2.account
 and (p1.amount = p2.amount or p1.date = p2.date))
group by p1.id

http://sqlfiddle.com/#!2/a50e9/3

mysql - 从不同的相关记录组中选择两列之一中包含重复值的所有行

带有相关数据的小样本表：

从所需的 SQL 查询返回的理想输出：

表结构和样本值：

2 回答 2

Related

Reference