mysql - 如何在mysql中找到顺序编号的空白？

Question

我们有一个数据库，其中包含一个表，其值是从另一个系统导入的。有一个自增列，没有重复值，但是有缺失值。例如，运行此查询：

select count(id) from arrc_vouchers where id between 1 and 100

应该返回 100，但它会返回 87。我可以运行任何查询来返回缺失数字的值吗？例如，id 为 1-70 和 83-100 的记录可能存在，但不存在 id 为 71-82 的记录。我想退回 71、72、73 等。

这可能吗？

score 185 · Accepted Answer

更新

ConfexianMJS 在性能方面提供了更好 的答案。

（不是尽可能快）答案

这是适用于任何大小的表（不仅仅是 100 行）的版本：

SELECT (t1.id + 1) as gap_starts_at, 
       (SELECT MIN(t3.id) -1 FROM arrc_vouchers t3 WHERE t3.id > t1.id) as gap_ends_at
FROM arrc_vouchers t1
WHERE NOT EXISTS (SELECT t2.id FROM arrc_vouchers t2 WHERE t2.id = t1.id + 1)
HAVING gap_ends_at IS NOT NULL

gap_starts_at- 当前间隙中的第一个 id
gap_ends_at- 当前间隙中的最后一个 id

score 135 · Accepted Answer

这只是让我找到了超过 80k 行的表中的空白：

SELECT
 CONCAT(z.expected, IF(z.got-1>z.expected, CONCAT(' thru ',z.got-1), '')) AS missing
FROM (
 SELECT
  @rownum:=@rownum+1 AS expected,
  IF(@rownum=YourCol, 0, @rownum:=YourCol) AS got
 FROM
  (SELECT @rownum:=0) AS a
  JOIN YourTable
  ORDER BY YourCol
 ) AS z
WHERE z.got!=0;

结果：

+------------------+
| missing          |
+------------------+
| 1 thru 99        |
| 666 thru 667     |
| 50000            |
| 66419 thru 66456 |
+------------------+
4 rows in set (0.06 sec)

请注意，列的顺序expected很got关键。

如果您知道YourCol它不是从 1 开始并且没关系，您可以替换

(SELECT @rownum:=0) AS a

和

(SELECT @rownum:=(SELECT MIN(YourCol)-1 FROM YourTable)) AS a

新结果：

+------------------+
| missing          |
+------------------+
| 666 thru 667     |
| 50000            |
| 66419 thru 66456 |
+------------------+
3 rows in set (0.06 sec)

如果您需要对缺少的 ID 执行某种 shell 脚本任务，您还可以使用此变体来直接生成可以在 bash 中迭代的表达式。

SELECT GROUP_CONCAT(IF(z.got-1>z.expected, CONCAT('$(',z.expected,' ',z.got-1,')'), z.expected) SEPARATOR " ") AS missing
FROM (  SELECT   @rownum:=@rownum+1 AS expected,   IF(@rownum=height, 0, @rownum:=height) AS got  FROM   (SELECT @rownum:=0) AS a   JOIN block   ORDER BY height  ) AS z WHERE z.got!=0;

这会产生这样的输出

$(seq 1 99) $(seq 666 667) 50000 $(seq 66419 66456)

然后，您可以将其复制并粘贴到 bash 终端中的 for 循环中，以便为每个 ID 执行命令

for ID in $(seq 1 99) $(seq 666 667) 50000 $(seq 66419 66456); do
  echo $ID
  # fill the gaps
done

它与上面的相同，只是它既可读又可执行。通过更改上面的“CONCAT”命令，可以为其他编程语言生成语法。甚至可能是 SQL。

score 12 · Accepted Answer

快速而肮脏的查询应该可以解决问题：

SELECT a AS id, b AS next_id, (b - a) -1 AS missing_inbetween
FROM 
 (
SELECT a1.id AS a , MIN(a2.id) AS b 
FROM arrc_vouchers  AS a1
LEFT JOIN arrc_vouchers AS a2 ON a2.id > a1.id
WHERE a1.id <= 100
GROUP BY a1.id
) AS tab

WHERE 
b > a + 1

这将为您提供一个表格，显示其上方缺少 id 的 id，以及存在的 next_id，以及之间缺少多少...例如

 
id next_id missing_inbetween
 1 4 2
68 70 1
75 87 11

score 6 · Accepted Answer

如果您使用的是一个使用序列存储引擎MariaDB的更快 (800%) 选项：

SELECT * FROM seq_1_to_50000 WHERE SEQ NOT IN (SELECT COL FROM TABLE);

score 3 · Accepted Answer

需要查询+一些代码进行一些处理的替代解决方案是：

select l.id lValue, c.id cValue, r.id rValue 
  from 
  arrc_vouchers l 
  right join arrc_vouchers c on l.id=IF(c.id > 0, c.id-1, null)
  left  join arrc_vouchers r on r.id=c.id+1
where 1=1
  and c.id > 0 
  and (l.id is null or r.id is null)
order by c.id asc;

请注意，查询不包含任何我们知道它没有被 MySQL 的计划程序高效处理的子选择。

这将返回每个没有较小值 (lValue) 或较大值 (rValue) 的 centralValue (cValue) 条目，即：

lValue |cValue|rValue
-------+------+-------
{null} | 2    | 3      
8      | 9    | {null} 
{null} | 22   | 23     
23     | 24   | {null} 
{null} | 29   | {null} 
{null} | 33   | {null}

无需进一步详细说明（我们将在下一段中看到它们），此输出意味着：

没有介于 0 和 2 之间的值
没有 9 到 22 之间的值
没有介于 24 和 29 之间的值
没有介于 29 和 33 之间的值
没有介于 33 和 MAX VALUE 之间的值

所以基本的想法是对同一个表进行 RIGHT 和 LEFT 连接，看看每个值是否有相邻值（即：如果中心值是 '3'，那么我们检查左边的 3-1=2 和 3+1右），并且当 ROW 在 RIGHT 或 LEFT 有 NULL 值时，我们知道没有相邻值。

我的表的完整原始输出是：

select * from arrc_vouchers order by id asc;

0  
2  
3  
4  
5  
6  
7  
8  
9  
22 
23 
24 
29 
33

一些注意事项：

如果您将“id”字段定义为 UNSIGNED，则需要连接条件中的 SQL IF 语句，因此它不允许您将其减少到零以下。如果您保持 c.value > 0 （如下一个注释中所述），则这不是绝对必要的，但我将其作为文档包含在内。
我正在过滤零中心值，因为我们对任何先前的值都不感兴趣，我们可以从下一行推导出 post 值。

score 2 · Accepted Answer

创建一个包含 100 行和包含值 1-100 的单列的临时表。

外部将此表连接到您的 arrc_vouchers 表并选择 arrc_vouchers id 为空的单列值。

编码这个盲，但应该工作。

select tempid from temptable 
left join arrc_vouchers on temptable.tempid = arrc_vouchers.id 
where arrc_vouchers.id is null

score 2 · Accepted Answer

如果两个数字（如 1、3、5、6）之间有一个最大间隔为 1 的序列，则可以使用的查询是：

select s.id+1 from source1 s where s.id+1 not in(select id from source1) and s.id+1<(select max(id) from source1);

表名 -source1
列名 -id

score 1 · Accepted Answer

尽管这些似乎都有效，但是当有 50,000 条记录时，结果集会在很长一段时间内返回。

我使用了它，它找到了间隙或下一个可用的（最后使用的 + 1），查询返回的速度要快得多。

SELECT a.id as beforegap, a.id+1 as avail
FROM table_name a
where (select b.id from table_name b where b.id=a.id+1) is null
limit 1;

score 1 · Accepted Answer

根据 Lucek 上面给出的答案，这个存储过程允许您指定要测试以查找非连续记录的表和列名称 - 从而回答原始问题并演示如何使用 @var 来表示表和/ 或存储过程中的列。

create definer=`root`@`localhost` procedure `spfindnoncontiguous`(in `param_tbl` varchar(64), in `param_col` varchar(64))
language sql
not deterministic
contains sql
sql security definer
comment ''
begin
declare strsql varchar(1000);
declare tbl varchar(64);
declare col varchar(64);

set @tbl=cast(param_tbl as char character set utf8);
set @col=cast(param_col as char character set utf8);

set @strsql=concat("select 
    ( t1.",@col," + 1 ) as starts_at, 
  ( select min(t3.",@col,") -1 from ",@tbl," t3 where t3.",@col," > t1.",@col," ) as ends_at
    from ",@tbl," t1
        where not exists ( select t2.",@col," from ",@tbl," t2 where t2.",@col," = t1.",@col," + 1 )
        having ends_at is not null");

prepare stmt from @strsql;
execute stmt;
deallocate prepare stmt;
end

score 1 · Accepted Answer

我以不同的方式尝试了它，我发现最好的性能是这个简单的查询：

select a.id+1 gapIni
    ,(select x.id-1 from arrc_vouchers x where x.id>a.id+1 limit 1) gapEnd
    from arrc_vouchers a
    left join arrc_vouchers b on b.id=a.id+1
    where b.id is null
    order by 1
;

...一个左连接来检查下一个id是否存在，只有在没有找到下一个id时，子查询才会找到下一个存在的id来找到gap的结尾。我这样做是因为使用等于(=) 的查询比大于(>) 运算符的性能更好。

使用sqlfiddle它不会显示其他查询的不同性能，但在真实数据库中，此查询的结果比其他查询快 3 倍。

架构：

CREATE TABLE arrc_vouchers (id int primary key)
;
INSERT INTO `arrc_vouchers` (`id`) VALUES (1),(4),(5),(7),(8),(9),(10),(11),(15),(16),(17),(18),(19),(20),(21),(22),(23),(24),(25),(26),(27),(28),(29)
;

按照下面我为比较性能所做的所有查询：

select a.id+1 gapIni
    ,(select x.id-1 from arrc_vouchers x where x.id>a.id+1 limit 1) gapEnd
    from arrc_vouchers a
    left join arrc_vouchers b on b.id=a.id+1
    where b.id is null
    order by 1
;
select *, (gapEnd-gapIni) qt
    from (
        select id+1 gapIni
        ,(select x.id from arrc_vouchers x where x.id>a.id limit 1) gapEnd
        from arrc_vouchers a
        order by id
    ) a where gapEnd <> gapIni
;
select id+1 gapIni
    ,(select x.id from arrc_vouchers x where x.id>a.id limit 1) gapEnd
    #,coalesce((select id from arrc_vouchers x where x.id=a.id+1),(select x.id from arrc_vouchers x where x.id>a.id limit 1)) gapEnd
    from arrc_vouchers a
    where id+1 <> (select x.id from arrc_vouchers x where x.id>a.id limit 1)
    order by id
;
select id+1 gapIni
    ,coalesce((select id from arrc_vouchers x where x.id=a.id+1),(select x.id from arrc_vouchers x where x.id>a.id limit 1)) gapEnd
    from arrc_vouchers a
    order by id
;
select id+1 gapIni
    ,coalesce((select id from arrc_vouchers x where x.id=a.id+1),concat('*** GAT *** ',(select x.id from arrc_vouchers x where x.id>a.id limit 1))) gapEnd
    from arrc_vouchers a
    order by id
;

也许它可以帮助某人并且有用。

您可以使用此sqlfiddle查看并测试我的查询：

http://sqlfiddle.com/#!9/6bdca7/1

score 1 · Accepted Answer

可能不相关，但我一直在寻找这样的东西来列出一系列数字中的差距，并找到了这篇文章，它有多种不同的解决方案，具体取决于您正在寻找的内容。我正在寻找序列中的第一个可用间隙（即下一个可用数字），这似乎工作正常。

SELECT MIN(l.number_sequence + 1) as nextavabile 
from patients as l 
LEFT OUTER JOIN patients as r on l.number_sequence + 1 = r.number_sequence
WHERE r.number_sequence is NULL

从 2005 年开始，那里讨论了其他几个场景和解决方案！

如何使用 SQL 查找序列中的缺失值

score 0 · Accepted Answer

一个简单而有效的解决方案来查找丢失的自动增量值

SELECT `id`+1 
FROM `table_name` 
WHERE `id`+1 NOT IN (SELECT id FROM table_name)

score 0 · Accepted Answer

确定差距的另一个简单答案。我们只选择奇数进行查询，然后将其加入到所有偶数的查询中。只要您不缺少 id 1；这应该会给你一个全面的清单，列出差距从哪里开始。

您仍然需要查看数据库中的那个位置，以确定差距有多少个数字。我发现这种方式比提出的解决方案更容易，并且更容易根据独特情况进行定制。

SELECT *
FROM (SELECT * FROM MyTABLE WHERE MYFIELD % 2 > 0) AS A
RIGHT JOIN FROM (SELECT * FROM MyTABLE WHERE MYFIELD % 2 = 0) AS B
ON A.MYFIELD=(B.MYFIELD+1)
WHERE a.id IS NULL;

score 0 · Accepted Answer

从user933161发表的评论开始，

select l.id + 1 as start from sequence as l inner join sequence as r on l.id + 1 = r.id where r.id is null;

更好的是它不会对记录列表的末尾产生误报。（我不确定为什么这么多人使用左外连接。）另外，

insert into sequence (id) values (#);

其中 # 是间隙的起始值，将填充该起始值。（如果存在不能为空的字段，则必须添加具有虚拟值的字段。）您可以在查询起始值和填写每个起始值之间交替，直到起始值查询返回一个空集。

Of course, this approach would only be helpful if you're working with a small enough data set that manually iterating like that is reasonable. I don't know enough about things like phpMyAdmin to come up with ways to automate it for larger sets with more and larger gaps.

mysql - 如何在mysql中找到顺序编号的空白？

14 回答 14

更新

（不是尽可能快）答案

Related

Reference