sql - MySQL中的自然排序

Question

有没有一种优雅的方法可以在 MySQL 数据库中进行高效、自然的排序？

例如，如果我有这个数据集：

最终幻想
最终幻想4
最终幻想10
最终幻想12
最终幻想 12：Promathia 锁链
最终幻想冒险
最终幻想起源
最终幻想战术

除了将游戏名称拆分为组件之外的任何其他优雅解决方案

书名：《最终幻想》
编号：“12”
副标题：“Promathia 之链”

以确保它们以正确的顺序出现？（10 在 4 之后，而不是在 2 之前）。

这样做对a**来说是一种痛苦，因为时不时会有另一款游戏打破解析游戏标题的机制（例如“战锤40,000”、“詹姆斯邦德007”）

score 97 · Accepted Answer

这是一个快速的解决方案：

SELECT alphanumeric, 
       integer
FROM sorting_test
ORDER BY LENGTH(alphanumeric), alphanumeric

score 63 · Accepted Answer

刚发现这个：

SELECT names FROM your_table ORDER BY games + 0 ASC

当数字在前面时进行自然排序，也可能适用于中间。

score 57 · Accepted Answer

与@plalx 发布的功能相同，但重写为 MySQL：

DROP FUNCTION IF EXISTS `udf_FirstNumberPos`;
DELIMITER ;;
CREATE FUNCTION `udf_FirstNumberPos` (`instring` varchar(4000)) 
RETURNS int
LANGUAGE SQL
DETERMINISTIC
NO SQL
SQL SECURITY INVOKER
BEGIN
    DECLARE position int;
    DECLARE tmp_position int;
    SET position = 5000;
    SET tmp_position = LOCATE('0', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF; 
    SET tmp_position = LOCATE('1', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('2', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('3', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('4', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('5', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('6', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('7', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('8', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('9', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;

    IF (position = 5000) THEN RETURN 0; END IF;
    RETURN position;
END
;;

DROP FUNCTION IF EXISTS `udf_NaturalSortFormat`;
DELIMITER ;;
CREATE FUNCTION `udf_NaturalSortFormat` (`instring` varchar(4000), `numberLength` int, `sameOrderChars` char(50)) 
RETURNS varchar(4000)
LANGUAGE SQL
DETERMINISTIC
NO SQL
SQL SECURITY INVOKER
BEGIN
    DECLARE sortString varchar(4000);
    DECLARE numStartIndex int;
    DECLARE numEndIndex int;
    DECLARE padLength int;
    DECLARE totalPadLength int;
    DECLARE i int;
    DECLARE sameOrderCharsLen int;

    SET totalPadLength = 0;
    SET instring = TRIM(instring);
    SET sortString = instring;
    SET numStartIndex = udf_FirstNumberPos(instring);
    SET numEndIndex = 0;
    SET i = 1;
    SET sameOrderCharsLen = CHAR_LENGTH(sameOrderChars);

    WHILE (i <= sameOrderCharsLen) DO
        SET sortString = REPLACE(sortString, SUBSTRING(sameOrderChars, i, 1), ' ');
        SET i = i + 1;
    END WHILE;

    WHILE (numStartIndex <> 0) DO
        SET numStartIndex = numStartIndex + numEndIndex;
        SET numEndIndex = numStartIndex;

        WHILE (udf_FirstNumberPos(SUBSTRING(instring, numEndIndex, 1)) = 1) DO
            SET numEndIndex = numEndIndex + 1;
        END WHILE;

        SET numEndIndex = numEndIndex - 1;

        SET padLength = numberLength - (numEndIndex + 1 - numStartIndex);

        IF padLength < 0 THEN
            SET padLength = 0;
        END IF;

        SET sortString = INSERT(sortString, numStartIndex + totalPadLength, 0, REPEAT('0', padLength));

        SET totalPadLength = totalPadLength + padLength;
        SET numStartIndex = udf_FirstNumberPos(RIGHT(instring, CHAR_LENGTH(instring) - numEndIndex));
    END WHILE;

    RETURN sortString;
END
;;

用法：

SELECT name FROM products ORDER BY udf_NaturalSortFormat(name, 10, ".")

score 22 · Accepted Answer

我想这就是为什么很多东西都是按发布日期排序的。

一个解决方案可能是在您的表中为“SortKey”创建另一列。这可能是标题的净化版本，它符合您为便于分类或计数器创建的模式。

score 17 · Accepted Answer

我刚才为MSSQL 2000编写了这个函数：

/**
 * Returns a string formatted for natural sorting. This function is very useful when having to sort alpha-numeric strings.
 *
 * @author Alexandre Potvin Latreille (plalx)
 * @param {nvarchar(4000)} string The formatted string.
 * @param {int} numberLength The length each number should have (including padding). This should be the length of the longest number. Defaults to 10.
 * @param {char(50)} sameOrderChars A list of characters that should have the same order. Ex: '.-/'. Defaults to empty string.
 *
 * @return {nvarchar(4000)} A string for natural sorting.
 * Example of use: 
 * 
 *      SELECT Name FROM TableA ORDER BY Name
 *  TableA (unordered)              TableA (ordered)
 *  ------------                    ------------
 *  ID  Name                    ID  Name
 *  1.  A1.                 1.  A1-1.       
 *  2.  A1-1.                   2.  A1.
 *  3.  R1      -->         3.  R1
 *  4.  R11                 4.  R11
 *  5.  R2                  5.  R2
 *
 *  
 *  As we can see, humans would expect A1., A1-1., R1, R2, R11 but that's not how SQL is sorting it.
 *  We can use this function to fix this.
 *
 *      SELECT Name FROM TableA ORDER BY dbo.udf_NaturalSortFormat(Name, default, '.-')
 *  TableA (unordered)              TableA (ordered)
 *  ------------                    ------------
 *  ID  Name                    ID  Name
 *  1.  A1.                 1.  A1.     
 *  2.  A1-1.                   2.  A1-1.
 *  3.  R1      -->         3.  R1
 *  4.  R11                 4.  R2
 *  5.  R2                  5.  R11
 */
CREATE FUNCTION dbo.udf_NaturalSortFormat(
    @string nvarchar(4000),
    @numberLength int = 10,
    @sameOrderChars char(50) = ''
)
RETURNS varchar(4000)
AS
BEGIN
    DECLARE @sortString varchar(4000),
        @numStartIndex int,
        @numEndIndex int,
        @padLength int,
        @totalPadLength int,
        @i int,
        @sameOrderCharsLen int;

    SELECT 
        @totalPadLength = 0,
        @string = RTRIM(LTRIM(@string)),
        @sortString = @string,
        @numStartIndex = PATINDEX('%[0-9]%', @string),
        @numEndIndex = 0,
        @i = 1,
        @sameOrderCharsLen = LEN(@sameOrderChars);

    -- Replace all char that has to have the same order by a space.
    WHILE (@i <= @sameOrderCharsLen)
    BEGIN
        SET @sortString = REPLACE(@sortString, SUBSTRING(@sameOrderChars, @i, 1), ' ');
        SET @i = @i + 1;
    END

    -- Pad numbers with zeros.
    WHILE (@numStartIndex <> 0)
    BEGIN
        SET @numStartIndex = @numStartIndex + @numEndIndex;
        SET @numEndIndex = @numStartIndex;

        WHILE(PATINDEX('[0-9]', SUBSTRING(@string, @numEndIndex, 1)) = 1)
        BEGIN
            SET @numEndIndex = @numEndIndex + 1;
        END

        SET @numEndIndex = @numEndIndex - 1;

        SET @padLength = @numberLength - (@numEndIndex + 1 - @numStartIndex);

        IF @padLength < 0
        BEGIN
            SET @padLength = 0;
        END

        SET @sortString = STUFF(
            @sortString,
            @numStartIndex + @totalPadLength,
            0,
            REPLICATE('0', @padLength)
        );

        SET @totalPadLength = @totalPadLength + @padLength;
        SET @numStartIndex = PATINDEX('%[0-9]%', RIGHT(@string, LEN(@string) - @numEndIndex));
    END

    RETURN @sortString;
END

GO

score 15 · Accepted Answer

MySQL 不允许这种“自然排序”，因此看起来获得所需内容的最佳方法是按照您上面描述的方式拆分数据设置（单独的 id 字段等），或者失败即，根据数据库中的非标题元素、索引元素（日期、数据库中插入的 id 等）执行排序。

让数据库为您进行排序几乎总是比将大型数据集读入您选择的编程语言并在那里对其进行排序更快，因此，如果您在此处对数据库架构有任何控制权，那么请查看添加如上所述的易于排序的字段，从长远来看，它将为您节省很多麻烦和维护。

在MySQL 错误和讨论论坛上不时出现添加“自然排序”的请求，许多解决方案围绕着剥离数据的特定部分并将它们转换为ORDER BY查询的一部分，例如

SELECT * FROM table ORDER BY CAST(mid(name, 6, LENGTH(c) -5) AS unsigned)

这种解决方案几乎可以用于上面的最终幻想示例，但不是特别灵活，不太可能干净地扩展到数据集，例如“战锤 40,000”和“詹姆斯邦德 007” .

score 9 · Accepted Answer

所以，虽然我知道你已经找到了一个令人满意的答案，但我在这个问题上苦苦挣扎了一段时间，我们之前已经确定它不能在 SQL 中相当好地完成，我们将不得不在 JSON 上使用 javascript大批。

这是我仅使用 SQL 解决它的方法。希望这对其他人有帮助：

我有以下数据：

场景 1
场景 1A
场景 1B
场景 2A
场景 3
...
场景 101
场景 XXA1
场景 XXA2

我实际上并没有“铸造”东西，尽管我想这也可能奏效。

我首先替换了数据中不变的部分，在本例中为“场景”，然后做了一个 LPAD 来排列。这似乎可以很好地使 alpha 字符串以及编号的字符串正确排序。

我的ORDER BY条款看起来像：

ORDER BY LPAD(REPLACE(`table`.`column`,'Scene ',''),10,'0')

显然这对不那么统一的原始问题没有帮助 - 但我想这可能适用于许多其他相关问题，所以把它放在那里。

score 6 · Accepted Answer

在表中添加排序键（排名）。ORDER BY rank
利用“发布日期”列。ORDER BY release_date
从 SQL 中提取数据时，让您的对象进行排序，例如，如果提取到 Set 中，则将其设为 TreeSet，并使您的数据模型实现 Comparable 并在此处制定自然排序算法（如果您使用的是插入排序就足够了一种没有集合的语言），因为您将在创建模型并将其插入集合时从 SQL 中逐一读取行）

score 5 · Accepted Answer

关于 Richard Toth 的最佳回复https://stackoverflow.com/a/12257917/4052357

注意包含 2 字节（或更多）字符和数字的 UTF8 编码字符串，例如

12 南新宿

使用 MySQL 的LENGTH()inudf_NaturalSortFormat函数将返回字符串的字节长度并且不正确，而使用CHAR_LENGTH()which 将返回正确的字符长度。

在我的情况下，使用LENGTH()导致查询永远不会完成并导致 MySQL 100% 的 CPU 使用率

DROP FUNCTION IF EXISTS `udf_NaturalSortFormat`;
DELIMITER ;;
CREATE FUNCTION `udf_NaturalSortFormat` (`instring` varchar(4000), `numberLength` int, `sameOrderChars` char(50)) 
RETURNS varchar(4000)
LANGUAGE SQL
DETERMINISTIC
NO SQL
SQL SECURITY INVOKER
BEGIN
    DECLARE sortString varchar(4000);
    DECLARE numStartIndex int;
    DECLARE numEndIndex int;
    DECLARE padLength int;
    DECLARE totalPadLength int;
    DECLARE i int;
    DECLARE sameOrderCharsLen int;

    SET totalPadLength = 0;
    SET instring = TRIM(instring);
    SET sortString = instring;
    SET numStartIndex = udf_FirstNumberPos(instring);
    SET numEndIndex = 0;
    SET i = 1;
    SET sameOrderCharsLen = CHAR_LENGTH(sameOrderChars);

    WHILE (i <= sameOrderCharsLen) DO
        SET sortString = REPLACE(sortString, SUBSTRING(sameOrderChars, i, 1), ' ');
        SET i = i + 1;
    END WHILE;

    WHILE (numStartIndex <> 0) DO
        SET numStartIndex = numStartIndex + numEndIndex;
        SET numEndIndex = numStartIndex;

        WHILE (udf_FirstNumberPos(SUBSTRING(instring, numEndIndex, 1)) = 1) DO
            SET numEndIndex = numEndIndex + 1;
        END WHILE;

        SET numEndIndex = numEndIndex - 1;

        SET padLength = numberLength - (numEndIndex + 1 - numStartIndex);

        IF padLength < 0 THEN
            SET padLength = 0;
        END IF;

        SET sortString = INSERT(sortString, numStartIndex + totalPadLength, 0, REPEAT('0', padLength));

        SET totalPadLength = totalPadLength + padLength;
        SET numStartIndex = udf_FirstNumberPos(RIGHT(instring, CHAR_LENGTH(instring) - numEndIndex));
    END WHILE;

    RETURN sortString;
END
;;

ps 我会将此作为对原件的评论添加，但我没有足够的声誉（还）

score 4 · Accepted Answer

另一种选择是在从 mysql 中提取数据后在内存中进行排序。虽然从性能的角度来看它不是最好的选择，但如果你不对巨大的列表进行排序，你应该没问题。

如果您查看 Jeff 的帖子，您会发现大量算法适用于您可能使用的任何语言。人类排序：自然排序

score 4 · Accepted Answer

为“排序键”添加一个字段，该字段将所有数字字符串零填充为固定长度，然后对该字段进行排序。

如果您可能有很长的数字字符串，另一种方法是将数字数量（固定宽度，零填充）添加到每个数字字符串。例如，如果连续的数字不超过 99 个，那么对于“Super Blast 10 Ultra”，排序键将为“Super Blast 0210 Ultra”。

score 4 · Accepted Answer

订购：
0
1
2
10
23
101
205
1000
a
aac
b
casdsadsa
css

使用此查询：

选择
    列名
从
    表名
订购方式
    column_name REGEXP '^\d*[^\da-z&\.\' \-\"\!\@\#\$\%\^\*\(\)\;\:\\,\?\/ \~\`\|\_\-]' DESC,
    列名 + 0，
    列名；

score 4 · Accepted Answer

如果您不想重新发明轮子或对大量不起作用的代码感到头疼，只需使用Drupal Natural Sort ...只需运行压缩后的 SQL（MySQL 或 Postgre），就是这样。进行查询时，只需使用以下命令进行排序：

... ORDER BY natsort_canon(column_name, 'natural')

score 2 · Accepted Answer

您还可以以动态方式创建“排序列”：

SELECT name, (name = '-') boolDash, (name = '0') boolZero, (name+0 > 0) boolNum 
FROM table 
ORDER BY boolDash DESC, boolZero DESC, boolNum DESC, (name+0), name

这样，您可以创建要排序的组。

在我的查询中，我想要在所有内容前面加上“-”，然后是数字，然后是文本。这可能会导致类似：

-
0    
1
2
3
4
5
10
13
19
99
102
Chair
Dog
Table
Windows

这样您就不必在添加数据时以正确的顺序维护排序列。您还可以根据需要更改排序顺序。

score 1 · Accepted Answer

如果您使用的是 PHP，您可以在 php.ini 中进行自然排序。

$keys = array();
$values = array();
foreach ($results as $index => $row) {
   $key = $row['name'].'__'.$index; // Add the index to create an unique key.
   $keys[] = $key;
   $values[$key] = $row; 
}
natsort($keys);
$sortedValues = array(); 
foreach($keys as $index) {
  $sortedValues[] = $values[$index]; 
}

我希望 MySQL 会在未来的版本中实现自然排序，但是功能请求（#1588）从 2003 年开始开放，所以我不会屏住呼吸。

score 1 · Accepted Answer

@plaix/Richard Toth/Luke Hoggett 的最佳响应的简化非 udf 版本，仅适用于该字段中的第一个整数，是

SELECT name,
LEAST(
    IFNULL(NULLIF(LOCATE('0', name), 0), ~0),
    IFNULL(NULLIF(LOCATE('1', name), 0), ~0),
    IFNULL(NULLIF(LOCATE('2', name), 0), ~0),
    IFNULL(NULLIF(LOCATE('3', name), 0), ~0),
    IFNULL(NULLIF(LOCATE('4', name), 0), ~0),
    IFNULL(NULLIF(LOCATE('5', name), 0), ~0),
    IFNULL(NULLIF(LOCATE('6', name), 0), ~0),
    IFNULL(NULLIF(LOCATE('7', name), 0), ~0),
    IFNULL(NULLIF(LOCATE('8', name), 0), ~0),
    IFNULL(NULLIF(LOCATE('9', name), 0), ~0)
) AS first_int
FROM table
ORDER BY IF(first_int = ~0, name, CONCAT(
    SUBSTR(name, 1, first_int - 1),
    LPAD(CAST(SUBSTR(name, first_int) AS UNSIGNED), LENGTH(~0), '0'),
    SUBSTR(name, first_int + LENGTH(CAST(SUBSTR(name, first_int) AS UNSIGNED)))
)) ASC

score 1 · Accepted Answer

我尝试了几种解决方案，但实际上它非常简单：

SELECT test_column FROM test_table ORDER BY LENGTH(test_column) DESC, test_column DESC

/* 
Result 
--------
value_1
value_2
value_3
value_4
value_5
value_6
value_7
value_8
value_9
value_10
value_11
value_12
value_13
value_14
value_15
...
*/

score 1 · Accepted Answer

我在这里（以及重复的问题中）看到的许多其他答案基本上只适用于格式非常明确的数据，例如完全是数字的字符串，或者有固定长度的字母前缀。这在一般情况下不起作用。

确实没有任何方法可以在 MySQL 中实现 100% 通用的 nat-sort，因为要做到这一点，您真正需要的是一个修改后的比较函数，如果/当它遇到时，它会在字符串的字典排序和数字排序之间切换一个号码。这样的代码可以实现任何你想要识别和比较两个字符串中的数字部分的算法。不幸的是，MySQL 中的比较功能是其代码内部的，用户无法更改。

这留下了某种 hack，您尝试为您的字符串创建一个排序键，其中数字部分被重新格式化，以便标准字典排序实际上按照您想要的方式对它们进行排序。

对于最多位数的普通整数，显而易见的解决方案是简单地用零填充它们，以便它们都是固定宽度。这是 Drupal 插件采用的方法，以及 @plalx / @RichardToth 的解决方案。（@Christian 有一个不同且更复杂的解决方案，但它没有提供我能看到的优势）。

正如@tye 指出的那样，您可以通过为每个数字添加固定数字长度来改进这一点，而不是简单地左填充它。不过，即使考虑到本质上是笨拙的 hack 的局限性，您也可以改进很多。然而，那里似乎没有任何预先构建的解决方案！

例如，关于：

加号和减号？+10 对 10 对 -10
小数点？8.2、8.5、1.006、.75
前导零？020, 030, 00000922
千位分隔符？“1,001 只斑点狗”与“1001 只斑点狗”
版本号？MariaDB v10.3.18 与 MariaDB v10.3.3
很长的数字？103,768,276,592,092,364,859,236,487,687,870,234,598.55

扩展 @tye 的方法，我创建了一个相当紧凑的 NatSortKey() 存储函数，它将任意字符串转换为 nat-sort 键，并处理所有上述情况，相当有效，并保留总排序 -顺序（没有两个不同的字符串具有比较相等的排序键）。第二个参数可用于限制每个字符串中处理的数字数量（例如，前 10 个数字），可用于确保输出适合给定长度。

注意：使用此第二个参数的给定值生成的排序键字符串只能针对使用相同参数值生成的其他字符串进行排序，否则它们可能无法正确排序！

您可以直接在订购时使用它，例如

SELECT myString FROM myTable ORDER BY NatSortKey(myString,0);  ### 0 means process all numbers - resulting sort key might be quite long for certain inputs

但是为了对大表进行高效排序，最好将排序键预先存储在另一列中（可能带有索引）：

INSERT INTO myTable (myString,myStringNSK) VALUES (@theStringValue,NatSortKey(@theStringValue,10)), ...
...
SELECT myString FROM myTable ORDER BY myStringNSK;

[理想情况下，您可以通过将键列创建为计算存储列来自动实现这一点，使用类似：

CREATE TABLE myTable (
...
myString varchar(100),
myStringNSK varchar(150) AS (NatSortKey(myString,10)) STORED,
...
KEY (myStringNSK),
...);

但是目前MySQL 和 MariaDB 都不允许在计算列中存储函数，所以很遗憾你还不能这样做。]

我的功能只影响数字的排序。如果您想做其他排序规范化的事情，例如删除所有标点符号，或修剪每一端的空格，或用单个空格替换多空格序列，您可以扩展函数，或者可以在NatSortKey()is之前或之后完成应用于您的数据。（我建议REGEXP_REPLACE()用于此目的）。

它也有点以盎格鲁为中心，因为我假设“。” 用于小数点，',' 用于千位分隔符，但如果您想要反转，或者如果您希望将其作为参数进行切换，它应该很容易修改。

它可能会以其他方式进一步改进；例如，它目前按绝对值对负数进行排序，因此 -1 在 -2 之前，而不是相反。也无法在为文本保留 ASC 词典排序的同时为数字指定 DESC 排序顺序。这两个问题都可以通过更多的工作来解决；如果/当我有时间时，我会更新代码。

还有许多其他细节需要注意——包括对您正在使用的追逐和排序规则的一些关键依赖项——但我已将它们全部放入 SQL 代码中的注释块中。在您自己使用该功能之前，请仔细阅读此内容！

所以，这里是代码。如果你发现了一个错误，或者有我没有提到的改进，请在评论中告诉我！

delimiter $$
CREATE DEFINER=CURRENT_USER FUNCTION NatSortKey (s varchar(100), n int) RETURNS varchar(350) DETERMINISTIC
BEGIN
/****
  Converts numbers in the input string s into a format such that sorting results in a nat-sort.
  Numbers of up to 359 digits (before the decimal point, if one is present) are supported.  Sort results are undefined if the input string contains numbers longer than this.
  For n>0, only the first n numbers in the input string will be converted for nat-sort (so strings that differ only after the first n numbers will not nat-sort amongst themselves).
  Total sort-ordering is preserved, i.e. if s1!=s2, then NatSortKey(s1,n)!=NatSortKey(s2,n), for any given n.
  Numbers may contain ',' as a thousands separator, and '.' as a decimal point.  To reverse these (as appropriate for some European locales), the code would require modification.
  Numbers preceded by '+' sort with numbers not preceded with either a '+' or '-' sign.
  Negative numbers (preceded with '-') sort before positive numbers, but are sorted in order of ascending absolute value (so -7 sorts BEFORE -1001).
  Numbers with leading zeros sort after the same number with no (or fewer) leading zeros.
  Decimal-part-only numbers (like .75) are recognised, provided the decimal point is not immediately preceded by either another '.', or by a letter-type character.
  Numbers with thousand separators sort after the same number without them.
  Thousand separators are only recognised in numbers with no leading zeros that don't immediately follow a ',', and when they format the number correctly.
  (When not recognised as a thousand separator, a ',' will instead be treated as separating two distinct numbers).
  Version-number-like sequences consisting of 3 or more numbers separated by '.' are treated as distinct entities, and each component number will be nat-sorted.
  The entire entity will sort after any number beginning with the first component (so e.g. 10.2.1 sorts after both 10 and 10.995, but before 11)
  Note that The first number component in an entity like this is also permitted to contain thousand separators.

  To achieve this, numbers within the input string are prefixed and suffixed according to the following format:
  - The number is prefixed by a 2-digit base-36 number representing its length, excluding leading zeros.  If there is a decimal point, this length only includes the integer part of the number.
  - A 3-character suffix is appended after the number (after the decimals if present).
    - The first character is a space, or a '+' sign if the number was preceded by '+'.  Any preceding '+' sign is also removed from the front of the number.
    - This is followed by a 2-digit base-36 number that encodes the number of leading zeros and whether the number was expressed in comma-separated form (e.g. 1,000,000.25 vs 1000000.25)
    - The value of this 2-digit number is: (number of leading zeros)*2 + (1 if comma-separated, 0 otherwise)
  - For version number sequences, each component number has the prefix in front of it, and the separating dots are removed.
    Then there is a single suffix that consists of a ' ' or '+' character, followed by a pair base-36 digits for each number component in the sequence.

  e.g. here is how some simple sample strings get converted:
  'Foo055' --> 'Foo0255 02'
  'Absolute zero is around -273 centigrade' --> 'Absolute zero is around -03273 00 centigrade'
  'The $1,000,000 prize' --> 'The $071000000 01 prize'
  '+99.74 degrees' --> '0299.74+00 degrees'
  'I have 0 apples' --> 'I have 00 02 apples'
  '.5 is the same value as 0000.5000' --> '00.5 00 is the same value as 00.5000 08'
  'MariaDB v10.3.0018' --> 'MariaDB v02100130218 000004'

  The restriction to numbers of up to 359 digits comes from the fact that the first character of the base-36 prefix MUST be a decimal digit, and so the highest permitted prefix value is '9Z' or 359 decimal.
  The code could be modified to handle longer numbers by increasing the size of (both) the prefix and suffix.
  A higher base could also be used (by replacing CONV() with a custom function), provided that the collation you are using sorts the "digits" of the base in the correct order, starting with 0123456789.
  However, while the maximum number length may be increased this way, note that the technique this function uses is NOT applicable where strings may contain numbers of unlimited length.

  The function definition does not specify the charset or collation to be used for string-type parameters or variables:  The default database charset & collation at the time the function is defined will be used.
  This is to make the function code more portable.  However, there are some important restrictions:

  - Collation is important here only when comparing (or storing) the output value from this function, but it MUST order the characters " +0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ" in that order for the natural sort to work.
    This is true for most collations, but not all of them, e.g. in Lithuanian 'Y' comes before 'J' (according to Wikipedia).
    To adapt the function to work with such collations, replace CONV() in the function code with a custom function that emits "digits" above 9 that are characters ordered according to the collation in use.

  - For efficiency, the function code uses LENGTH() rather than CHAR_LENGTH() to measure the length of strings that consist only of digits 0-9, '.', and ',' characters.
    This works for any single-byte charset, as well as any charset that maps standard ASCII characters to single bytes (such as utf8 or utf8mb4).
    If using a charset that maps these characters to multiple bytes (such as, e.g. utf16 or utf32), you MUST replace all instances of LENGTH() in the function definition with CHAR_LENGTH()

  Length of the output:

  Each number converted adds 5 characters (2 prefix + 3 suffix) to the length of the string. n is the maximum count of numbers to convert;
  This parameter is provided as a means to limit the maximum output length (to input length + 5*n).
  If you do not require the total-ordering property, you could edit the code to use suffixes of 1 character (space or plus) only; this would reduce the maximum output length for any given n.
  Since a string of length L has at most ((L+1) DIV 2) individual numbers in it (every 2nd character a digit), for n<=0 the maximum output length is (inputlength + 5*((inputlength+1) DIV 2))
  So for the current input length of 100, the maximum output length is 350.
  If changing the input length, the output length must be modified according to the above formula.  The DECLARE statements for x,y,r, and suf must also be modified, as the code comments indicate.
****/
  DECLARE x,y varchar(100);            # need to be same length as input s
  DECLARE r varchar(350) DEFAULT '';   # return value:  needs to be same length as return type
  DECLARE suf varchar(101);   # suffix for a number or version string. Must be (((inputlength+1) DIV 2)*2 + 1) chars to support version strings (e.g. '1.2.33.5'), though it's usually just 3 chars. (Max version string e.g. 1.2. ... .5 has ((length of input + 1) DIV 2) numeric components)
  DECLARE i,j,k int UNSIGNED;
  IF n<=0 THEN SET n := -1; END IF;   # n<=0 means "process all numbers"
  LOOP
    SET i := REGEXP_INSTR(s,'\\d');   # find position of next digit
    IF i=0 OR n=0 THEN RETURN CONCAT(r,s); END IF;   # no more numbers to process -> we're done
    SET n := n-1, suf := ' ';
    IF i>1 THEN
      IF SUBSTRING(s,i-1,1)='.' AND (i=2 OR SUBSTRING(s,i-2,1) RLIKE '[^.\\p{L}\\p{N}\\p{M}\\x{608}\\x{200C}\\x{200D}\\x{2100}-\\x{214F}\\x{24B6}-\\x{24E9}\\x{1F130}-\\x{1F149}\\x{1F150}-\\x{1F169}\\x{1F170}-\\x{1F189}]') AND (SUBSTRING(s,i) NOT RLIKE '^\\d++\\.\\d') THEN SET i:=i-1; END IF;   # Allow decimal number (but not version string) to begin with a '.', provided preceding char is neither another '.', nor a member of the unicode character classes: "Alphabetic", "Letter", "Block=Letterlike Symbols" "Number", "Mark", "Join_Control"
      IF i>1 AND SUBSTRING(s,i-1,1)='+' THEN SET suf := '+', j := i-1; ELSE SET j := i; END IF;   # move any preceding '+' into the suffix, so equal numbers with and without preceding "+" signs sort together
      SET r := CONCAT(r,SUBSTRING(s,1,j-1)); SET s = SUBSTRING(s,i);   # add everything before the number to r and strip it from the start of s; preceding '+' is dropped (not included in either r or s)
    END IF;
    SET x := REGEXP_SUBSTR(s,IF(SUBSTRING(s,1,1) IN ('0','.') OR (SUBSTRING(r,-1)=',' AND suf=' '),'^\\d*+(?:\\.\\d++)*','^(?:[1-9]\\d{0,2}(?:,\\d{3}(?!\\d))++|\\d++)(?:\\.\\d++)*+'));   # capture the number + following decimals (including multiple consecutive '.<digits>' sequences)
    SET s := SUBSTRING(s,LENGTH(x)+1);   # NOTE: LENGTH() can be safely used instead of CHAR_LENGTH() here & below PROVIDED we're using a charset that represents digits, ',' and '.' characters using single bytes (e.g. latin1, utf8)
    SET i := INSTR(x,'.');
    IF i=0 THEN SET y := ''; ELSE SET y := SUBSTRING(x,i); SET x := SUBSTRING(x,1,i-1); END IF;   # move any following decimals into y
    SET i := LENGTH(x);
    SET x := REPLACE(x,',','');
    SET j := LENGTH(x);
    SET x := TRIM(LEADING '0' FROM x);   # strip leading zeros
    SET k := LENGTH(x);
    SET suf := CONCAT(suf,LPAD(CONV(LEAST((j-k)*2,1294) + IF(i=j,0,1),10,36),2,'0'));   # (j-k)*2 + IF(i=j,0,1) = (count of leading zeros)*2 + (1 if there are thousands-separators, 0 otherwise)  Note the first term is bounded to <= base-36 'ZY' as it must fit within 2 characters
    SET i := LOCATE('.',y,2);
    IF i=0 THEN
      SET r := CONCAT(r,LPAD(CONV(LEAST(k,359),10,36),2,'0'),x,y,suf);   # k = count of digits in number, bounded to be <= '9Z' base-36
    ELSE   # encode a version number (like 3.12.707, etc)
      SET r := CONCAT(r,LPAD(CONV(LEAST(k,359),10,36),2,'0'),x);   # k = count of digits in number, bounded to be <= '9Z' base-36
      WHILE LENGTH(y)>0 AND n!=0 DO
        IF i=0 THEN SET x := SUBSTRING(y,2); SET y := ''; ELSE SET x := SUBSTRING(y,2,i-2); SET y := SUBSTRING(y,i); SET i := LOCATE('.',y,2); END IF;
        SET j := LENGTH(x);
        SET x := TRIM(LEADING '0' FROM x);   # strip leading zeros
        SET k := LENGTH(x);
        SET r := CONCAT(r,LPAD(CONV(LEAST(k,359),10,36),2,'0'),x);   # k = count of digits in number, bounded to be <= '9Z' base-36
        SET suf := CONCAT(suf,LPAD(CONV(LEAST((j-k)*2,1294),10,36),2,'0'));   # (j-k)*2 = (count of leading zeros)*2, bounded to fit within 2 base-36 digits
        SET n := n-1;
      END WHILE;
      SET r := CONCAT(r,y,suf);
    END IF;
  END LOOP;
END
$$
delimiter ;

score 1 · Accepted Answer

其他答案是正确的，但您可能想知道 MariaDB 10.7 会有一个natural_sort_key()功能。在撰写本文时，它仅作为预览版提供。此处解释了该功能。

score 0 · Accepted Answer

0

还有natsort。它旨在成为drupal 插件的一部分，但它可以独立工作。

于 2011-06-20T07:58:55.437 回答

score 0 · Accepted Answer

如果标题只有版本作为数字，这是一个简单的：

ORDER BY CAST(REGEXP_REPLACE(title, "[a-zA-Z]+", "") AS INT)';

否则，如果您使用模式（此模式在版本之前使用 #），则可以使用简单的 SQL：

create table titles(title);

insert into titles (title) values 
('Final Fantasy'),
('Final Fantasy #03'),
('Final Fantasy #11'),
('Final Fantasy #10'),
('Final Fantasy #2'),
('Bond 007 ##2'),
('Final Fantasy #01'),
('Bond 007'),
('Final Fantasy #11}');

select REGEXP_REPLACE(title, "#([0-9]+)", "\\1") as title from titles
ORDER BY REGEXP_REPLACE(title, "#[0-9]+", ""),
CAST(REGEXP_REPLACE(title, ".*#([0-9]+).*", "\\1") AS INT);     
+-------------------+
| title             |
+-------------------+
| Bond 007          |
| Bond 007 #2       |
| Final Fantasy     |
| Final Fantasy 01  |
| Final Fantasy 2   |
| Final Fantasy 03  |
| Final Fantasy 10  |
| Final Fantasy 11  |
| Final Fantasy 11} |
+-------------------+
8 rows in set, 2 warnings (0.001 sec)

如果需要，您可以使用其他模式。例如，如果您有电影“我是 #1”和“我是 #1 第 2 部分”，则可以包装版本，例如“最终幻想 {11}”

score -4 · Accepted Answer

我知道这个话题很古老，但我想我已经找到了一种方法：

SELECT * FROM `table` ORDER BY 
CONCAT(
  GREATEST(
    LOCATE('1', name),
    LOCATE('2', name),
    LOCATE('3', name),
    LOCATE('4', name),
    LOCATE('5', name),
    LOCATE('6', name),
    LOCATE('7', name),
    LOCATE('8', name),
    LOCATE('9', name)
   ),
   name
) ASC

报废它，它对以下集合进行了错误的排序（没用的哈哈）：

最终幻想1 最终幻想2 最终幻想5 最终幻想7 最终幻想7：降临之子最终幻想12 最终幻想112 FF1 FF2

sql - MySQL中的自然排序

22 回答 22

Related

Reference