mysql - 如何从 MySQL 中的文本字段中提取两个连续的数字？

Question

我有一个 MySQL 数据库，我有一个查询：

SELECT `id`, `originaltext` FROM `source` WHERE `originaltext` regexp '[0-9][0-9]'

这会检测所有包含 2 位数字的原始文本。

我需要 MySQL 将这些数字作为字段返回，因此我可以进一步操作它们。

理想情况下，如果我可以添加应该大于 20 的其他标准，那就太好了，但我也可以单独执行此操作。

score 12 · Accepted Answer

如果您想在数据库中使用更多正则表达式功能，可以考虑使用LIB_MYSQLUDF_PREG。这是一个导入 PCRE 库的 MySQL 用户函数的开源库。LIB_MYSQLUDF_PREG 仅以源代码形式提供。要使用它，您需要能够编译它并将其安装到您的 MySQL 服务器中。安装这个库不会以任何方式改变 MySQL 的内置正则表达式支持。它仅提供以下附加功能：

PREG_CAPTURE从字符串中提取正则表达式匹配。PREG_POSITION 返回正则表达式匹配字符串的位置。PREG_REPLACE 对字符串执行搜索和替换。PREG_RLIKE 测试正则表达式是否匹配字符串。

所有这些函数都将正则表达式作为它们的第一个参数。此正则表达式必须像 Perl 正则表达式运算符一样格式化。例如，要测试正则表达式是否不区分主题大小写，您将使用 MySQL 代码 PREG_RLIKE('/regex/i', subject)。这类似于 PHP 的 preg 函数，后者也需要额外的 // 分隔符用于 PHP 字符串中的正则表达式。

如果您想要更简单的东西，您可以更改此功能以更好地满足您的需求。

CREATE FUNCTION REGEXP_EXTRACT(string TEXT, exp TEXT)
-- Extract the first longest string that matches the regular expression
-- If the string is 'ABCD', check all strings and see what matches: 'ABCD', 'ABC', 'AB', 'A', 'BCD', 'BC', 'B', 'CD', 'C', 'D'
-- It's not smart enough to handle things like (A)|(BCD) correctly in that it will return the whole string, not just the matching token.

RETURNS TEXT
DETERMINISTIC
BEGIN
  DECLARE s INT DEFAULT 1;
  DECLARE e INT;
  DECLARE adjustStart TINYINT DEFAULT 1;
  DECLARE adjustEnd TINYINT DEFAULT 1;

  -- Because REGEXP matches anywhere in the string, and we only want the part that matches, adjust the expression to add '^' and '$'
  -- Of course, if those are already there, don't add them, but change the method of extraction accordingly.

  IF LEFT(exp, 1) = '^' THEN 
    SET adjustStart = 0;
  ELSE
    SET exp = CONCAT('^', exp);
  END IF;

  IF RIGHT(exp, 1) = '$' THEN
    SET adjustEnd = 0;
  ELSE
    SET exp = CONCAT(exp, '$');
  END IF;

  -- Loop through the string, moving the end pointer back towards the start pointer, then advance the start pointer and repeat
  -- Bail out of the loops early if the original expression started with '^' or ended with '$', since that means the pointers can't move
  WHILE (s <= LENGTH(string)) DO
    SET e = LENGTH(string);
    WHILE (e >= s) DO
      IF SUBSTRING(string, s, e) REGEXP exp THEN
        RETURN SUBSTRING(string, s, e);
      END IF;
      IF adjustEnd THEN
        SET e = e - 1;
      ELSE
        SET e = s - 1; -- ugh, such a hack to end it early
      END IF;
    END WHILE;
    IF adjustStart THEN
      SET s = s + 1;
    ELSE
      SET s = LENGTH(string) + 1; -- ugh, such a hack to end it early
    END IF;
  END WHILE;

  RETURN NULL;

END

score 9 · Accepted Answer

MySQL 中没有任何语法可用于使用正则表达式提取文本。您可以使用 REGEXP 来识别包含两个连续数字的行，但是要提取它们，您必须使用普通的字符串操作函数，这在这种情况下非常困难。

备择方案：

从数据库中选择整个值，然后在客户端上使用正则表达式。
使用对 SQL 标准有更好支持的不同数据库（我知道，这可能不是一种选择）。然后你可以使用这个：SUBSTRING(originaltext from '%#[0-9]{2}#%' for '#').

score 2 · Accepted Answer

我遇到了同样的问题，这是我找到的解决方案（但并非在所有情况下都有效）：

用于LOCATE()查找您不想匹配的字符串的开头和结尾
用于MID()提取介于两者之间的子字符串...
保持正则表达式只匹配您确定找到匹配的行。

score 2 · Accepted Answer

我将我的代码用作存储过程（函数），应该可以从单个块中的数字中提取任何数字。这是我更广泛的图书馆的一部分。

DELIMITER $$

--  2013.04 michal@glebowski.pl
--  FindNumberInText("ab 234 95 cd", TRUE) => 234  
--  FindNumberInText("ab 234 95 cd", FALSE) => 95

DROP FUNCTION IF EXISTS FindNumberInText$$
CREATE FUNCTION FindNumberInText(_input VARCHAR(64), _fromLeft BOOLEAN) RETURNS VARCHAR(32)
BEGIN
  DECLARE _r              VARCHAR(32) DEFAULT '';
  DECLARE _i              INTEGER DEFAULT 1;
  DECLARE _start          INTEGER DEFAULT 0;
  DECLARE _IsCharNumeric  BOOLEAN;

  IF NOT _fromLeft THEN SET _input = REVERSE(_input); END IF;
  _loop: REPEAT
    SET _IsCharNumeric = LOCATE(MID(_input, _i, 1), "0123456789") > 0;
    IF _IsCharNumeric THEN
      IF _start = 0 THEN SET _start  = _i;  END IF;
    ELSE
      IF _start > 0 THEN LEAVE _loop;       END IF;
    END IF;
    SET _i = _i + 1;
  UNTIL _i > length(_input) END REPEAT;

  IF _start > 0 THEN
    SET _r = MID(_input, _start, _i - _start);
    IF NOT _fromLeft THEN SET _r = REVERSE(_r);  END IF;
  END IF;
  RETURN _r;
END$$

score 2 · Accepted Answer

我认为更清洁的方法是使用REGEXP_SUBSTR()：

这正好提取两个任意数字：

SELECT REGEXP_SUBSTR(`originalText`,'[0-9]{2}') AS `twoDigits` FROM `source`;

这正好提取两位数字，但从 20-99 （例如：1112返回 null；1521返回52）：

SELECT REGEXP_SUBSTR(`originalText`,'[2-9][0-9]') AS `twoDigits` FROM `source`;

我在 v8.0 中都进行了测试，它们都可以工作。就是这样，祝你好运！

score 0 · Accepted Answer

我知道自从提出这个问题以来已经有一段时间了，但遇到了这个问题并认为这对我的自定义正则表达式替换器来说将是一个很好的挑战 - 请参阅此博客文章。

...好消息是它可以，尽管它需要调用很多次。请参阅此在线 rextester 演示，它显示了以下 SQL 的工作原理。

SELECT reg_replace(
         reg_replace(
           reg_replace(
             reg_replace(
               reg_replace(
                 reg_replace(
                   reg_replace(txt,
                               '[^0-9]+',
                               ',',
                               TRUE,
                               1, -- Min match length
                               0 -- No max match length
                               ),
                             '([0-9]{3,}|,[0-9],)',
                             '',
                             TRUE,
                             1, -- Min match length
                             0 -- No max match length
                             ),
                           '^[0-9],',
                           '',
                           TRUE,
                           1, -- Min match length
                           0 -- No max match length
                           ),
                         ',[0-9]$',
                         '',
                         TRUE,
                         1, -- Min match length
                         0 -- No max match length
                         ),
                       ',{2,}',
                       ',',
                       TRUE,
                       1, -- Min match length
                       0 -- No max match length
                       ),
                     '^,',
                     '',
                     TRUE,
                     1, -- Min match length
                     0 -- No max match length
                     ),
                   ',$',
                   '',
                   TRUE,
                   1, -- Min match length
                   0 -- No max match length
                   ) AS `csv`
FROM tbl;

score 0 · Accepted Answer

如果要返回字符串的一部分：

SELECT id , substring(columnName,(locate('partOfString',columnName)),10) from tableName;

Locate()将返回匹配字符串的起始位置，该位置成为Function Substring()

mysql - 如何从 MySQL 中的文本字段中提取两个连续的数字？

7 回答 7

Related

Reference