mysql - 如何计算 MySQL / 正则表达式替换器中的单词？

Question

在 MySQL 查询中，我如何才能具有与 Regex.Replace 函数相同的行为（例如在 .NET/C# 中）？

我需要这个，因为和很多人一样，我想计算一个字段中的单词数。但是，我对以下答案不满意（在该网站上多次给出）：

SELECT LENGTH(name) - LENGTH(REPLACE(name, ' ', '') +1 FROM table

因为当两个单词之间有超过一个空格时，它不会给出好的结果。

顺便说一句，我认为 Regex.Replace 函数可能很有趣，所以欢迎所有好的想法！

score 17 · Accepted Answer

有 REGEXP_REPLACE 作为MySQL 用户定义的函数可用。

字数统计：如果您可以控制进入数据库的数据，您可以在插入之前删除双空格。此外，如果您必须经常访问字数，您可以在代码中计算一次并将计数存储在数据库中。

score 1 · Accepted Answer

更新：现在已经为 MySQL 8.0+ 添加了一个单独的答案，应该优先使用。（如果被限制使用早期版本，请保留此答案。）

几乎是这个问题的一个副本，但这个答案将解决基于此博客文章中自定义正则表达式替换器的高级版本计算单词的用例。

演示

Rextester 在线演示

对于示例文本，这给出了 61 的计数 - 与我尝试过的所有在线单词计数器相同（例如https://wordcounter.net/）。

SQL （为简洁起见，不包括函数代码）：

SELECT txt,
       -- Count the number of gaps between words
       CHAR_LENGTH(txt) -
       CHAR_LENGTH(reg_replace(txt,
                               '[[:space:]]+', -- Look for a chunk of whitespace
                               '^.', -- Replace the first character from the chunk
                               '',   -- Replace with nothing (i.e. remove the character)
                               TRUE, -- Greedy matching
                               1,  -- Minimum match length
                               0,  -- No maximum match length
                               1,  -- Minimum sub-match length
                               0   -- No maximum sub-match length
                               ))
       + 1 -- The word count is 1 more than the number of gaps between words
       - IF (txt REGEXP '^[[:space:]]', 1, 0) -- Exclude whitespace at the start from count
       - IF (txt REGEXP '[[:space:]]$', 1, 0) -- Exclude whitespace at the end from count
       AS `word count`
FROM tbl;

score 0 · Accepted Answer

答案是否定的，你不能在 MySQL 中拥有相同的行为。

但我建议您查看这个较早的关于链接到 UDF 的主题的问题，该 UDF 据说可以启用其中的一些功能。

score 0 · Accepted Answer

MySQL 8.0 现在提供了一个不错的REGEXP_REPLACE函数，这使得这变得更加简单：

SQL

SELECT -- Count the number of gaps between words
       CHAR_LENGTH(txt) -
           CHAR_LENGTH(REGEXP_REPLACE(
               txt,
               '[[:space:]]([[:space:]]*)', -- A chunk of one or more whitespace characters
               '$1')) -- Discard the first whitespace character and retain the rest
           + 1 -- The word count is 1 more than the number of gaps between words
           - IF (txt REGEXP '^[[:space:]]', 1, 0) -- Exclude whitespace at the start from count
           - IF (txt REGEXP '[[:space:]]$', 1, 0) -- Exclude whitespace at the end from count
           AS `Word count`
FROM tbl;

演示

DB-Fiddle 在线演示

mysql - 如何计算 MySQL / 正则表达式替换器中的单词？

4 回答 4

Related

Reference