regex - 正则表达式：在大写字母之后和数字之前捕获任何内容？

Question

测试字符串：

TEST Hello, world, 75793250
TEST TESTER Hello, world. Another word here. 75793250

期望匹配：

Hello, world, 
Hello, world. Another word here.

我想选择大写字母和 8 位数字之间的所有内容。

我怎样才能做到这一点？

编辑：目的是使用 Notepad++ 清理大型文本文件。我同时使用 Notepad++ 和Rubular.com进行测试。

score 2 · Accepted Answer

尝试这样的事情：

/(?<=[A-Z]+(?: [A-Z]+)*\b)(?:(?!\b\d{8}).)*/

基本上：

在后面查找所有大写字母或空格，然后是一个单词中断。
然后开始匹配，然后从那一点开始匹配，直到遇到断字后跟 8 位数字。

如果您的正则表达式引擎抱怨（如我的）可变长度看起来落后，请尝试以下操作：

/(?:[A-Z]+(?: [A-Z]+)*\b)((?:(?!\b\d{8}).)*)/

产量：

>> "TEST Hello, world, 75793250".match /(?:[A-Z]+(?: [A-Z]+)*\b)((?:(?!\b\d{8}).)*)/
=> #<MatchData "TEST Hello, world, " 1:" Hello, world, ">

>> "TEST TESTER Hello, world. Another word here. 75793250".match /(?:[A-Z]+(?: [A-Z]+)*\b)((?:(?!\b\d{8}).)*)/
=> #<MatchData "TEST TESTER Hello, world. Another word here. " 1:" Hello, world. Another word here. ">

score 1 · Accepted Answer

尝试以下

\b[A-Z]+\b\s+(.*)\d{8}

修改为排除开头的大写单词。寻找的文本在捕获组 1 中：

(?:\b[A-Z]+\b\s+)+(.*)\d{8}

如果大写单词（标记）仅位于行首：

^(?:\b[A-Z]+\b\s+)+(.*)\d{8}

score 1 · Accepted Answer

你可以使用下面的java代码：

    String str = "TEST TESTER Hello, world. Another word here. 75793250";
    Pattern pattern = Pattern.compile("(([A-Z]+\\s)+)([^\n]*)([0-9]{8})");
    Matcher m = pattern.matcher(str);
    while (m.find()){
        System.out.println(m.group(3));
    }

score 0 · Accepted Answer

使用字符类创建一个只匹配大写字母 - 的原子[A-Z]。然后你想多次匹配（至少一次？），所以[A-Z]+.

然后你想捕捉任何可能的东西 - .+，但你想捕捉它，所以将它包装在一个命名的捕捉 -(?<nameHere>.+)中。

然后，您希望将数字与数字相匹配，以便以数字结束捕获，以便数字不会最终出现在捕获中（因为.+匹配任何内容）。\d是数字字符类快捷方式，我们需要一个或多个数字，所以\d+.

将它们放在一起，在所有内容之间寻找空格 ( \s)：

[A-Z]+\s+(?<nameHere>.+)\s+\d+

使用 Match 类拉出命名的捕获 - Match.Captures。

regex - 正则表达式：在大写字母之后和数字之前捕获任何内容？

4 回答 4

Related

Reference