2

我有一组这种形式的字符串:

NOOO (2), { AAA (1), BBB (2), CCC-CC (3), DDD (4) }

(括号内的元素可以多于四个)

我需要匹配括号内的内容并仅提取(使用组)'AAA'、'BBB'、...子字符串。所以这个例子的结果将是

group1 : AAA
group2 : BBB
group3 : CCC-CC
group4 : DDD

我试过这个表达式:

\{ (?:(\S+) \(\d+\),?\s?)+ \}

但它只返回最后一个匹配的组(因此,在这种情况下,只返回“DDD”)。我错过了什么?谢谢

4

1 回答 1

3

如果您使用的是 .NET 正则表达式,那么您的表达式将起作用,因为捕获组将捕获其所有值。否则,您必须使用更棘手的正则表达式或分两步进行匹配,首先匹配{ ... }组,然后匹配其中的元素。

棘手的正则表达式看起来像:

(?:{|\G(?!^),)   # match a { or where the previous match ended followed by a ,
\s+              # space between elements
(\S+)\s+\(\d+\)  # an element
(?=[^{]*})       # make sure it's eventually followed by a }

You can use that expression as it's written if you use the /x flag (can also be set by adding (?x) in the beginning of the expression).

The regex without the comments:

(?:{|\G(?!^),)\s+(\S+)\s+\(\d+\)(?=[^{]*})

This expression uses \G which your regex flavor has to support. Most modern regex flavors have it, including: Perl, PCRE (PHP/etc), .NET.

Note that such an expression isn't perfect. It would capture AAA and BBB in the following string for example:

{ AAA (1), BBB (23), CCC, something invalid here #¤% ))),,,,!! }

Altho that can be fixed if required (except for the counter).

于 2012-06-05T12:58:08.670 回答