1

我有这个正则表达式

("[^"]*")|('[^']*')|([^<>]+)

当交给这个输入字符串时

<telerik:RadTab Text="RGB">

我希望它匹配RGB。但是,由于最后一个替代方案会产生更长的字符串,因此不会。

我理想中想要的是这样的:

  1. 如果有双引号子字符串,匹配它,包括双引号。
  2. 否则,如果存在单引号子字符串,则匹配它,包括单引号。
  3. 否则,如果有一个字符串被尖括号包围,则匹配它,不包括尖括号。

这个逻辑可以在单个正则表达式中完成吗?

4

3 回答 3

3
    var strings = new[]
        {"<telerik:RadTab Text=\"RGB\">", "<telerik:RadTab Text=RGB>", "<telerik:RadTab Text='RGB'>"};
    var r = new Regex("<([^<\"']+[^>\"']+)>|(\"[^\"]*\")|('[^']*')");
    foreach (var s1 in strings)
    {
        Console.WriteLine(s1);
        var match = r.Match(s1);
        Console.WriteLine(match.Value);
        Console.WriteLine();
    }
    Console.ReadLine();
于 2013-11-04T21:11:27.450 回答
2

这个问题的解决方案之一是使用前瞻断言:

(?=("[^"]*"))|(?=('[^']*'))|(?=<([^<>]+)>)

让我们分解正则表达式以获得更好的视图:

(?=             # zero-width assertion, look ahead if there is ...
    ("[^"]*")   # a double quoted string, group it in group number 1
)               # end of lookahead
|               # or
(?=             # zero-width assertion, look ahead if there is ...
('[^']*')       # a single quoted string, group it in group number 2
)               # end of lookahead
|               # or
(?=             # zero-width assertion, look ahead if there is ...
<([^<>]+)>      # match anything except <> between <> one or more times and group it in group number 3
)               # end of lookahead

你可能会想what in the world is he doing?,没问题,我会进一步解释你的正则表达式失败的原因。

我们有以下字符串<telerik:RadTab Text="RGB">

<telerik:RadTab Text="RGB">
^ the regex engine starts here
since there is no match with ("[^"]*")|('[^']*')|([^<>]+)
it will look further !

<telerik:RadTab Text="RGB">
 ^ the regex engine will now take a look here
it will check if there is "[^"]*", well obviously there isn't
now since there is an alternation, the regex engine will
check if there is '[^']*', meh same thing
it will now check if there is [^<>]+, but hey it matches !

So your regex engine will "eat" it like so
<telerik:RadTab Text="RGB">
 ^^^^^^^^^^^^^^^^^^^^^^^^^ and match this, by eating I mean it's advancing
Now the regex engine is at this point
<telerik:RadTab Text="RGB">
                          ^ and obviously, there is no match
The problem is, you want it to "step" back to match "RGB"
The regex engine won't go back for you :(

这就是为什么我们对组使用零宽度断言,它不会吃(不会前进),如果你在前瞻中使用一个组,你仍然会得到匹配的组。

<telerik:RadTab Text="RGB">
^ So when it comes here, it will match it with (?=<([^<>]+)>)
but it won't eat the whole matched string
Now obviously, the regex needs to continue to look for other matches
So it comes here:
<telerik:RadTab Text="RGB">
 ^ no match
<telerik:RadTab Text="RGB">
  ^ no match
.....
until
<telerik:RadTab Text="RGB">
                     ^ hey there is a match using (?=("[^"]*"))
it will then advance further
<telerik:RadTab Text="RGB">
                      ^ no match
.... until it reaches the end

当然,如果你有一个这样的字符串,<telerik:RadTab Text="RGB'lol'">它仍然会匹配'lol'双引号值并将其放在第 2 组中。

Online demo
                                                                                                                                                        正则表达式摇滚!

于 2013-11-04T21:18:35.513 回答
1

编辑:考虑以下正则表达式...

(\".*?\"|\'.*?\'|(?<=\<).*?(?=\>))
于 2013-11-04T21:23:24.667 回答