我目前正在尝试在以下输入中匹配和捕获文本:
field: one two three field: "moo cow" field: +this
我可以匹配field:
,[a-z]*\:
但是我似乎无法匹配其余的内容,到目前为止,我的尝试只导致捕获所有我不想做的事情。
如果您知道它总是字面意思field:
,则绝对不需要正则表达式:
var delimiters = new String[] {"field:"};
string[] values = input.Split(delimiters, StringSplitOptions.RemoveEmptyEntries);
但是,根据您的正则表达式,我假设名称field
可以有所不同,只要它位于冒号前面。您可以尝试捕捉一个单词:
,然后捕捉到下一个单词的所有内容(使用前瞻)。
foreach(Match match in Regex.Matches(input, @"([a-z]+):((?:(?![a-z]+:).)*)"))
{
string fieldName = match.Groups[1].Value;
string value = match.Groups[2].Value;
}
正则表达式的解释:
( # opens a capturing group; the content can later be accessed with Groups[1]
[a-z] # lower-case letter
+ # one or more of them
) # end of capturing group
: # a literal colon
( # opens a capturing group; the content can later be accessed with Groups[2]
(?: # opens a non-capturing group; just a necessary subpattern which we do not
# need later any more
(?! # negative lookahead; this will NOT match if the pattern inside matches
[a-z]+:
# a word followed by a colon; just the same as we used at the beginning of
# the regex
) # end of negative lookahead (not that this does not consume any characters;
# it LOOKS ahead)
. # any character (except for line breaks)
) # end of non-capturing group
* # 0 or more of those
) # end of capturing group
所以首先我们匹配anylowercaseword:
。然后我们一次再匹配一个字符,每个字符都检查这个字符不是anotherlowercaseword:
. 使用捕获组,我们稍后可以分别找到字段的名称和字段的值。
不要忘记您实际上可以匹配正则表达式中的文字字符串。如果你的模式是这样的:
field\:
您将按字面意思匹配“字段:”,仅此而已。