我需要标记一个字符串,这样标记要么是:
- 双引号内
- 空格分隔
带引号的字符串必须处理转义:this: "is included in \"single token\""
应该变成这样:
[this:] [is included in "single token"]
或这个:
[this:] [is included in \"single token\"]
(令牌是@"[^\s]"
,不是@"\w"
)
我已经看到了许多解决部分问题的方法:
- 查找所有带引号的字符串 - 但这会留下所有未加引号的标记
- 查找所有未加引号的标记 - 忽略带引号的字符串
不幸的是,我找不到将这两个问题的解决方案融合在一起的方法......
这就是我迄今为止所拥有的:
static void Main(string[] args) {
var inputs = new List<string>
{
@"bef\`ore`xy z`after",
@"start `with simple` expression: `i am xprsion` and this is empty: `` ...",
@"now `with some tabs` expression",
@"nothing \but\ escapers\\\",
@"some #@ other kind$ of whildcards...",
@"and now `with \`allegedly\` escape` char",
@"tight` or even `connected",
}.Select(s => s.Replace('`', '"'));
var sections = new[]
{
@"(?<i>[^\s]+)",
@"((?<!\\)`(?<i>.*?)(?<!\\)`)", // quoted
};
var pattern = string.Join("|", sections).Replace("`","\"");
foreach (var i in inputs)
{
Regex.Matches(i, pattern)
.Cast<Match>()
.Select(m => m.Groups["i"].Value)
.ToList()
.ForEach(s => Console.Write("[{0}]", s));
Console.WriteLine();
}
Console.ReadKey();
}
但是引用模式和空白分隔模式的组合破坏了这一切......