我有一个基本上是这样的要求。如果我有一串文本,例如
"There once was an 'ugly' duckling but it could
never have been \'Scarlett\' Johansen"
那么我想匹配尚未转义的引号。这些将是“丑陋”周围的那些,而不是“斯嘉丽”周围的那些。
我已经花了很长时间使用一个小的 C# 控制台应用程序来测试事情,并提出了以下解决方案。
private static void RegexFunAndGames() {
string result;
string sampleText = @"Mr. Grant and Ms. Kelly starred in the film \'To Catch A Thief' but not in 'Stardust' because they'd stopped acting by then";
string rePattern = @"\\'";
string replaceWith = "'";
Console.WriteLine(sampleText);
Regex regEx = new Regex(rePattern);
result = regEx.Replace(sampleText, replaceWith);
result = result.Replace("'", @"\'");
Console.WriteLine(result);
}
基本上我所做的是一个两步过程,找到那些已经被转义的字符,撤消然后再次执行所有操作。这听起来有点笨拙,我觉得可能有更好的方法。
测试信息
我得到了两个非常好的答案,所以我认为值得进行测试,看看哪个运行得更好。我有这两个功能:
private static string RegexReplace(string sampleText) {
Regex regEx = new Regex("(?<!\\\\)'");
return regEx.Replace(sampleText, "\\'");
}
private static string ReplaceTest(string sampleText) {
return sampleText.Replace(@"\'", "'").Replace("'", @"\'");
}
我通过控制台应用程序中的 Main 方法调用它们:
static void Main(string[] args) {
string sampleText = @"Mr. Grant and Ms. Kelly starred in the film \'To Catch A Thief' but not in 'Stardust' because they'd stopped acting by then.";
string testReplace = string.Empty;
System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
sw.Start();
for (int i = 1000000; i > 0; i--) {
testReplace = ReplaceTest(sampleText);
}
sw.Stop();
Console.WriteLine("This method took '" + sw.ElapsedMilliseconds.ToString() + "'");
sw.Reset();
sw.Start();
for (int i = 1000000; i > 0; i--) {
testReplace = RegexReplace(sampleText);
}
sw.Stop();
Console.WriteLine("This method took '" + sw.ElapsedMilliseconds.ToString() + "'");
}
ReplaceTest 方法需要 2068 毫秒。RegexReplace 方法需要 9372 毫秒。我已经运行了几次这个测试,并且 ReplaceTest 总是最快的。