c# - 查找未转义字符时使用正则表达式替换

Question

我有一个基本上是这样的要求。如果我有一串文本，例如

"There once was an 'ugly' duckling but it could 
never have been \'Scarlett\' Johansen"

那么我想匹配尚未转义的引号。这些将是“丑陋”周围的那些，而不是“斯嘉丽”周围的那些。

我已经花了很长时间使用一个小的 C# 控制台应用程序来测试事情，并提出了以下解决方案。

private static void RegexFunAndGames() {

  string result;
  string sampleText = @"Mr. Grant and Ms. Kelly  starred in the film \'To Catch A Thief' but not in 'Stardust' because they'd stopped acting by then";
  string rePattern = @"\\'";
  string replaceWith = "'";

  Console.WriteLine(sampleText);

  Regex regEx = new Regex(rePattern);
  result = regEx.Replace(sampleText, replaceWith);

  result = result.Replace("'", @"\'");

  Console.WriteLine(result);
}

基本上我所做的是一个两步过程，找到那些已经被转义的字符，撤消然后再次执行所有操作。这听起来有点笨拙，我觉得可能有更好的方法。

测试信息

我得到了两个非常好的答案，所以我认为值得进行测试，看看哪个运行得更好。我有这两个功能：

    private static string RegexReplace(string sampleText) {
        Regex regEx = new Regex("(?<!\\\\)'");
        return regEx.Replace(sampleText, "\\'");           
    }

    private static string ReplaceTest(string sampleText) {
        return sampleText.Replace(@"\'", "'").Replace("'", @"\'");
    }

我通过控制台应用程序中的 Main 方法调用它们：

    static void Main(string[] args) {

        string sampleText = @"Mr. Grant and Ms. Kelly  starred in the film \'To Catch A Thief'  but not in 'Stardust' because they'd stopped acting by then.";
        string testReplace = string.Empty;
        System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();

        sw.Start();
        for (int i = 1000000; i > 0; i--) {
            testReplace = ReplaceTest(sampleText);
        }

        sw.Stop();
        Console.WriteLine("This method took '" + sw.ElapsedMilliseconds.ToString() + "'");

        sw.Reset();
        sw.Start();
        for (int i = 1000000; i > 0; i--) {
            testReplace = RegexReplace(sampleText);
        }

        sw.Stop();
        Console.WriteLine("This method took '" + sw.ElapsedMilliseconds.ToString() + "'");
}

ReplaceTest 方法需要 2068 毫秒。RegexReplace 方法需要 9372 毫秒。我已经运行了几次这个测试，并且 ReplaceTest 总是最快的。

score 4 · Accepted Answer

您可以使用否定的lookbehind来确保引用没有被转义：下面的表达式

(?<!\\)'

匹配单引号，除非它前面紧跟一个斜杠。

请注意，进入字符串常量的斜线必须加倍。

var sampleText = @"Mr. Grant and Ms. Kelly  starred in the film \'To Catch A Thief' but not in 'Stardust' because they'd stopped acting by then";
var regEx = new Regex("(?<!\\\\)'");
var result = regEx.Replace(sampleText, "\\'");
Console.WriteLine(result);

以上印刷品

Mr. Grant and Ms. Kelly  starred in the film \'To Catch A Thief\' but not in \'Stardust\' because they\'d stopped acting by then

链接到ideone。

score 3 · Accepted Answer

我很惊讶您为什么使用 RegEx 来执行此操作，为什么不简单地使用：

string result = sampleText.Replace(@"\'", "'").Replace("'", @"\'");

这将逃脱所有未逃脱的'.

它将首先使所有转义'（单引号）未转义，然后将 escape all。

好吧，如果RegEx is the requirement，您将接受您已经说过的正确解决方案。

score -1 · Accepted Answer

-1

你可以使用

    string rePattern = @"[\\'|\']";

反而

于 2012-11-16T16:55:21.843 回答

c# - 查找未转义字符时使用正则表达式替换

3 回答 3

Related

Reference