4

我有一个基本上是这样的要求。如果我有一串文本,例如

"There once was an 'ugly' duckling but it could 
never have been \'Scarlett\' Johansen"

那么我想匹配尚未转义的引号。这些将是“丑陋”周围的那些,而不是“斯嘉丽”周围的那些。

我已经花了很长时间使用一个小的 C# 控制台应用程序来测试事情,并提出了以下解决方案。

private static void RegexFunAndGames() {

  string result;
  string sampleText = @"Mr. Grant and Ms. Kelly  starred in the film \'To Catch A Thief' but not in 'Stardust' because they'd stopped acting by then";
  string rePattern = @"\\'";
  string replaceWith = "'";

  Console.WriteLine(sampleText);

  Regex regEx = new Regex(rePattern);
  result = regEx.Replace(sampleText, replaceWith);

  result = result.Replace("'", @"\'");

  Console.WriteLine(result);
}

基本上我所做的是一个两步过程,找到那些已经被转义的字符,撤消然后再次执行所有操作。这听起来有点笨拙,我觉得可能有更好的方法。

测试信息

我得到了两个非常好的答案,所以我认为值得进行测试,看看哪个运行得更好。我有这两个功能:

    private static string RegexReplace(string sampleText) {
        Regex regEx = new Regex("(?<!\\\\)'");
        return regEx.Replace(sampleText, "\\'");           
    }

    private static string ReplaceTest(string sampleText) {
        return sampleText.Replace(@"\'", "'").Replace("'", @"\'");
    }

我通过控制台应用程序中的 Main 方法调用它们:

    static void Main(string[] args) {

        string sampleText = @"Mr. Grant and Ms. Kelly  starred in the film \'To Catch A Thief'  but not in 'Stardust' because they'd stopped acting by then.";
        string testReplace = string.Empty;
        System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();

        sw.Start();
        for (int i = 1000000; i > 0; i--) {
            testReplace = ReplaceTest(sampleText);
        }

        sw.Stop();
        Console.WriteLine("This method took '" + sw.ElapsedMilliseconds.ToString() + "'");

        sw.Reset();
        sw.Start();
        for (int i = 1000000; i > 0; i--) {
            testReplace = RegexReplace(sampleText);
        }

        sw.Stop();
        Console.WriteLine("This method took '" + sw.ElapsedMilliseconds.ToString() + "'");
}

ReplaceTest 方法需要 2068 毫秒。RegexReplace 方法需要 9372 毫秒。我已经运行了几次这个测试,并且 ReplaceTest 总是最快的。

4

3 回答 3

4

您可以使用否定的lookbehind来确保引用没有被转义:下面的表达式

(?<!\\)'

匹配单引号,除非它前面紧跟一个斜杠。

请注意,进入字符串常量的斜线必须加倍。

var sampleText = @"Mr. Grant and Ms. Kelly  starred in the film \'To Catch A Thief' but not in 'Stardust' because they'd stopped acting by then";
var regEx = new Regex("(?<!\\\\)'");
var result = regEx.Replace(sampleText, "\\'");
Console.WriteLine(result);

以上印刷品

Mr. Grant and Ms. Kelly  starred in the film \'To Catch A Thief\' but not in \'Stardust\' because they\'d stopped acting by then

链接到ideone。

于 2012-11-16T16:51:40.710 回答
3

我很惊讶您为什么使用 RegEx 来执行此操作,为什么不简单地使用:

string result = sampleText.Replace(@"\'", "'").Replace("'", @"\'");

这将逃脱所有未逃脱的'.

它将首先使所有转义'(单引号)未转义,然后将 escape all

好吧,如果RegEx is the requirement,您将接受您已经说过的正确解决方案。

于 2012-11-16T18:56:20.447 回答
-1

你可以使用

    string rePattern = @"[\\'|\']"; 

反而

于 2012-11-16T16:55:21.843 回答