regex - 带有转义引号的引用字符串的正则表达式

Question

如何" It's big \"problem "使用正则表达式获取子字符串？

s = ' function(){  return " It\'s big \"problem  ";  }';

score 184 · Accepted Answer

/"(?:[^"\\]|\\.)*"/

在 Regex Coach 和 PCRE Workbench 中工作。

JavaScript 中的测试示例：

    var s = ' function(){ return " Is big \\"problem\\", \\no? "; }';
    var m = s.match(/"(?:[^"\\]|\\.)*"/);
    if (m != null)
        alert(m);

score 37 · Accepted Answer

这个来自许多 linux 发行版中可用的 nanorc.sample。它用于 C 风格字符串的语法高亮

\"(\\.|[^\"])*\"

score 22 · Accepted Answer

正如ePharaoh所提供的，答案是

/"([^"\\]*(\\.[^"\\]*)*)"/

要将上述内容应用于单引号或双引号字符串，请使用

/"([^"\\]*(\\.[^"\\]*)*)"|\'([^\'\\]*(\\.[^\'\\]*)*)\'/

score 11 · Accepted Answer

Most of the solutions provided here use alternative repetition paths i.e. (A|B)*.

You may encounter stack overflows on large inputs since some pattern compiler implements this using recursion.

Java for instance: http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6337993

Something like this: "(?:[^"\\]*(?:\\.)?)*", or the one provided by Guy Bedford will reduce the amount of parsing steps avoiding most stack overflows.

score 9 · Accepted Answer

9

/(["\']).*?(?<!\\)(\\\\)*\1/is

应该使用任何带引号的字符串

于 2008-10-30T10:58:32.710 回答

score 9 · Accepted Answer

"(?:\\"|.)*?"

交替使用\"和.传递转义引号，而惰性量词*?可确保您不会超出引用字符串的末尾。适用于 .NET Framework RE 类

score 8 · Accepted Answer

/"(?:[^"\\]++|\\.)*+"/

直接取自man perlre安装了 Perl 5.22.0 的 Linux 系统。作为一种优化，这个正则表达式使用两者的“posessive”形式+并*防止回溯，因为事先知道没有右引号的字符串在任何情况下都不匹配。

score 5 · Accepted Answer

这个在 PCRE 上完美运行，并且不属于 StackOverflow。

"(.*?[^\\])??((\\\\)+)?+"

解释：

每个带引号的字符串都以 Char: ";
它可以包含任意数量的任意字符：.*?{Lazy match}; 以非转义字符结尾[^\\]；
语句 (2) 是 Lazy(!) 可选的，因为字符串可以为空 ("")。所以：(.*?[^\\])??
最后，每个带引号的字符串都以 Char( ") 结尾，但它前面可以有偶数个转义符号对(\\\\)+；并且它是 Greedy(!) 可选的：((\\\\)+)?+{Greedy matching}，因为字符串可以为空或没有结尾对！

score 3 · Accepted Answer

之前没有提到的一个选项是：

反转字符串。
对反转的字符串执行匹配。
重新反转匹配的字符串。

这具有能够正确匹配转义的打开标签的额外好处。

假设您有以下字符串；String \"this "should" NOT match\" and "this \"should\" match" 在这里，不应该和应该\"this "should" NOT match\"匹配。"should"最重要的是this \"should\" match应该匹配\"should\"而不应该匹配。

先举个例子。

// The input string.
const myString = 'String \\"this "should" NOT match\\" and "this \\"should\\" match"';

// The RegExp.
const regExp = new RegExp(
    // Match close
    '([\'"])(?!(?:[\\\\]{2})*[\\\\](?![\\\\]))' +
    '((?:' +
        // Match escaped close quote
        '(?:\\1(?=(?:[\\\\]{2})*[\\\\](?![\\\\])))|' +
        // Match everything thats not the close quote
        '(?:(?!\\1).)' +
    '){0,})' +
    // Match open
    '(\\1)(?!(?:[\\\\]{2})*[\\\\](?![\\\\]))',
    'g'
);

// Reverse the matched strings.
matches = myString
    // Reverse the string.
    .split('').reverse().join('')
    // '"hctam "\dluohs"\ siht" dna "\hctam TON "dluohs" siht"\ gnirtS'

    // Match the quoted
    .match(regExp)
    // ['"hctam "\dluohs"\ siht"', '"dluohs"']

    // Reverse the matches
    .map(x => x.split('').reverse().join(''))
    // ['"this \"should\" match"', '"should"']

    // Re order the matches
    .reverse();
    // ['"should"', '"this \"should\" match"']

好的，现在解释正则表达式。这是正则表达式可以很容易地分成三部分。如下：

# Part 1
(['"])         # Match a closing quotation mark " or '
(?!            # As long as it's not followed by
  (?:[\\]{2})* # A pair of escape characters
  [\\]         # and a single escape
  (?![\\])     # As long as that's not followed by an escape
)
# Part 2
((?:          # Match inside the quotes
(?:           # Match option 1:
  \1          # Match the closing quote
  (?=         # As long as it's followed by
    (?:\\\\)* # A pair of escape characters
    \\        # 
    (?![\\])  # As long as that's not followed by an escape
  )           # and a single escape
)|            # OR
(?:           # Match option 2:
  (?!\1).     # Any character that isn't the closing quote
)
)*)           # Match the group 0 or more times
# Part 3
(\1)           # Match an open quotation mark that is the same as the closing one
(?!            # As long as it's not followed by
  (?:[\\]{2})* # A pair of escape characters
  [\\]         # and a single escape
  (?![\\])     # As long as that's not followed by an escape
)

这在图像形式上可能更清晰：使用Jex 的 Regulex生成

github 上的图像（JavaScript 正则表达式可视化工具。）对不起，我没有足够高的声誉来包含图像，所以，它现在只是一个链接。

以下是使用此概念的示例函数的要点，该概念更高级：https ://gist.github.com/scagood/bd99371c072d49a4fee29d193252f5fc#file-matchquotes-js

score 2 · Accepted Answer

这是一个同时使用 " 和 ' 的版本，您可以在开始时轻松添加其他内容。

("|')(?:\\\1|[^\1])*?\1

它使用反向引用 (\1) 精确匹配第一组中的内容 (" 或 ')。

http://www.regular-expressions.info/backref.html

score 0 · Accepted Answer

必须记住，正则表达式并不是所有字符串的灵丹妙药。有些事情用光标和线性、手动、搜索更简单。CFL可以很简单地完成这个技巧，但是没有很多 CFL 实现（afaik）。

score 0 · Accepted Answer

0

如果从头开始搜索，也许这可以工作？

\"((\\\")|[^\\])*\"

于 2013-04-10T21:14:32.603 回答

score 0 · Accepted Answer

0

于 2013-12-03T13:36:39.243 回答

score 0 · Accepted Answer

如果您的 IDE 是 IntelliJ Idea，您可以忘记所有这些令人头疼的问题并将您的正则表达式存储到一个字符串变量中，当您将其复制粘贴到双引号内时，它会自动更改为正则表达式可接受的格式。

Java 中的示例：

String s = "\"en_usa\":[^\\,\\}]+";

现在你可以在你的正则表达式或任何地方使用这个变量。

score 0 · Accepted Answer

我在尝试删除可能会干扰某些文件解析的带引号的字符串时遇到了类似的问题。

我最终得到了一个两步解决方案，它击败了你能想出的任何复杂的正则表达式：

 line = line.replace("\\\"","\'"); // Replace escaped quotes with something easier to handle
 line = line.replaceAll("\"([^\"]*)\"","\"x\""); // Simple is beautiful

更容易阅读并且可能更有效。

score -1 · Accepted Answer

搞砸了正则表达式并最终得到了这个正则表达式：（不要问我它是如何工作的，即使我写了它我也几乎不明白，哈哈）

"(([^"\\]?(\\\\)?)|(\\")+)+"

regex - 带有转义引号的引用字符串的正则表达式

16 回答 16

Related

Reference