php - 正则表达式匹配分号，但不在注释或引号中

Question

我想使用正则表达式测试来返回所有匹配的分号，但前提是它们在引号（嵌套引号）之外，而不是注释代码。

testfunc();
testfunc2("test;test");
testfunc3("test';test");
testfunc4('test";test');
//testfunc5();
/* testfunc6(); */
/*
  testfunc7();
*/
/*
  //testfunc8();
*/
testfunc9("test\"test");

正则表达式字符串只应返回每个示例末尾的分号。

我一直在玩下面的，但它在示例 testfunc3 和 testfun9 上失败了。它也不会忽略评论...

/;(?=(?:(?:[^"']*+["']){2})*+[^"']*+\z)/g

任何帮助，将不胜感激！

score 3 · Accepted Answer

没有时间将其转换为 JS。这是 Perl 示例中的正则表达式，但该正则表达式可以与 JS 一起使用。

C 注释，双/单字符串引号 - 取自 Jeffrey Friedl 的“strip C comments”，后来由 Fred Curtis 修改，适用于包括 C++ 注释和目标分号（由我）。

捕获组 1（可选），包括分号之前的所有内容，组 2 是分号（但可以是任何内容）。

修饰符是 //xsg。

下面的正则表达式用于替换运算符 s/pattern/replace/xsg （即：替换为 $1[$2] ）。

我认为你的帖子只是想知道这是否可以做到。如果您真的需要，我可以包含一个带注释的正则表达式。

$str = <<EOS;
testfunc();
testfunc2("test;test"); 
testfunc3("test';test");
testfunc4('test";test');
//testfunc5();
/* testfunc6(); */
/*
  testfunc7();
*/
/*
  //testfunc8();
*/
testfunc9("test\"test");
EOS

$str =~ s{
     ((?:(?:/\*[^*]*\*+(?:[^/*][^*]*\*+)*/|//(?:[^\\]|\\\n?)*?\n)|(?:"(?:\\.|[^"\\])*"|'(?:\\.|[^'\\])*'|.[^/"'\\;]*))*?)(;)
 }
 {$1\[$2\]}xsg;

print $str;

输出

testfunc()[;]
testfunc2("test;test")[;]
testfunc3("test';test")[;]
testfunc4('test";test')[;]
//testfunc5();
/* testfunc6(); */
/*
  testfunc7();
*/
/*
  //testfunc8();
*/
testfunc9("test"test")[;]

用评论扩展

 (  ## Optional non-greedy, Capture group 1
   (?:
      ## Comments
        (?:
            /\*         ##  Start of /* ... */ comment
            [^*]*\*+    ##  Non-* followed by 1-or-more *'s
            (?:
                [^/*][^*]*\*+
            )*          ##  0-or-more things which don't start with /
                        ##    but do end with '*'
            /           ##  End of /* ... */ comment
          |  
            //          ## Start of // ... comment
            (?:
                [^\\]         ## Any Non-Continuation character ^\
              |               ##   OR
                \\\n?         ## Any Continuation character followed by 0-1 newline \n

             )*?            ## To be done 0-many times, stopping at the first end of comment

             \n         ##  End of // comment
        )

     | ##  OR,  various things which aren't comments, group 2:
        (?:
            " (?: \\. | [^"\\] )* "  ## Double quoted text
          |
            ' (?: \\. | [^'\\] )* '  ## Single quoted text
          |
            .           ##  Any other char
            [^/"'\\;]*  ##  Chars which doesn't start a comment, string, escape
        )               ##  or continuation (escape + newline) AND are NOT semi-colon ;
   )*?
 )
  ## Capture grou 2, the semi-colon
 (;)

score 1 · Accepted Answer

这适用于您的所有示例，但这取决于您要应用它的代码与示例的接近程度：

;(?!\S|(?:[^;]*\*/))

;- 匹配分号

(?!- 负前瞻 - 确保 ->

\S- 分号后没有非空白字符

|(?:[^;]*\*/))- 如果有空格字符，请确保直到下一个;没有*/符号

如果您对此有任何问题，请告诉我。

如果这是您想要使用的东西，那么使用正则表达式没有害处，但如果它是您可能想要在以后重用的东西，那么正则表达式可能证明不是最可靠的工具。

编辑：

修复第 5 项 - 现在分号将位于第一个匹配组中：

^(?:[^/]*)(;)(?!\S|(?:[^;]*\*/))

php - 正则表达式匹配分号，但不在注释或引号中

2 回答 2

Related

Reference