0

我正在尝试编写一个正则表达式来搜索从 C++ 源代码文件中读取的字符串中的 for/if/while 关键字,但排除包含它们的任何单词,例如:

WhatifYes()
Whatfor()
Waitforwhile()

我写了我的正则表达式,如下所示:

if { [ regexp {(for|while|if)(\s+)(\()} $lineValue ] } { 

但它没有处理以下情况:

while(( int x = 0 ) > 0 );
while(( int x = 0 ) > 0 )
for(int y =0 ; ; )
for(int y =0 ; ; );
if( (int x = 9) > 0 )
if( (int x = 9) > 0 );

最初我认为因为我的正则表达式的框架如下:

if/for/while \s+ ( #space or multiple spaces

但我尝试在上面的示例中包含空格:

while (( int x = 0 ) > 0 );
while (( int x = 0 ) > 0 )
if ( (int x = 9) > 0 )
if ( (int x = 9) > 0 );

正则表达式仍然不起作用 - 请让我知道我应该使用什么正则表达式来捕获它们?

4

2 回答 2

4

你的问题有一部分很容易解决,另一部分很难解决。

最简单的部分是确保你有一个完整的单词:\m约束转义只匹配单词的开头,而\M约束转义匹配结尾,所以我们可以使用:

# Nothing capturing; you can add that as necessary
# Ellipsis for the bits I've not talked about yet
regexp {\m(?:while|if|for)\M\s*...} ...

The very hard part is matching the part in parentheses. The problem is that that's really a “language” (in a theoretical sense) that requires a different kind of parser than a regular expression to match (i.e., a recursive descent parser, which has a more complex state model than the finite automatons used in RE matching). What's more, using () characters in those expressions is common. The easiest approach is instead match against a close parenthesis that's at the end of the line, possibly followed by a semicolon, but that's definitely not properly correct. Alternatively, supporting a limited number of levels of nested parens is also possible.

# Match a few levels...
regexp {\m(?:while|if|for)\M\s*\((?:[^()]|\((?:[^()]|\([^()]*\))*\))*\)} ...

So, let's break that RE down:

\m                                Word start
(?:while|if|for)                  One of the keywords 
\M                                Word end
\s*                               Optional spaces
\(                                Open paren
  (?:                             Either...
    [^()]                           Non-paren...
  |                               Or...
    \(                              Open paren
      (?:                           Either...
        [^()]                         Non-paren...
      |                             Or...
        \(                            Open paren
          [^()]*                      Non-parens
        \)                            Close paren
      )*                            ... as many of the above as needed
    \)                              Close paren
  )*                              ... as many of the above as needed
\)                                Close paren

If you look at the above, you'll notice a pattern. Yes, you can keep on nesting to do as deep as you want. What you can't do is make the RE engine do that nesting for you.

于 2012-11-01T10:00:50.887 回答
0

In your regex you are using \s+. That means there must be at least one space/tab/line-break. Use \s* (0 or more whitespace) and add logic for what comes before:

if { [ regexp {(^|[ \t])(for|while|if)(\s*)(\()} $lineValue ] } { 
于 2012-11-01T10:10:14.687 回答