regex - TCL：正则表达式在字符串中查找是否为

Question

我正在尝试编写一个正则表达式来搜索从 C++ 源代码文件中读取的字符串中的 for/if/while 关键字，但排除包含它们的任何单词，例如：

WhatifYes()
Whatfor()
Waitforwhile()

我写了我的正则表达式，如下所示：

if { [ regexp {(for|while|if)(\s+)(\()} $lineValue ] } {

但它没有处理以下情况：

while(( int x = 0 ) > 0 );
while(( int x = 0 ) > 0 )
for(int y =0 ; ; )
for(int y =0 ; ; );
if( (int x = 9) > 0 )
if( (int x = 9) > 0 );

最初我认为因为我的正则表达式的框架如下：

if/for/while \s+ ( #space or multiple spaces

但我尝试在上面的示例中包含空格：

while (( int x = 0 ) > 0 );
while (( int x = 0 ) > 0 )
if ( (int x = 9) > 0 )
if ( (int x = 9) > 0 );

正则表达式仍然不起作用 - 请让我知道我应该使用什么正则表达式来捕获它们？

score 4 · Accepted Answer

你的问题有一部分很容易解决，另一部分很难解决。

最简单的部分是确保你有一个完整的单词：\m约束转义只匹配单词的开头，而\M约束转义匹配结尾，所以我们可以使用：

# Nothing capturing; you can add that as necessary
# Ellipsis for the bits I've not talked about yet
regexp {\m(?:while|if|for)\M\s*...} ...

The very hard part is matching the part in parentheses. The problem is that that's really a “language” (in a theoretical sense) that requires a different kind of parser than a regular expression to match (i.e., a recursive descent parser, which has a more complex state model than the finite automatons used in RE matching). What's more, using () characters in those expressions is common. The easiest approach is instead match against a close parenthesis that's at the end of the line, possibly followed by a semicolon, but that's definitely not properly correct. Alternatively, supporting a limited number of levels of nested parens is also possible.

# Match a few levels...
regexp {\m(?:while|if|for)\M\s*\((?:[^()]|\((?:[^()]|\([^()]*\))*\))*\)} ...

So, let's break that RE down:

\m                                Word start
(?:while|if|for)                  One of the keywords 
\M                                Word end
\s*                               Optional spaces
\(                                Open paren
  (?:                             Either...
    [^()]                           Non-paren...
  |                               Or...
    \(                              Open paren
      (?:                           Either...
        [^()]                         Non-paren...
      |                             Or...
        \(                            Open paren
          [^()]*                      Non-parens
        \)                            Close paren
      )*                            ... as many of the above as needed
    \)                              Close paren
  )*                              ... as many of the above as needed
\)                                Close paren

If you look at the above, you'll notice a pattern. Yes, you can keep on nesting to do as deep as you want. What you can't do is make the RE engine do that nesting for you.

score 0 · Accepted Answer

In your regex you are using \s+. That means there must be at least one space/tab/line-break. Use \s* (0 or more whitespace) and add logic for what comes before:

if { [ regexp {(^|[ \t])(for|while|if)(\s*)(\()} $lineValue ] } {

regex - TCL：正则表达式在字符串中查找是否为

2 回答 2

Related

Reference