你的问题有一部分很容易解决,另一部分很难解决。
最简单的部分是确保你有一个完整的单词:\m
约束转义只匹配单词的开头,而\M
约束转义匹配结尾,所以我们可以使用:
# Nothing capturing; you can add that as necessary
# Ellipsis for the bits I've not talked about yet
regexp {\m(?:while|if|for)\M\s*...} ...
The very hard part is matching the part in parentheses. The problem is that that's really a “language” (in a theoretical sense) that requires a different kind of parser than a regular expression to match (i.e., a recursive descent parser, which has a more complex state model than the finite automatons used in RE matching). What's more, using ()
characters in those expressions is common. The easiest approach is instead match against a close parenthesis that's at the end of the line, possibly followed by a semicolon, but that's definitely not properly correct. Alternatively, supporting a limited number of levels of nested parens is also possible.
# Match a few levels...
regexp {\m(?:while|if|for)\M\s*\((?:[^()]|\((?:[^()]|\([^()]*\))*\))*\)} ...
So, let's break that RE down:
\m Word start
(?:while|if|for) One of the keywords
\M Word end
\s* Optional spaces
\( Open paren
(?: Either...
[^()] Non-paren...
| Or...
\( Open paren
(?: Either...
[^()] Non-paren...
| Or...
\( Open paren
[^()]* Non-parens
\) Close paren
)* ... as many of the above as needed
\) Close paren
)* ... as many of the above as needed
\) Close paren
If you look at the above, you'll notice a pattern. Yes, you can keep on nesting to do as deep as you want. What you can't do is make the RE engine do that nesting for you.