c++ - 调整 perlfaq 脚本以删除 C 和 C++ 注释，改为在 C 和 C++ 注释中执行搜索和替换

Question

我的任务是（尝试）在大型代码库中搜索和替换单词后缀，仅当它出现在评论中时。所有注释都是 /* 或 // 类型，但它们保证包括大多数可以想象的边缘情况。

所以我想改变这个：

/* blah blah something__suffix blah */

对此：

/* blah blah something blah */

但我也想改变这一点：

// blah blah something__suffix blah

对此：

// blah blah something blah

和这个：

/*
 * blah blah something__suffix blah 
 */

对此：

/*
 * blah blah something blah 
 */

和这个：

/** 

// blah blah something__suffix blah 

*/

对此：

/** 

// blah blah something blah 

*/

令人作呕（字面意思）。

最初我觉得这是一个解析器任务，我安装了 cochinelle，它确实可以解析我的评论，但它被我的预处理器宏卡住了，对于那些只是把它作为一次性任务的人来说，解决方法似乎很复杂。所以现在我正在考虑正则表达式。

我还没有找到很多关于在 C 和 C++ 注释中使用正则表达式进行真正强大的搜索和替换的建议（除了“你需要一个解析器”），但我确实注意到似乎有一个经过良好道路测试的 perl perl FAQ 上的脚本，用于在此处删除这两种样式的注释。

如下：

$/ = undef;
$_ = <>;

s#/\*[^*]*\*+([^/*][^*]*\*+)*/|//([^\\]|[^\n][\n]?)*?\n|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#defined $3 ? $3 : ""#gse;

print;

我的问题：如何调整这个脚本，而不是剥离评论，然后可以搜索已被识别为评论的文本以查找后缀并删除后缀，从而使评论的其余部分保持不变？

score 1 · Accepted Answer

您需要分两步完成，因为您可能有

/* foo__suffix bar__suffix */

首先，提取评论，然后替换__suffix评论中的任何内容。

s{
   \G
   (?:(?!/[*/]).)*
   \K
   (   /[*] (?:(?![*]/).)* [*]/
   |   //   [^\n]*
   )
}{
   my $comment = $1;
   $comment =~ s/(?<=\w)__suffix//g;
   $comment
}xes;

笔记：

(?:(?!STRING).)就是(?:STRING)这样。[^CHAR]_CHAR
//如果你有或/*在字符串文字中，我的解决方案会搞砸。
如果您__suffix可以删除前面没有标识符的实例，则可以删除(?<=\w).

如果您使用的是 5.14 或更高版本，则可以简化

s{...}{
   my $comment = $1;
   $comment =~ s/(?<=\w)__suffix//g;
   $comment
}xes;

至

s{...}{
   $1 =~ s/(?<=\w)__suffix//rg
}xes;

score 1 · Accepted Answer

我不确定这是否是一个好的解决方案，但它确实有效。

use strict; use warnings; use feature qw(say);
my @lines = (
qq~Example 1:
/* blah blah something__suffix blah */~,
qq~Example 2:
// blah blah something__suffix blah needs a newline at the end
~,
qq~Example 3:
/*
 * blah blah something__suffix blah 
 */~,
qq~Example 4:
/** 

// blah blah something__suffix blah 

*/~,
qq~Example 5 (string):
foobar '// blah blah something__suffix blah '~,
qq~Example 6:
public void main { return; } // this does__suffix nothing but needs newline
~,
);

foreach (@lines) {
  print "Before:\n$_\n";
  s!/\*[^*]*\*+([^/*][^*]*\*+)*/|//([^\\]|[^\n][\n]?)*?\n|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)!
  { if (defined $3) { $3 } else { (my $temp = ${^MATCH}) =~ s/__suffix//; $temp;} } 
  !gsepx;

  print "After:\n$_\n\n";
}

它可能效率不高，但我认为这对你的工作并不重要。

c++ - 调整 perlfaq 脚本以删除 C 和 C++ 注释，改为在 C 和 C++ 注释中执行搜索和替换

2 回答 2

Related

Reference