这是一个用 C 编写的经过轻微测试的过滤器,它应该执行您想要的转换。关于这个过滤器的一些评论,即使不是不可能,也很难用正则表达式处理:
- 它忽略用引号括起来的类似评论的序列(因为它们不是评论)
- 如果正在转换的 C99 注释包含将开始或结束 C89 注释的内容,它会调整该序列,因此不会有嵌套注释或注释过早结束(嵌套的
/*
or*/
更改为/+
or /|
)。我不确定你是否需要这个(如果你不需要,它应该很容易删除)
- 上述嵌套注释的修改仅发生在正在转换的 C99 注释中 - 已经是 C89 样式的注释内容不会更改。
- 它不处理 trigraphs 或 digraphs(我认为这只允许丢失由 trigraph 启动的转义序列或行尾延续的可能性
??/
)。
当然,您需要执行自己的测试以确定它是否适合您的目的。
#include <stdio.h>
char* a = " this is /* a test of \" junk // embedded in a '\' string";
char* b = "it should be left alone//";
// comment /* that should ***//// be converted.
/* leave this alone*/// but fix this one
// and "leave these \' \" quotes in a comment alone*
/**** and these '\' too //
*/
enum states {
state_normal,
state_double_quote,
state_single_quote,
state_c89_comment,
state_c99_comment
};
enum states current_state = state_normal;
void handle_char( char ch)
{
static char last_ch = 0;
switch (current_state) {
case state_normal:
if ((last_ch == '/') && (ch == '/')) {
putchar( '*'); /* NOTE: changing to C89 style comment */
current_state = state_c99_comment;
}
else if ((last_ch == '/') && (ch == '*')) {
putchar( ch);
current_state = state_c89_comment;
}
else if (ch == '\"') {
putchar( ch);
current_state = state_double_quote;
}
else if (ch == '\'') {
putchar( ch);
current_state = state_single_quote;
}
else {
putchar( ch);
}
break;
case state_double_quote:
if ((last_ch == '\\') && (ch == '\\')) {
/* we want to output this \\ escaped sequence, but we */
/* don't want to 'remember' the current backslash - */
/* otherwise we'll mistakenly treat the next character*/
/* as being escaped */
putchar( ch);
ch = 0;
}
else if ((ch == '\"') && (last_ch != '\\')) {
putchar( ch);
current_state = state_normal;
}
else {
putchar( ch);
}
break;
case state_single_quote:
if ((last_ch == '\\') && (ch == '\\')) {
/* we want to output this \\ escaped sequence, but we */
/* don't want to 'remember' the current backslash - */
/* otherwise we'll mistakenly treat the next character*/
/* as being escaped */
putchar( ch);
ch = 0;
}
else if ((ch == '\'') && (last_ch != '\\')) {
putchar( ch);
current_state = state_normal;
}
else {
putchar( ch);
}
break;
case state_c89_comment:
if ((last_ch == '*') && (ch == '/')) {
putchar( ch);
ch = 0; /* 'forget' the slash so it doesn't affect a possible slash that immediately follows */
current_state = state_normal;
}
else {
putchar( ch);
}
break;
case state_c99_comment:
if ((last_ch == '/') && (ch == '*')) {
/* we want to change any slash-star sequences inside */
/* what was a C99 comment to something else to avoid */
/* nested comments */
putchar( '+');
}
else if ((last_ch == '*') && (ch == '/')) {
/* similarly for star-slash sequences inside */
/* what was a C99 comment */
putchar( '|');
}
else if (ch == '\n') {
puts( "*/");
current_state = state_normal;
}
else {
putchar( ch);
}
break;
}
last_ch = ch;
}
int main(void)
{
int c;
while ((c = getchar()) != EOF) {
handle_char( c);
}
return 0;
}
一些放纵的评论:很多年前,我工作的一家商店想要强加一个编码标准,禁止 C99 风格的注释,理由是即使我们当时使用的编译器没有问题,但代码可能必须是移植到不支持它们的编译器。我(和其他人)成功地论证了这种可能性是如此遥远,以至于基本上不存在,即使它确实发生了,也可以很容易地编写一个使注释兼容的转换程序。我们被允许使用 C99/C++ 风格的注释。
我现在认为我的誓言已经兑现,可能对我施加的任何诅咒都将被解除。