c++ - 使用 boost::regex 删除 C/C++ 样式注释

Question

我正在尝试使用正则表达式从字符串中删除 C 和 C++ 样式的注释。我为 Perl 找到了一个似乎两者兼而有之的方法：

s#/\*[^*]*\*+([^/*][^*]*\*+)*/|//([^\\]|[^\n][\n]?)*?\n|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#defined $3 ? $3 : ""#gse;

但我不确定如何将它与boost::regex代码块一起使用，或者我需要做什么才能将其转换为boost::regex.

仅供参考：我在这里找到了正则表达式：perlfaq6，它似乎涵盖了我需要的任何情况。

我不希望使用boost::spirit::qi它来执行此操作，因为它会为项目的编译增加大量时间。

编辑：

std::string input = "hello /* world */ world";

boost::regex reg("(/\\*([^*]|(\\*+[^*/]))*\\*+/)|(//.*)");

input = boost::regex_replace(input, reg, "");

因此，较短的正则表达式确实有效，但较长的则无效。

score 3 · Accepted Answer

当 boost 已经有一个可用于去除注释的 C++ 预处理器库 ( Boost.Wave )时，您会为此使用正则表达式，这似乎有点奇怪。

std::string strip_comments(std::string const& input) {
    std::string output;
    typedef boost::wave::cpplexer::lex_token<> token_type;
    typedef boost::wave::cpplexer::lex_iterator<token_type> lexer_type;
    typedef token_type::position_type position_type;

    position_type pos;

    lexer_type it = lexer_type(input.begin(), input.end(), pos, 
        boost::wave::language_support(
            boost::wave::support_cpp|boost::wave::support_option_long_long));
    lexer_type end = lexer_type();

    for (;it != end; ++it) {
        if (*it != boost::wave::T_CCOMMENT
         && *it != boost::wave::T_CPPCOMMENT) {
            output += std::string(it->get_value().begin(), it->get_value().end());
        }
    }
    return output;
}

score 0 · Accepted Answer

0

如果

\*

变成

\\*

那为什么不

[^\\]

变得

[^\\\\]

于 2012-02-26T03:28:45.073 回答

c++ - 使用 boost::regex 删除 C/C++ 样式注释

2 回答 2

Related

Reference