lex - 使用 JFlex 删除注释，但保留行终止符

Question

我正在为 JFlex 编写词汇规范（它类似于 flex，但用于 Java）。我对 TraditionalComment ( /* */) 和 DocumentationComment ( /** */) 有疑问。到目前为止，我有这个，取自JFlex 用户手册：

LineTerminator = \r|\n|\r\n
InputCharacter = [^\r\n]
WhiteSpace     = {LineTerminator} | [ \t\f]

/* comments */
Comment = {TraditionalComment} | {EndOfLineComment} | {DocumentationComment}

TraditionalComment   = "/*" [^*] ~"*/" | "/*" "*"+ "/"
EndOfLineComment     = "//" {InputCharacter}* {LineTerminator}
DocumentationComment = "/**" {CommentContent} "*"+ "/"
CommentContent       = ( [^*] | \*+ [^/*] )*

{Comment}           { /* Ignore comments */ }
{LineTerminator}    { return LexerToken.PASS; }

LexerToken.PASS意味着稍后我将在输出上传递行终止符。现在，我想做的是：

忽略注释中的所有内容，除了新行终止符。

例如，考虑这样的输入：

/* Some
 * quite long comment. */

事实上它是/* Some\n * quite long comment. */\n。使用当前的词法分析器，它将被转换为单行。输出将是单个 '\n'。但我想要两行，'\n\n'。一般来说，我希望我的输出始终具有与输入相同的行数。怎么做？

score 2 · Accepted Answer

几天后，我找到了解决方案。我会把它贴在这里，也许有人会遇到同样的问题。

诀窍是，在识别出您在评论中之后 - 再次浏览其正文，如果您发现新的行终止符 - 传递它们，而不是忽略：

%{
public StringBuilder newLines;
%}

// ...

{Comment}           { 
                        char[] ch; 
                        ch = yytext().toCharArray(); 
                        newLines = new StringBuilder();
                        for (char c : ch)
                        {
                            if (c == '\n')
                            {
                                newLines.append(c);
                            }
                        } 
                        return LexerToken.NEW_LINES;
                    }

lex - 使用 JFlex 删除注释，但保留行终止符

1 回答 1

Related

Reference