python - 如何在 PyPEG 中处理所有可能的 C 类块注释样式

Question

在放弃简约之后，我尝试了 PyPEG。我已经取得了更大的成功，因为我已经实现了最初的目标，但似乎无法正确处理评论。

我已将问题提炼为以下代码。

您可以看到，如果块注释前面有代码（测试用例 4 和 5），则并非所有测试用例都有效，然后生成 Line 而不是 BlockComment。

有没有办法让 PyPEG 自己做到这一点，或者我需要对行进行后处理以找到存在于多行中的 BlockComments。

import pypeg2 as pp
import re
import pprint

nl = pp.RegEx(r"[\r\n]+")
symbols = "\"\-\[\]\\!#$%&'()¬*+£,./:;<=>?@^_‘{|}~"

text = re.compile(r"[\w" + symbols + "]+", re.UNICODE)


# Partial definition as we use it before it's fully defined
class Code(pp.List):
    pass


class Text(str):
    grammar = text


class Line(pp.List):
    grammar = pp.maybe_some(Text), nl


class LineComment(Line):
    grammar = re.compile(r".*?//.*"), nl


class BlockComment(pp.Literal):
    grammar = pp.comment_c, pp.maybe_some(Text)


Code.grammar = pp.maybe_some([BlockComment, LineComment, Line])


comments = """
/*
Block comment 1
*/

// Line Comment1

Test2 // EOL Comment2

/*
Block comment 2*/

/* Block
comment 3 */

Test4 start /*
Block comment 4
*/ Test4 end

Test5 start /* Block comment 5 */ Test5 end

      /* Block comment 6 */

"""

parsed = pp.parse(comments, Code, whitespace=pp.RegEx(r"[ \t]"))
pprint.pprint(list(parsed))

score 1 · Accepted Answer

你的模式text也将匹配评论；由于它是贪婪地应用的，除非它恰好位于行首，否则不可能匹配注释。因此，您需要确保在遇到注释分隔符时匹配停止。

您可以尝试以下方法：

# I removed / from the list.
symbols = "\"\-\[\]\\!#$%&'()¬*+£,.:;<=>?@^_‘{|}~"

text = re.compile(r"([\w" + symbols + "]|/(?![/*]))+", re.UNICODE)

虽然我不得不说这个列表symbols对我来说似乎有些武断。我会用

text = re.compile(r"([^/\r\n]|/(?![/*]))+", re.UNICODE)

python - 如何在 PyPEG 中处理所有可能的 C 类块注释样式

1 回答 1

Related

Reference