pyparsing - 导致多个序列结果的嵌套括号解析案例

Question

我想用这些条件解析带有嵌套括号的字符串：

元素由逗号,或 bar分隔|。
嵌套括号元素可能是单个字母数字或另一个嵌套括号。
由条形|文字连接的每个嵌套括号元素都会导致创建一个新序列，该序列将先前的序列元素和由,嵌套括号外的逗号连接的前向元素组合在一起。

为了澄清，让我举一些输入字符串的例子以及它们应该返回的结果：

(a, b, c)应该返回：a, b, c

(a, (b | c))应该返回：a, b和a, c

(a, b, (c | (d, e)), f)应该返回：a, b, c, f和a, b, d, e, f

(a, b, (c | (d, e) | f), g)应该返回：a, b, c, g并且a, b, d, e, g和a, b, f, g

(a, b, c, ((d, (e | f)) | (g, h)), i)应该返回：a, b, c, d, e, i并且a, b, c, d, f, i和a, b, c, g, h, i

((a | b), c)应该返回：a, c和b, c

score 3 · Accepted Answer

（来自 pyparsing wiki）您可以使用infixNotation（以前称为operatorPrecedence）解析字符串。假设 ',' 优先于 '|'，这看起来像：

variable = oneOf(list(alphas.lower()))
expr = infixNotation(variable, 
            [
            (',', 2, opAssoc.LEFT),
            ('|', 2, opAssoc.LEFT),
            ])

将您的测试用例转换为一个小测试框架，我们至少可以测试解析部分：

tests = [
    ("(a, b, c)", ["abc"]),
    ("(a, b | c)", ["ab", "c"]),
    ("((a, b) | c)", ["ab", "c"]),
    ("(a, (b | c))", ["ab", "ac"]),
    ("(a, b, (c | (d, e)), f)", ["abcf","abdef"]),
    ("(a, b, (c | (d, e) | f), g)", ["abcg", "abdeg", "abfg"]),
    ("(a, b, c, ((d, (e | f)) | (g, h)), i)",
      ["abcdei", "abcdfi", "abcghi"]),
    ("((a | b), c)", ["ac", "bc"]),
    ]

for test,expected in tests:
    # if your expected values *must* be lists and not strings, then
    # add this line
    # expected = [list(ex) for ex in expected]
    result = expr.parseString(test)
    print result[0].asList()

这会给你这样的东西：

['a', ',', 'b', ',', 'c']
[['a', ',', 'b'], '|', 'c']
[['a', ',', 'b'], '|', 'c']
['a', ',', ['b', '|', 'c']]
['a', ',', 'b', ',', ['c', '|', ['d', ',', 'e']], ',', 'f']
['a', ',', 'b', ',', ['c', '|', ['d', ',', 'e'], '|', 'f'], ',', 'g']
['a', ',', 'b', ',', 'c', ',', [['d', ',', ['e', '|', 'f']], '|', ['g', ',', 'h']], ',', 'i']
[['a', '|', 'b'], ',', 'c']

这是最简单的部分，解析字符串并在结果结构中反映运算符优先级。现在，如果您遵循正则表达式逆变器中的示例，您将需要将对象附加到每个解析的位，如下所示：

class ParsedItem(object):
    def __init__(self, tokens):
        self.tokens = tokens[0]
class Var(ParsedItem): 
    """ TBD """
class BinaryOpn(ParsedItem):
    def __init__(self, tokens):
        self.tokens = tokens[0][::2]
class Sequence(BinaryOpn):
    """ TBD """
class Alternation(BinaryOpn):
    """ TBD """

variable = oneOf(list(alphas.lower())).setParseAction(Var)
expr = infixNotation(variable, 
            [
            (',', 2, opAssoc.LEFT, Sequence),
            ('|', 2, opAssoc.LEFT, Alternation),
            ])

现在您必须实现Var、Sequence和的主体Alternation。您不会直接从 pyparsing 中获得值列表，而是会获得其中一种对象类型。然后，不是asList()像我在上面的示例中那样调用，而是调用类似generate或makeGenerator从该对象获取生成器的东西。然后，您将调用该生成器，让对象为您生成所有不同的结果。

我把剩下的留给你练习。

——保罗

pyparsing - 导致多个序列结果的嵌套括号解析案例

1 回答 1

Related

Reference