python - pyparsing、转发和递归

Question

我正在使用 pyparsing 解析 vcd（值更改转储）文件。本质上，我想读入文件，将其解析为内部字典，然后操作值。

在不详细介绍结构的情况下，我的问题在于识别嵌套类别。

在 vcd 文件中，您有“范围”，其中包括电线和可能一些更深（嵌套）的范围。把它们想象成关卡。

所以在我的文件中，我有：

$scope module toplevel $end
$scope module midlevel $end
$var wire a $end
$var wire b $end
$upscope $end
$var wire c $end
$var wire d $end
$var wire e $end
$scope module extralevel $end
$var wire f $end
$var wire g $end
$upscope $end
$var wire h $end
$var wire i $end
$upscope $end

所以“顶层”包含一切（a - i），“中层”有（a - b），“超层”有（f - g）等。

这是我用于解析此部分的代码（片段）：

scope_header = Group(Literal('$scope') + Word(alphas) + Word(alphas) + \
                     Literal('$end'))

wire_map = Group(Literal('$var') + Literal('wire') + Word(alphas) + \
                 Literal('$end'))

scope_footer = Group(Literal('$upscope') + Literal('$end'))

scope = Forward()
scope << (scope_header + ZeroOrMore(wire_map) + ZeroOrMore(scope) + \
          ZeroOrMore(wire_map) + scope_footer)

现在，我想发生的是，当它到达每个范围时，它会跟踪每个“级别”，我最终会得到一个包含嵌套范围的结构。但是，它会出错

$scope module extralevel $end

说它需要'$ upscope'。

所以我知道我没有正确使用递归。有人可以帮我吗？如果我需要提供更多信息，请告诉我。

谢谢！！！！

score 9 · Accepted Answer

根据您的定义，一个范围不能包含另一个范围，然后是一些映射，然后是另一个范围。

如果解析器具有打印解析树的调试模式，您将能够立即看到这一点。但简而言之，你说的是有零个或多个映射，然后是零个或多个范围，然后是零个或多个映射，所以如果有一个范围，然后是一个映射，你已经通过了范围字段，所以任何更多的范围都是无效的。如果 pyparsing 使用的语言支持“或”，您可以使用：

scope << (scope_header + ZeroOrMore((wire_map | scope)) + scope_footer)

score 6 · Accepted Answer

请选择@ZackBloom 的答案作为正确答案，他马上直觉，甚至不知道 pyparsing 的语法。

只是对您的语法的一些评论/建议：

通过上面发布的答案，您可以在 ParseResultsasList()上使用 pprint 和 pyparsing 的方法来可视化嵌套：

res = scope.parseString(vcd)

from pprint import pprint
pprint(res.asList())

给予：

[[['$scope', 'module', 'toplevel', '$end'],
  [['$scope', 'module', 'midlevel', '$end'],
   ['$var', 'wire', 'a', '$end'],
   ['$var', 'wire', 'b', '$end'],
   ['$upscope', '$end']],
  ['$var', 'wire', 'c', '$end'],
  ['$var', 'wire', 'd', '$end'],
  ['$var', 'wire', 'e', '$end'],
  [['$scope', 'module', 'extralevel', '$end'],
   ['$var', 'wire', 'f', '$end'],
   ['$var', 'wire', 'g', '$end'],
   ['$upscope', '$end']],
  ['$var', 'wire', 'h', '$end'],
  ['$var', 'wire', 'i', '$end'],
  ['$upscope', '$end']]]

所以现在你得到了结构良好的结果。但是你可以稍微清理一下。一方面，既然你有了结构，你就不需要所有那些$scope,$end等标记。您当然可以在浏览解析结果时跳过它们，但您也可以让 pyparsing 将它们从解析的输出中删除（因为结果现在是结构化的，所以您并没有真正失去任何东西）。将解析器定义更改为：

SCOPE, VAR, UPSCOPE, END = map(Suppress, 
                                 "$scope $var $upscope $end".split())
MODULE, WIRE = map(Literal, "module wire".split())

scope_header = Group(SCOPE + MODULE + Word(alphas) + END)
wire_map = Group(VAR + WIRE + Word(alphas) + END) 
scope_footer = (UPSCOPE + END)

（无需分组scope_footer- 该表达式中的所有内容都被抑制，因此Group只会给您一个空列表。）

现在您可以更清楚地看到真正重要的部分：

[[['module', 'toplevel'],
  [['module', 'midlevel'], ['wire', 'a'], ['wire', 'b']],
  ['wire', 'c'],
  ['wire', 'd'],
  ['wire', 'e'],
  [['module', 'extralevel'], ['wire', 'f'], ['wire', 'g']],
  ['wire', 'h'],
  ['wire', 'i']]]

冒着太多分组的风险，我建议你也Group把你的scope表达内容，像这样：

scope << Group(scope_header + 
               Group(ZeroOrMore((wire_map | scope))) + 
               scope_footer)

这给出了这些结果：

[[['module', 'toplevel'],
  [[['module', 'midlevel'], [['wire', 'a'], ['wire', 'b']]],
   ['wire', 'c'],
   ['wire', 'd'],
   ['wire', 'e'],
   [['module', 'extralevel'], [['wire', 'f'], ['wire', 'g']]],
   ['wire', 'h'],
   ['wire', 'i']]]]

现在每个作用域结果都有 2 个可预测的元素：模块头，以及连线或子作用域的列表。这种可预测性将使编写用于导航结果的递归代码变得更加容易：

res = scope.parseString(vcd)
def dumpScope(parsedTokens, indent=''):
    module,contents = parsedTokens
    print indent + '- ' + module[1]
    for item in contents:
        if item[0]=='wire':
            print indent + '  wire: ' + item[1]
        else:
            dumpScope(item, indent+'  ')
dumpScope(res[0])

结果看起来像：

- toplevel
  - midlevel
    wire: a
    wire: b
  wire: c
  wire: d
  wire: e
  - extralevel
    wire: f
    wire: g
  wire: h
  wire: i

好的第一个问题，欢迎来到 SO 和 pyparsing！

python - pyparsing、转发和递归

2 回答 2

Related

Reference