4

我第一次尝试使用pyparsing。我的解析器没有做我希望它做的事情,有人可以检查一下,看看有什么问题。我正在尝试将 OneOrMore 嵌入 OneOrMore 中,我认为它应该可以正常工作,但事实并非如此。

下面是整个代码:

import pyparsing

status = """
    sale number       : 11/7 
    NAME               ID    PAWN    PRICE    TIME         %C     STATE     START/STOP
    cross-cu-1       1055       1    106284K  07:49:36.19  25.05%   run          1d01h
    cross-cu-2        918       1    104708K  07:38:19.08  24.02%   run          1d01h
    sale number       : 11/8 
    NAME               ID    PAWN    PRICE    TIME         %C     STATE     START/STOP
    cross-cu-3       1055       1    106284K  07:49:36.19  25.05%   run          1d01h
    cross-cu-4        918       1    104708K  07:38:19.08  24.02%   run          1d01h
    """

integer = pyparsing.Word(pyparsing.nums).setParseAction(lambda toks: int(toks[0]))
decimal = pyparsing.Word(pyparsing.nums + ".").setParseAction(lambda toks: float(toks[0]))
wordSuppress = pyparsing.Suppress(pyparsing.Word(pyparsing.alphas))
endOfLine = pyparsing.LineEnd().suppress()
colon = pyparsing.Suppress(":")

saleNumber = pyparsing.Regex("\d{2}\/\d{1}").setResultsName("saleNumber")
lineSuppress = pyparsing.Regex("NAME.*STOP") + endOfLine
saleRow = wordSuppress + wordSuppress + colon + saleNumber + endOfLine

name = pyparsing.Regex("cross-cu-\d").setResultsName("name")
id = integer.setResultsName("id")
pawn = integer.setResultsName("pawn")
price = integer.setResultsName("price") + "K"
time = pyparsing.Regex("\d{2}:\d{2}:\d{2}.\d{2}").setResultsName("time")
c = decimal.setResultsName("c") + "%"
state = pyparsing.Word(pyparsing.alphas).setResultsName("state")
startStop = pyparsing.Word(pyparsing.alphanums).setResultsName("startStop")
row = name + id + pawn + price + time + c + state + startStop + endOfLine

table = pyparsing.OneOrMore(pyparsing.Group(saleRow + lineSuppress.suppress() + (pyparsing.OneOrMore(pyparsing.Group(row) | pyparsing.SkipTo(row).suppress())) ) | pyparsing.SkipTo(saleRow).suppress())

resultDic = [x.asDict() for x in table.parseString(status)]
print resultDic

它只返回 [{'saleNumber': '11/7'}] 我希望得到这样的 dic 列表:

[{ {'saleNumber': '11/7'},{ elements in cross-cu-1 line, elements in cross-cu-2 line } },
 { {'saleNumber': '11/8'},{ elements in cross-cu-3 line, elements in cross-cu-4 line } }]

任何帮助表示赞赏! 请不要建议其他实现此输出的方法!我也在努力学习 pyparsing!

4

2 回答 2

1

在这种情况下,pyparsing 可能是矫枉过正。为什么不简单地逐行读取文件然后解析结果呢?

代码如下所示:

编辑:我已经更新了代码以更紧密地遵循您的示例。

从集合导入 defaultdict

status = """
sale number       : 11/7
NAME               ID    PAWN    PRICE    TIME         %C     STATE     START/STOP
cross-cu-1       1055       1    106284K  07:49:36.19  25.05%   run          1d01h
cross-cu-2        918       1    104708K  07:38:19.08  24.02%   run          1d01h
sale number       : 11/8
NAME               ID    PAWN    PRICE    TIME         %C     STATE     START/STOP
cross-cu-3       1055       1    106284K  07:49:36.19  25.05%   run          1d01h
cross-cu-4        918       1    104708K  07:38:19.08  24.02%   run          1d01h
"""

sale_number = ''

sales = defaultdict(list)

for line in status.split('\n'):
    line = line.strip()
    if line.startswith("NAME"):
         continue
    elif line.startswith("sale number"):
         sale_number = line.split(':')[1].strip()
    elif not line or line.isspace() :
         continue
    else:
         # you can also use a regular expression here
         sales[sale_number].append(line.split())

for sale in sales:
    print sale, sales[sale]
于 2012-09-14T11:56:30.353 回答
0

这行得通吗?

import pyparsing

status = """
sale number       : 11/7
NAME               ID    PAWN    PRICE    TIME         %C     STATE     START/STOP
cross-cu-1       1055       1    106284K  07:49:36.19  25.05%   run          1d01h
cross-cu-2        918       1    104708K  07:38:19.08  24.02%   run          1d01h
sale number       : 11/8
NAME               ID    PAWN    PRICE    TIME         %C     STATE     START/STOP
cross-cu-3       1055       1    106284K  07:49:36.19  25.05%   run          1d01h
cross-cu-4        918       1    104708K  07:38:19.08  24.02%   run          1d01h
"""

integer = pyparsing.Word(pyparsing.nums).setParseAction(lambda toks: int(toks[0]))
decimal = pyparsing.Word(pyparsing.nums + ".").setParseAction(lambda toks:     float(toks[0]))
wordSuppress = pyparsing.Suppress(pyparsing.Word(pyparsing.alphas))
endOfLine = pyparsing.LineEnd().suppress()
colon = pyparsing.Suppress(":")

saleNumber = pyparsing.Regex("\d{2}\/\d{1}").setResultsName("saleNumber")
lineSuppress = pyparsing.Regex("NAME.*STOP") + endOfLine
saleRow = wordSuppress + wordSuppress + colon + saleNumber + endOfLine

name = pyparsing.Regex("cross-cu-\d").setResultsName("name")
id = integer.setResultsName("id")
pawn = integer.setResultsName("pawn")
price = integer.setResultsName("price") + "K"
time = pyparsing.Regex("\d{2}:\d{2}:\d{2}.\d{2}").setResultsName("time")
c = decimal.setResultsName("c") + "%"
state = pyparsing.Word(pyparsing.alphas).setResultsName("state")
startStop = pyparsing.Word(pyparsing.alphanums).setResultsName("startStop")
row = pyparsing.Group(name + id + pawn + price + time + c + state + startStop +    endOfLine)
row.setResultsName("row")
rows = pyparsing.OneOrMore(row).setResultsName("rows")

table = pyparsing.OneOrMore(pyparsing.Group(saleRow + lineSuppress + rows))

resultDic = [x.asDict() for x in table.parseString(status)]
print resultDic
于 2012-09-14T12:26:18.207 回答