python - PyParsing：并非所有令牌都传递给 setParseAction()

Question

我正在解析诸如“CS 2110 或 INFO 3300”之类的句子。我想输出如下格式：

[[("CS" 2110)], [("INFO", 3300)]]

为此，我认为我可以使用setParseAction(). 但是，中的print语句statementParse()表明实际上仅传递了最后一个令牌：

>>> statement.parseString("CS 2110 or INFO 3300")
Match [{Suppress:("or") Re:('[A-Z]{2,}') Re:('[0-9]{4}')}] at loc 7(1,8)
string CS 2110 or INFO 3300
loc: 7 
tokens: ['INFO', 3300]
Matched [{Suppress:("or") Re:('[A-Z]{2,}') Re:('[0-9]{4}')}] -> ['INFO', 3300]
(['CS', 2110, 'INFO', 3300], {'Course': [(2110, 1), (3300, 3)], 'DeptCode': [('CS', 0), ('INFO', 2)]})

我希望所有令牌都能通过，但它只是['INFO', 3300]. 难道我做错了什么？还是有另一种方法可以产生所需的输出？

这是pyparsing代码：

from pyparsing import *

def statementParse(str, location, tokens):
    print "string %s" % str
    print "loc: %s " % location
    print "tokens: %s" % tokens

DEPT_CODE = Regex(r'[A-Z]{2,}').setResultsName("DeptCode")
COURSE_NUMBER = Regex(r'[0-9]{4}').setResultsName("CourseNumber")

OR_CONJ = Suppress("or")

COURSE_NUMBER.setParseAction(lambda s, l, toks : int(toks[0]))

course = DEPT_CODE + COURSE_NUMBER.setResultsName("Course")

statement = course + Optional(OR_CONJ + course).setParseAction(statementParse).setDebug()

score 2 · Accepted Answer

如果您在两者 course上都设置解析操作，效果会更好Optional（您只在Optional！）：

>>> statement = (course + Optional(OR_CONJ + course)).setParseAction(statementParse).setDebug()
>>> statement.parseString("CS 2110 or INFO 3300")

给

Match {Re:('[A-Z]{2,}') Re:('[0-9]{4}') [{Suppress:("or") Re:('[A-Z]{2,}') Re:('[0-9]{4}')}]} at loc 0(1,1)
string CS 2110 or INFO 3300
loc: 0 
tokens: ['CS', 2110, 'INFO', 3300]
Matched {Re:('[A-Z]{2,}') Re:('[0-9]{4}') [{Suppress:("or") Re:('[A-Z]{2,}') Re:('[0-9]{4}')}]} -> ['CS', 2110, 'INFO', 3300]
(['CS', 2110, 'INFO', 3300], {'Course': [(2110, 1), (3300, 3)], 'DeptCode': [('CS', 0), ('INFO', 2)]})

虽然我怀疑你真正想要的是在每门课程上设置解析操作，而不是在声明上：

>>> statement = course + Optional(OR_CONJ + course)
>>> statement.parseString("CS 2110 or INFO 3300")                               Match {Re:('[A-Z]{2,}') Re:('[0-9]{4}')} at loc 0(1,1)
string CS 2110 or INFO 3300
loc: 0 
tokens: ['CS', 2110]
Matched {Re:('[A-Z]{2,}') Re:('[0-9]{4}')} -> ['CS', 2110]
Match {Re:('[A-Z]{2,}') Re:('[0-9]{4}')} at loc 10(1,11)
string CS 2110 or INFO 3300
loc: 10 
tokens: ['INFO', 3300]
Matched {Re:('[A-Z]{2,}') Re:('[0-9]{4}')} -> ['INFO', 3300]
(['CS', 2110, 'INFO', 3300], {'Course': [(2110, 1), (3300, 3)], 'DeptCode': [('CS', 0), ('INFO', 2)]})

score 2 · Accepted Answer

为了保留“CS 2110”和“INFO 3300”中的令牌位，我建议您将您的定义当然包含在一个组中：

course = Group(DEPT_CODE + COURSE_NUMBER).setResultsName("Course")

看起来您在解析某种搜索表达式时正在迎头赶上，例如“x and y or z”。这个问题有一些微妙之处，我建议您查看 pyparsing wiki 上的一些示例，了解如何构建这些类型的表达式。否则你最终会得到一个鸟巢Optional("or" + this)和ZeroOrMore( "and" + that)碎片。作为最后一搏，你甚至可以只使用一些东西operatorPrecedence，比如：

DEPT_CODE = Regex(r'[A-Z]{2,}').setResultsName("DeptCode")        
COURSE_NUMBER = Regex(r'[0-9]{4}').setResultsName("CourseNumber")
course = Group(DEPT_CODE + COURSE_NUMBER)

courseSearch = operatorPrecedence(course, 
    [
    ("not", 1, opAssoc.RIGHT),
    ("and", 2, opAssoc.LEFT),
    ("or", 2, opAssoc.LEFT),
    ])

（您可能需要从 SourceForge SVN 下载最新的 1.5.3 版本才能使用。）

python - PyParsing：并非所有令牌都传递给 setParseAction()

2 回答 2

Related

Reference