我pyparsing
今晚才开始使用,我已经建立了一个复杂的语法,它非常有效地描述了我正在使用的一些资源。它非常简单而且非常强大。但是,我在使用ParsedResults
. 我需要能够按照找到它们的顺序迭代嵌套的标记,我发现它有点令人沮丧。我已经将我的问题抽象为一个简单的案例:
import pyparsing as pp
word = pp.Word(pp.alphas + ',.')('word*')
direct_speech = pp.Suppress('“') + pp.Group(pp.OneOrMore(word))('direct_speech*') + pp.Suppress('”')
sentence = pp.Group(pp.OneOrMore(word | direct_speech))('sentence')
test_string = 'Lorem ipsum “dolor sit” amet, consectetur.'
r = sentence.parseString(test_string)
print r.asXML('div')
print ''
for name, item in r.sentence.items():
print name, item
print ''
for item in r.sentence:
print item.getName(), item.asList()
据我所知,这应该有效吗?这是输出:
<div>
<sentence>
<word>Lorem</word>
<word>ipsum</word>
<direct_speech>
<word>dolor</word>
<word>sit</word>
</direct_speech>
<word>amet,</word>
<word>consectetur.</word>
</sentence>
</div>
word ['Lorem', 'ipsum', 'amet,', 'consectetur.']
direct_speech [['dolor', 'sit']]
Traceback (most recent call last):
File "./test.py", line 27, in <module>
print item.getName(), item.asList()
AttributeError: 'str' object has no attribute 'getName'
XML 输出似乎表明该字符串已完全按照我的意愿进行解析,但我无法遍历该句子,例如重新构建它。
有没有办法做我需要做的事?
谢谢!
编辑:
我一直在使用这个:
for item in r.sentence:
if isinstance(item, basestring):
print item
else:
print item.getName(), item
但这对我帮助不大,因为我无法区分不同类型的字符串。这是一个稍微扩展的示例:
word = pp.Word(pp.alphas + ',.')('word*')
number = pp.Word(pp.nums + ',.')('number*')
direct_speech = pp.Suppress('“') + pp.Group(pp.OneOrMore(word | number))('direct_speech*') + pp.Suppress('”')
sentence = pp.Group(pp.OneOrMore(word | number | direct_speech))('sentence')
test_string = 'Lorem 14 ipsum “dolor 22 sit” amet, consectetur.'
r = sentence.parseString(test_string)
for i, item in enumerate(r.sentence):
if isinstance(item, basestring):
print i, item
else:
print i, item.getName(), item
输出是:
0 Lorem
1 14
2 ipsum
3 word ['dolor', '22', 'sit']
4 amet,
5 consectetur.
不太有帮助。我无法区分word
and number
,并且direct_speech
元素被标记为word
?!
我显然错过了一些东西。我想做的就是:
for item in r.sentence:
if (item is a number):
do something
elif (item is a word):
do something else
etc. ...
我应该以不同的方式处理这个问题吗?