我有像 a(b,c(d,e(f,g),h(i))) 这样的子句和一个字符串,它有许多用逗号分隔的子句,例如 a(b,c(d,e (f,g),h(i))),a(b,c(d,e(f,g),h(i)))
有没有办法按层次顺序提取变量和函数名称?假设我想按如下方式打印它们,
a
b
c
d
e
f
g
h
i
如何轻松地使用 Python 的解析器来做到这一点?我应该使用什么正则表达式?
正则表达式不适用于嵌套结构。但字符串操作不一定是大问题:
s = "a(b,c(d,e(f,g),h(i)))"
import re
level = 0
for tok in re.finditer(r"\w+|[()]", s):
tok = tok.group()
if tok == "(":
level += 1
elif tok == ")":
level -= 1
else:
print "%s%s" % (" "*level, tok)
印刷:
a
b
c
d
e
f
g
h
i
>>> s = "a(b,c(d,e(f,g),h(i))),a(b,c(d,e(f,g),h(i)))"
>>> from pyparsing import nestedExpr,Word,alphas,Literal
>>> result = nestedExpr(content=Word(alphas)).ignore(Literal(',')).parseString('('+s+')')
>>> print(results.asList())
[['a', ['b', 'c', ['d', 'e', ['f', 'g'], 'h', ['i']]], 'a', ['b', 'c', ['d', 'e', ['f', 'g'], 'h', ['i']]]]]
>>> def dump(lst,indent=''):
... for i in lst:
... if isinstance(i,list):
... dump(i,indent+' ')
... else:
... print (indent,i)
...
>>> dump(result.asList())
a
b
c
d
e
f
g
h
i
a
b
c
d
e
f
g
h
i
将问题分解为 2 个步骤: 1. 解析数据 2. 打印数据
The best way to parse your data is to find a parser that already exists. If you have a say in the format, pick one that has already been devised: don't make your own. If you don't have a say in the format and are forced to write your own parser, heed Ned's advise and don't use regex. It will only end in tears.
Once you have parsed the data, print it out with the pprint module. It excels at printing things for human consumption!