0

鉴于我如何提取以 <> 为界并以逗号分隔的引号中的元素列表中的解决方案 - python、正则表达式?,我能够捕获由尖括号内的 a 和值表示的所需模式的theprefix和 thevaluesCAPITALIZED.PREFIX< "value1" , "value2", ... >

"""calf_n1 := n_-_c_le & n_-_pn_le &\n [ ORTH.FOO < "cali.ber,kl", 'calf' , "done" >,\nLKEYS.KEYREL.PRED "_calf_n_1_rel",\n ORHT2BAR <"what so ever >", "this that mess < up"> ,\n LKEYS.KEYREL.CARG "<20>",\nLOOSE.SCREW ">20 but <30"\n JOKE <'whatthe ', "什么" >,\n THIS + ]."""

但是我遇到了问题,因为我有上面的字符串。所需的输出将是:

('ORTH.FOO', ['cali.ber,kl','calf','done'])
('ORHT2BAR', ['what so ever >', 'this that mess < up'])
('JOKE', ['whathe ', 'what'])

我尝试了以下方法,但它只给了我第一个元组,我如何获得所需输出中的所有可能的元组?

import re
intext = """calf_n1 := n_-_c_le & n_-_pn_le &\n [ ORTH.FOO < "cali.ber,kl", 'calf' , "done" >,\nLKEYS.KEYREL.PRED "_calf_n_1_rel",\n ORHT2BAR <"what so ever >", "this that mess < up">\n LKEYS.KEYREL.CARG "<20>",\nLOOSE.SCREW ">20 but <30" ]."""
pattern = re.compile(r'.*?([A-Z0-9\.]*) < ([^>]*) >.*', flags=re.DOTALL)
f, v = pattern.match(intext).groups()
names = re.findall('[\'"](.*?)["\']', v)
print f, names
4

2 回答 2

1

呵呵傻了我。不知何故,我没有在我的机器上测试整个字符串^^;

无论如何,我使用了这个正则表达式并且它有效,您只需在列表中获得您正在寻找的结果,我想这没问题。我在python中不太好,不知道如何将这个列表转换为数组或元组:

>>> import re
>>> intext = """calf_n1 := n_-_c_le & n_-_pn_le &\n [ ORTH.FOO < "cali.ber,kl", 'calf' , "done" >,\nLKEYS.KEYREL.PRED "_calf_n_1_rel",\n ORHT2BAR <"what so ever >", "this that mess < up"> ,\n LKEYS.KEYREL.CARG "<20>",\nLOOSE.SCREW ">20 but <30"\n JOKE <'whatthe ', "what" >,\n THIS + ]."""
>>> results = re.findall('\\n .*?([A-Z0-9\.]*) < *((?:[^>\n]|>")*) *>.*?(?:\\n|$)', intext)
>>> print results
[('ORTH.FOO', '"cali.ber,kl", \'calf\', "done"'), ('ORHT2BAR', '"what so ever>", "this that mess < up"'), ('JOKE', '\'whatthe \', "what" ')]

括号表示第一级元素,单引号表示第二级元素。

于 2013-08-12T11:57:01.420 回答
1

正则表达式不支持“递归”解析。用正则表达式捕获后处理<和字符之间的组。>

shlex模块可以很好地解析您引用的字符串:

import shlex
import re

intext = """calf_n1 := n_-_c_le & n_-_pn_le &\n [ ORTH.FOO < "cali.ber,kl", 'calf' , "done" >,\nLKEYS.KEYREL.PRED "_calf_n_1_rel",\n ORHT2BAR <"what so ever >", "this that mess < up">\n LKEYS.KEYREL.CARG "<20>",\nLOOSE.SCREW ">20 but <30" ]."""
pattern = re.compile(r'.*?([A-Z0-9\.]*) < ([^>]*) >.*', flags=re.DOTALL)
f, v = pattern.match(intext).groups()

parser = shlex.shlex(v, posix=True)
parser.whitespace += ','
names = list(parser)

print f, names

输出:

ORTH.FOO ['cali.ber,kl', 'calf', 'done']
于 2013-08-12T09:28:46.693 回答