python - Python findall 和正则表达式

Question

我正在解析一个 xml 文件（下面称为 xml），其中包含两种不同类型的行：

1. <line a="a1" b="b1" c="c1">
2. <line a="a2" c="c2">

我试图只从第二种类型中提取 a2 和 c2，但是这个正则表达式也捕获第一种类型：

>>> list = re.findall('<line a="(.*)" c="(.*)">', xml)
>>> print(list)
[('a1" b="b1', 'c1'), ('a2', 'c2')]

我将如何仅捕获第二种类型？

score 8 · Accepted Answer

这对于像ElementTree这样的适当的 XML 解析库来说更有意义，而不是诉诸正则表达式。例如：

>>> xmlstr = """\
... <root>
...   <line a="a1" b="b1" c="c1"></line>
...   <line a="a2" c="c2"></line>
... </root>
... """
>>> import xml.etree.ElementTree as ET
>>> root = ET.XML(xmlstr)
>>> root.findall('./line')
[<Element 'line' at 0x226db70>, <Element 'line' at 0x226de48>]
>>> filtered = [line for line in root.findall('./line') if line.get('b') is None]
>>> for line in filtered:
...     print ET.tostring(line)
...
<line a="a2" c="c2" />

>>>

score 5 · Accepted Answer

5

* 运算符默认是贪婪的。尝试 ([^"]*) 而不是 (.*)

于 2011-03-19T01:54:51.300 回答

python - Python findall 和正则表达式

2 回答 2

Related

Reference