我有以下 XML 文档
<data>
<point address="com.example.www" time="Jul 30, 2013 10:02:56 PM" protocol="http" type="2" body="404 Not Found" name="Example Site" />
<point address="com.example.test" time="Jul 29, 2013 07:45:03 AM" protocol="https" type="2" body="This is a test" name="Test.example" />
.......
</data>
我使用了以下 Python 代码:
import libxml2
def ReadValue(pn, dt):
return [attr.content for attr in input_file.xpathEval("/data/point[@protocol='%s']/@%s" % (pn, dt))]
protocol = ["http", "https"]
data_type = ["body", "type", "time", "name"]
for i in protocol:
for j in data_type:
print ReadValue(i, j)
exit()
当我解析超过 200k 个标签时,我怀疑这ReadValue
是瓶颈。它运行得太慢了,即使我在运行时也无法 Ctrl-C 脚本。有没有比使用上面提到的代码更好的实现?
谢谢