python - 使用 Python 从 URL 获取 XML 中的字段

Question

我正在尝试从 URL 的 XML 文件中的特定字段获取信息。在我开始尝试之前，我就遇到了这些奇怪的错误。这是我的代码：

url1 = 'http://www.dac.unicamp.br/sistemas/horarios/grad/G5A0/indiceP.htm'
data1 = urllib.urlopen(url1)
xml1 = minidom.parse(data1)

我收到此错误：

File "C:\Users\Administrator\Desktop\teste.py", line 15, in <module>
    xml1 = minidom.parse(data1)
  File "C:\Python27\lib\xml\dom\minidom.py", line 1920, in parse
    return expatbuilder.parse(file)
  File "C:\Python27\lib\xml\dom\expatbuilder.py", line 928, in parse
    result = builder.parseFile(file)
  File "C:\Python27\lib\xml\dom\expatbuilder.py", line 207, in parseFile
    parser.Parse(buffer, 0)
ExpatError: not well-formed (invalid token): line 4, column 22

我做错什么了吗？我从教程中复制了这些功能，它似乎应该可以工作..

score 1 · Accepted Answer

使用lxml.html，它可以更好地处理无效的 xhtml。

import lxml.html as lh
In [24]: xml1=lh.parse('http://www.dac.unicamp.br/sistemas/horarios/grad/G5A0/indiceP.htm')

python - 使用 Python 从 URL 获取 XML 中的字段

1 回答 1

Related

Reference