python - python：下载和缓存 XML 文件 - 如何处理编码声明？

Question

from urllib.request import urlopen
from lxml import objectify

我正在尝试编写一个程序，将 XML 文件下载到缓存中，然后使用objectify. 如果我使用下载文件，那么我可以很好urlopen()地阅读它们：objectify.fromstring()

r = urlopen(my_url)
o = objectify.fromstring(r.read())

但是，如果我下载它们并将它们写入文件，我最终会在文件顶部得到一个objectify不喜欢的编码声明。以机智：

# download the file
my_file = 'foo.xml'
r = urlopen(my_url)

# save locally
with open(my_file, 'wb') as fp:
    fp.write(r.read())

# open saved copy
with open(my_file, 'r') as fp:
    o1 = objectify.fromstring(fp.read())

结果是ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.

如果我使用objectify.parse(fp)，那么它工作得很好——我可以通过并更改所有客户端代码来parse()代替使用，但我觉得这不是正确的方法。我在本地存储了其他 XML 文件，它们.fromstring()工作得很好——根据粗略的审查，它们似乎有utf-8编码。

我只是不知道这里的正确分辨率是什么——我应该在保存文件时更改编码吗？我应该剥离编码声明吗？try.. except ValueError我应该用子句填充我的代码吗？请指教。

score 2 · Accepted Answer

该文件需要以二进制模式而不是文本模式打开。

open(my_file, 'rb') # b stands for binary

正如异常所建议的：... Please use bytes input ...

python - python：下载和缓存 XML 文件 - 如何处理编码声明？

1 回答 1

Related

Reference