在我之前的一篇文章中,我在阅读和编写与英语不同的语言的字符串时遇到了问题。问题出在我系统的编码中。ton1c提到在 txt 中写入字符串很好,确实如此!现在我试图在 gml 文件中传递这些字符串,但我又遇到了编码问题。这是代码和结果。
import urllib2
import BeautifulSoup
import networkx as nx
url = 'http://www.bbc.co.uk/zhongwen/simp/'
page = urllib2.urlopen(url).read().decode("utf-8")
dom = BeautifulSoup.BeautifulSoup(page)
data = dom.findAll('meta', {'name' : 'keywords'})
data = data.encode("utf-8")
datalist = data.split(',')
G = nx.Graph()
G.add_node( "name", Strings = datalist );
它返回
File "C:\...\name.py", line 23, in <module> nx.write_gml(G, 'Gname')
File "<string>", line 2, in write_gml
File "C:\Python27\lib\site-packages\networkx\utils\decorators.py", line 263, in _open_file
result = func(*new_args, **kwargs)
File "C:\Python27\lib\site-packages\networkx\readwrite\gml.py", line 392, in write_gml
path.write(line.encode('latin-1'))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 13: ordinal not in range(128)
有什么建议么?我还想提一下,在 networkx 的站点中,它提到GML 规范表明该文件只能使用 7 位 ASCII 文本 encoding.iso8859-1 (latin-1)。(http://networkx.lanl.gov/reference/generated/networkx.readwrite.gml.write_gml.html)
PS:请在 Python 2.7 兼容性方面提出任何建议。