我试图在这段代码中替换为简单地从页面中提取一些信息urllib2
。requests
我不是 100% 确定我应该如何移动图书馆。这就是我到目前为止所拥有的错误,我做错了什么?
代码:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import requests, sys
from lxml import etree
# import urllib2
# UTF8
reload(sys)
sys.setdefaultencoding("utf-8")
# url = 'http://countrycode.org/Germany'
# opener = urllib2.build_opener()
# opener.addheaders = [('User-agent', 'USERAGENT')]
r = requests.get('http://countrycode.org/Germany')
response = r.text
htmlparser = etree.HTMLParser()
tree = etree.parse(response, htmlparser)
countryCodeXpath = '//*[@id="main_table_blue_2"]/tr[3]/td[2]'
countryCode = tree.xpath(countryCodeXpath)
destCountryCode = countryCode[0].text
print destCountryCode
错误:
Traceback (most recent call last):
File "/home/ubuntu/test.py", line 16, in <module>
tree = etree.parse(response, htmlparser)
File "lxml.etree.pyx", line 3196, in lxml.etree.parse (src/lxml/lxml.etree.c:64039)
File "parser.pxi", line 1549, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:91262)
File "parser.pxi", line 1578, in lxml.etree._parseDocumentFromURL (src/lxml/lxml.etree.c:91546)
File "parser.pxi", line 1478, in lxml.etree._parseDocFromFile (src/lxml/lxml.etree.c:90613)
File "parser.pxi", line 1025, in lxml.etree._BaseParser._parseDocFromFile (src/lxml/lxml.etree.c:87527)
File "parser.pxi", line 565, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:83101)
File "parser.pxi", line 656, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:84083)
File "parser.pxi", line 594, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:83379)
IOError: Error reading file '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<SNIP>