我有这个脚本:
import urllib2
from BeautifulSoup import BeautifulSoup
import html5lib
import lxml
soup = BeautifulSoup(urllib2.urlopen("http://www.hitmeister.de").read())
但这给了我以下错误:
Traceback (most recent call last):
File "akaConnection.py", line 59, in <module>
soup = BeautifulSoup(urllib2.urlopen("http://www.hitmeister.de").read())
File "/usr/lib/pymodules/python2.6/BeautifulSoup.py", line 1499, in __init__
BeautifulStoneSoup.__init__(self, *args, **kwargs)
File "/usr/lib/pymodules/python2.6/BeautifulSoup.py", line 1230, in __init__
self._feed(isHTML=isHTML)
File "/usr/lib/pymodules/python2.6/BeautifulSoup.py", line 1263, in _feed
self.builder.feed(markup)
File "/usr/lib/python2.6/HTMLParser.py", line 108, in feed
self.goahead(0)
File "/usr/lib/python2.6/HTMLParser.py", line 148, in goahead
k = self.parse_starttag(i)
File "/usr/lib/python2.6/HTMLParser.py", line 226, in parse_starttag
endpos = self.check_for_whole_start_tag(i)
File "/usr/lib/python2.6/HTMLParser.py", line 301, in check_for_whole_start_tag
self.error("malformed start tag")
File "/usr/lib/python2.6/HTMLParser.py", line 115, in error
raise HTMLParseError(message, self.getpos())
HTMLParser.HTMLParseError: malformed start tag, at line 56, column 872
然后我尝试了这段代码:
soup = BeautifulSoup(urllib2.urlopen("http://www.hitmeister.de").read(),"lxml")
或者
soup = BeautifulSoup(urllib2.urlopen("http://www.hitmeister.de").read(),"html5lib")
这给了我这个错误:
Traceback (most recent call last):
File "akaConnection.py", line 59, in <module>
soup = BeautifulSoup(urllib2.urlopen("http://www.hitmeister.de").read(),"lxml")
File "/usr/lib/pymodules/python2.6/BeautifulSoup.py", line 1499, in __init__
BeautifulStoneSoup.__init__(self, *args, **kwargs)
File "/usr/lib/pymodules/python2.6/BeautifulSoup.py", line 1230, in __init__
self._feed(isHTML=isHTML)
File "/usr/lib/pymodules/python2.6/BeautifulSoup.py", line 1263, in _feed
self.builder.feed(markup)
File "/usr/lib/python2.6/HTMLParser.py", line 108, in feed
self.goahead(0)
File "/usr/lib/python2.6/HTMLParser.py", line 156, in goahead
k = self.parse_declaration(i)
File "/usr/lib/pymodules/python2.6/BeautifulSoup.py", line 1112, in parse_declaration
j = HTMLParser.parse_declaration(self, i)
File "/usr/lib/python2.6/markupbase.py", line 109, in parse_declaration
self.handle_decl(data)
File "/usr/lib/pymodules/python2.6/BeautifulSoup.py", line 1097, in handle_decl
self._toStringSubclass(data, Declaration)
File "/usr/lib/pymodules/python2.6/BeautifulSoup.py", line 1030, in _toStringSubclass
self.soup.endData(subclass)
File "/usr/lib/pymodules/python2.6/BeautifulSoup.py", line 1318, in endData
(not self.parseOnlyThese.text or \
AttributeError: 'str' object has no attribute 'text'
我正在运行 Linux Ubuntu 10.04,Python 2.6.5,BeautifulSoup 版本是:'3.1.0.1' 如何修复我的代码,或者我错过了什么?