1

这是一个非常奇怪的错误,我似乎无法弄清楚。

import urllib2
from bs4 import BeautifulSoup

url = 'http://www.crummy.com/software/BeautifulSoup/bs4/doc/'
soup = BeautifulSoup(urllib2.urlopen(url))

print soup.title

这返回

<title>Beautiful Soup Documentation — Beautiful Soup 4.0.0 documentation</title>

正如预期的那样,但是如果我将其更改为“打印soup.title.string”(应该返回上面的所有内容减去html标签)我得到

Traceback (most recent call last):
  File "C:\Users\MyName\Desktop\MyProgram\Python\test.py", line 7, in <module>
    print soup.title.string
  File "C:\Python27\lib\idlelib\rpc.py", line 595, in __call__
    value = self.sockio.remotecall(self.oid, self.name, args, kwargs)
  File "C:\Python27\lib\idlelib\rpc.py", line 210, in remotecall
    seq = self.asynccall(oid, methodname, args, kwargs)
  File "C:\Python27\lib\idlelib\rpc.py", line 225, in asynccall
    self.putmessage((seq, request))
  File "C:\Python27\lib\idlelib\rpc.py", line 324, in putmessage
    s = pickle.dumps(message)
  File "C:\Python27\lib\copy_reg.py", line 74, in _reduce_ex
    getstate = self.__getstate__
RuntimeError: maximum recursion depth exceeded

我环顾四周,找不到其他人遇到此错误。有什么建议吗?

编辑:所以我在其他一些页面上尝试了相同的代码,效果更好。例如,google.com 就可以工作。这意味着它与页面的构建有关。

4

1 回答 1

0

也许问题是因为它包含 non_ASCII 字符。将您的打印语句修改为此

print soup.title.string.encode('ascii','ignore')
于 2013-07-31T07:27:16.330 回答