1

我编写了这个脚本来下载我的歌曲的歌词并将它们存储在一个文本文件中:

>>> lis = os.listdir('D:\Phone\Sounds')
>>> for i in lis:
    print i

    br.open('http://www.azlyrics.com/') # THE PROBLEM

    br.select_form(nr=0)
    track = eyed3.load(i).tag
    if(track.artist != None):
        ft = track.artist.find('ft.')
        if(ft != -1):
            br['q'] = track.title + ' ' + track.artist[:ft]
        else:
            br['q'] = track.title + ' ' + track.artist
    else:
        br['q'] = track.title
    br.submit()
    s = BeautifulSoup(br.response().read())
    a = s.find('div',{'class':'sen'})
    if(a != None):
        s = BeautifulSoup(urllib.urlopen(a.find('a')['href']))
        file = open(i.replace('.mp3','.txt'),'w')
        file.write(str(s.find('div',{'style':'margin-left:10px;margin-right:10px;'})).replace('<br />','\n'))
        file.close()
    else:
        print 'Lyrics not found'

这似乎工作了一段时间,我下载了一些歌曲的歌词,突然它引发了 BadStatusLine 错误

Heartbreaker.mp3
<response_seek_wrapper at 0x4af6f08L whose wrapped object = <closeable_response at 0x4cb9288L whose fp = <socket._fileobject object at 0x00000000047A2480>>>
<response_seek_wrapper at 0x4b1b888L whose wrapped object = <closeable_response at 0x4cc0048L whose fp = <socket._fileobject object at 0x00000000047A2570>>>
Heartless (The Fray Cover).mp3
<response_seek_wrapper at 0x4b22d08L whose wrapped object = <closeable_response at 0x4b15988L whose fp = <socket._fileobject object at 0x00000000047B2750>>>
<response_seek_wrapper at 0x4cb9388L whose wrapped object = <closeable_response at 0x4b1b448L whose fp = <socket._fileobject object at 0x000000000362AED0>>>
Lyrics not found
Heartless.mp3
<response_seek_wrapper at 0x4cc0288L whose wrapped object = <closeable_response at 0x4b01108L whose fp = <socket._fileobject object at 0x000000000362AE58>>>
<response_seek_wrapper at 0x4b15808L whose wrapped object = <closeable_response at 0x47a4508L whose fp = <socket._fileobject object at 0x000000000362A6D8>>>
Here Without You.mp3
<response_seek_wrapper at 0x4b1b3c8L whose wrapped object = <closeable_response at 0x4916508L whose fp = <socket._fileobject object at 0x000000000362A480>>>
<response_seek_wrapper at 0x47a4fc8L whose wrapped object = <closeable_response at 0x37830c8L whose fp = <socket._fileobject object at 0x000000000362A0C0>>>
Hero.mp3
<response_seek_wrapper at 0x4930408L whose wrapped object = <closeable_response at 0x4cced48L whose fp = <socket._fileobject object at 0x00000000047A2228>>>
<response_seek_wrapper at 0x453ca48L whose wrapped object = <closeable_response at 0x4b23f88L whose fp = <socket._fileobject object at 0x00000000047A2048>>>
Hey Jude.mp3
<response_seek_wrapper at 0x3783808L whose wrapped object = <closeable_response at 0x4cd71c8L whose fp = <socket._fileobject object at 0x00000000047A2A20>>>
<response_seek_wrapper at 0x4ccee48L whose wrapped object = <closeable_response at 0x4cd7c08L whose fp = <socket._fileobject object at 0x00000000047A2B10>>>
Hey, Soul Sister.mp3

Traceback (most recent call last):
  File "<pyshell#23>", line 3, in <module>
    br.open('http://www.azlyrics.com/')
  File "build\bdist.win-amd64\egg\mechanize\_mechanize.py", line 203, in open
    return self._mech_open(url, data, timeout=timeout)
  File "build\bdist.win-amd64\egg\mechanize\_mechanize.py", line 230, in _mech_open
    response = UserAgentBase.open(self, request, data)
  File "build\bdist.win-amd64\egg\mechanize\_opener.py", line 193, in open
    response = urlopen(self, req, data)
  File "build\bdist.win-amd64\egg\mechanize\_urllib2_fork.py", line 344, in _open
    '_open', req)
  File "build\bdist.win-amd64\egg\mechanize\_urllib2_fork.py", line 332, in _call_chain
    result = func(*args)
  File "build\bdist.win-amd64\egg\mechanize\_urllib2_fork.py", line 1142, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "build\bdist.win-amd64\egg\mechanize\_urllib2_fork.py", line 1116, in do_open
    r = h.getresponse()
  File "D:\Programming\Python\lib\httplib.py", line 1027, in getresponse
    response.begin()
  File "D:\Programming\Python\lib\httplib.py", line 407, in begin
    version, status, reason = self._read_status()
  File "D:\Programming\Python\lib\httplib.py", line 371, in _read_status
    raise BadStatusLine(line)
BadStatusLine: ''

那么,为什么 br.open 函数突然停止工作?提前致谢 。

4

1 回答 1

0

httplib当它不理解响应状态代码时会生成错误。引用自文档

HTTPException 的子类。如果服务器以我们不理解的 HTTP 状态代码响应,则引发。

运行时我没有收到任何错误br.open('http://www.azlyrics.com/')。所以,问题就在你身边。

您很可能正在使用代理,请查看Python 的 mechanize proxy support

UPD:试试这个:

br = mechanize.Browser()
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]

br.set_debug_http(True)
br.set_debug_redirects(True)
br.set_debug_responses(True)

br.open('http://www.azlyrics.com')

print br.response().read()

希望有帮助。

于 2013-08-18T12:06:57.680 回答