1

我正在尝试使用 Python 2.7 和 BeautifulSoup 抓取网页,但我无法克服对我来说没有多大意义的协议错误。这只发生在我需要这样做的特定网站上:https ://edd.telstra.com/telstra

我仅用于基本测试的代码:

#! /usr/bin/python

from urllib import urlopen
from BeautifulSoup import BeautifulSoup
import re

# Copy all of the content from the provided web page
webpage = urlopen("https://edd.telstra.com/telstra/").read()

我收到以下错误(在 Ubuntu 12.10 上运行):

Traceback (most recent call last):
File "e.py", line 8, in <module>
webpage = urlopen("https://edd.telstra.com/telstra/").read()
File "/usr/lib/python2.7/urllib.py", line 86, in urlopen
return opener.open(url)
File "/usr/lib/python2.7/urllib.py", line 207, in open
return getattr(self, name)(url)
File "/usr/lib/python2.7/urllib.py", line 436, in open_https
h.endheaders(data)
File "/usr/lib/python2.7/httplib.py", line 958, in endheaders
self._send_output(message_body)
File "/usr/lib/python2.7/httplib.py", line 818, in _send_output
self.send(msg)
File "/usr/lib/python2.7/httplib.py", line 780, in send
self.connect()
File "/usr/lib/python2.7/httplib.py", line 1165, in connect
self.sock = ssl.wrap_socket(sock, self.key_file, self.cert_file)
File "/usr/lib/python2.7/ssl.py", line 381, in wrap_socket
ciphers=ciphers)
File "/usr/lib/python2.7/ssl.py", line 143, in __init__
self.do_handshake()
File "/usr/lib/python2.7/ssl.py", line 305, in do_handshake
self._sslobj.do_handshake()
IOError: [Errno socket error] [Errno 1] _ssl.c:504: error:1408F119:SSL routines:SSL3_GET_RECORD:decryption failed or bad record mac

有人可以告诉我是否需要指定一些参数才能让这个页面在 Python 中下载?似乎这只是这个网页上的问题,因为上面的代码(加上我尝试过的许多其他代码)在我尝试过的其他 HTTPS/SSL 页面上运行良好。

谢谢你的帮助!

4

1 回答 1

0

我可以推荐使用 requests lib :

def get_page(login, password):
    '''Docstring 
    '''
    url = 'https://qwe.qwe'

    payload = {
        'user': login,
        'pass': password
    }

    with requests.Session() as my_session:
        my_session.post(url, data=payload)
        data = my_session.get(url)
    return data.text

更多信息:http ://docs.python-requests.org/en/latest/user/advanced/#session-objects

于 2015-01-20T11:47:57.750 回答