我正在尝试用 BeautifulSoup 抓取一个网站。有问题的网站需要我登录。请查看我的代码。
from bs4 import BeautifulSoup as bs
import requests
import sys
user = 'user'
password = 'pass'
# Url to login page
url = 'main url'
# Starts a session
session = requests.session(config={'verbose': sys.stderr})
login_data = {
'loginuser': user,
'loginpswd': password,
'submit': 'login',
}
r = session.post(url, data=login_data)
# Accessing a page to scrape
r = session.get('specific url')
soup = bs(r.content)
我在这里看到了一些线程后想出了这段代码,所以我想它应该是有效的,但打印的内容仍然就像我被注销一样。
当我运行此代码时,将打印:
2013-05-10T22:49:45.882000 POST >the main url to login<
2013-05-10T22:49:46.676000 GET >error page of the main url page as if the logging in failed<
2013-05-10T22:49:46.761000 GET >the specific url<
当然,登录详细信息是正确的。需要一些帮助的家伙。
@编辑
我将如何在上面实现标题?
opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]