0

下面是在页面中登录并获取源代码的代码。

import requests
import sys
import urllib, urllib2, cookielib

USERNAME = ''
PASSWORD = ''

URL = 'http://coned.com'

def main():
    # Start a session so we can have persistant cookies
    session = requests.session()
    cj = cookielib.CookieJar()
    opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))

    # This is the form data that the page sends when logging in
    login_data = {
        'TxtUser': USERNAME,
        'TxtPwd': PASSWORD,
        'submit': 'Sign In',
    }

    # Authenticate
    r = session.post(URL, data=login_data)

    # Try accessing a page that requires you to be logged in
    r = session.get('https://apps1.coned.com/cemyaccount/MemberPages/MyAccounts.aspx?lang=eng')
    resp = opener.open('https://apps1.coned.com/cemyaccount/MemberPages/MyAccounts.aspx?lang=eng')
    print resp
    print r.text


if __name__ == '__main__':
    main()

这里 r.text 不起作用,登录后我需要页面的 html 代码。有人可以帮我在这里做什么吗?

4

1 回答 1

0

在 Chrome 中打开http://coned.com并打开“开发人员工具”窗格,我可以在下面跟踪我尝试的登录。我用testtesttest作用户名和test密码。

标题:

Request URL: https://apps2.coned.com/cemyaccount/NonMemberPages/Login.aspx?lang=eng
Request Method: POST
Status Code: 200 OK

数据:

TxtUser:testtesttest
UserName:VALUE
UserName:0
TxtPwd:test
UserName2:VALUE
UserName2:0
ctl00$Main$Login1$LoginButton:Sign In

知道了这一点,您应该使用附加参数构建数据字典:

URL = 'https://apps2.coned.com/cemyaccount/NonMemberPages/Login.aspx?lang=eng'

# This is the form data that the page sends when logging in
login_data = {
    'TxtUser': USERNAME,
    'UserName': 'VALUE',
    'UserName': '0',
    'TxtPwd': PASSWORD,
    'UserName2': 'VALUE',
    'UserName2': '0',
    'ctl00$Main$Login1$LoginButton': 'Sign In',
}

# Authenticate to the login page
r = session.post(URL, data=login_data)

# now, r.text will contain the html results of the page you just requested. In this case, the login page's redirected response.
# Check if the word successful appears in the results...
print filter(lambda x: 'success' in x.lower(), r.text.splitlines())

该站点似乎会向您显示登录页面,如果您的登录无效,该页面将包含另外一段 HTML:

<span id="ctl00_Main_FailureMsg">Your sign In attempt was not successful. Please try again.  If you have not created your registry information you can register now.</span>

最后,您还应该考虑mechanizescrapy。这两个工具都有很好的文档记录,并且专为您所追求的而构建。

希望这能为您指明更好的方向。

于 2013-11-06T22:11:35.383 回答