1

我正在尝试以编程方式登录 OKCupid ( www.okcupid.com/login ) 以获取一些用户数据。我试图组合一个 python 脚本来执行此操作,但似乎做错了什么。

我希望从这个示例脚本中获得的行为是登录,然后重定向到主页,然后打印 HTML 响应。这是我到目前为止所拥有的:

import urllib, urllib2, cookielib

# cookie storage
cj = cookielib.CookieJar()
opener = urllib2.build_opener(
    urllib2.HTTPCookieProcessor(cj),
    urllib2.HTTPRedirectHandler
    )
# Useragent
opener.addheaders.append(('User-agent','Mozilla/4.0'))

url = 'http://www.okcupid.com/login'
login_data = urllib.urlencode({
    'username':'myusername',
    'password':'mypassword',
    })

req = urllib2.Request(url,login_data)
resp = urllib2.urlopen(req)
the_page = resp.read()

print the_page
4

3 回答 3

1

根据您正在做的事情,这是我正在做的登录示例,然后转到消息并保存所有用户名

import urllib, urllib2, cookielib
import re
# cookie storage
cj = cookielib.CookieJar()
opener = urllib2.build_opener(
    urllib2.HTTPCookieProcessor(cj),
    urllib2.HTTPRedirectHandler
    )
# Useragent
opener.addheaders.append(('User-agent','Mozilla/4.0'))


url = 'http://www.okcupid.com/login'
login_data = urllib.urlencode({
    'username':'Mobius1',
    'password':'raptor22',
    })

urllib2.install_opener(opener)

res = opener.open(url, login_data)

print res.url #should be http://www.okcupid.com/home if successful
res.close()

#navigate profile after successful login
res = opener.open('http://www.okcupid.com/messages')
the_page = res.read() #read content at URL

#find all usernames from page content with pattern /profile/username?
UserNameList = re.findall(r'/profile/([\w\.-]+)?', the_page)
print UserNameList

with open('OkC_messages.html', 'w') as fid: #save the_page as html
    fid.write(the_page)
于 2013-11-09T16:01:19.290 回答
0

想通了错误。我做了开瓶器,然后没有使用它。这是修复:

import urllib, urllib2, cookielib

# cookie storage
cj = cookielib.CookieJar()
opener = urllib2.build_opener(
    urllib2.HTTPCookieProcessor(cj),
    urllib2.HTTPRedirectHandler
    )
# Useragent
opener.addheaders.append(('User-agent','Mozilla/4.0'))

url = 'http://www.okcupid.com/login'
login_data = urllib.urlencode({
    'username':'myusername',
    'password':'mypassword',
    })

req = urllib2.Request(url,login_data)
resp = opener.open(req)
the_page = resp.read()

print the_page
于 2012-07-22T00:20:46.807 回答
0

查看https://github.com/IvanMalison/okcupyd。它完全符合您的需要,并且提供了一些不错的抽象。

于 2014-09-25T21:55:14.047 回答