我正在将此作为一个空闲时间项目。
我想使用 Python 3 登录到一个站点(填写表单并提交),然后我计划从页面中抓取一些数据。我在 Python 3 中严格寻找解决方案的原因是因为我正在尝试更多地了解 Python,并认为我会直接使用 Python 3。我已经看到了几个很棒的工具,例如 mechanize,但它们似乎只支持 Python 2 .
计划将其用于金融投资网站,但我们仅以星巴克为例。
def loginToStockSite(username, pwd):
url = "https://www.starbucks.com/account/signin"
values = {"Account.UserName" : username,
"Account.PassWord" : pwd}
data = urllib.parse.urlencode(values)
data = data.encode('utf-8')
req = urllib.request.Request(url, data)
sock = urllib.request.urlopen(req)
htmlSource = sock.read().decode('utf-8')
return htmlSource
我对各种编码、解码、url 打开器等示例感到非常困惑。我还没有找到适合我的解决方案。
提前感谢您对我的“有趣”项目的任何帮助。
更新
这是我返回的 html(剪掉了一堆以适应问题的字符限制)
<!DOCTYPE html>
<head>
<meta charset="utf-8" />
<title>Account Sign In | Starbucks Coffee Company</title>
<meta name="viewport" content="width=device-width, initial-scale=1" />
</head>
<div class="fields">
<div class="region size1of1">
<div class="validation_summary warning validation_medium"><h2>Please Enable Cookies to Continue</h2><p>To sign in to the Starbucks web site, please enable cookies in your web browser.</p></div>
<div class="fields">
<form action="/account/signin" class="siren region size1of2" id="accountForm" method="post">
<fieldset class="login_position">
<legend class="hidden_visually">I have a Starbucks account.</legend>
<h2 class="existing_acc_h3">I have a Starbucks account.</h2>
<div id="user_info" class="region size1of2 reset">
Hi,
<span id="info_user"></span>
</div>
<div class="size1of2">
<a id="not_me" href="#notme">Not You?</a>
</div>
</div>
</div>
</div>
<div id="connect_cont" >
<div id="text_cont" class="hidden">
<h3>Have a Starbucks account?</h3>
<p>Connect your Facebook account to your Starbucks account by logging in here.</p>
</div>
</div>
</li>
<li>
<label for="Account_UserName">Username <span class="required">*</span></label>
<label for="Account_UserName" class="hidden">Starbucks Username <span class="required">*</span></label>
<input class="field_xlarge" id="Account_UserName" maxlength="200" name="Account.UserName" type="text" value="MY_USERNAME_WAS_HERE" />
</li>
<li>
<label for="Account_PassWord">Password <span class="required">*</span></label>
<label for="Account_PassWord" class="hidden">Starbucks Password <span class="required">*</span></label>
<input class="field_xlarge password" id="Account_PassWord" maxlength="200" name="Account.PassWord" type="password" value="MY_PASSWORD_WAS_HERE" />
</li>
<li>
Forgot your <a href="/account/forgot-username?AllowGuest=False">username</a> or <a href="/account/forgot-password?AllowGuest=False">password</a>?
<p class="privacy_policy hidden">
<a href="/about-us/company-information/online-policies/privacy-policy">Concerned about privacy?</a>
</p>
</li>
<li class="inline push">
<input type="checkbox" id="Account.IsRememberMe" name="Account.IsRememberMe" value="True" class="checkbox" />
<label for="Account.IsRememberMe">Keep me signed in.</label>
</li>
</ol>
</fieldset>
<fieldset>
<input id="ReturnUrl" name="ReturnUrl" type="hidden" value="" />
<input id="AllowGuest" name="AllowGuest" type="hidden" value="False" />
<input id="isJavaScriptDisabled" name="isJavaScriptDisabled" type="hidden" value="True" />
<span class="button"><button type="submit">Sign In</button></span>
</fieldset>
<fieldset class="submit">
<div id="fb_container">
<div id="fb_btn_cont">
Or log in using Facebook.
<p><a class="fb_button fb_button_medium" id="connect" href="#connect"><span class="fb_button_text">Login with Facebook</span></a></p>
</div>
</div>
</fieldset>
</form>
<div class="region size1of2 block_login">
<h2>I need a Starbucks account.</h2>
<p><span class="button"><a href="/account/create">Create An Account</a></span></p>
<p>With a Starbucks account you can register and manage your Starbucks Cards and participate in <a href="/card/rewards">My Starbucks Rewards</a>.</p>
<ul class ="basic">
<li>Enjoy a free drink on your birthday</li>
<li>Protect your balance if your Starbucks Card is missing or stolen.</li>
<li>Transfer money between cards.</li>
<li>Track your earnings in My Starbucks Rewards</li>
<li>Reload your Card balance automatically</li>
</ul>
</div>
</div>
<div class="fields">
<div class="region size1of1">
<ul id="breadcrumb">
<li><a href="/card">Card</a> .<ul>
<li><a href="/card/rewards">My Starbucks Rewards</a> .<ul>
<li>View Your Stars</li>
</ul></li>
</ul></li>
</ul>
</div>
</div>
</div>
<div id="footer">
<div class="container">
<form id="search" method="get" action="/search">
<fieldset>
<input id="searchbox" name="keywords" title="Search Keyword" maxlength="100" class="search_input" />
<span class="button button_search"><button id="submit_search_util" type="submit">Search</button></span>
</fieldset>
</form>
<div class="fields">
<div class="region size5of6 suffix1of6">
<div class="footer_categorical"><ol class="blocks blocks-five-up">
<li><h4>
<a href="/shop/card">Buy a Card</a>
</h4>
</li>
<li><h4>
<a href="/card">Manage Your Card</a>
</h4>
<ol>
<li><a href="/Card#cardBalanceWrapper">Check Balance</a></li>
<li><a href="/card/reload/one-time">Reload Your Card</a></li>
<li><a href="/card/manage/transfer">Transfer Funds</a></li>
<li><a href="/card/manage/history">View Transactions</a></li>
</ol>
</li>
<li><h4>
<a href="/card/rewards">My Starbucks Rewards</a>
</h4>
<ol>
<li><a href="/account/create/register">Register Your Card</a></li>
<li><a href="/account">View Your Stars</a></li>
<li><a href="/card/rewards/gold">Keep Your Gold Benefits</a></li>
<li><a href="/card/rewards/rewards-program-ts-and-cs">Rewards Program Terms and Conditions</a></li>
</ol>
</li>
<li><h4>
Learn More
</h4>
<ol>
<li><a href="/card/card-terms-and-conditions">Card Terms and Conditions</a></li>
<li><a href="/card/egift">What is a Starbucks Card eGift?</a></li>
<li><a href="/customer-service/faqs/card">Card FAQs</a></li>
<li><a href="/account">Manage Your Account</a></li>
<li><a href="http://mystarbucksidea.force.com/ideaList?ext=0&lsi=0&category=Starbucks+Card">My Starbucks Idea</a></li>
</ol>
</li>
我希望能找回一个页面,就像我使用浏览器登录时得到的一样,并在注册的默认礼品卡上拥有一个包含当前余额的 div。
<div class="balance-amount numbers">
那是可见的div。我也在页面源的另一个地方找到了余额,但我没有在页面上看到它。
<p class="card_balance numbers">
<span>$27.68</span> <span class="datestamp">3/10/2013 2:12 PM</span>
</p>
无论如何,我想输入登录信息并使用 Python 3 提交表单或以某种方式发布数据以登录,然后(作为一个用例,但在回答问题所需之外)我会从 html 中提取帐户余额.
我确实发现我的用户名和密码显然在其相应的字段中,并且在它说的验证摘要中
<div class="validation_summary warning validation_medium"><h2>Please Enable Cookies to Continue</h2><p>To sign in to the Starbucks web site, please enable cookies in your web browser.</p></div>
我确实看到了一些处理 cookie 的例子。那是问题吗?我将研究可能的解决方案。同时,希望这对您有所帮助。谢谢。