所以我正在抓取一个我可以通过 HTTPS 访问的网站,我可以登录并启动该过程,但每次我点击一个新页面 (URL) 时,cookie 会话 ID 都会发生变化。如何保留登录的 Cookie 会话 ID?
#!/usr/bin/perl -w
use strict;
use warnings;
use WWW::Mechanize;
use HTTP::Cookies;
use LWP::Debug qw(+);
use HTTP::Request;
use LWP::UserAgent;
use HTTP::Request::Common;
my $un = 'username';
my $pw = 'password';
my $url = 'https://subdomain.url.com/index.do';
my $agent = WWW::Mechanize->new(cookie_jar => {}, autocheck => 0);
$agent->{onerror}=\&WWW::Mechanize::_warn;
$agent->agent('Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.3) Gecko/20100407 Ubuntu/9.10 (karmic) Firefox/3.6.3');
$agent->get($url);
$agent->form_name('form');
$agent->field(username => $un);
$agent->field(password => $pw);
$agent->click("Log In");
print "After Login Cookie: ";
print $agent->cookie_jar->as_string();
print "\n\n";
my $searchURL='https://subdomain.url.com/search.do';
$agent->get($searchURL);
print "After Search Cookie: ";
print $agent->cookie_jar->as_string();
print "\n";
输出:
After Login Cookie: Set-Cookie3: JSESSIONID=367C6D; path="/thepath"; domain=subdomina.url.com; path_spec; secure; discard; version=0
After Search Cookie: Set-Cookie3: JSESSIONID=855402; path="/thepath"; domain=subdomain.com.com; path_spec; secure; discard; version=0
另外我认为该站点需要 CERT(在浏览器中确实如此),这是添加它的正确方法吗?
$ENV{HTTPS_CERT_FILE} = 'SUBDOMAIN.URL.COM'; ## Insert this after the use HTTP::Request...
同样对于 CERT 在使用此列表中的第一个选项时,这是否正确?
X.509 Certificate (PEM)
X.509 Certificate with chain (PEM)
X.509 Certificate (DER)
X.509 Certificate (PKCS#7)
X.509 Certificate with chain (PKCS#7)