1

On a certain project in Perl, I've written several "parsers", which allow me to visit websites with LWP::UserAgent. However, I'm having a problem with one website: it's behaving exactly as if I had visited the site with my browser, having turned off Cookies, so instead of giving me the page I want, it gives me a page with the message that I must turn on cookies. The entire code of my script is below. Any ideas? Thanks in advance.

(Note that I looked at the following url, which seems to be addressing my question, but unfortunately, I was unable to get a working script based on its suggestion: Cookies in perl lwp.)

use strict;
use warnings;
use LWP::UserAgent;
use HTTP::Cookies;
my $useragent = LWP::UserAgent->new;
$useragent->cookie_jar(HTTP::Cookies->new);
my $request = HTTP::Request->new(GET => "http://www.the-site-im-trying-to-parse.com");
my $response = $useragent->request($request);
print "Content-type: text/html\n\n";
print $response->as_string;
4

3 回答 3

1

您所做的只是通过 HTTP 下载 html 数据,因此在您决定查看结果之前没有浏览器交互。话虽如此,HTTP 服务器无法知道您的请求是否来自启用了 cookie 的客户端。所以这样做实际上不会改变结果。

WWW:Mechanize 模块对于轻松遍历网站很有用,但它不能解决您面临的问题。所以它实际上并不能帮助您解决您遇到的问题。

更现实的是,一旦您下载文件并将其显示在浏览器中,某种客户端 javascript 代码就无法正常工作。这可以是任意数量的事情,例如破坏 javascript 代码中实现的跨域策略。如果不提供您正在访问的 URL,就不可能说出来。

于 2012-02-04T21:17:11.437 回答
1

您是否考虑过使用 WWW::Mechanize 模块?默认情况下,它会自动收集 cookie。而且它更容易使用,因为包含很多非常有用的方法。

于 2012-02-04T18:07:41.293 回答
-1

尝试设置cookie_jar为临时存储(给它空 hashref):

$useragent->cookie_jar( {} );
于 2012-02-04T19:00:42.663 回答