1

我曾经写过一个简单的“爬虫”来用 JAVA 为我下载 http 页面。现在我正在尝试使用 LWP 模块将相同的东西重写为 Perl。

这是我的 Java 代码(工作正常):

String referer = "http://example.com";
String url = "http://example.com/something/cgi-bin/something.cgi";
String params= "a=0&b=1";

HttpState initialState = new HttpState(); HttpClient httpclient = new HttpClient(); httpclient.setState(initialState); httpclient.getParams().setCookiePolicy(CookiePolicy.NETSCAPE);

PostMethod postMethod = new PostMethod(url); postMethod.addRequestHeader("Referer", referer); postMethod.addRequestHeader("User-Agent", " Mozilla/5.0 (Windows; U; Windows NT 6.1; pl; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13"); postMethod.addRequestHeader("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8"); postMethod.addRequestHeader("Content-Type", "application/x-www-form-urlencoded");

String length = String.valueOf(params.length()); postMethod.addRequestHeader("Content-Length", length); postMethod.setRequestBody(params);

httpclient.executeMethod(postMethod);

这是 Perl 版本:

my $referer = "http://example.com/something/cgi-bin/something.cgi?module=A";
my $url = "http://example.com/something/cgi-bin/something.cgi";
my @headers = (
  'User-Agent' => 'Mozilla/5.0 (Windows; U; Windows NT 6.1; pl; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13',
  'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
  'Referer' => $referer,
  'Content-Type' => 'application/x-www-form-urlencoded',
);

my @params = (
    'a' => '0',
    'b' => '1',
);

my $browser = LWP::UserAgent->new( );
$browser->cookie_jar({});

$response = $browser->post($url, @params, @headers);
print $response->content;

发布请求正确执行,但我得到另一个(主)网页。好像 cookie 不能正常工作......

任何猜测什么是错的?为什么我从 JAVA 和 perl 程序中得到不同的结果?

4

3 回答 3

7

You can also use WWW::Mechanize, which is a wrapper around LWP::UserAgent. It gives you the cookie jar automatically.

于 2011-02-11T15:03:45.853 回答
4

You want to be creating hashes, not arrays - e.g. instead of:

my @params = ( 'a' => '0', 'b' => '1', );

You should use:

my %params = ( a => 0, b => 1, );

When passing the params to the LWP::UserAgent post method, you need to pass a reference to the hash, e.g.:

$response = $browser->post($url, \%params, %headers);

You could also look at the request you're sending to the server with:

print $response->request->as_string;

You can also use a handler to automatically dump requests and responses for debugging purposes:

$ua->add_handler("request_send", sub { shift->dump; return }); $ua->add_handler("response_done", sub { shift->dump; return });

于 2011-02-11T15:02:22.220 回答
1

I believe it has to do with $response = $browser->post($url, @params, @headers);

From the doc of LWP::UserAgent

$ua->post( $url, \%form )
$ua->post( $url, \@form )
$ua->post( $url, \%form, $field_name => $value, ... )
$ua->post( $url, $field_name => $value,... Content => \%form )
$ua->post( $url, $field_name => $value,... Content => \@form )
$ua->post( $url, $field_name => $value,... Content => $content )

Since your params and headers are as hashes, I would try this:

my $referer = "http://example.com/something/cgi-bin/something.cgi?module=A";
my $url = "http://example.com/something/cgi-bin/something.cgi";
my %headers = (
  'User-Agent' => 'Mozilla/5.0 (Windows; U; Windows NT 6.1; pl; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13',
  'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
  'Referer' => $referer,
  'Content-Type' => 'application/x-www-form-urlencoded',
);

my %params = (
    'a' => '0',
    'b' => '1',
);

my $browser = LWP::UserAgent->new( );
$browser->cookie_jar({});

$response = $browser->post($url, \%params, %headers);
于 2011-02-11T15:01:22.600 回答