3

我注意到使用给定的 https 网站处理response_dataresponse_done事件之间大约有 120 秒的延迟。WWW::Mechanize我使用普通的网络浏览器进行了检查,并没有遇到这种缓慢的情况,所以我怀疑我必须做错什么。

这是我为追踪事件所做的(由于某种原因use LWP::Debug qw(+)没有做任何事情):

use WWW::Mechanize;
use Time::HiRes qw(gettimeofday);
use IO::Handle;

my $mech = WWW::Mechanize->new(
  timeout     => 3,
  autocheck   => 1,       # check success of each query
  stack_depth => 0,       # no keeping history
  keep_alive  => 50,      # connection pool
);

$mech->agent_alias( 'Windows IE 6' );
open my $debugfile, '>traffic.txt';
$debugfile->autoflush(1);

$mech->add_handler( request_send => sub {
    my $cur_time = gettimeofday();
    my $req = shift;
    print $debugfile "\n$cur_time === BEGIN HTTP REQUEST ===\n";
    print $debugfile $req->dump();
    print $debugfile "\n$cur_time ===   END HTTP REQUEST ===\n";
    return
  }
);
$mech->add_handler( response_header => sub {
    my $cur_time = gettimeofday();
    my $res = shift;
    print $debugfile "\n$cur_time === GOT RESPONSE HDRS ===\n";
    print $debugfile $res->dump();
    return
  }
);
$mech->add_handler( response_data => sub {
    my $cur_time = gettimeofday();
    my $res = shift;
    my $content_length = length($res->content);
    print $debugfile "$cur_time === Got response data chunk resp size = $content_length ===\n";
    return
  }
);
$mech->add_handler( response_done => sub {
    my $cur_time = gettimeofday();
    my $res = shift;
    print $debugfile "\n$cur_time === BEGIN HTTP RESPONSE ===\n";
    print $debugfile $res->dump();
    print $debugfile "\n===   END HTTP RESPONSE ===\n";
    return
  }
);

这是跟踪的摘录(URL 和 cookie 被混淆了):

1347463214.24724 === BEGIN HTTP REQUEST ===
GET https://...
Accept-Encoding: gzip
Referer: https://...
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
Cookie: ...
Cookie2: $Version="1"

(no content)

1347463214.24724 ===   END HTTP REQUEST ===

1347463216.13134 === GOT RESPONSE HDRS ===
HTTP/1.1 200 OK
Date: Wed, 12 Sep 2012 15:20:08 GMT
Accept-Ranges: bytes
...
Server: Lotus-Domino
Content-Length: 377806
Content-Type: application/octet-stream
Last-Modified: Fri, 07 Sep 2012 06:25:33 GMT
Client-Peer: ...
Client-Response-Num: 1
Client-SSL-Cert-Issuer: ...
Client-SSL-Cert-Subject: ...
Client-SSL-Cipher: DES-CBC3-SHA
Client-SSL-Socket-Class: IO::Socket::SSL

(no content)
1347463216.48305 === Got response data chunk resp size = 4096 ===

1347463337.98131 === BEGIN HTTP RESPONSE ===
HTTP/1.1 200 OK
Date: Wed, 12 Sep 2012 15:20:08 GMT
Accept-Ranges: bytes
...
Server: Lotus-Domino
Content-Length: 377806
Content-Type: application/octet-stream
Last-Modified: Fri, 07 Sep 2012 06:25:33 GMT
Client-Date: Wed, 12 Sep 2012 15:22:17 GMT
Client-Peer: ...
Client-Response-Num: 1
Client-SSL-Cert-Issuer: ...
Client-SSL-Cert-Subject: ...
Client-SSL-Cipher: DES-CBC3-SHA
Client-SSL-Socket-Class: IO::Socket::SSL

PK\3\4\24\0\6\0\10\0\0\0!\0\x88\xBC\21Xi\2\0\0\x84\22\0\0\23\0\10\2[Content_Types].xml \xA2...
(+ 377294 more bytes not shown)

===   END HTTP RESPONSE ===

在“Got response data chunk”和“BEGIN HTTP RESPONSE”消息期间,您可以看到 121.5 秒的间隔。LWP::UserAgent我有一种在收到全部数据后有时会挂起两分钟的感觉。

你有什么线索可以从哪里来吗?

编辑这里是 Wireshark 中的屏幕截图:我在 120 秒后收到 FIN/ACK 消息……</p>

Wireshark 摘录

谢谢

4

4 回答 4

3

我认为您的交易实际上可能需要很长时间。的文档LWP::UserAgent说这个

[response_data 处理程序] 需要返回一个 TRUE 值,以便为同一请求的后续块再次调用

因此,因为您的处理程序不返回任何内容,所以您只跟踪第一个返回的数据包

根据您的输出,前 4KB 的数据在 2.2 秒内到达,或大约每秒 2KB。整个数据的长度为 369KB,因此您预计会再收到 92 个数据包,而以每秒 2KB 的速度传输需要三分钟。你会在两分钟内得到回复,所以我认为你的时间是合理的

于 2012-09-12T16:35:53.950 回答
3

感谢 Borodin 的回答,我找到了解决它的方法:

我以response_data这种方式修改了事件处理程序子:

if($res->header('Content-Length') == length($res->content)) {
    die "OK"; # Got whole data, not waiting for server to end the communication channel.
}
return 1; # In other cases make sure the handler is called for subsequent chunks

然后如果X-Diedheader 等于OK然后忽略调用者中的错误。

于 2012-09-12T16:59:30.903 回答
2

我知道这已经很老了,但我最近遇到了同样的问题。仅当未加密的 HTTPS 响应(包括标头)的大小正好为 1024 字节时才会发生这种情况。Benoit 的响应似乎正好是 4096 字节,所以 1024 的倍数可能很重要。我无法控制服务器,因此无法生成任意长度的测试响应,也无法在任何其他服务器上重现该问题。不过,在 1024 字节处发生的情况是可重复的。

环顾 LWP 代码(v6.05),我发现 sysread 被要求一次读取 1024 个字节。因此,它第一次返回所有 1024 个字节。然后立即第二次调用它,而不是返回0,表示没有更多数据,而是返回undef,表示错误,并将errno设置为EAGAIN,表示还有更多数据,但尚不可用. 这会导致套接字上的选择,因为不会有更多数据而挂起。超时需要 120 秒,然后返回我们所拥有的数据,这恰好是正确的结果。因此,我们没有错误,只是有很长的延迟。

我没有足够方便的访问权限来使用 Benoit 的解决方案。相反,我的解决方法是扩展 HTTPS 处理代码以检查上述情况并返回 0 而不是 undef:

package LWP::Protocol::https::Socket;

sub sysread {
    my $self = shift;
    my $result = $self->SUPER::sysread(@_);
    # If we get undef back then some error occurred. If it's EAGAIN
    # then that ought to mean that there is more data to read but
    # it's not available yet. We suspect the error may be false.
    # $_[2] is the offset, so if it's defined and non-zero we have
    # some data in the buffer.
    # $_[0] is the buffer, so check it for an entire HTTP response,
    # including the headers and the body. If the length specified
    # by Content-Length is exactly the length of the body we have in
    # the buffer, then take that as being complete and return a length
    # here instead. Since it's unlikely that anything was read, the
    # buffer will not have increased in size and the result will be zero
    # (which was the expected result anyway).
    if (!defined($result) &&
        $!{EAGAIN} &&
        $_[2] &&
        $_[0] =~ /^HTTP\/\d+\.\d+\s+\d+\s+.*\s+content-length\s*:\s*(\d+).*?\r?\n\r?\n(.*)$/si &&
        length($2) == $1) {
            return length($_[0]) - $_[2]; # bufferlen - offset
    }
    return $result;
}
于 2016-11-29T17:28:50.637 回答
1

艾伦,我在我的系统上收到了同样的行为。对于内容长度 1024、2048、3072 字节等

这个问题的解决方法是将Net::HTTP升级到6.09及以上版本

于 2017-06-29T15:54:22.997 回答