我正在使用 perl 的线程模块和一个我正在研究的简单爬虫,这样我就可以并行下载页面。有时,我会收到如下错误消息:
Thread 7 terminated abnormally: read timeout at /usr/lib64/perl5/threads.pm line 101.
Thread 15 terminated abnormally: Can't connect to burgundywinecompany.com:80 (connect: timeout) at /usr/lib64/perl5/threads.pm line 101.
Thread 19 terminated abnormally: write failed: Connection reset by peer at /usr/lib64/perl5/threads.pm line 101.
当我在没有线程的情况下线性运行脚本时,我不会遇到这些错误。这些错误几乎看起来像是来自LWP::UserAgent模块,但它们似乎不应该导致线程异常退出。在使用 perl 的线程时,我必须采取一些额外的预防措施吗?谢谢!
更新:
我已经找到了这些异常终止的来源,而且似乎每当我使用LWP::UserAgent
. 如果我删除下载网页的方法调用,那么错误就会停止。
示例脚本
下面的脚本会导致我所说的一个错误。最后一个 URL 将超时,导致应该只是 HTTP::Repsonse 对象的一部分,而不是导致线程异常终止:
#!/usr/bin/perl
use threads;
use Thread::Queue;
use LWP::UserAgent;
my $THREADS=10; # Number of threads
#(if you care about them)
my $workq = Thread::Queue->new(); # Work to do
my @stufftodo = qw(http://www.collectorsarmoury.com/ http://burgundywinecompany.com/ http://beetreeminiatures.com/);
$workq->enqueue(@stufftodo); # Queue up some work to do
$workq->enqueue("EXIT") for(1..$THREADS); # And tell them when
threads->create("Handle_Work") for(1..$THREADS); # Spawn our workers
$_->join for threads->list;
sub Handle_Work {
while(my $todo=$workq->dequeue()) {
last if $todo eq 'EXIT'; # All done
print "$todo\n";
my $ua = LWP::UserAgent->new;
my $RESP = $ua->get($todo);
}
threads->exit(0);
}