该Parallel::ForkManager
模块应该为您工作,但因为它使用fork
而不是线程,父进程和每个子进程中的变量是独立的,它们必须以不同的方式进行通信。
该程序使用-o
选项curl
将页面保存在文件中。例如,文件http://mysite.com/page
保存在文件http\mysite.com\page
中,父进程可以从那里检索。
use strict;
use warnings;
use Parallel::ForkManager;
use URI;
use File::Spec;
use File::Path 'make_path';
my $pm = Parallel::ForkManager->new(10);
foreach my $site (qw( http://mysite.com/page http://myothersite.com/page )) {
my $pid = $pm->start;
next if $pid;
fetch($site);
$pm->finish;
}
$pm->wait_all_children;
sub fetch {
my ($url) = @_;
my $uri = URI->new($url);
my $filename = File::Spec->catfile($uri->scheme, $uri->host, $uri->path);
my ($vol, $dir, $file) = File::Spec->splitpath($filename);
make_path $dir;
print `curl http://mysite.com/page -m 2 -o $filename`;
}
更新
这是一个使用threads
withthreads::shared
将每个页面返回到所有线程之间共享的哈希的版本。哈希必须标记为共享,并在修改之前锁定以防止并发访问。
use strict;
use warnings;
use threads;
use threads::shared;
my %pages;
my @threads;
share %pages;
foreach my $site (qw( http://mysite.com/page http://myothersite.com/page )) {
my $thread = threads->new('fetch', $site);
push @threads, $thread;
}
$_->join for @threads;
for (scalar keys %pages) {
printf "%d %s fetched\n", $_, $_ == 1 ? 'page' : 'pages';
}
sub fetch {
my ($url) = @_;
my $page = `curl -s $url -m 2`;
lock %pages;
$pages{$url} = $page;
}