我有一个正在运行的 LWP::UserAgent 应该应用于以下 URL:
http://dms-schule.bildung.hessen.de/suchen/suche_schul_db.html?show_school=5503
这与许多类似的目标一起运行,看到以下结局:
html?show_school=5503
html?show_school=9002
html?show_school=5512
我想通过使用 LWP::UserAgent 来做到这一点:
for my $i (0..10000)
{ $ua->get(' [here the URL should be applied] ', id => 21, extern_uid => $i);
# process reply }
无论如何,使用这样的循环来完成这种工作是一种方法。我猜 LWP 的 API 并不是要取代核心 Perl 的功能,我可以使用 Perl 循环来查询多个 URL。
由于必须应用循环而无法运行的代码:
#use strict;
use DBI;
use LWP::UserAgent;
use HTTP::Request::Common;
use HTML::TreeBuilder::XPath;
# first get a list of all schools
my ($url = '[here the url should be applied] =',id);
for my $id (0..10000) {
$ua->get(' [here the url should be applied ] ', id => 21, extern_uid => $i);
# process reply
}
#my $request = POST $url,
# [
# Schulsuche=> "Ergebnisse anzeigen",
# order => "schule_ort",
# schulname => undef,
# schulort => undef,
# typid => "11",
# verbinder => "AND"
# ];
my $ua = LWP::UserAgent->new;
print "getting all schools - this could take some time\n";
my $response = $ua->request($request);
# extract the ids
my @ids = $response->content =~ /getSchoolDetail\((\d+)/gs;
print "found " . scalar @ids . " schools\n";
# for this demo we only do the first 5
my @ids_to_do = @ids[0..4];
# use your own user and password
my $dbh = DBI->connect("DBI:mysql:database=schulen", "user", "pass", { AutoCommit => 0 }) or die $!;
my $sth = $dbh->prepare(<<sqlend);
insert into schulen ( name , plz , ort, strasse , tel, fax , mail, quelle , original_id )
values ( ?, ?, ?, ?, ?, ?, ?, ?, ? )
sqlend
# now loop over ids
for my $id (@ids_to_do) {
# get detail information for id
my $res = $ua->get("[url]=> &gid=$id");
# parse the response
my $tree = HTML::TreeBuilder::XPath->new;
$tree->parse($res->content);
my $xpath = q|//div[@id='MCinhview']//div[@class='contentitem']//table|;
my ($adress_table, $tel_table) = $tree->findnodes($xpath);
my ($adr) = $adress_table->find("td");
my ($name, $city, $street) = map { s/^\s*//; s/\s*$//; $_ } ($adr->content_list)[2,4,6];
my($plz, $ort) = $city =~ /^(\d+)\s*(.*)/;
my ($tel, $fax, $mail) = map { s/^\s*//; s/\s*$//; $_ } map { ($_->content_list)[1] } $tel_table->find("td");
$sth->execute($name, $plz, $ort, $street, $tel, $fax, $mail, "SA", $id);
$dbh->commit;
$tree->delete;
print "$name done\n";
}
10 月 25 日星期日更新:我已经应用了 OmnipotentEntity 的建议。
#!/usr/bin/perl -W
use strict;
use warnings; # give out some warnings if something does not run well
use diagnostics; # tell me when something is wrong
use DBI;
use LWP::UserAgent;
use HTTP::Request::Common;
use HTML::TreeBuilder::XPath;
# first get a list of all schools
my $ua = LWP::UserAgent->new;
$ua->agent("Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.7) Gecko/20070914 Firefox/2.0.0.7");
#pretending to be firefox on linux.
for my $i (0..10000) {
my $request = HTTP::Request->new(GET => sprintf(" here to put the URL into =%d", $i));
$request->header('Accept' => 'text/html');
my $response = $ua->request($request);
if ($response->is_success) {
$pagecontent = $response -> content;
}
# now we can do whatever with the $pagecontent
}
my $request = POST $url,
[
order => "schule_ort",
schulname => undef,
Basisdaten => undef,
Profil => undef,
Schulort => undef,
typid => "11",
Fax =>
Homepage => undef,
verbinder => "AND"
];
print "getting all schools - this could take some time\n";
my $response = $ua->request($request);
# extract the ids
my @ids = $response->content =~ /getSchoolDetail\((\d+)/gs;
print "found " . scalar @ids . " schools\n";
# for this demo we only do the first 5
my @ids_to_do = @ids[0..4];
# use your own user and password
my $dbh = DBI->connect("DBI:mysql:database=schulen", "user", "pass", { AutoCommit => 0 }) or die $!;
my $sth = $dbh->prepare(<<sqlend);
insert into schulen ( name , plz , ort, strasse , tel, fax , mail, quelle , original_id )
values ( ?, ?, ?, ?, ?, ?, ?, ?, ? )
sqlend
# now loop over ids
for my $id (@ids_to_do) {
# get detail information for id
my $res = $ua->get(" here to put the URL into => &gid=$id");
# parse the response
my $tree = HTML::TreeBuilder::XPath->new;
$tree->parse($res->content);
my $xpath = q|//div[@id='MCinhview']//div[@class='floatbox']//table|;
my ($adress_table, $tel_table) = $tree->findnodes($xpath);
my ($adr) = $adress_table->find("td");
my ($name, $city, $street) = map { s/^\s*//; s/\s*$//; $_ } ($adr->content_list)[2,4,6];
my($plz, $ort) = $city =~ /^(\d+)\s*(.*)/;
my ($tel, $fax, $mail) = map { s/^\s*//; s/\s*$//; $_ } map { ($_->content_list)[1] } $tel_table->find("td");
$sth->execute($name, $plz, $ort, $street, $tel, $fax, $mail, "SA", $id);
$dbh->commit;
$tree->delete;
print "$name done\n";
}
我想遍历结果,因此我尝试应用相应的 URL,但出现了一堆错误:
suse-linux:/usr/perl # perl perl_mecha_example_two.pl 全局符号“$pagecontent”在 perl_mecha_example_two.pl 第 24 行需要明确的包名称。 全局符号“$url”在 perl_mecha_example_two.pl 第 29 行需要明确的包名称。 perl_mecha_example_two.pl 的执行由于编译错误而中止 (#1) (F) 你说过“use strict”或“use strict vars”,这表明 所有变量都必须是词法范围的(使用“my”或“state”), 使用“我们的”预先声明,或明确有资格说 全局变量在哪个包中(使用“::”)。 用户代码中未捕获的异常: 全局符号“$pagecontent”在 perl_mecha_example_two.pl 第 24 行需要明确的包名称。 全局符号“$url”在 perl_mecha_example_two.pl 第 29 行需要明确的包名称。 perl_mecha_example_two.pl 的执行由于编译错误而中止。 在 perl_mecha_example_two.pl 第 86 行
现在是调试部分。我要改变什么?如何以正确的方式应用 URL?
当我使用严格时,我不允许在声明变量之前使用它。通常的解决方法是在它的第一次出现时添加my
, 例如my $url
and my $pagecontent
。