0

由于此站点上的帮助,我在 Perl 方面取得了一些不错的进展,但我遇到了问题。我正在抓取的页面之一已更改,我现在不知道如何访问它。我想要做的是存储一个链接到我想要访问的每个页面。问题是这些链接位于源代码中的 a href 属性标签内,我不知道如何提取它们。有人可以帮我吗?

我需要的链接是从这个页面的第 316 到 354 行(源代码)http://www.soccerbase.com/teams/home.sd

我需要基本上提取变量的链接以在我的其他脚本中使用。如前所述,我正在使用 WWW::Mechanize 和 HTML::TokeParser,希望其中有一些我可以使用但目前无法弄清楚的方法。提前致谢!

4

1 回答 1

0

参见WWW::Mechanize 中的方法find_all_links。无需手动使用解析器。您可能想放松正则表达式,以便一次获得所有约 1000 个可能的团队。

use WWW::Mechanize qw();
my $w = WWW::Mechanize->new;
$w->get('http://www.soccerbase.com/teams/home.sd');
for my $link ($w->find_all_links(url_regex => qr/comp_id=1\b/)) {
    # 20 instances of WWW::Mechanize::Link
    printf "URL=%s\tTeam=%s\n", $link->url_abs, $link->text
}

URL=http://www.soccerbase.com/tournaments/tournament.sd?comp_id=1       Team=Premier League
URL=http://www.soccerbase.com/teams/team.sd?team_id=142&comp_id=1       Team=Arsenal
URL=http://www.soccerbase.com/teams/team.sd?team_id=154&comp_id=1       Team=Aston Villa
URL=http://www.soccerbase.com/teams/team.sd?team_id=308&comp_id=1       Team=Blackburn
URL=http://www.soccerbase.com/teams/team.sd?team_id=354&comp_id=1       Team=Bolton
URL=http://www.soccerbase.com/teams/team.sd?team_id=536&comp_id=1       Team=Chelsea
URL=http://www.soccerbase.com/teams/team.sd?team_id=942&comp_id=1       Team=Everton
URL=http://www.soccerbase.com/teams/team.sd?team_id=1055&comp_id=1      Team=Fulham
URL=http://www.soccerbase.com/teams/team.sd?team_id=1563&comp_id=1      Team=Liverpool
URL=http://www.soccerbase.com/teams/team.sd?team_id=1718&comp_id=1      Team=Man City
URL=http://www.soccerbase.com/teams/team.sd?team_id=1724&comp_id=1      Team=Man Utd
URL=http://www.soccerbase.com/teams/team.sd?team_id=1823&comp_id=1      Team=Newcastle
URL=http://www.soccerbase.com/teams/team.sd?team_id=1855&comp_id=1      Team=Norwich
URL=http://www.soccerbase.com/teams/team.sd?team_id=2093&comp_id=1      Team=QPR
URL=http://www.soccerbase.com/teams/team.sd?team_id=2477&comp_id=1      Team=Stoke
URL=http://www.soccerbase.com/teams/team.sd?team_id=2493&comp_id=1      Team=Sunderland
URL=http://www.soccerbase.com/teams/team.sd?team_id=2513&comp_id=1      Team=Swansea
URL=http://www.soccerbase.com/teams/team.sd?team_id=2590&comp_id=1      Team=Tottenham
URL=http://www.soccerbase.com/teams/team.sd?team_id=2744&comp_id=1      Team=West Brom
URL=http://www.soccerbase.com/teams/team.sd?team_id=2783&comp_id=1      Team=Wigan
URL=http://www.soccerbase.com/teams/team.sd?team_id=2848&comp_id=1      Team=Wolves
于 2012-02-21T20:33:04.593 回答