整整一周,我一直在尝试编写一个代码,该代码将从网页下载链接,然后遍历每个链接以转储每个链接页面上写入的内容。我下载的原始网页有 500 个链接,这些链接指向不同的网页,每个网页都包含对我来说很重要的信息。我只想降一级。但是我有几个问题。
2) 我一直无法打印页面的文字内容。我已经能够打印字体详细信息,但这没用。
#Download all the modules I used#
use LWP::UserAgent;
use HTML::TreeBuilder;
use HTML::FormatText;
use WWW::Mechanize;
use Data::Dumper;
#Download original webpage and acquire 500+ Links#
$url = "http://wx.toronto.ca/festevents.nsf/all?openform";
my $mechanize = WWW::Mechanize->new(autocheck => 1);
my $title = $mechanize->title;
print "<b>$title</b><br />";
my @links = $mechanize->links;
foreach my $link (@links) {
# Retrieve the link URL
my $href = $link->url_abs;
# $URL1= get("$link");
my $ua = LWP::UserAgent->new;
my $response = $ua->get($href);
unless($response->is_success) {
die $response->status_line;
my $URL1 = $response->decoded_content;
die Dumper($URL1);
#This part of the code is just to "clean up" the text
open(FILE, ">TorontoParties.txt");
print FILE "$Parsed";
close (FILE);