regex - 使用 WWW::Mechanize 跟踪图像链接时遇到问题

Question

我正在编写一个 Perl 脚本来获取“当天的天文图像”并将其设置为我的墙纸。然后我会设置一个 cronjob 每天为我做这件事。但是我很难让脚本跟随指向全尺寸图像的图像链接，然后才下载它。我正在尝试类似下面的代码（请记住，我只是一个对 Perl 正则表达式不太了解的 Perl 初学者）：

#!/usr/bin/perl -w
use strict;
use warnings;
use WWW::Mechanize;

my $url = "http://apod.nasa.gov/apod/astropix.html";

my $mech = WWW::Mechanize->new();
$mech->get($url);
    #debugging
if ($mech->follow_link(url_regex=>qr/\.(?:jpg|png)$/)){
    print "Following the image link...";
}else{
    print "Couldn't find the link...";
}

my @img = $mech->find_image(alt_regex => qr/image/i);

    foreach my $img(@img){
     $mech->get($img->url, ':content_file'=>'astro.jpg');
    }

    print "\n";

    exit(0);

任何帮助将非常感激！

score 3 · Accepted Answer

您的脚本几乎是正确的。NASA页面的结构是：

<html>
<body>
  ...
  <a href="http://.../blah.jpg"><img src="http://.../blah-lowres.jpg"></a>
  ...
</body>
</html>

因此，如果$mech->follow_link成功，您已经在$mech->content.

试试这个：

$mech->get($url) or die "unable to get $url";
$mech->follow_link(url_regex => qr/\.(jpg|png)\z/) or die "unable to follow image link";
open(my $fh, ">astro.jpg");
print {$fh} $mech->content;
close($fh);
print "saved image as astro.jpg\n";

regex - 使用 WWW::Mechanize 跟踪图像链接时遇到问题

1 回答 1

Related

Reference