perl - Perl：从网络目录下载所有 .jpgs

Question

#!/usr/bin/perl

use strict;
use warnings;
use WWW::Mechanize;
use FindBin qw($Bin);
print $Bin;
my $folder = "$Bin/Resources";
mkdir($folder, 0700) unless(-d $folder );
chdir($folder) or die "can't chdir $folder\n";
my $url = 'http://www.ukgamingcomputers.co.uk/images/zalmanz11plus.jpg';
my $local_file_name = 'pic.jpg';
my $mech = WWW::Mechanize->new;
$mech->get( $url, ":content_file" => $local_file_name );

我目前正在使用此代码下载 .jpg 并将其放在名为 Resources 的文件夹中。我想下载http://www.ukgamingcomputers.co.uk/images/目录中的所有 .jpg。我完全不知道我将如何实现这一目标。如果您有代码解决方案，我将不胜感激！

score 2 · Accepted Answer

I'm afraid you can't do that. It is also unlikely that the web site owner would want you to.

There is no practical problem with downloading an image in that path, but to fetch them all you need to know what they are called, and there is no way to get a directory listing using HTTP.

You could crawl the site, fetch all the HTML pages from it, and find the names of all the image files those pages link to, but that would be awkward to do and even less likely to be acceptable to the site owner. It would also get you only the images used on the site, and not all the images in the directory.

Some HTTP servers are configured to return a listing of the directory in HTML if no specific file is specified in the URL and there is no default index.html file to send, but that is unusual nowadays as it represents a security breach.

If you think the site owner won't mind you helping yourself to his pictures, why not just send an email asking for a copy of them?

score 1 · Accepted Answer

有点像你的例子，这会从你列出的那个网站中提取 jpgs。

#!/usr/bin/perl 
use strict;
use warnings;

use WWW::Mechanize;
use WWW::Mechanize::Link;
use Getopt::Long;

exit int main( parse_args() );

sub main {
    my $opts = shift;

    my $folder = $opts->{folder};
    chdir($folder) or die "can't chdir $opts->{folder}\n";

    my $mech = WWW::Mechanize->new;
    $mech->get( $opts->{url} );

    for my $link ( $mech->links() ) {
        next unless $link->text() =~ /jpg$/;
        $mech->get( $link->url() );
        $mech->save_content( $link->text() );
    }
}

sub parse_args {
    my %opts = (
        url    => "http://www.ukgamingcomputers.co.uk/images/",
        folder => "/home/kprice/tmp",
    );

    GetOptions( \%opts, 'url|u=s', 'folder|d=s', ) or die $!;

    return \%opts;
}

如果您在 linux 上，这将起作用，但从该链接中提取所有内容：

$ wget -r http://www.ukgamingcomputers.co.uk/images/

编辑：在快速复制/粘贴之后，我稍微更正了它。

score 1 · Accepted Answer

你必须使用WWW::Mechanize吗？

HTML::LinkExtor这是一个例子LWP::Simple

编辑：这实际上从给定地址提取所有图像。

#!/usr/bin/perl

use warnings;
use strict;

use LWP::Simple;
use HTML::LinkExtor;
use Data::Dumper;
$Data::Dumper::Indent=1;

die "usage: $0 url\n" if @ARGV != 1;
my $url = shift;
$|++;

if ( $url !~ /^http/ ) { 
  print "usage: url ( http(s)://www.example.com/  )\n"; 
  exit(1);
}

my %images = (); 
my $html = get($url) 
  or die "could not get $url\n";

my $parser = HTML::LinkExtor->new(undef, $url);
$parser->parse($html);

my @all_link_refs = $parser->links();

for my $link_ref ( @all_link_refs  ) { 
  my ( $html_tag, $attr_name, $this_url ) = @$link_ref;
  if ( ($html_tag eq 'img') ) { 
    my $image_name = (split("/", $this_url))[-1];
    $images{$image_name}++;

    if ( $images{$image_name} == 1  ) { 
        print "Downloading $this_url to $image_name...\n";
        open my $PIC, ">", "$image_name";
        my $image = get($this_url);
        print $PIC $image;
    }   
  }
}

输出：

$ test.pl http://google.com
Downloading http://google.com/intl/en_ALL/images/srpr/logo1w.png to logo1w.png...

perl - Perl：从网络目录下载所有 .jpgs

3 回答 3

Related

Reference