perl - 使用 perl lwp linkextractor 下载文件

Question

我正在尝试从网页下载文件。

首先，我使用linkextractor 获取链接，然后我想使用lwp 下载它们。我是perl 中的新手编程。

我做了以下代码...

#!/usr/bin/perl

use strict;
use warnings;

use HTML::TableExtract;
use HTML::LinkExtractor;
use LWP::Simple qw(get);
use Archive::Zip;

my $html = get $ARGV[0];

my $te = HTML::TableExtract->new(
    keep_html => 1,
    headers => [qw( column1 column2 )],
);
$te->parse($html);

# I get only the first row
my ($row) = $te->rows;

my $LXM = new HTML::LinkExtractor(undef,undef,1);
$LXM->parse(\$$row[0]);
my ($t) = $LXM->links;

my $LXS = new HTML::LinkExtractor(undef,undef,1);
$LXS->parse(\$$row[1]);
my ($s) = $LXS->links;

#-------
for (my $i=0; $i < scalar(@$s); $i++) {
  print "$$s[$i]{_TEXT} $$s[$i]{href} $$t[$i]{href} \n";
  my $file = '/tmp/$$s[$i]{_TEXT}';
  my $url = $$s[$i]{href};
  my $content = getstore($url, $file);
  die "Couldn't get it!" unless defined $content;
}

我收到以下错误

Undefined subroutine &main::getstore called at ./geturlfromtable.pl line 35.

提前致谢！

score 2 · Accepted Answer

LWP::Simple 可以通过两种不同的方式加载。

use LWP::Simple;

这将加载模块并使其所有功能对您的程序可用。

use LWP::Simple qw(list of function names);

这将加载模块并仅使您请求的特定功能集可用。

你有这个代码：

use LWP::Simple qw(get);

这使得该get()功能可用，但该getstore()功能不可用。

要解决此问题，请添加getstore()到您的函数列表中。

use LWP::Simple qw(get getstore);

或者（可能更简单）删除函数列表。

use LWP::Simple;

更新：如果我添加几个风格点，我希望你不介意。

首先，您使用的是一个非常旧的模块 - HTML::LinkExtractor。它已经快十五年没有更新了。我建议改为查看HTML::LinkExtor 。

其次，您的代码使用了很多引用，但是您以一种非常复杂的方式使用它们。例如，你有的地方\$$row[0]，你真的只需要$row->[0]. 同样，$$s[$i]{href}如果写成 . 对大多数人来说也很容易理解$s->[$i]{href}。

接下来，您使用 C 风格的 for 循环并迭代数组的索引。foreach从零迭代到数组中的最后一个索引通常更简单。

foreach my $i (0 .. $#$s) {
  print "$s->[$i]{_TEXT} $s->[$i]{href} $t->[$i]{href} \n";
  my $file = "/tmp/$s->[$i]{_TEXT}";
  my $url = $s->[$i]{href};
  my $content = getstore($url, $file);
  die "Couldn't get it!" unless defined $content;
}

最后，您似乎对getstore()返回的内容有些困惑。它返回 HTTP 响应代码。所以它永远不会是未定义的。如果检索内容时出现问题，您将收到 500 或 403 或类似的信息。

perl - 使用 perl lwp linkextractor 下载文件

1 回答 1

Related

Reference