perl - 使用 Perl WWW::Mechanize::Firefox 在 Firefox 中下载

Question

我有一个我想从不同站点下载的 pdf 文件的 URL 列表。

在我的 Firefox 中，我选择了将 PDF 文件直接保存到特定文件夹的选项。

我的计划是在 perl 中使用 WWW::Mechanize::Firefox 来使用 Firefox 下载每个文件（在列表中 - 一个一个），并在下载后重命名文件。

我使用以下代码来做到这一点：

    use WWW::Mechanize::Firefox;
    use File::Copy;

    # @list contains the list of links to pdf files
    foreach $x (@list) {
        my $mech = WWW::Mechanize::Firefox->new(autoclose => 1);
        $mech->get($x);  #This downloads the file using firefox in desired folder

        opendir(DIR, "output/download");
        @FILES= readdir(DIR);
        my $old = "output/download/$FILES[2]";
        move ($old, $new);  # $new is the URL of the new filename
    }

当我运行该文件时，它会在 Firefox 中打开第一个链接，然后 Firefox 会将文件下载到所需的目录。但是，在那之后，“新标签”没有关闭，文件没有被重命名，代码继续运行（就像它遇到了一个无限循环）并且没有进一步的文件被下载。

这里发生了什么？为什么代码不起作用？如何关闭选项卡并使代码读取列表中的所有文件？有没有其他的下载方式？

score 2 · Accepted Answer

解决了这个问题。

功能，

$mech->get()

等待 Firefox 在页面加载时触发 'DOMContentLoaded' Firefox 事件。由于我已将 Firefox 设置为自动下载文件，因此没有加载页面。因此，“DOMContentLoaded”事件从未被触发。这导致我的代码暂停。

我使用以下选项将该功能设置为不等待页面加载

$mech->get($x, synchronize => 0);

在此之后，我添加了 60 秒延迟以允许 Firefox 在代码进行之前下载文件

sleep 60;

因此，我的最终代码看起来像

use WWW::Mechanize::Firefox;
use File::Copy;

# @list contains the list of links to pdf files
foreach $x (@list) {
    my $mech = WWW::Mechanize::Firefox->new(autoclose => 1);

    $mech->get($x, synchronize => 0);
    sleep 60;

    opendir(DIR, "output/download");
    @FILES= readdir(DIR);
    my $old = "output/download/$FILES[2]";
    move ($old, $new);  # $new is the URL of the new filename
}

score 1 · Accepted Answer

如果我对您的理解正确，您将获得实际 pdf 文件的链接。在这种情况下，WWW::Mechanize 很可能比 WWW::Mechanize::Firefox 更容易。事实上，我认为几乎总是如此。再说一遍，看浏览器工作肯定更酷。

use strict;
use warnings;

use WWW::Mechanize;

# your code here
# loop

    my $mech = WWW::Mechanize->new();    # Could (should?) be outside of the loop.
    $mech->agent_alias("Linux Mozilla"); # Optionally pretend to be whatever you want.

    $mech->get($link);
    $mech->save_content("$new");

#end of the loop

如果这绝对不是您想要的，我的封面故事将是我不想破坏我的 666 代表！

perl - 使用 Perl WWW::Mechanize::Firefox 在 Firefox 中下载

2 回答 2

Related

Reference