bash - BASH 脚本：使用 wget 下载连续编号的文件

Question

我有一个 Web 服务器，用于保存编号为 Web 应用程序的日志文件。一个文件名示例是：

dbsclog01s001.log
dbsclog01s002.log
dbsclog01s003.log

最后 3 位数字是计数器，有时可以达到 100。

我通常打开一个网络浏览器，浏览到如下文件：

http://someaddress.com/logs/dbsclog01s001.log

并保存文件。当您获得 50 条日志时，这当然会有点烦人。我试图想出一个使用 wget 和传递的 BASH 脚本

http://someaddress.com/logs/dbsclog01s*.log

但我的脚本有问题。无论如何，有人有关于如何做到这一点的样本吗？

谢谢！

score 63 · Accepted Answer

#!/bin/sh

if [ $# -lt 3 ]; then
        echo "Usage: $0 url_format seq_start seq_end [wget_args]"
        exit
fi

url_format=$1
seq_start=$2
seq_end=$3
shift 3

printf "$url_format\\n" `seq $seq_start $seq_end` | wget -i- "$@"

将以上内容另存为seq_wget，赋予执行权限（chmod +x seq_wget），然后运行，例如：

$ ./seq_wget http://someaddress.com/logs/dbsclog01s%03d.log 1 50

或者，如果你有 Bash 4.0，你可以输入

$ wget http://someaddress.com/logs/dbsclog01s{001..050}.log

或者，如果您有curl而不是wget，您可以按照丹尼斯威廉姆森的回答。

score 43 · Accepted Answer

curl似乎支持范围。从man页面：

网址  
       URL 语法取决于协议。你会发现一个详细的描述-
       RFC 3986 中的化。

       您可以通过编写部分集来指定多个 URL 或部分 URL
       在大括号内，如：

        http://site.{一、二、三}.com

       或者您可以使用 [] 获取字母数字系列的序列，如下所示：

        ftp://ftp.numericals.com/file[1-100].txt
        ftp://ftp.numericals.com/file[001-100].txt（带前导零）
        ftp://ftp.letters.com/file[az].txt

       目前不支持序列的嵌套，但您可以使用
       几个并排的：

        http://any.org/archive[1996-1999]/vol[1-4]/part{a,b,c}.html

       您可以在命令行上指定任意数量的 URL。他们会
       以指定的顺序按顺序获取。

       从 curl 7.15.1 开始，您还可以为范围指定步数，所以
       您可以获得每第 N 个数字或字母：

        http://www.numericals.com/file[1-100:10].txt
        http://www.letters.com/file[az:2].txt

您可能已经注意到它说“带有前导零”！

score 19 · Accepted Answer

您可以在 wget url 中使用 echo 类型序列来下载一串数字...

wget http://someaddress.com/logs/dbsclog01s00{1..3}.log

这也适用于字母

{a..z} {A..Z}

score 14 · Accepted Answer

不确定您遇到了什么问题，但听起来 bash 中的一个简单的 for 循环会为您解决问题。

for i in {1..999}; do
wget -k http://someaddress.com/logs/dbsclog01s$i.log -O your_local_output_dir_$i;
done

score 12 · Accepted Answer

您可以将bash中的 for 循环与 printf命令结合使用（当然可以根据需要进行修改echo）wget：

$ for i in {1..10}; do echo "http://www.com/myurl`printf "%03d" $i`.html"; done
http://www.com/myurl001.html
http://www.com/myurl002.html
http://www.com/myurl003.html
http://www.com/myurl004.html
http://www.com/myurl005.html
http://www.com/myurl006.html
http://www.com/myurl007.html
http://www.com/myurl008.html
http://www.com/myurl009.html
http://www.com/myurl010.html

score 3 · Accepted Answer

有趣的任务，所以我为你写了完整的脚本（结合了几个答案等等）。这里是：

#!/bin/bash
# fixed vars
URL=http://domain.com/logs/     # URL address 'till logfile name
PREF=logprefix                  # logfile prefix (before number)
POSTF=.log                      # logfile suffix (after number)
DIGITS=3                        # how many digits logfile's number have
DLDIR=~/Downloads               # download directory
TOUT=5                          # timeout for quit
# code
for((i=1;i<10**$DIGITS;++i))
do
        file=$PREF`printf "%0${DIGITS}d" $i`$POSTF   # local file name
        dl=$URL$file                                 # full URL to download    
        echo "$dl -> $DLDIR/$file"                   # monitoring, can be commented
        wget -T $TOUT -q $dl -O $file
        if [ "$?" -ne 0 ]                            # test if we finished
        then
                exit
        fi
done

在脚本的开头，您可以设置 URL、日志文件前缀和后缀、编号部分和下载目录的位数。循环将下载它找到的所有日志文件，并在第一个不存在时自动退出（使用 wget 的超时）。

请注意，此脚本假定日志文件索引从 1 开始，而不是 0，如您在示例中提到的。

希望这可以帮助。

score 0 · Accepted Answer

在这里你可以找到一个看起来像你想要的 Perl 脚本

http://osix.net/modules/article/?id=677

#!/usr/bin/perl
$program="wget"; #change this to proz if you have it ;-)
my $count=1; #the lesson number starts from 1
my $base_url= "http://www.und.nodak.edu/org/crypto/crypto/lanaki.crypt.class/lessons/lesson";
my $format=".zip"; #the format of the file to download
my $max=24; #the total number of files to download
my $url;

for($count=1;$count<=$max;$count++) {
    if($count<10) {
    $url=$base_url."0".$count.$format; #insert a '0' and form the URL
    }
    else {
    $url=$base_url.$count.$format; #no need to insert a zero
    }
    system("$program $url");
}

score 0 · Accepted Answer

我刚刚看了wget manpage关于'globbing'的讨论：

默认情况下，如果 URL 包含通配符，通配将打开。此选项可用于永久打开或关闭通配。您可能必须引用 URL 以保护它不被您的 shell 扩展。Globbing 使 Wget 查找特定于系统的目录列表。 这就是为什么它目前仅适用于 Unix FTP 服务器（以及那些模拟 Unix“ls”输出的服务器）。

所以 wget http://... 不适用于通配符。

score 0 · Accepted Answer

检查你的系统是否有seq，那么就很简单了：

for i in $(seq -f "%03g" 1 10); do wget "http://.../dbsclog${i}.log"; done

如果您的系统有 jot 命令而不是 seq：

for i in $(jot -w "http://.../dbsclog%03d.log" 10); do wget $i; done

score 0 · Accepted Answer

哦！这是我在学习 bash 自动下载漫画时遇到的类似问题。

像这样的东西应该工作：

for a in `seq 1 999`; do
if [ ${#a} -eq 1 ]; then
    b="00"
elif [ ${#a} -eq 2 ]; then
    b="0"
fi
echo "$a of 231"
wget -q http://site.com/path/fileprefix$b$a.jpg

完毕

score -1 · Accepted Answer

迟到了，但一个真正简单的不需要编码的解决方案是使用 DownThemAll Firefox 插件，它具有检索文件范围的功能。当我需要下载 800 个连续编号的文件时，这就是我的解决方案。

bash - BASH 脚本：使用 wget 下载连续编号的文件

11 回答 11

Related

Reference