image - 从网站下载图片

Question

我想在网站上拥有画廊的本地副本。图库在 domain.com/id/1 显示图片（id 以 1 为增量增加），然后图像存储在 pics.domain.com/pics/original/image.format。图像在 HTML 中的确切行是

<div id="bigwall" class="right"> 
    <img border=0 src='http://pics.domain.com/pics/original/image.jpg' name='pic' alt='' style='top: 0px; left: 0px; margin-top: 50px; height: 85%;'> 
</div>

所以我想写一个脚本来做这样的事情（在伪代码中）：

for(id = 1; id <= 151468; id++) {
     page = "http://domain.com/id/" + id.toString();
     src = returnSrc(); // Searches the html for img with name='pic' and saves the image location as a string
     getImg(); // Downloads the file named in src
}

不过，我不确定该怎么做。我想我可以在 bash 中做到这一点，使用 wget 下载 html，然后手动在 html 中搜索http://pics.domain.com/pics/original/ 。然后再次使用 wget 保存文件，删除 html 文件，增加 id 并重复。唯一的问题是我不擅长处理字符串，所以如果有人能告诉我如何搜索 url 并将 *s 替换为文件名和格式，我应该能够完成剩下的工作。或者，如果我的方法很愚蠢，而您有更好的方法，请分享。

score 25 · Accepted Answer

# get all pages
curl 'http://domain.com/id/[1-151468]' -o '#1.html'

# get all images
grep -oh 'http://pics.domain.com/pics/original/.*jpg' *.html >urls.txt

# download all images
sort -u urls.txt | wget -i-

image - 从网站下载图片

1 回答 1

Related

Reference