bash - wget维基媒体图片？

Question

我正在尝试通过使用文件命名空间中页面的 URL 从 Wikimedia Commons 下载图像：

wget http://commons.wikimedia.org/wiki/File:A_golden_tree_during_the_golden_season.JPG

我得到的只是一个无法打开的 JPG 文件。但是，当您转到链接时，您实际上看到的是页面而不是图像本身，但是有一个名为“全分辨率”的链接将您发送到真实的图像链接，即：http: //upload.wikimedia.org/wikipedia /commons/9/92/A_golden_tree_during_the_golden_season.JPG

我怎样才能通过只有第一个链接来下载这个文件？

score 2 · Accepted Answer

您可以尝试以下方法：

wget http://commons.wikimedia.org/wiki/File:A_golden_tree_during_the_golden_season.JPG -O output.html; wget $(cat output.html | grep fullMedia | sed 's/\(.*href="\/\/\)\([^ ]*\)\(" class.*\)/\2/g')

第一个wget获取您指定的链接。我浏览了几页，发现高分辨率图像在divclass=fullMedia 下。它解析图像的 url，然后获取该图像。

PS：正如上面所建议的， bash 不是这样做的好方法。你应该看看解析 dom 树的东西。

score 2 · Accepted Answer

提取没有命名空间 ( A_golden_tree_during_the_golden_season.JPG) 的标题并将其传递给Special:Redirect。

wget http://commons.wikimedia.org/wiki/Special:Redirect/file/$( echo 'http://commons.wikimedia.org/wiki/File:A_golden_tree_during_the_golden_season.JPG' | sed 's/.*\/File\:\(.*\)/\1/g' )

score 0 · Accepted Answer

wget http://upload.wikimedia.org/wikipedia/commons/9/92/A_golden_tree_during_the_golden_season.JPG

您正在获取网页而不是图像本身。

score 0 · Accepted Answer

您可以使用以下链接进行检索：https ://upload.wikimedia.org/wikipedia/commons/9/92/A_golden_tree_during_the_golden_season.JPG 即使我遇到了同样的问题，点击图片你会得到上面的链接，我希望这可以帮助

bash - wget维基媒体图片？

4 回答 4

Related

Reference