post - How can I programmatically get the image on this page?

Question

The URL http://www.fourmilab.ch/cgi-bin/Earth shows a live map of the Earth.

If I issue this URL in my browser (FF), the image shows up just fine. But when I try 'wget' to fetch the same page, I fail!

Here's what I tried first:

wget -p http://www.fourmilab.ch/cgi-bin/Earth

Thinking, that probably all other form fields are required too, I did a 'View Source' on the above page, noted down the various field values, and then issued the following URL:

wget --post-data "opt=-p&lat=7°27'&lon=50°49'&ns=North&ew=East&alt=150889769&img=learth.evif&date=1&imgsize=320&daynight=-d" http://www.fourmilab.ch/cgi-bin/Earth

Still no image!

Can someone please tell me what is going on here...? Are there any 'gotchas' with CGI and/or form-POST based wgets? Where (book or online resource) would such concepts be explained?

score 2 · Accepted Answer

如果你检查页面的源代码，里面有一个带有 img 的链接，其中包含地球的图像。例如：

<img
 src="/cgi-bin/Earth?di=570C6ABB1F33F13E95631EFF088262D5E20F2A10190A5A599229"
 ismap="ismap" usemap="#zoommap" width="320" height="320" border="0" alt="" />

如果不提供 'di' 参数，您只是要求整个网页，并引用此图像，而不是图像本身。

编辑：'Di'参数编码你想要接收地球的哪个“部分”，无论如何，尝试例如

wget http://www.fourmilab.ch/cgi-bin/Earth?di=F5AEC312B69A58973CCAB756A12BCB7C47A9BE99E3DDC5F63DF746B66C122E4E4B28ADC1EFADCC43752B45ABE2585A62E6FB304ACB6354E2796D9D3CEF7A1044FA32907855BA5C8F

score 1 · Accepted Answer

1

使用 GET 而不是 POST。对于后台的 CGI 程序，它们完全不同。

于 2009-09-03T11:12:17.473 回答

score 1 · Accepted Answer

继拉瓦德雷之后，

wget -p http://www.fourmilab.ch/cgi-bin/Earth

下载一个包含 <img> 标签的 XHTML 文件。

我编辑了 XHTML 以删除除 img 标记之外的所有内容，并将其转换为包含另一个 wget -p 命令的 bash 脚本，转义 ? 和 =

当我执行此操作时，我得到了一个 14kB 的文件，我将其重命名为 earth.jpg

不是真正的程序化，我这样做的方式，但我认为可以做到。

但正如@somedeveloper 所说，di 值正在变化（因为它取决于时间）。

score 0 · Accepted Answer

伙计们，这就是我最后所做的。对这个解决方案并不完全满意，因为我曾经（现在仍然）希望有更好的方法......可以在第一个 wget 本身上获取图像......给我通过 Firefox 浏览时获得的相同用户体验。

#!/bin/bash

tmpf=/tmp/delme.jpeg
base=http://www.fourmilab.ch
liveurl=$(wget -O - $base/cgi-bin/Earth?opt=-p 2>/dev/null | perl -0777 -nle 'if(m@<img \s+ src \s* = \s* "(/cgi-bin/Earth\?di= .*? )" @gsix) { print "$1\n" }' )
wget -O $tmpf $base/$liveurl &>/dev/null

score 0 · Accepted Answer

您下载的是整个 HTML 页面，而不是图像。要同时下载图像和其他元素，您需要使用--page-requisites（可能还有--convert-links）参数。不幸的是，由于robots.txt不允许访问下的 URL /cgi-bin/，wget 不会下载位于下的图像/cgi-bin/。AFAIK 没有禁用机器人协议的参数。

post - How can I programmatically get the image on this page?

5 回答 5

Related

Reference