bash - Wget 和 cURL 不适用于 Wikipedia

Question

我正在尝试将特定维基百科文章的来源下载到我的计算机上。但是，wget 和 curl 工具不起作用！我不确定为什么。每当我输入类似wget http://en.wikipedia.org/wiki/List_of_current_NFL_team_rostersorcurl http://en.wikipedia.org/wiki/List_of_current_NFL_team_rosters的内容时，我都会出现乱码（curl 和 wget 也是如此）。

我得到的输出的第一行：??N?????g???????^??L??~???IR?OX/?џ??X???4????b???m??Jk??o߾5E_S???D?xT????y???>??b?C?g?B?#?}????ŏ?Hv?K?dڛ?L˿l?K??,???T?c????n?????F*???'???w??z??d??? ???Y1Id?z?:7C?'W2??(?%>?~ԫ?|~7??4?%qz?r???H?]??P?PH 77I??Z6~{z??UG?~???]?.?#?G?F\????ӓ???8??ߞ?

关于为什么会发生这种情况的任何想法？

score 3 · Accepted Answer

curl --compressed http://en.wikipedia.org/wiki/List_of_current_NFL_team_rosters

wget： http: //www.commandlinefu.com/commands/view/7180/get-gzip-compressed-web-page-using-wget。

score 2 · Accepted Answer

您获得 gzip 数据的原因是因为默认情况下 Wiki 数据以 gzip 格式发送。如果您检查响应的标头（您可以在 Fiddler 之类的工具中执行此操作）

HTTP/1.0 200 OK
Date: Tue, 08 May 2012 03:45:40 GMT
Server: Apache
X-Content-Type-Options: nosniff
Cache-Control: private, s-maxage=0, max-age=0, must-revalidate
Content-Language: en
Vary: Accept-Encoding,Cookie
Last-Modified: Tue, 08 May 2012 02:33:41 GMT
Content-Length: 83464
Content-Type: text/html; charset=UTF-8
Age: 6415
X-Cache: HIT from cp1008.eqiad.wmnet
X-Cache-Lookup: HIT from cp1008.eqiad.wmnet:3128
X-Cache: MISS from cp1018.eqiad.wmnet
X-Cache-Lookup: MISS from cp1018.eqiad.wmnet:80
Connection: close
Content-Encoding: gzip

标题中的最后一行是您所看到内容的线索。因此，您可以从 wiki 流式传输输出并将其通过管道传输到 gunzip 以获得所需的响应。

score 1 · Accepted Answer

我猜你的终端有问题。试试这个：

wget -q -O - http://en.wikipedia.org/wiki/List_of_current_NFL_team_rosters

bash - Wget 和 cURL 不适用于 Wikipedia

3 回答 3

Related

Reference