5

我正在构建一个程序来阅读网页。我试着阅读

    http://en.wikipedia.org/wiki/France

但后来我得到了回应:

    HTTP/1.0 301 Moved Permanently.

我无法理解的是新链接(在位置字段中)与我给出的相同......那么重定向页面的新链接在哪里?

这是回应:

    HTTP/1.0 301 Moved Permanently
    Date: Wed, 16 Jan 2013 22:26:03 GMT
    Server: Apache
    X-Content-Type-Options: nosniff
    Cache-Control: private, s-maxage=0, max-age=0, must-revalidate
    Vary: Accept-Encoding,X-Forwarded-Proto,Cookie
    Last-Modified: Wed, 16 Jan 2013 22:26:03 GMT
    Location: http://en.wikipedia.org/wiki/France
    Content-Length: 0
    Content-Type: text/html; charset=utf-8
    X-Cache: MISS from sq64.wikimedia.org
    X-Cache-Lookup: HIT from sq64.wikimedia.org:3128
    Age: 45
    X-Cache: HIT from amssq32.esams.wikimedia.org
    X-Cache-Lookup: HIT from amssq32.esams.wikimedia.org:3128
    X-Cache: MISS from amssq35.esams.wikimedia.org
    X-Cache-Lookup: MISS from amssq35.esams.wikimedia.org:80
    Connection: close

谢谢


根据 Eric 的回答,我再次测试我的程序。

我发送了以下命令:

    GET http://www.wikipedia.org/wiki/france HTTP/1.1

回应是:

    HTTP/1.0 301 Moved Permanently
    Date: Thu, 17 Jan 2013 22:36:04 GMT
    Server: Apache
    Location: http://en.wikipedia.org/wiki/france
    Content-Length: 243
    Content-Type: text/html; charset=iso-8859-1
    X-Cache: MISS from sq64.wikimedia.org
    X-Cache-Lookup: MISS from sq64.wikimedia.org:3128
    X-Cache: MISS from amssq45.esams.wikimedia.org
    X-Cache-Lookup: MISS from amssq45.esams.wikimedia.org:3128
    X-Cache: MISS from knsq26.knams.wikimedia.org
    X-Cache-Lookup: MISS from knsq26.knams.wikimedia.org:80
    Connection: close

    <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
    <html><head>
    <title>301 Moved Permanently</title>
    </head><body>
    <h1>Moved Permanently</h1>
    <p>The document has moved <a href="http://en.wikipedia.org/wiki/france">here</a>.</p>
    </body></html>

这当然是意料之中的。现在我发送了:

    GET http://en.wikipedia.org/wiki/france HTTP/1.1

回应是:

    HTTP/1.0 301 Moved Permanently
    Date: Wed, 16 Jan 2013 22:26:03 GMT
    Server: Apache
    X-Content-Type-Options: nosniff
    Cache-Control: private, s-maxage=0, max-age=0, must-revalidate
    Vary: Accept-Encoding,X-Forwarded-Proto,Cookie
    Last-Modified: Wed, 16 Jan 2013 22:26:03 GMT
    Location: http://en.wikipedia.org/wiki/France
    Content-Length: 0
    Content-Type: text/html; charset=utf-8
    X-Cache: MISS from sq64.wikimedia.org
    X-Cache-Lookup: HIT from sq64.wikimedia.org:3128
    Age: 45
    X-Cache: HIT from amssq32.esams.wikimedia.org
    X-Cache-Lookup: HIT from amssq32.esams.wikimedia.org:3128
    X-Cache: MISS from amssq35.esams.wikimedia.org
    X-Cache-Lookup: MISS from amssq35.esams.wikimedia.org:80
    Connection: close

我尝试使用 wget:

    wget.exe http://en.wikipedia.org/wiki/france   

没关系!页面已加载:

    wget.exe http://en.wikipedia.org/wiki/france
    --2013-01-18 00:43:06--  http://en.wikipedia.org/wiki/france
    Resolving en.wikipedia.org... 91.198.174.225
    Connecting to en.wikipedia.org|91.198.174.225|:80... connected.
    HTTP request sent, awaiting response... 301 Moved Permanently
    Location: http://en.wikipedia.org/wiki/France [following]
    --2013-01-18 00:43:06--  http://en.wikipedia.org/wiki/France
    Reusing existing connection to en.wikipedia.org:80.
    HTTP request sent, awaiting response... 200 OK
    Length: 854896 (835K) [text/html]
    Saving to: `France'

    100%[======================================>] 854,896      573K/s   in 1.5s

    2013-01-18 00:43:08 (573 KB/s) - `France' saved [854896/854896]

那么我的程序有什么问题呢?

4

2 回答 2

1

位置标头是新地址应该出现的位置。在这种情况下,它会导致网络浏览器继续尝试重新加载它,直到它放弃“重定向太多”错误。

如果我使用你上面使用的 URL,我会得到 200 的回报。如果我使用wget --server-response http://wikipedia.org/wiki/France,有趣的部分是:

wget --server-response http://wikipedia.org/wiki/France
--2013-01-16 18:10:59--  http://wikipedia.org/wiki/France
Resolving wikipedia.org... 208.80.152.201, 2620:0:860:ed1a::1
Connecting to wikipedia.org|208.80.152.201|:80... connected.
HTTP request sent, awaiting response... 
  HTTP/1.0 301 Moved Permanently
  Date: Wed, 16 Jan 2013 23:10:59 GMT
  Server: Apache
  Location: http://www.wikipedia.org/wiki/France
  Content-Length: 244
  Content-Type: text/html; charset=iso-8859-1
  X-Pad: avoid browser bug
  X-Cache: MISS from sq65.wikimedia.org
  X-Cache-Lookup: MISS from sq65.wikimedia.org:3128
  X-Cache: MISS from sq64.wikimedia.org
  X-Cache-Lookup: MISS from sq64.wikimedia.org:80
  Connection: keep-alive
Location: http://www.wikipedia.org/wiki/France [following]
--2013-01-16 18:10:59--  http://www.wikipedia.org/wiki/France
Resolving www.wikipedia.org... 208.80.154.225, 2620:0:861:ed1a::1
Connecting to www.wikipedia.org|208.80.154.225|:80... connected.
HTTP request sent, awaiting response... 
  HTTP/1.0 301 Moved Permanently
  Date: Wed, 16 Jan 2013 23:11:00 GMT
  Server: Apache
  Location: http://en.wikipedia.org/wiki/France
  Content-Length: 243
  Content-Type: text/html; charset=iso-8859-1
  X-Cache: MISS from cp1019.eqiad.wmnet
  X-Cache-Lookup: MISS from cp1019.eqiad.wmnet:3128
  X-Cache: MISS from cp1018.eqiad.wmnet
  X-Cache-Lookup: MISS from cp1018.eqiad.wmnet:80
  Connection: keep-alive
Location: http://en.wikipedia.org/wiki/France [following]
--2013-01-16 18:11:00--  http://en.wikipedia.org/wiki/France
Resolving en.wikipedia.org... 208.80.154.225, 2620:0:861:ed1a::1
Reusing existing connection to www.wikipedia.org:80.
HTTP request sent, awaiting response... 
  HTTP/1.0 200 OK

如您所见,wget 正在跟踪从 wikipedia.org 到 www.wikipedia.org 并最终到 en.wikipedia.org 的重定向。我会重新检查您的 URL 并确保您没有使用 www.wikipedia.org。如果不是,那一定是他们服务器上的临时错误。

于 2013-01-16T23:08:35.373 回答
0

问题是我的程序在内部将给定地址更改为小写。这导致链接使用带有小写“f”的“法国”,而不是根据 301 响应所需的大写“F”。

我认为 URL 不区分大小写,但现在我知道了:

“虽然域名不区分大小写,但 URL 的其余部分可能是”(http://www.wisegeek.com/are-urls-case-sensitive.htm)

于 2013-01-17T23:07:21.850 回答