7

很明显,Web 服务器必须解码任何转义的未保留字符(例如字母数字等)才能进行 URI 比较。例如,http://www.example.com/~user/index.htm应与 相同http://www.example.com/%7Euser/index.htm

我的问题是,我们将如何处理转义的保留字符?

一个例子是%2F/。如果%2F请求 URI 中有一个,Web 服务器的解析器是否应该将其替换为/? 在上面的例子中,这意味着这http://www.example.com/~user%2Findex.htm将与http://www.example.com/~user/index.htm? 虽然我在 Apache 服务器(2.2.17 Unix)上尝试过它,但它看起来给出了“404 Not Found”错误。

那么这是否意味着%2F其他转义的保留字符应该被单独留下(至少在 URI 比较之前)?

背景资料:

RFC 2616 (HTTP 1.1) 中有两处提到转义解码问题:

Request-URI 以第 3.2.1 节中指定的格式传输。如果使用“% HEX HEX”编码 [42] 对 Request-URI 进行编码,则源服务器必须解码 Request-URI 以正确解释请求。服务器应该使用适当的状态码来响应无效的请求 URI。

“保留”和“不安全”集中的字符(参见 RFC 2396 [42])中的字符等价于它们的““%”HEX HEX 编码。

(根据http://trac.tools.ietf.org/wg/httpbis/trac/ticket/2 “不安全”是一个错误,应从规范中删除。所以我们在这里只看“保留”。)

仅供参考,RFC 2396 中此类字符的定义:

保留=“;” | "/" | “?” | “:” | "@" | "&" | “=” | "+" | "$" | ","

无保留 = 字母数字 | 标记

标记 = "-" | "_" | “。” | “!” | "~" | "*" | "'" | "(" | ")"

4

1 回答 1

3

tl;dr:

Decode percent-encoded unreserved characters,
keep percent-encoded reserved characters.


The URI standard is STD 66, which currently is RFC 3986.

Section 6 is about Normalization and Comparison, where section 6.2.2.2 explains what to do with percent-encoded octets:

These URIs should be normalized by decoding any percent-encoded octet that corresponds to an unreserved character […]

As explicitly stated in section 2 (bold emphasis mine):

  • Unreserved characters:

    URIs that differ in the replacement of an unreserved character with its corresponding percent-encoded US-ASCII octet are equivalent

  • Reserved characters:

    URIs that differ in the replacement of a reserved character with its corresponding percent-encoded octet are not equivalent.

于 2015-04-15T18:02:25.970 回答