为什么 EscapeDataString 在 .NET 4 和 4.5 之间的行为不同?输出是
Uri.EscapeDataString("-_.!~*'()") => "-_.!~*'()"
Uri.EscapeDataString("-_.!~*'()") => "-_.%21~%2A%27%28%29"
文档
默认情况下,EscapeDataString 方法将除 RFC 2396 未保留字符之外的所有字符转换为其十六进制表示形式。如果启用国际资源标识符 (IRI) 或国际化域名 (IDN) 解析,则 EscapeDataString 方法会将所有字符(RFC 3986 未保留字符除外)转换为其十六进制表示。所有 Unicode 字符在转义之前都会转换为 UTF-8 格式。
作为参考,未保留字符在RFC 2396中定义如下:
unreserved = alphanum | mark
mark = "-" | "_" | "." | "!" | "~" | "*" | "'" |
(" | ")"
在RFC 3986中:
ALPHA / DIGIT / "-" / "." / "_" / "~"
源代码
看起来EscapeDataString的每个字符是否被转义大致是这样确定的
is unicode above \x7F
? PERCENT ENCODE
: is a percent symbol
? is an escape char
? LEAVE ALONE
: PERCENT ENCODE
: is a forced character
? PERCENT ENCODE
: is an unreserved character
? PERCENT ENCODE
在最后检查“是一个非保留字符”时,在 RFC2396 和 RFC3986 之间进行选择。该方法的源代码逐字是
internal static unsafe bool IsUnreserved(char c)
{
if (Uri.IsAsciiLetterOrDigit(c))
{
return true;
}
if (UriParser.ShouldUseLegacyV2Quirks)
{
return (RFC2396UnreservedMarks.IndexOf(c) >= 0);
}
return (RFC3986UnreservedMarks.IndexOf(c) >= 0);
}
该代码指的是
private static readonly UriQuirksVersion s_QuirksVersion =
(BinaryCompatibility.TargetsAtLeast_Desktop_V4_5
// || BinaryCompatibility.TargetsAtLeast_Silverlight_V6
// || BinaryCompatibility.TargetsAtLeast_Phone_V8_0
) ? UriQuirksVersion.V3 : UriQuirksVersion.V2;
internal static bool ShouldUseLegacyV2Quirks {
get {
return s_QuirksVersion <= UriQuirksVersion.V2;
}
}
混乱
文档说 EscapeDataString 的输出取决于是否启用了 IRI/IDN 解析,而源代码说输出由TargetsAtLeast_Desktop_V4_5
. 有人可以解决这个问题吗?