2

If we type into firefox or chrome

http://☃.net/

It takes us to

http://xn--n3h.net/

Which is a mirror of unicodesnowmanforyou.com

What I don't understand is by what rules the unicode snowman can decode to xn--n3h, it doesn't look anything like utf-8 or urlencoding.

I think I found a hint while mucking around in python3, because:

>>> '☃'.encode('punycode')
b'n3h'

But I still don't understand the xn-- part. How are domain names internationalised, what is the standard and where is this stuff documented?

4

1 回答 1

4

它使用一种称为 Punycode 的编码方案(正如您已经从您所做的 Python 测试中发现的那样),能够以纯 ASCII 格式表示 Unicode 字符。

get.me.a.coffee.com每个包含 Unicode 字符的标签(由点分隔,因此有五个标签)以 Punycode 编码并以字符串 为前缀xn--

标签编码首先复制所有 ASCII 字符,然后附加编码的 Unicode 字符。Unicode 字符始终-位于标签中的最后一个字符之后,因此如果需要,可以在 ASCII 字符之后添加一个。

更多细节可以在 w3 站点上的这个页面和RFC 3987中找到。有关 Punycode 如何实际编码标签的详细信息,请参阅Wikipedia 页面

于 2015-06-11T02:00:52.987 回答