我对 python 2.7 中的编码有一些疑问。
1.python代码如下,
#s = u"严"
s = u'\u4e25'
print 's is:', s
print 'len of s is:', len(s)
s1 = "a" + s
print 's1 is:', s1
print 'len of s1 is:', len(s1)
输出是:
s is: 严
len of s is: 1
s1 is: a严
len of s1 is: 2
我很困惑,为什么 lens
是 1,怎么能4e25
存储在 1 个字节中?我还注意到 USC-2 是 2 字节长,USC-4 是 4 字节长,为什么 unicode strings
的长度是 1?
2. (1)新建一个以a.py
notepad++(Windows 7)命名的文件,并设置文件的编码ANSI
,代码a.py
如下:
# -*- encoding:utf-8 -*-
import sys
print sys.getdefaultencoding()
s = "严"
print "s:", s
print "type of s:", type(s)
输出是:
ascii
s: 严
type of s: <type 'str'>
(2)新建一个以b.py
notepad++(Windows 7)命名的文件,并设置文件的编码UTF-8
,代码b.py
如下:
# -*- encoding:gbk -*-
import sys
print sys.getdefaultencoding()
s = "严"
print "s:", s
print "type of s:", type(s)
输出是:
File "D:\pyws\code\\b.py", line 1
SyntaxError: encoding problem: utf-8
(3)修改文件b.py
如下(文件的编码风格为UTF-8
):
import sys
print sys.getdefaultencoding()
s = "严"
print "s:", s
print "type of s:", type(s)
输出是:
ascii
s: 涓
type of s: <type 'str'>
(4)修改文件a.py
如下(文件编码风格为ANSI
):
import sys
print sys.getdefaultencoding()
s = "严"
print "s:", s
print "type of s:", type(s)
输出是:
File "D:\pyws\code\a1.py", line 3
SyntaxError: Non-ASCII character '\xd1' in file D:\pyws\code\a1.py on
line 3, but no encoding declared; see http://www.python.org/peps/pep-0263.html f
or details
为什么问题 2 中这 4 个案例的输出不同?任何人都可以详细弄清楚吗?