python - Python UTF-16 输出和 Windows 行尾的错误？

Question

使用此代码：

测试.py

import sys
import codecs

sys.stdout = codecs.getwriter('utf-16')(sys.stdout)

print "test1"
print "test2"

然后我运行它：

test.py > test.txt

在 Windows 2000 上的 Python 2.6 中，我发现换行符作为字节序列输出，\x0D\x0A\x00这对于 UTF-16 来说当然是错误的。

我错过了什么，还是这是一个错误？

score 3 · Accepted Answer

尝试这个：

import sys
import codecs

if sys.platform == "win32":
    import os, msvcrt
    msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)

class CRLFWrapper(object):
    def __init__(self, output):
        self.output = output

    def write(self, s):
        self.output.write(s.replace("\n", "\r\n"))

    def __getattr__(self, key):
        return getattr(self.output, key)

sys.stdout = CRLFWrapper(codecs.getwriter('utf-16')(sys.stdout))
print "test1"
print "test2"

score 3 · Accepted Answer

换行翻译发生在标准输出文件中。您正在将“test1\n”写入 sys.stdout（一个 StreamWriter）。StreamWriter 将其转换为“t\x00e\x00s\x00t\x001\x00\n\x00”，并将其发送到真实文件，即原始 sys.stderr。

该文件不知道您已将数据转换为 UTF-16；它所知道的是输出流中的任何 \n 值都需要转换为 \x0D\x0A，这会导致您看到的输出。

score 0 · Accepted Answer

到目前为止，我已经找到了两种解决方案，但没有一种可以提供带有Windows 样式行尾的 UTF-16 输出。

首先，将 Pythonprint语句重定向到具有 UTF-16 编码的文件（输出 Unix 样式的行尾）：

import sys
import codecs

sys.stdout = codecs.open("outputfile.txt", "w", encoding="utf16")

print "test1"
print "test2"

其次，使用 UTF-16 编码重定向到stdout，没有行尾翻译损坏（输出 Unix 风格的行尾）（感谢这个 ActiveState recipe）：

import sys
import codecs

sys.stdout = codecs.getwriter('utf-16')(sys.stdout)

if sys.platform == "win32":
    import os, msvcrt
    msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)

print "test1"
print "test2"

python - Python UTF-16 输出和 Windows 行尾的错误？

3 回答 3

Related

Reference