4

如何说服email.generator.Generator在 Python 3.2 中使用二进制文件?这似乎正是policyPython 3.3 中引入的框架的用例,但我希望我的代码在 3.2 中运行。

from email.parser import Parser
from email.generator import Generator
from io import BytesIO, StringIO

data = "Key: \N{SNOWMAN}\r\n\r\n"
message = Parser().parse(StringIO(data))
with open("/tmp/rfc882test", "w") as out:
    Generator(out, maxheaderlen=0).flatten(message)

失败UnicodeEncodeError: 'ascii' codec can't encode character '\u2603' in position 0: ordinal not in range(128)

4

2 回答 2

4

您的数据不是有效的 RFC2822 标头,我怀疑这会误导您。它是一个 Unicode 字符串,但 RFC2822 始终只有 ASCII。要使用非 ASCII 字符,您需要使用字符集和 base64 或带引号的可打印编码对它们进行编码。

因此,有效的代码是这样的:

from email.parser import Parser
from email.generator import Generator
from io import BytesIO, StringIO

data = "Key: =?utf8?b?4piD?=\r\n\r\n"
message = Parser().parse(StringIO(data))
with open("/tmp/rfc882test", "w") as out:
    Generator(out, maxheaderlen=0).flatten(message)

这当然完全避免了错误。

问题是如何生成这样的标头,=?utf8?b?4piD?=答案在于email.header模块。

我做了这个例子:

>>> from email import header
>>> header.Header('\N{SNOWMAN}', 'utf8').encode()
'=?utf8?b?4piD?='

要处理具有Key: Value格式的文件,电子邮件模块是错误的解决方案。如果没有电子邮件模块,处理此类文件就很容易了,而且您不必绕过 RF2822 的限制。例如:

# -*- coding: UTF-8 -*-
import io
import sys
if sys.version_info > (3,):
    def u(s): return s
else:
    def u(s): return s.decode('unicode-escape')

def parse(infile):
    res = {}
    payload = ''

    for line in infile:
        key, value = line.strip().split(': ',1)
        if key in res:
            raise ValueError(u("Key {0} appears twice").format(key))
        res[key] = value
    return res

def generate(outfile, data):
    for key in data:
        outfile.write(u("{0}: {1}\n").format(key, data[key]))


if __name__ == "__main__":
    # Ensure roundtripping:
    data = {u('Key'): u('Value'), u('Foo'): u('Bar'), u('Frötz'): u('Öpöpöp')}
    with io.open('/tmp/outfile.conf', 'wt', encoding='UTF8') as outfile:
        generate(outfile, data)

    with io.open('/tmp/outfile.conf', 'rt', encoding='UTF8') as infile:
        res = parse(infile)

    assert data == res

该代码花了 15 分钟编写,并且适用于 Python 2 和 Python 3。如果您想要行继续等,也很容易添加。

是一个更完整的支持评论等。

于 2012-08-19T19:16:45.240 回答
0

一个有用的解决方案来自http://mail.python.org/pipermail/python-dev/2010-October/104409.html

from email.parser import Parser
from email.generator import BytesGenerator

# How do I get surrogateescape from a BytesIO/StringIO?
data = "Key: \N{SNOWMAN}\r\n\r\n" # write this to headers.txt
headers = open("headers.txt", "r", encoding="ascii", errors="surrogateescape")
message = Parser().parse(headers)
with open("/tmp/rfc882test", "wb") as out:
    BytesGenerator(out, maxheaderlen=0).flatten(message)

这适用于想要读取和写入二进制Key: value文件而不关心编码的程序。Generator()将标题作为解码文本使用,而不能用 , 将它们写回Parser().parse(open("headers.txt", "r", encoding="utf-8"))应该就足够了。

于 2012-08-20T13:31:04.920 回答