我正在编写一些 Python 3 代码来获取 NNTP 消息、解析标头并处理数据。我的代码在前几百条消息中运行良好,然后我抛出异常。
例外是:
sys.exc_info()
(<class 'UnicodeDecodeError'>, UnicodeDecodeError('utf-8', b"Ana\xefs's", 3, 4, 'invalid continuation byte'), <traceback object at 0x7fe325261c08>)
问题来自试图解析主题。消息的原始内容是:
{'subject': 'Re: Mme. =?UTF-8?B?QW5h73Mncw==?= Computer Died', 'from': 'Fred Williams <unclefred@webruler.com>', 'date': 'Sun, 05 Aug 2007 18:55:22 -0400', 'message-id': '<13bclaqcdot4s55@corp.supernews.com>', 'references': '<mq0cb35ci3tv53hnahmnognh2rauqpveqb@4ax.com>', ':bytes': '1353', ':lines': '14', 'xref': 'number1.nntp.dca.giganews.com rec.pets.cats.community:171958'}
那个?UTF-8?是我不知道如何处理的。自己呕吐的代码片段是:
for msgId, msg in overviews:
print(msgId)
hdrs = {}
if msgId == 171958:
print(msg)
try:
for k in msg.keys():
hdrs[k] = nntplib.decode_header(msg[k])
except:
print('Unicode error!')
continue