0

我正在使用以下代码导出特定 gmail 文件夹中的所有电子邮件。

它运行良好,因为它可以提取我期望的所有电子邮件,但它(或我)似乎破坏了 CR / 换行符的编码。

代码:

import imaplib
import email
import codecs
mail = imaplib.IMAP4_SSL('imap.gmail.com')
mail.login('myUser@gmail.com', 'myPassword')  #user / password
mail.list()
mail.select("myFolder") # connect to folder with matching label

result, data = mail.uid('search', None, "ALL") # search and return uids instead
i = len(data[0].split())

for x in range(i):
    latest_email_uid = data[0].split()[x]
    result, email_data = mail.uid('fetch', latest_email_uid, '(RFC822)')
    raw_email = email_data[0][1]
    email_message = email.message_from_string(raw_email)
    save_string = str("C:\\\googlemail\\boxdump\\email_" + str(x) + ".eml") #set to   save location
    myfile = open(save_string, 'a')
    myfile.write(email_message)
    myfile.close()

我的问题是,当我到达对象时,它到处都是'= 0A',我假设它被错误地解释为换行符或回车标志。

我可以用十六进制找到它,[d3 03 03 0a] 但因为这不是“字符”,所以我找不到 str.replace() 取出零件的方法。我实际上并不想要换行标志。

我可以将整个字符串转换为十六进制,并做一个替换排序/正则表达式的事情,但这似乎有点过头了——当问题在于源数据的编码/读取时

我所看到的:

====
CAUTION:  This email message and any attachments con= tain information that may be confidential and may be LEGALLY PRIVILEGED. If yo= u are not the intended recipient, any use, disclosure or copying of this messag= e or attachments is strictly prohibited. If you have received this email messa= ge in error please notify us immediately and erase all copies of the message an= d attachments. Thank you.
==== 

我想要的是:

====
CAUTION:  This email message and any attachments contain information that may be confidential and may be LEGALLY PRIVILEGED. If you are not the intended recipient, any use, disclosure or copying of this message or attachments is strictly prohibited. If you have received this email message in error please notify us immediately and erase all copies of the message and attachments. Thank you.
====  
4

2 回答 2

2

您正在查看的是Quoted Printable编码。

尝试改变:

email_message = email.message_from_string(raw_email)

到:

email_message = str(email.message_from_string(raw_email)).decode("quoted-printable")

有关详细信息,请参阅Python 编解码器模块中的标准编码

于 2011-12-06T01:41:27.817 回答
0

只有 2 个额外的项目已经考虑了这一天的痛苦。1 在有效负载级别执行此操作,以便您可以处理 email_message 以从您的邮件中获取电子邮件地址等。

2 您还需要解码字符集,我在人们将网页中的 html 和 word 文档等内容复制并粘贴到我当时尝试处理的电子邮件中时遇到了麻烦。

if maintype == 'multipart':
                    for part in email_message.get_payload():
                            if part.get_content_type() == 'text/plain':
                                text += part.get_payload().decode("quoted-printable").decode(part.get_content_charset())

希望这对某人有帮助!

戴夫

于 2013-07-01T10:52:04.700 回答