我正在使用以下代码导出特定 gmail 文件夹中的所有电子邮件。
它运行良好,因为它可以提取我期望的所有电子邮件,但它(或我)似乎破坏了 CR / 换行符的编码。
代码:
import imaplib
import email
import codecs
mail = imaplib.IMAP4_SSL('imap.gmail.com')
mail.login('myUser@gmail.com', 'myPassword') #user / password
mail.list()
mail.select("myFolder") # connect to folder with matching label
result, data = mail.uid('search', None, "ALL") # search and return uids instead
i = len(data[0].split())
for x in range(i):
latest_email_uid = data[0].split()[x]
result, email_data = mail.uid('fetch', latest_email_uid, '(RFC822)')
raw_email = email_data[0][1]
email_message = email.message_from_string(raw_email)
save_string = str("C:\\\googlemail\\boxdump\\email_" + str(x) + ".eml") #set to save location
myfile = open(save_string, 'a')
myfile.write(email_message)
myfile.close()
我的问题是,当我到达对象时,它到处都是'= 0A',我假设它被错误地解释为换行符或回车标志。
我可以用十六进制找到它,[d3 03 03 0a] 但因为这不是“字符”,所以我找不到 str.replace() 取出零件的方法。我实际上并不想要换行标志。
我可以将整个字符串转换为十六进制,并做一个替换排序/正则表达式的事情,但这似乎有点过头了——当问题在于源数据的编码/读取时
我所看到的:
====
CAUTION: This email message and any attachments con= tain information that may be confidential and may be LEGALLY PRIVILEGED. If yo= u are not the intended recipient, any use, disclosure or copying of this messag= e or attachments is strictly prohibited. If you have received this email messa= ge in error please notify us immediately and erase all copies of the message an= d attachments. Thank you.
====
我想要的是:
====
CAUTION: This email message and any attachments contain information that may be confidential and may be LEGALLY PRIVILEGED. If you are not the intended recipient, any use, disclosure or copying of this message or attachments is strictly prohibited. If you have received this email message in error please notify us immediately and erase all copies of the message and attachments. Thank you.
====