0

我试图弄清楚如何只获取电子邮件的文本部分。使用以下代码,我可以获得正文,但后面总是跟着电子邮件的 html,我不需要。如何告诉我的脚本忽略 html?

import imaplib
import email

def extract_body(payload):
    if isinstance(payload,str):
        return payload
    else:
        return '\n'.join([extract_body(part.get_payload()) for part in payload])

conn = imaplib.IMAP4_SSL("imap.gmail.com", 993)
conn.login("username", "password")
conn.select()
typ, data = conn.search(None, 'UNSEEN')
try:
    for num in data[0].split():
        typ, msg_data = conn.fetch(num, '(RFC822)')
        for response_part in msg_data:
            if isinstance(response_part, tuple):
                msg = email.message_from_string(response_part[1])
                subject=msg['subject']                   
                print(subject)
                payload=msg.get_payload()
                body=extract_body(payload)
                print(body)
        typ, response = conn.store(num, '+FLAGS', r'(\Seen)')
finally:
    try:
        conn.close()
    except:
        pass
    conn.logout()
4

1 回答 1

0

您正在调用get_payload()多部分容器的每个项目,并将它们串在一起。只需遍历多部分容器中的每个有效负载,然后选择Content-Type您要查找的有效负载。

于 2012-10-05T02:12:55.223 回答