这是一个自我回答的问题,我会重视任何输入/评论/代码审查。我认为它有效,但不确定。我的测试似乎产生了有效的结果,但电子邮件是一种微妙而复杂的野兽。我不确定我的逻辑是否合理。如果您想对其进行测试,请保存原始电子邮件文件并将文件名放入代码中,应该很明显在哪里。有一个更好的方法吗?如果是的话,我很想听听。
Python 2.7 代码。
import email
filename = 'xxx.eml'
with open(filename, 'rb') as f:
msg = email.message_from_file(f)
# count number of attachments in an email
# this determines the 'real' attachments, ie those that a user might have attached to the email
# it does not include the attachments that make up the message content
totalattachments = 0
firsttextattachmentseen = False
lastseenboundary = ''
# .walk steps through all the parts of an email including boundaries and attachments
for part in msg.walk():
if part.is_multipart():
# this is a boundary, not an attachment, so we record it as the last seen boundary and continue to next part
lastseenboundary = part.get_content_type()
continue
if lastseenboundary == 'multipart/alternative':
#for HTML emails, the multipart/alternative part contains the HTML and its alternative
#text representation, so we skip anything within the multipart/alternative boundary
continue
if part.get_content_type() == 'text/plain':
#if this is a plain text email, then the first txt attachment is the message body so we do not
#count it as an attachment
if firsttextattachmentseen == False:
firsttextattachmentseen = True
continue
else:
totalattachments += 1
continue
# any other part we encounter we shall assume is a user added attachment
totalattachments += 1
print(totalattachments, ': ', filename)