linux - procmail 方法来删除页脚

Question

我在做 procmail 配方时遇到了一些问题。

这是我到目前为止所得到的：

  :0
     * ^X-Loop: myemail@gmail\.com
     /dev/null

     :0

    # filtering email by number 60
     * ^Subject:.*(60)
    {
      :0c:
      ${DEFAULT}

      #trying to take out input from the body
      :0fb
      | head -10

      #Forward it to the other folder
      :0
      mytest/
      }

procmail 读取电子邮件正文时会出现问题。它将显示如下输出：

   +96szV6aBDlD/F7vuiK8fUYVknMQPfPmPNikB+fdYLvbwsv9duz6HQaDuwhGn6dh9w2U
   1sABcykpdyfWqWhLt5RzCqppYr5I4yCmB1CNOKwhlzI/w8Sx1QTzGT32G/ERTlbr91BM VmNQ==
   MIME-Version: 1.0
   Received: by 10.52.97.41 with SMTP id dx9mr14500007vdb.89.1337845760664; Thu,
   24 May 2012 00:49:20 -0700 (PDT)
   Received: by 10.52.34.75 with HTTP; Thu, 24 May 2012 00:49:20 -0700 (PDT)
   Date: Thu, 24 May 2012 15:49:20 +0800
   Message-ID: <CAE1Fe-r4Lid+YSgFTQdpsniE_wzeGjETWLLJJxat+HK94u1=AQ@mail.gmail.com>
   Subject: 60136379500
   From: my email <my email@gmail.com>
   To: your email <your email@gmail.com>
   Content-Type: multipart/alternative; boundary=20cf307f380654240604c0c37d07

   --20cf307f380654240604c0c37d07
   Content-Type: text/plain; charset=ISO-8859-1

   hi
   there
   how
   are
   you

   --20cf307f380654240604c0c37d07
   +96szV6aBDlD/F7vuiK8fUYVknMQPfPmPNikB+fdYLvbwsv9duz6HQaDuwhGn6dh9w2U
   1sABcykpdyfWqWhLt5RzCqppYr5I4yCmB1CNOKwhlzI/w8Sx1QTzGT32G/ERTlbr91BM VmNQ==

我设法获得了输出，但如果发件人发送的行数少于 3 行，则它不起作用，因为输出也会打印出电子邮件的页脚（因为它在 head -10 的范围之间）。

我只希望在 procmail 中过滤电子邮件正文（在文本文件中打印）。有可能吗？谁能给我指路？我束手无策。谢谢

score 1 · Accepted Answer

试图将 MIME 多部分视为只是一个文本块是充满危险的。为了正确处理正文，您应该使用 MIME 感知工具。但是，如果您只想假设第一部分是文本部分并删除所有其他部分，您可以创建一些相当简单和健壮的东西。

# Truncate everything after first body part:
# Change second occurrence of --$MATCH to --$MATCH--
# and trim anything after it
:0fb
* ^Content-type: multipart/[a-z]+; boundary="\/[^"]+
| sed -e "1,/^--$MATCH$/b" -e "/^--$MATCH$/!b" -e 's//&--/' -eq

对于优雅点，您也许可以开发脚本来同时实现您的 10 行正文截断操作，但至少，这应该可以帮助您入门。（此时我会切换到awkPerl。）

:0fb
* ^Content-type: multipart/[a-z]+; boundary="\/[^"]+
| awk -v "b=--$MATCH" ' \
    ($0 == b || $0 == b "--") && seen++ { printf "%s--\n", $0; exit } \
    !seen || p++ < 10'

正确地，MIME 部分的标题不应计入行数。

这有点投机；我假设“页脚”是指在第一个正文部分之后的丑陋的 base64 编码附件，当然，这个秘诀对于单部分消息根本没有任何作用。也许你想回到你原来的食谱。

score 0 · Accepted Answer

最近有一个类似的问题并用这个解决了它（适应于OP）......

#trying to take out input from the body
:0fb
| sed -n '/^Content-Type/,/^--/ { /^Content-Type/b; /^--/b; p }'

解释：一般形式......

sed -n '/begin/,/end/ { /begin/b; /end/b; p }'

-n:         --> turn printing off
/begin/     --> begin of pattern range (remainder commands only apply inside range)
,/end/      --> , end of sed pattern range
{ /begin/b; --> /b branch causes lines with pattern /begin/ to skip remaining commands
/end/b;     --> (same as above), these lines will skip the upcoming (p)rint command
p }'        --> prints lines that in pattern that made it to this command

linux - procmail 方法来删​​除页脚

2 回答 2

Related

Reference

linux - procmail 方法来删除页脚