5
    #!/usr/bin/php -q
    <?php
    $savefile = "savehere.txt";
    $sf = fopen($savefile, 'a') or die("can't open file");
    ob_start();

    // read from stdin
    $fd = fopen("php://stdin", "r");
    $email = "";
    while (!feof($fd)) {
        $email .= fread($fd, 1024);
    }
    fclose($fd);
    // handle email
    $lines = explode("\n", $email);

    // empty vars
    $from = "";
    $subject = "";
    $headers = "";
    $message = "";
    $splittingheaders = true;

    for ($i=0; $i < count($lines); $i++) {
        if ($splittingheaders) {
            // this is a header
            $headers .= $lines[$i]."\n";

            // look out for special headers
            if (preg_match("/^Subject: (.*)/", $lines[$i], $matches)) {
                $subject = $matches[1];
            }
            if (preg_match("/^From: (.*)/", $lines[$i], $matches)) {
                $from = $matches[1];
            }
            if (preg_match("/^To: (.*)/", $lines[$i], $matches)) {
                $to = $matches[1];
            }
        } else {
            // not a header, but message
            $message .= $lines[$i]."\n";




        }

        if (trim($lines[$i])=="") {
            // empty line, header section has ended
            $splittingheaders = false;
        }
    }
/*$headers is ONLY included in the result at the last section of my question here*/
    fwrite($sf,"$message");
    ob_end_clean();
    fclose($sf);
    ?>

这是我尝试的一个例子。问题是我在文件中得到了太多。这是写入文件的内容:(如您所见,我刚刚向它发送了一堆垃圾)

From xxxxxxxxxxxxx Tue Sep 07 16:26:51 2010
Received: from xxxxxxxxxxxxxxx ([xxxxxxxxxxx]:3184 helo=xxxxxxxxxxx)
    by xxxxxxxxxxxxx with esmtpa (Exim 4.69)
    (envelope-from <xxxxxxxxxxxxxxxx>)
    id 1Ot4kj-000115-SP
    for xxxxxxxxxxxxxxxxxxx; Tue, 07 Sep 2010 16:26:50 -0400
Message-ID: <EE3B7E26298140BE8700D9AE77CB339D@xxxxxxxxxxx>
From: "xxxxxxxxxxxxx" <xxxxxxxxxxxxxx>
To: <xxxxxxxxxxxxxxxxxxxxx>
Subject: stackoverflow is helping me
Date: Tue, 7 Sep 2010 16:26:46 -0400
MIME-Version: 1.0
Content-Type: multipart/alternative;
    boundary="----=_NextPart_000_0169_01CB4EA9.773DF5E0"
X-Priority: 3
X-MSMail-Priority: Normal
Importance: Normal
X-Mailer: Microsoft Windows Live Mail 14.0.8089.726
X-MIMEOLE: Produced By Microsoft MimeOLE V14.0.8089.726

This is a multi-part message in MIME format.

------=_NextPart_000_0169_01CB4EA9.773DF5E0
Content-Type: text/plain;
    charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

111
222
333
444
------=_NextPart_000_0169_01CB4EA9.773DF5E0
Content-Type: text/html;
    charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META content=3Dtext/html;charset=3Diso-8859-1 =
http-equiv=3DContent-Type>
<META name=3DGENERATOR content=3D"MSHTML 8.00.6001.18939"></HEAD>
<BODY style=3D"PADDING-LEFT: 10px; PADDING-RIGHT: 10px; PADDING-TOP: =
15px"=20
id=3DMailContainerBody leftMargin=3D0 topMargin=3D0 =
CanvasTabStop=3D"true"=20
name=3D"Compose message area">
<DIV><FONT face=3DCalibri>111</FONT></DIV>
<DIV><FONT face=3DCalibri>222</FONT></DIV>
<DIV><FONT face=3DCalibri>333</FONT></DIV>
<DIV><FONT face=3DCalibri>444</FONT></DIV></BODY></HTML>

------=_NextPart_000_0169_01CB4EA9.773DF5E0--

我在四处搜索时发现了这一点,但不知道如何实现或在我的代码中插入的位置,或者它是否可以工作。

preg_match("/boundary=\".*?\"/i", $headers, $boundary);
$boundaryfulltext = $boundary[0];

if ($boundaryfulltext!="")
{
$find = array("/boundary=\"/i", "/\"/i");
$boundarytext = preg_replace($find, "", $boundaryfulltext);
$splitmessage = explode("--" . $boundarytext, $message);
$fullmessage = ltrim($splitmessage[1]);
preg_match('/\n\n(.*)/is', $fullmessage, $splitmore);

if (substr(ltrim($splitmore[0]), 0, 2)=="--")
{
$actualmessage = $splitmore[0];
}
else
{
$actualmessage = ltrim($splitmore[0]);
}

}
else
{
$actualmessage = ltrim($message);
}

$clean = array("/\n--.*/is", "/=3D\n.*/s");
$cleanmessage = trim(preg_replace($clean, "", $actualmessage)); 

那么,我怎样才能将电子邮件的纯文本区域放入我的文件或脚本中以进行进一步处理?

提前致谢。堆栈溢出很棒!

4

2 回答 2

18

为了隔离电子邮件正文的纯文本部分,您必须采取四个步骤:

1.获取MIME边界字符串

我们可以使用正则表达式来搜索您的标题(假设它们在一个单独的变量中$headers):

$matches = array();
preg_match('#Content-Type: multipart\/[^;]+;\s*boundary="([^"]+)"#i', $headers, $matches);
list(, $boundary) = $matches;

正则表达式将搜索Content-Type包含边界字符串的标头,然后将其捕获到第一个捕获组中。然后我们将该捕获组复制到变量$boundary中。

2. 将邮件正文分割成段

一旦我们有了边界,我们就可以将正文分割成不同的部分(在您的消息正文中,正文将在--每次出现时作为前缀)。根据MIME 规范,第一个边界之前的所有内容都应该被忽略。

$email_segments = explode('--' . $boundary, $message);
array_shift($email_segments); // drop everything before the first boundary

这将为我们留下一个包含所有段的数组,忽略第一个边界之前的所有内容。

3. 确定哪个段是纯文本。

纯文本段将具有Content-TypeMIME-type 的标题text/plain。我们现在可以在每个段中搜索具有该标头的第一个段:

foreach ($email_segments as $segment)
{
  if (stristr($segment, "Content-Type: text/plain") !== false)
  {
    // We found the segment we're looking for!
  }
}

由于我们正在寻找的是一个常量,我们可以使用stristr(它在字符串中查找子字符串的第一个实例,不区分大小写)而不是正则表达式。如果Content-Type找到标题,我们就得到了我们的段。

4.从段中删除任何标题

现在我们需要从我们找到的段中删除任何标题,因为我们只想要实际的消息内容。这里可以出现四个MIME 标头Content-Type正如我们之前看到的Content-IDContent-DispositionContent-Transfer-Encoding. 标头终止于,\r\n因此我们可以使用它来确定标头的结尾:

$text = preg_replace('/Content-(Type|ID|Disposition|Transfer-Encoding):.*?\r\n/is', "", $segment);

正则表达式末尾的修饰符s 使点匹配任何换行符。.*?将收集尽可能少的字符(即所有字符\r\n);the是 .上?的一个惰性修饰符.*

而在此之后,$text将包含您的电子邮件内容。

所以把它和你的代码放在一起:

<?php
// read from stdin
$fd = fopen("php://stdin", "r");
$email = "";
while (!feof($fd))
{
    $email .= fread($fd, 1024);
}
fclose($fd);

$matches = array();
preg_match('#Content-Type: multipart\/[^;]+;\s*boundary="([^"]+)"#i', $email, $matches);
list(, $boundary) = $matches;

$text = "";
if (isset($boundary) && !empty($boundary)) // did we find a boundary?
{
  $email_segments = explode('--' . $boundary, $email);

  foreach ($email_segments as $segment)
  {
    if (stristr($segment, "Content-Type: text/plain") !== false)
    {
      $text = trim(preg_replace('/Content-(Type|ID|Disposition|Transfer-Encoding):.*?\r\n/is', "", $segment));
      break;
    }
  }
}

// At this point, $text will either contain your plain text body,
// or be an empty string if a plain text body couldn't be found.

$savefile = "savehere.txt";
$sf = fopen($savefile, 'a') or die("can't open file");
fwrite($sf, $text);
fclose($sf);
?>
于 2010-09-07T21:03:44.647 回答
0

这里有一个答案:

您只需要更改以下 2 行:

require_once('/path/to/class/rfc822_addresses.php');
require_once('/path/to/class/mime_parser.php');
于 2012-12-21T23:42:34.090 回答