regex - 正则表达式以多种编码解析邮件主题

Question

那里！

我想匹配一个邮件主题中的所有内联编码并在 utf8 中构建主题字符串。

一些例子：

[Listname | Topic123] =?utf-8?Q?encodedtext?=
=?iso-8859-1?q?this=20is=20some=20text?=
Klartext-Betreff
[Listname | Topic123] =?utf-8?Q?encodedtext?= =?iso-8859-1?q?this=20is=20some=20text?=
=?ISO-8859-1?B?SWYgeW91IGNhbiByZWFkIHRoaXMgeW8=?=
    =?ISO-8859-2?B?dSB1bmRlcnN0YW5kIHRoZSBleGFtcGxlLg==?=

我还收到了一封带有两种不同编码的邮件（最后一行中的示例）。

在电子邮件中，也可以将主题拆分为多行，其中每一行（第一行除外）以至少一个空格开头

所以我正在寻找一个正则表达式，它解析：

部分+

其中 Part 是以下之一：

带空格的文本
=?charset?encoding?encoded-text?=

我认为它会变成这样：

ENC = (=\?)([A-Za-z0-9-]*)(\?)([A-Za-z0-9-]*)(?)([Any Character])(\?=)
Part = any character that doesnt match to ENC or ENC

score 0 · Accepted Answer

function decode ($string, $source_enc, $dest_enc)
{
    $parts = preg_split (
        '/=\?([^?]+)\?([^?]+)\?([^?]+)\?=/', 
        $string, 
        -1, PREG_SPLIT_DELIM_CAPTURE);

    $result = "";

    for ($i = 0; $i < count ($parts); $i++)
    {
        $part = $parts [$i];

        if ($i % 4 == 0)
            $result .= iconv ($source_enc, $dest_enc, $part);
        else
        {
            $charset = $parts [$i++];
            $encoding = $parts [$i++];
            $text = $parts [$i];

            if ($encoding == 'Q' || $encoding == 'q')
                $text = quoted_printable_decode ($text);
            else if ($encoding == 'B' || $encoding == 'b')
                $text = base64_decode ($text);

            $result .= iconv ($charset, $dest_enc, $text);
        }
    }

    return $result;
}

echo (decode ("=?utf-8?Q?encodedtext?= =?iso-8859-1?q?this=20is=20some=20text?=
=?ISO-8859-1?B?SWYgeW91IGNhbiByZWFkIHRoaXMgeW8=?=
    =?ISO-8859-2?B?dSB1bmRlcnN0YW5kIHRoZSBleGFtcGxlLg==?=", 
    "ISO-8859-1", "ISO-8859-1"));

我的输出是：

encodedtext this is some text If you can read this yo u understand the example.

regex - 正则表达式以多种编码解析邮件主题

1 回答 1

Related

Reference