3

我有一个 webhook 发布到我的 web 应用程序的表单中,我需要解析出电子邮件标题地址。

以下是原文:

Thread-Topic: test subject
Thread-Index: AcwE4mK6Jj19Hgi0SV6yYKvj2/HJbw==
From: "Lastname, Firstname" <firstname_lastname@domain.com>
To: <testto@domain.com>, testto1@domain.com, testto2@domain.com
Cc: <testcc@domain.com>, test3@domain.com
X-OriginalArrivalTime: 27 Apr 2011 13:52:46.0235 (UTC) FILETIME=[635226B0:01CC04E2]

我正在寻找以下内容:

<testto@domain.com>, testto1@domain.com, testto2@domain.com

我整天都在用正则表达式苦苦挣扎,没有任何运气。

4

5 回答 5

6

Contrary to some of the posts here I have to agree with mmutz, you cannot parse emails with a regex... see this article:

https://www.rfc-editor.org/rfc/rfc2822#section-3.4.1

3.4.1. Addr-spec specification

An addr-spec is a specific Internet identifier that contains a locally interpreted string followed by the at-sign character ("@", ASCII value 64) followed by an Internet domain.

The idea of "locally interpreted" means that only the receiving server is expected to be able to parse it.

If I were going to try and solve this I would find the "To" line contents, break it apart and attempt to parse each segment with System.Net.Mail.MailAddress.

    static void Main()
    {
        string input = @"Thread-Topic: test subject
Thread-Index: AcwE4mK6Jj19Hgi0SV6yYKvj2/HJbw==
From: ""Lastname, Firstname"" <firstname_lastname@domain.com>
To: <testto@domain.com>, ""Yes, this is valid""@[emails are hard to parse!], testto1@domain.com, testto2@domain.com
Cc: <testcc@domain.com>, test3@domain.com
X-OriginalArrivalTime: 27 Apr 2011 13:52:46.0235 (UTC) FILETIME=[635226B0:01CC04E2]";

        Regex toline = new Regex(@"(?im-:^To\s*:\s*(?<to>.*)$)");
        string to = toline.Match(input).Groups["to"].Value;

        int from = 0;
        int pos = 0;
        int found;
        string test;
        
        while(from < to.Length)
        {
            found = (found = to.IndexOf(',', from)) > 0 ? found : to.Length;
            from = found + 1;
            test = to.Substring(pos, found - pos);

            try
            {
                System.Net.Mail.MailAddress addy = new System.Net.Mail.MailAddress(test.Trim());
                Console.WriteLine(addy.Address);
                pos = found + 1;
            }
            catch (FormatException)
            {
            }
        }
    }

Output from the above program:

testto@domain.com
"Yes, this is valid"@[emails are hard to parse!]
testto1@domain.com
testto2@domain.com
于 2011-04-27T17:16:25.980 回答
2

符合 RFC 2822 的电子邮件正则表达式是:

(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

只需在您的文本上运行它,您就会得到电子邮件地址。

当然,在 regex 不是最佳选择的情况下,总是可以选择不使用 regex。但取决于你!

于 2011-04-27T15:33:34.413 回答
0

您不能使用正则表达式来解析 RFC2822 邮件,因为它们的语法包含递归产生式(在我的脑海中,它是用于注释(a (nested) comment)的),这使得语法不规则。正则表达式(顾名思义)只能解析正则文法。

有关详细信息,另请参阅RegEx 匹配开放标记(XHTML 自包含标记除外)

于 2011-04-27T15:48:54.017 回答
0

正如 Blindy 建议的那样,有时你可以用老式的方式解析它。

如果您更愿意这样做,假设电子邮件标题文本称为“标题”,这是一种快速方法:

int start = header.IndexOf("To: ");
int end = header.IndexOf("Cc: ");
string x = header.Substring(start, end-start);

我可能会在减法上偏离一个字节,但你可以很容易地测试和修改它。当然,您还必须确定您的标题中始终会有一个 Cc: 行,否则这将不起作用。

于 2011-04-27T15:49:14.600 回答
0

这里有一个使用正则表达式验证电子邮件的细分,它引用了一个更实际的 RFC 2822 实现:

[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?

看起来您只希望将电子邮件地址排除在“收件人”字段之外,并且您还需要担心 <>,因此类似以下内容可能会起作用:

^To: ((?:\<?[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\>?,?(?:\s*))*)

Again, as others having mentioned, you might not want to do this. But if you want regex that will turn that input into <testto@domain.com>, testto1@domain.com, testto2@domain.com, that'll do it.

于 2011-04-27T16:02:08.627 回答