0

我正在尝试将 HTML 页面中的所有唯一电子邮件放入一个数组中。该文件很大,没有真正的模式来获取电子邮件。

这是一个名为 GetEmails.html 的示例 html --- 实际文件将包含 css 和更多要筛选的代码。在此示例中,请注意电子邮件的独特模式。简而言之,并非所有都用空格分隔,但有些用逗号和分号等分隔。

<html>
<body>
<p>This is some text and here is an email me@myemail.com and in this text we will see lots of emails like hello@hotmail.com; mike@hello.com, Bill@John.com or even dot orgs too like ed@wisdom.org and all types such as bill@hot.tv,mary@Mary.us and even Obama@yikes.gov some might be bold Ed@Ed.com and some will look like this Email:<strong>Ed@myemail.com</strong>
</p>
<p><u>There will be pages and pages and pages of text to sift thru so get the emails into an array.</u></p>
<p>This is some text and here is an email me@myemail.com and in this text we will see lots of emails like hello@hotmail.com; mike@hello.com, Bill@John.com or even dot orgs too like ed@wisdom.org and all types such as bill@hot.tv,mary@Mary.us and even Obama@yikes.gov some might be bold Ed@Ed.com and some will look like this Email:<strong>Ed@myemail.com</strong> and repeat This is some text and here is an email me@myemail.com and in this text we will see lots of emails like hello@hotmail.com; mike@hello.com, Bill@John.com or even dot orgs too like ed@wisdom.org and all types such as bill@hot.tv,mary@Mary.us and even Obama@yikes.gov some might be bold Ed@Ed.com and some will look like this Email:<strong>Ed@myemail.com</strong></p>
<p>&nbsp;</p>
</body>
</html>

我想使用带空格的爆炸,但这可能不起作用并且可能会占用太多资源。只是想知道php中是否有一个简单的函数可以帮助我将所有电子邮件放入一个数组中。这是我尝试过的。

<?

$lines = file('GetEmails.html');


foreach ($lines as $line_num => $line) {

/// Finds if line has email.
   if (preg_match('/\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b/si', $line))
{

// Puts that line into an array
$line =  explode(" " , strip_tags($line));

// Finds if one of the itmes has an @ sign
$fl_array = preg_grep("/@/", $line);

// Puts that email in an array
$TheEmails[] = trim($fl_array);

// Puts only the unique emails an an array
$UniqueEmails= array_unique($TheEmails);

?>

但是,上面的代码有效;我将使用的巨大文件恐怕它会不必要地使用资源。它也不会考虑用逗号分隔的电子邮件,例如 ed@ed.com,mike@mike.com

关于最好的方法的任何想法?至少,即使我只能收到用空格等分隔的电子邮件,学习如何以最佳方式做到这一点也会非常有帮助......

希望这是有道理的。非常感谢!

4

1 回答 1

0

这是一个更完整的正则表达式,用于获取任何 RFC5322 有效的电子邮件地址:

使用正则表达式验证电子邮件地址

于 2013-03-22T03:29:54.630 回答