1

fff.html 是一封包含电子邮件地址的电子邮件,有些有 href mailto 链接,有些没有,我想抓取它们并将它们输出为以下格式

Lorem@ipsum.com,dolor@sit.com,amet@consectetur.com

我有一个简单的刮刀来获取那些链接href但有些奇怪的东西

  <?php
    $url = "fff.html";
    $raw = file_get_contents($url);

    $newlines = array("\t","\n","\r","\x20\x20","\0","\x0B");
    $content = str_replace($newlines, "", html_entity_decode($raw));

    $start = strpos($content,'<a href="mailto:');
    $end = strpos($content,'"',$start) + 8;
    $mail = substr($content,$start,$end-$start);

    print "$mail<br />";
    ?>

我应该为最初使用 lorem ipsum 加分

4

1 回答 1

3

问题是如果您在 HTML 页面中有多个电子邮件地址怎么办。substr 只会返回第一个实例。这是一个将解析所有电子邮件地址的脚本。您可能需要对其进行一些调整以供使用。它将以您请求的 CSV 格式输出结果。

<?php
$url = "fff.html";
$raw = file_get_contents($url);

$newlines = array("\t","\n","\r","\x20\x20","\0","\x0B");
$content = str_replace($newlines, "", html_entity_decode($raw));

$start = strpos($content, '<body>');
$end = strpos($content, '</body>');
$data = substr($content, $start, $end-$start);

$pattern = '#a[^>]+href="mailto:([^"]+)"[^>]*?>#is';
preg_match_all($pattern, $data, $matches);

foreach ($matches[1] as $key => $email) {
    $emails[] = $email;
}
echo implode(', ', $emails );
?>
于 2010-08-12T20:48:00.137 回答