killSpam()
功能特点:
- 适用于单引号和双引号。
- 无效的 html
- ftp://
- http://
- https://
- 文件://
- 邮寄:
function killSpam($html, $whitelist){
//process html links
preg_match_all('%(<(?:\s+)?a.*?href=["|\'](.*?)["|\'].*?>(.*?)<(?:\s+)?/(?:\s+)?a(?:\s+)?>)%sm', $html, $match, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($match[1]); $i++) {
if(!preg_match("/$whitelist/", $match[1][$i])){
$spamsite = $match[3][$i];
$html = preg_replace("%" . preg_quote($match[1][$i]) . "%", " (SPAM) ", $html);
}
}
//process cleartext links
preg_match_all('/(\b(?:(?:(?:https?|ftp|file):\/\/|www\.|ftp\.)[A-Z0-9+&@#\/%?=~_|$!:,.;-]*[A-Z0-9+&@#\/%=~_|$-]|((?:mailto:)?[A-Z0-9._%+-]+@[A-Z0-9._%-]+\.[A-Z]{2,6})\b)|"(?:(?:https?|ftp|file):\/\/|www\.|ftp\.)[^"\r\n]+"|\'(?:(?:https?|ftp|file):\/\/|www\.|ftp\.)[^\'\r\n]+\')/i', $html, $match2, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($match2[1]); $i++) {
if(!preg_match("/$whitelist/", $match2[1][$i])){
$spamsite = $match2[1][$i];
$html = preg_replace("%" . preg_quote($spamsite) . "%", " (SPAM) ", $html);
}
}
return $html;
}
用法:
$html = <<< LOB
<p>Hello world, thanks to <a href="http://mywebsite.com/about" rel="nofollow">http://mywebsite/about</a> I learned a lot. I found
you on <a href="http://www.bing.com" rel="nofollow">http://www.bing.com</a>, <a href="https://google.com/search" rel="nofollow">https://google.com/search</a> and on some <a href="http://www.spamwebsite.com" rel="nofollow">www.spamwebsite.com/refid=spammer2< /a >. www.spamme.com, http://morespam.com/?aff=122, http://crazyspammer.com/?money=22 and spam@email.com, file://spamfile.com/file.txt ftp://spamftp.com/file.exe </p>
LOB;
$whitelist = "(google\.com|yahoo\.com|bing\.com|nicesite\.com|mywebsite\.com)";
$noSpam = killSpam($html, $whitelist);
echo $noSpam;
垃圾邮件示例:
我无法在此处发布垃圾邮件 HTML,我猜有自己的 killSpam()...- 在http://pastebin.com/HXCkFeGn查看它
世界你好,感谢 http://mywebsite/about 我学到了很多东西。我在 http://www.bing.com、https://google.com/search 和一些 www.spamwebsite.com/refid=spammer2 上找到了你。www.spamme.com、http://morespam.com/?aff=122、http://crazyspammer.com/?money=22 和 spam@email.com、file://spamfile.com/file.txt ftp ://spamftp.com/file.exe
输出:
世界你好,感谢(SPAM)我学到了很多东西。我在http://www.bing.com、https://google.com/search和一些 (SPAM) 上找到了你。(SPAM) , (SPAM) , (SPAM) 和 (SPAM) , (SPAM) (SPAM)
演示:
http://ideone.com/9IxFrB