1

我想利用简单的 HTML DOM 解析器在 html 站点的内容中搜索邮件地址并替换它们。

替换包含一个span元素和一点点 JS(这应该混淆地址。

目前,它的工作原理如下:

        $pattern =
            "/(?:[a-z0-9!#$%&'*+=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+=?^_`{|}~-]+)*|\"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*\")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])/";

        preg_match_all( $pattern, $content, $matches );

        foreach ( $matches[ 0 ] as $email ) {
             $content = $this->searchDOM(
                $content,
                $email,
                $this->hide_email($email)
            );
        }

这是searchDOM-方法:

private function searchDOM( $content, $search, $replace, $excludedParents = [] )
{
    $dom = HtmlDomParser::str_get_html(
        $content,
        true,
        true,
        DEFAULT_TARGET_CHARSET,
        false,
        DEFAULT_BR_TEXT,
        DEFAULT_SPAN_TEXT
    );

    foreach ( $dom->find( 'text' ) as $element ) {

        if ( !in_array( $element->parent()->tag, $excludedParents ) ) {
            $element->innertext = preg_replace(
                '/(?<!\w)' . preg_quote( $search, "/" ) . '(?!\w)/i',
                $replace,
                $element->innertext
            );
        }
    }

    return $dom->save();
}

这是 hide_email 方法:

function hide_email( $email )

{
    $character_set = '+-.0123456789@ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz';

    $key         = str_shuffle( $character_set );
    $cipher_text = '';
    $id          = 'e' . rand( 1, 999999999 );

    for ( $i = 0; $i < strlen( $email ); $i += 1 )
        $cipher_text .= $key[ strpos( $character_set, $email[ $i ] ) ];

    $script = 'var a="' . $key . '";var b=a.split("").sort().join("");var c="' . $cipher_text . '";var d="";';

    $script .= 'for(var e=0;e<c.length;e++)d+=b.charAt(a.indexOf(c.charAt(e)));';

    $script .= 'document.getElementById("' . $id . '").innerHTML="<a href=\\"mailto:"+d+"\\">"+d+"</a>"';

    $script = "eval(\"" . str_replace( [ "\\", '"' ], [ "\\\\", '\"' ], $script ) . "\")";

    $script = '<script type="text/javascript">/*<![CDATA[*/' . $script . '/*]]>*/</script>';

    return '<span id="' . $id . '">[javascript protected email address]</span>' . $script;

}

好吧 - 这没有按预期工作,因为呈现的页面仅显示“[javascript protected email address]”。如果我查看源代码,a则缺少 -tag。

在此处输入图像描述

4

0 回答 0