php - 可点击链接的最佳 PHP 脚本

Question

我发现许多 PHP 脚本可以将文本中的 url 转换为可点击的链接。但是它们中的大多数都不起作用，有些会产生大错误。其中一些转换已经可点击的链接。其他人不起作用，第三人从文本链接中制作部分。 我需要一个脚本，它只检测链接，而不是文本，并且不会转换已经可以点击的链接，因为它非常丑陋。

我发现这段代码似乎是我测试过的最好的。但它有一些错误。此代码转换可点击链接。像这样：

原来的：

<a href="http://www.netload.in/dateiySgPP2b14W/1409423417ExpFut.pdf.htm" target="_blank">http://www.netload.in/dateiySgPP2b14W/1409...7ExpFut.pdf.htm</a>

转换：

http://www.netload.in/dateiySgPP2b14W/1409423417ExpFut.pdf.htm" target="_blank">http://www.netload.in/dateiySgPP2b14W/1409...7ExpFut.pdf.htm

这是代码：

function parse_urls($text, $maxurl_len = 35, $target = '_self') // Make URLs Clickable
{
    if (preg_match_all('/((ht|f)tps?:\/\/([\w\.]+\.)?[\w-]+(\.[a-zA-Z]{2,4})?[^\s\r\n\(\)"\'<>\,\!]+)/si', $text, $urls))
    {
        $offset1 = ceil(0.65 * $maxurl_len) - 2;

        $offset2 = ceil(0.30 * $maxurl_len) - 1;

        foreach (array_unique($urls[1]) AS $url)
        {
            if ($maxurl_len AND strlen($url) > $maxurl_len)
            {
                $urltext = substr($url, 0, $offset1) . '...' . substr($url, -$offset2);
            }
            else
            {
                $urltext = $url;
            }

            $text = str_replace($url, '<a href="'. $url .'" target="'. $target .'" title="'. $url .'">'. $urltext .'</a>', $text);
        }
    }

    return $text;
}

score 2 · Accepted Answer

I just threw this together.

<?php
function replaceUrlsWithLinks($text){
    $dom = new DOMDocument;
    $dom->loadXML($text);
    $xpath = new DOMXpath($dom);
    $query = $xpath->query('//text()[not(ancestor-or-self::a)]');
    foreach($query as $item){
        $content = $item->textContent;
        if(preg_match_all('/((ht|f)tps?:\/\/([\w\.]+\.)?[\w-]+(\.[a-zA-Z]{2,4})?[^\s\r\n\(\)"\'<>\,\!]+)/si',$content,$matches,PREG_SET_ORDER | PREG_OFFSET_CAPTURE)){
            foreach($matches as $match){
                $newA = $dom->createElement('a',$match[0][0]);
                $newA->setAttribute('href',$match[0][0]);
                $newA->setAttribute('target','_blank');
                $a = $item->splitText($match[0][1]);
                $b = $a->splitText(strlen($match[0][0]));
                $a->parentNode->replaceChild($newA,$a);
            }
        }
    }
    return $dom->saveHtml();
}
// The HTML to process ...
$html = <<<HTML
<block>
<a href="http://google.com">http://google.com</a>
<b>Stuff http://google.com</b>
asdf http://google.com ffaa 
</block>
HTML;
// Process the HTML and echo it out.
echo replaceUrlsWithLinks($html);
?>

The output would be:

<block>
<a href="http://google.com">http://google.com</a>
<b>Stuff <a href="http://google.com" target="_blank">http://google.com</a></b>
asdf <a href="http://google.com" target="_blank">http://google.com</a> ffaa 
</block>

You shouldn't use regular expressions to manipulate HTML.

Hope this helps.

Kyle

-- Edit --

The previous code is more efficient, but if you plan to have two URLs in the same parent node, the code will break because the DOM tree is changed. To fix this, you can use this more intensive code:

<?php
function replaceUrlsWithLinks($text){
    $dom = new DOMDocument;
    $dom->loadXML($text);
    $xpath = new DOMXpath($dom);
    while(true){
        $shouldBreak = false;
        $query = $xpath->query('//text()[not(ancestor-or-self::a)]');
        foreach($query as $item){
            $shouldBreak = false;
            $content = $item->textContent;
            if(preg_match_all('/((ht|f)tps?:\/\/([\w\.]+\.)?[\w-]+(\.[a-zA-Z]{2,4})?[^\s\r\n\(\)"\'<>\,\!]+)/si',$content,$matches,PREG_SET_ORDER | PREG_OFFSET_CAPTURE)){
                foreach($matches as $match){
                    $newA = $dom->createElement('a',$match[0][0]);
                    $newA->setAttribute('href',$match[0][0]);
                    $newA->setAttribute('target','_blank');
                    $a = $item->splitText($match[0][1]);
                    $b = $a->splitText(strlen($match[0][0]));
                    $a->parentNode->replaceChild($newA,$a);
                    $shouldBreak = true;
                    break;
                }
            }
            if($shouldBreak == true)break;
        }
        if($shouldBreak == true){
            continue;
        }
        else {
            break;
        }
    }
    return $dom->saveHtml();
}

$html = <<<HTML
<block>
<a href="http://google.com">http://google.com</a>
<b>Stuff http://google.com</b>
asdf http://google.com ffaa  http://google.com
</block>
HTML;

echo replaceUrlsWithLinks($html);
?>

score 0 · Accepted Answer

this function wraps text like http://www.domain.com in an anchor tag. What I see here is that you are trying to convert an anchor tag to an anchor tag, which of course won't work. So: don't write the anchors in your text, and let the script create them for you.

score 0 · Accepted Answer

You're running into the usual problems that happen when you try to parse HTML with regexes. You need a proper HTML parser. Have a look at this thread.

php - 可点击链接的最佳 PHP 脚本

3 回答 3

Related

Reference