php - 用 HTML 链接替换文本中的 URL

Question

这是一个设计：例如，我放了一个链接，例如

http://example.com

在文本区域。如何让 PHP 检测到它是一个http://链接，然后将其打印为

print "<a href='http://www.example.com'>http://www.example.com</a>";

我记得以前做过类似的事情，但它并不是万无一失的，它会不断破坏复杂的链接。

另一个好主意是，如果您有一个链接，例如

http://example.com/test.php?val1=bla&val2blablabla%20bla%20bla.bl

修复它

print "<a href='http://example.com/test.php?val1=bla&val2=bla%20bla%20bla.bla'>";
print "http://example.com/test.php";
print "</a>";

这只是事后的想法.. stackoverflow 也可能使用它：D

有任何想法吗

score 122 · Accepted Answer

让我们看看要求。您有一些用户提供的纯文本，您希望使用超链接 URL 显示这些纯文本。

“http://”协议前缀应该是可选的。
应接受域和 IP 地址。
应接受任何有效的顶级域，例如 .aero 和 .xn--jxalpdlp。
应该允许端口号。
在正常的句子上下文中必须允许 URL。例如，在“访问 stackoverflow.com.”中，最后一个句点不是 URL 的一部分。
您可能还希望允许“https://” URL，也许还希望允许其他 URL。
与在 HTML 中显示用户提供的文本一样，您希望防止跨站点脚本(XSS)。此外，您还希望将 URL 中的 & 符号正确转义为 &。
您可能不需要支持 IPv6 地址。
编辑：如评论中所述，对电子邮件地址的支持绝对是一个加分项。
编辑：仅支持纯文本输入 - 输入中的 HTML 标记不应受到尊重。（Bitbucket 版本支持 HTML 输入。）

编辑：查看GitHub以获取最新版本，支持电子邮件地址、经过身份验证的 URL、引号和括号中的 URL、HTML 输入以及更新的 TLD 列表。

这是我的看法：

<?php
$text = <<<EOD
Here are some URLs:
stackoverflow.com/questions/1188129/pregreplace-to-detect-html-php
Here's the answer: http://www.google.com/search?rls=en&q=42&ie=utf-8&oe=utf-8&hl=en. What was the question?
A quick look at http://en.wikipedia.org/wiki/URI_scheme#Generic_syntax is helpful.
There is no place like 127.0.0.1! Except maybe http://news.bbc.co.uk/1/hi/england/surrey/8168892.stm?
Ports: 192.168.0.1:8080, https://example.net:1234/.
Beware of Greeks bringing internationalized top-level domains: xn--hxajbheg2az3al.xn--jxalpdlp.
And remember.Nobody is perfect.

<script>alert('Remember kids: Say no to XSS-attacks! Always HTML escape untrusted input!');</script>
EOD;

$rexProtocol = '(https?://)?';
$rexDomain   = '((?:[-a-zA-Z0-9]{1,63}\.)+[-a-zA-Z0-9]{2,63}|(?:[0-9]{1,3}\.){3}[0-9]{1,3})';
$rexPort     = '(:[0-9]{1,5})?';
$rexPath     = '(/[!$-/0-9:;=@_\':;!a-zA-Z\x7f-\xff]*?)?';
$rexQuery    = '(\?[!$-/0-9:;=@_\':;!a-zA-Z\x7f-\xff]+?)?';
$rexFragment = '(#[!$-/0-9:;=@_\':;!a-zA-Z\x7f-\xff]+?)?';

// Solution 1:

function callback($match)
{
    // Prepend http:// if no protocol specified
    $completeUrl = $match[1] ? $match[0] : "http://{$match[0]}";

    return '<a href="' . $completeUrl . '">'
        . $match[2] . $match[3] . $match[4] . '</a>';
}

print "<pre>";
print preg_replace_callback("&\\b$rexProtocol$rexDomain$rexPort$rexPath$rexQuery$rexFragment(?=[?.!,;:\"]?(\s|$))&",
    'callback', htmlspecialchars($text));
print "</pre>";

为了正确转义 < 和 & 字符，我在处理之前通过 htmlspecialchars 抛出整个文本。这并不理想，因为 html 转义会导致错误检测 URL 边界。
正如“记住。没有人是完美的”所证明的那样。行（其中 remember.Nobody 被视为 URL，因为缺少空格），可能需要进一步检查有效的顶级域。

编辑：下面的代码修复了上述两个问题，但由于我或多或少地重新实现preg_replace_callback使用preg_match.

// Solution 2:

$validTlds = array_fill_keys(explode(" ", ".aero .asia .biz .cat .com .coop .edu .gov .info .int .jobs .mil .mobi .museum .name .net .org .pro .tel .travel .ac .ad .ae .af .ag .ai .al .am .an .ao .aq .ar .as .at .au .aw .ax .az .ba .bb .bd .be .bf .bg .bh .bi .bj .bm .bn .bo .br .bs .bt .bv .bw .by .bz .ca .cc .cd .cf .cg .ch .ci .ck .cl .cm .cn .co .cr .cu .cv .cx .cy .cz .de .dj .dk .dm .do .dz .ec .ee .eg .er .es .et .eu .fi .fj .fk .fm .fo .fr .ga .gb .gd .ge .gf .gg .gh .gi .gl .gm .gn .gp .gq .gr .gs .gt .gu .gw .gy .hk .hm .hn .hr .ht .hu .id .ie .il .im .in .io .iq .ir .is .it .je .jm .jo .jp .ke .kg .kh .ki .km .kn .kp .kr .kw .ky .kz .la .lb .lc .li .lk .lr .ls .lt .lu .lv .ly .ma .mc .md .me .mg .mh .mk .ml .mm .mn .mo .mp .mq .mr .ms .mt .mu .mv .mw .mx .my .mz .na .nc .ne .nf .ng .ni .nl .no .np .nr .nu .nz .om .pa .pe .pf .pg .ph .pk .pl .pm .pn .pr .ps .pt .pw .py .qa .re .ro .rs .ru .rw .sa .sb .sc .sd .se .sg .sh .si .sj .sk .sl .sm .sn .so .sr .st .su .sv .sy .sz .tc .td .tf .tg .th .tj .tk .tl .tm .tn .to .tp .tr .tt .tv .tw .tz .ua .ug .uk .us .uy .uz .va .vc .ve .vg .vi .vn .vu .wf .ws .ye .yt .yu .za .zm .zw .xn--0zwm56d .xn--11b5bs3a9aj6g .xn--80akhbyknj4f .xn--9t4b11yi5a .xn--deba0ad .xn--g6w251d .xn--hgbk6aj7f53bba .xn--hlcj6aya9esc7a .xn--jxalpdlp .xn--kgbechtv .xn--zckzah .arpa"), true);

$position = 0;
while (preg_match("{\\b$rexProtocol$rexDomain$rexPort$rexPath$rexQuery$rexFragment(?=[?.!,;:\"]?(\s|$))}", $text, &$match, PREG_OFFSET_CAPTURE, $position))
{
    list($url, $urlPosition) = $match[0];

    // Print the text leading up to the URL.
    print(htmlspecialchars(substr($text, $position, $urlPosition - $position)));

    $domain = $match[2][0];
    $port   = $match[3][0];
    $path   = $match[4][0];

    // Check if the TLD is valid - or that $domain is an IP address.
    $tld = strtolower(strrchr($domain, '.'));
    if (preg_match('{\.[0-9]{1,3}}', $tld) || isset($validTlds[$tld]))
    {
        // Prepend http:// if no protocol specified
        $completeUrl = $match[1][0] ? $url : "http://$url";

        // Print the hyperlink.
        printf('<a href="%s">%s</a>', htmlspecialchars($completeUrl), htmlspecialchars("$domain$port$path"));
    }
    else
    {
        // Not a valid URL.
        print(htmlspecialchars($url));
    }

    // Continue text parsing from after the URL.
    $position = $urlPosition + strlen($url);
}

// Print the remainder of the text.
print(htmlspecialchars(substr($text, $position)));

score 15 · Accepted Answer

这是我发现的经过尝试和测试的东西

function make_links_blank($text)
{
  return  preg_replace(
     array(
       '/(?(?=<a[^>]*>.+<\/a>)
             (?:<a[^>]*>.+<\/a>)
             |
             ([^="\']?)((?:https?|ftp|bf2|):\/\/[^<> \n\r]+)
         )/iex',
       '/<a([^>]*)target="?[^"\']+"?/i',
       '/<a([^>]+)>/i',
       '/(^|\s)(www.[^<> \n\r]+)/iex',
       '/(([_A-Za-z0-9-]+)(\\.[_A-Za-z0-9-]+)*@([A-Za-z0-9-]+)
       (\\.[A-Za-z0-9-]+)*)/iex'
       ),
     array(
       "stripslashes((strlen('\\2')>0?'\\1<a href=\"\\2\">\\2</a>\\3':'\\0'))",
       '<a\\1',
       '<a\\1 target="_blank">',
       "stripslashes((strlen('\\2')>0?'\\1<a href=\"http://\\2\">\\2</a>\\3':'\\0'))",
       "stripslashes((strlen('\\2')>0?'<a href=\"mailto:\\0\">\\0</a>':'\\0'))"
       ),
       $text
   );
}

这个对我有用。它适用于电子邮件和 URL，很抱歉回答我自己的问题。:(

但这是唯一有效的

这是我找到它的链接：http ://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/Q_21878567.html

提前为它是一个专家交流。

score 15 · Accepted Answer

你们正在谈论推进和复杂的东西的方式，这对某些情况很有好处，但大多数情况下我们需要一个简单粗心的解决方案。简单的这个怎么样？

preg_replace('/(http[s]{0,1}\:\/\/\S{4,})\s{0,}/ims', '<a href="$1" target="_blank">$1</a> ', $text_msg);

试试吧，让我知道它不满足哪些疯狂的 url。

score 4 · Accepted Answer

这是在函数中使用正则表达式的代码

<?php
//Function definations
function MakeUrls($str)
{
$find=array('`((?:https?|ftp)://\S+[[:alnum:]]/?)`si','`((?<!//)(www\.\S+[[:alnum:]]/?))`si');

$replace=array('<a href="$1" target="_blank">$1</a>', '<a href="http://$1" target="_blank">$1</a>');

return preg_replace($find,$replace,$str);
}
//Function testing
$str="www.cloudlibz.com";
$str=MakeUrls($str);
echo $str;
?>

score 4 · Accepted Answer

我一直在使用这个功能，它对我有用

function AutoLinkUrls($str,$popup = FALSE){
    if (preg_match_all("#(^|\s|\()((http(s?)://)|(www\.))(\w+[^\s\)\<]+)#i", $str, $matches)){
        $pop = ($popup == TRUE) ? " target=\"_blank\" " : "";
        for ($i = 0; $i < count($matches['0']); $i++){
            $period = '';
            if (preg_match("|\.$|", $matches['6'][$i])){
                $period = '.';
                $matches['6'][$i] = substr($matches['6'][$i], 0, -1);
            }
            $str = str_replace($matches['0'][$i],
                    $matches['1'][$i].'<a href="http'.
                    $matches['4'][$i].'://'.
                    $matches['5'][$i].
                    $matches['6'][$i].'"'.$pop.'>http'.
                    $matches['4'][$i].'://'.
                    $matches['5'][$i].
                    $matches['6'][$i].'</a>'.
                    $period, $str);
        }//end for
    }//end if
    return $str;
}//end AutoLinkUrls

所有学分都转到 - http://snipplr.com/view/68586/

享受！

score 1 · Accepted Answer

此 RegEx 应匹配除这些新的 3+ 字符顶级域之外的任何链接...

{
  \\b
  # 匹配前导部分（proto://hostname，或者只是主机名）
  (
    # http:// 或 https:// 前导部分
    (https?)://[-\\w]+(\\.\\w[-\\w]*)+
  |
    # 或者，尝试查找具有更具体子表达式的主机名
    (?i: [a-z0-9] (?:[-a-z0-9]*[a-z0-9])? \\. )+ # 子域
    # 现在以 .com 等结尾。对于这些，需要小写
    (?-i: com\\b
        | 教育\\b
        | 商务\\b
        | 政府\\b
        | in(?:t|fo)\\b # .int 或 .info
        | 米尔\b
        | 网络\\b
        | 组织\\b
        | [az][az]\\.[az][az]\\b # 两个字母的国家代码
    )
  )

  # 允许一个可选的端口号
  ( : \\d+ )?

  # URL 的其余部分是可选的，以 / 开头
  (
    /
    # 其余的都是启发式的，似乎运作良好
    [^.!,?;"\\'()\[\]\{\}\s\x7F-\\xFF]*
    (
      [.!,?]+ [^.!,?;"\\'()\\[\\]\{\\}\s\\x7F-\\xFF]+
    )*
  )?
}ix

这不是我写的，我不太清楚我从哪里得到它，对不起，我不能给予信任......

score 1 · Accepted Answer

这应该会为您提供电子邮件地址：

$string = "bah bah steve@gmail.com foo";
$match = preg_match('/[^\x00-\x20()<>@,;:\\".[\]\x7f-\xff]+(?:\.[^\x00-\x20()<>@,;:\\".[\]\x7f-\xff]+)*\@[^\x00-\x20()<>@,;:\\".[\]\x7f-\xff]+(?:\.[^\x00-\x20()<>@,;:\\".[\]\x7f-\xff]+)+/', $string, $array);
print_r($array);

// outputs:
Array
(
    [0] => steve@gmail.com
)

score 1 · Accepted Answer

我知道这个答案已被接受，并且这个问题已经很老了，但它对于寻找其他实现的其他人可能很有用。

这是Angel.King.47在2009年7月27日发布的代码的修改版本：

$text = preg_replace(
 array(
   '/(^|\s|>)(www.[^<> \n\r]+)/iex',
   '/(^|\s|>)([_A-Za-z0-9-]+(\\.[A-Za-z]{2,3})?\\.[A-Za-z]{2,4}\\/[^<> \n\r]+)/iex',
   '/(?(?=<a[^>]*>.+<\/a>)(?:<a[^>]*>.+<\/a>)|([^="\']?)((?:https?):\/\/([^<> \n\r]+)))/iex'
 ),  
 array(
   "stripslashes((strlen('\\2')>0?'\\1<a href=\"http://\\2\" target=\"_blank\">\\2</a>&nbsp;\\3':'\\0'))",
   "stripslashes((strlen('\\2')>0?'\\1<a href=\"http://\\2\" target=\"_blank\">\\2</a>&nbsp;\\4':'\\0'))",
   "stripslashes((strlen('\\2')>0?'\\1<a href=\"\\2\" target=\"_blank\">\\3</a>&nbsp;':'\\0'))",
 ),  
 $text
);

变化：

我删除了规则 #2 和 #3（我不确定在哪些情况下有用）。
删除了电子邮件解析，因为我真的不需要它。
我又添加了一条规则，允许识别以下形式的 URL：[域]/*（不带 www）。例如：“example.com/faq/”（多个 tld：domain.{2-3}.{2-4}/）
解析以“http://”开头的字符串时，会将其从链接标签中删除。
为所有链接添加了“target='_blank'”。
可以在 any(?) 标记之后指定 URL。例如：<b>www.example.com</b>

正如“Søren Løvborg”所说，这个函数不会转义 URL。我尝试了他/她的课程，但它并没有像我预期的那样工作（如果您不信任您的用户，请先尝试他/她的代码）。

score 1 · Accepted Answer

正如我在上面的评论之一中提到的，运行 php 7 的 VPS 开始发出警告Warning: preg_replace(): The /e 修饰符不再受支持，请改用 preg_replace_callback。替换后的缓冲区为空/假。

我重写了代码并做了一些改进。如果您认为您应该在作者部分，请随时编辑函数 make_links_blank 名称上方的评论。我故意不使用关闭 php ?> 以避免在输出中插入空格。

<?php

class App_Updater_String_Util {
    public static function get_default_link_attribs( $regex_matches = [] ) {
        $t = ' target="_blank" ';
        return $t;
    }

    /**
     * App_Updater_String_Util::set_protocol();
     * @param string $link
     * @return string
     */
    public static function set_protocol( $link ) {
        if ( ! preg_match( '#^https?#si', $link ) ) {
            $link = 'http://' . $link;
        }
        return $link;
    }

/**
     * Goes through text and makes whatever text that look like a link an html link
     * which opens in a new tab/window (by adding target attribute).
     * 
     * Usage: App_Updater_String_Util::make_links_blank( $text );
     * 
     * @param str $text
     * @return str
     * @see http://stackoverflow.com/questions/1188129/replace-urls-in-text-with-html-links
     * @author Angel.King.47 | http://dashee.co.uk
     * @author Svetoslav Marinov (Slavi) | http://orbisius.com
     */
    public static function make_links_blank( $text ) {
        $patterns = [
            '#(?(?=<a[^>]*>.+?<\/a>)
                 (?:<a[^>]*>.+<\/a>)
                 |
                 ([^="\']?)((?:https?|ftp):\/\/[^<> \n\r]+)
             )#six' => function ( $matches ) {
                $r1 = empty( $matches[1] ) ? '' : $matches[1];
                $r2 = empty( $matches[2] ) ? '' : $matches[2];
                $r3 = empty( $matches[3] ) ? '' : $matches[3];

                $r2 = empty( $r2 ) ? '' : App_Updater_String_Util::set_protocol( $r2 );
                $res = ! empty( $r2 ) ? "$r1<a href=\"$r2\">$r2</a>$r3" : $matches[0];
                $res = stripslashes( $res );

                return $res;
             },

            '#(^|\s)((?:https?://|www\.|https?://www\.)[^<>\ \n\r]+)#six' => function ( $matches ) {
                $r1 = empty( $matches[1] ) ? '' : $matches[1];
                $r2 = empty( $matches[2] ) ? '' : $matches[2];
                $r3 = empty( $matches[3] ) ? '' : $matches[3];

                $r2 = ! empty( $r2 ) ? App_Updater_String_Util::set_protocol( $r2 ) : '';
                $res = ! empty( $r2 ) ? "$r1<a href=\"$r2\">$r2</a>$r3" : $matches[0];
                $res = stripslashes( $res );

                return $res;
            },

            // Remove any target attribs (if any)
            '#<a([^>]*)target="?[^"\']+"?#si' => '<a\\1',

            // Put the target attrib
            '#<a([^>]+)>#si' => '<a\\1 target="_blank">',

            // Make emails clickable Mailto links
            '/(([\w\-]+)(\\.[\w\-]+)*@([\w\-]+)
                (\\.[\w\-]+)*)/six' => function ( $matches ) {

                $r = $matches[0];
                $res = ! empty( $r ) ? "<a href=\"mailto:$r\">$r</a>" : $r;
                $res = stripslashes( $res );

                return $res;
            },
        ];

        foreach ( $patterns as $regex => $callback_or_replace ) {
            if ( is_callable( $callback_or_replace ) ) {
                $text = preg_replace_callback( $regex, $callback_or_replace, $text );
            } else {
                $text = preg_replace( $regex, $callback_or_replace, $text );
            }
        }

        return $text;
    }
}

score 0 · Accepted Answer

类似的东西：

<?php
if(preg_match('@^http://(.*)\s|$@g', $textarea_url, $matches)) {
    echo '<a href=http://", $matches[1], '">', $matches[1], '</a>';
}
?>

score 0 · Accepted Answer

这class会将网址更改为文本，同时保持主页网址不变。我希望这对您有所帮助并节省时间。享受。

class RegClass 
{ 

     function preg_callback_url($matches) 
     { 
        //var_dump($matches); 
        //Get the matched URL  text <a>text</a>
        $text = $matches[2];
        //Get the matched URL link <a href ="http://www.test.com">text</a>
        $url = $matches[1];

        if($url=='href ="http://www.test.com"'){
         //replace all a tag as it is
         return '<a href='.$url.' rel="nofollow"> '.$text.' </a>'; 

         }else{
         //replace all a tag to text
         return " $text " ;
         }
} 
function ParseText($text){ 

    $text = preg_replace( "/www\./", "http://www.", $text );
        $regex ="/http:\/\/http:\/\/www\./"
    $text = preg_replace( $regex, "http://www.", $text );
        $regex2 = "/https:\/\/http:\/\/www\./";
    $text = preg_replace( $regex2, "https://www.", $text );

        return preg_replace_callback('/<a\s(.+?)>(.+?)<\/a>/is',
                array( &$this,        'preg_callback_url'), $text); 
      } 

} 
$regexp = new RegClass();
echo $regexp->ParseText($text);

score 0 · Accepted Answer

如果您想信任 IANA，您可以获得当前使用的官方支持的 TLD 列表，例如：

  $validTLDs = 
explode("\n", file_get_contents('http://data.iana.org/TLD/tlds-alpha-by-domain.txt')); //get the official list of valid tlds
  array_shift($validTLDs); //throw away first line containing meta data
  array_pop($validTLDs); //throw away last element which is empty

使 Søren Løvborg 的解决方案 #2 不那么冗长，省去了更新列表的麻烦，现在新的 tld 被如此粗心地丢弃了；）

score 0 · Accepted Answer

这对我有用（将答案之一变成了 PHP 函数）

function make_urls_from_text ($text){
   return preg_replace('/(http[s]{0,1}\:\/\/\S{4,})\s{0,}/ims', '<a href="$1" target="_blank">$1 </a>', $text);
}

score 0 · Accepted Answer

我创建的这个类可以满足我的需要，但诚然它确实需要一些工作；

class addLink
{
    public function link($string)
    {
        $expression = "/(?i)\b((?:https?:\/\/|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,63}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))/";
        if(preg_match_all($expression, $string, $matches) == 1)// If the pattern is found then
        {
            $string = preg_replace($expression, '<a href="'.$matches[0][0].'" target="_blank">$1</a>', $string);
        }

        return $string;       
    }
}

使用此代码的示例；

include 'PHP/addLink.php';

if(class_exists('addLink')) 
{                  
    $al = new addLink();                  
}
else{
    echo 'Class not found...';
} 

$paragraph = $al->link($paragraph);

score 0 · Accepted Answer

这只是Dharmendra Jadon发布的解决方案的一个变体，所以如果你喜欢它，请投票给他！

我刚刚添加了一个参数以使在新窗口中打开链接 (target="_blank") 成为可选的，因为我在其他一些解决方案中看到了这一点，并且喜欢这种灵活性：

function MakeUrls($str, $popup = FALSE)
{
    $find=array('`((?:https?|ftp)://\S+[[:alnum:]]/?)`si','`((?<!//)(www\.\S+[[:alnum:]]/?))`si');

    $replace=array('<a href="$1"' . ($popup ? ' target="_blank"' : '') . '>$1</a>', '<a href="http://$1"' . ($popup ? ' target="_blank"' : '') . '>$1</a>');

    return preg_replace($find,$replace,$str);
}

score -1 · Accepted Answer

这应该可以让您的 Twitter 句柄不涉及您的电子邮件 /(?<=^|(?<=[^a-zA-Z0-9- .]))@([A-Za-z]+[A-Za -z0-9 ]+)/i

score -2 · Accepted Answer

虽然匹配完整的 url 规范很困难，但这里有一个通常做得很好的正则表达式：

([\w-]+(\.[\w-]+)*@([a-z0-9-]+(\.[a-z0-9-]+)*?\.[a-z]{2,6}|(\d{1,3}\.){3}\d{1,3})(:\d{4})?)

但是，要在 preg_replace 中使用它，您需要对其进行转义。如此：

$pattern = "/([\\w-]+(\\.[\\w-]+)*@([a-z0-9-]+(\\.[a-z0-9-]+)*?\\.[a-z]{2,6}|(\\d{1,3}\\.){3}\\d{1,3})(:\\d{4})?)/";
$replaced_texttext = preg_replace($pattern, '<a href="$0" title="$0">$0</a>', $text);

php - 用 HTML 链接替换文本中的 URL

17 回答 17

Related

Reference