2

我一直在寻找一个正则表达式来替换字符串中的纯文本 url(该字符串可以包含超过 1 个 url),方法是:

 <a href="url">url</a>

我发现了这个: http: //mathiasbynens.be/demo/url-regex

我想使用diegoperini的正则表达式(根据测试是最好的):

_^(?:(?:https?|ftp)://)(?:\S+(?::\S*)?@)?(?:(?!10(?:\.\d{1,3}){3})(?!127(?:\.\d{1,3}){3})(?!169\.254(?:\.\d{1,3}){2})(?!192\.168(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)(?:\.(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)*(?:\.(?:[a-z\x{00a1}-\x{ffff}]{2,})))(?::\d{2,5})?(?:/[^\s]*)?$_iuS

但我想让它全局替换字符串中的所有 url。当我使用这个时:

/_(?:(?:https?|ftp)://)(?:\S+(?::\S*)?@)?(?:(?!10(?:\.\d{1,3}){3})(?!127(?:\.\d{1,3}){3})(?!169\.254(?:\.\d{1,3}){2})(?!192\.168(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)(?:\.(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)*(?:\.(?:[a-z\x{00a1}-\x{ffff}]{2,})))(?::\d{2,5})?(?:/[^\s]*)?_iuS/g

它不起作用,我如何使这个正则表达式全局化,开头的下划线和结尾的“_iuS”是什么意思?

我想将它与 php 一起使用,所以我正在使用:

preg_replace($regex, '<a href="$0">$0</a>', $examplestring);
4

2 回答 2

0

下划线是正则表达式分隔符,i、u 和 S 是模式修饰符:

我(PCRE_CASELESS)

If this modifier is set, letters in the pattern match both upper and lower 
case letters.

U (PCRE_UNGREEDY)

This modifier inverts the "greediness" of the quantifiers so that they are 
not greedy by default, but become greedy if followed by ?. It is not compatible
with Perl. It can also be set by a (?U) modifier setting within the pattern 
or by a question mark behind a quantifier (e.g. .*?).

小号

When a pattern is going to be used several times, it is worth spending more 
time analyzing it in order to speed up the time taken for matching. If this 
modifier is set, then this extra analysis is performed. At present, studying 
a pattern is useful only for non-anchored patterns that do not have a single 
fixed starting character.

有关更多信息,请参阅http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php

当您添加 / ... /g 时,您添加了另一个正则表达式分隔符以及 PCRE 中不存在的修饰符 g,这就是它不起作用的原因。

于 2012-09-10T13:43:17.120 回答
0

我同意@verdesmarald 并在以下函数中使用了这种模式:

$string = preg_replace_callback(
        "_(?:(?:https?|ftp)://)(?:\S+(?::\S*)?@)?(?:(?!10(?:\.\d{1,3}){3})(?!127(?:\.\d{1,3}){3})(?!169\.254(?:\.\d{1,3}){2})(?!192\.168(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)(?:\.(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)*(?:\.(?:[a-z\x{00a1}-\x{ffff}]{2,})))(?::\d{2,5})?(?:/[^\s]*)?_iuS",
        create_function('$match','
            $m = trim(strtolower($match[0]));
            $m = str_replace("http://", "", $m);
            $m = str_replace("https://", "", $m);
            $m = str_replace("ftp://", "", $m);
            $m = str_replace("www.", "", $m);

            if (strlen($m) > 25)
            {
                $m = substr($m, 0, 25) . "...";
            }

            return "<a href=\"$match[0]\">$m</a>";
                '), $string);

    return $string;

它似乎可以解决问题,并解决我遇到的问题。正如@verdesmarald 所说,删除 ^ 和 $ 字符使该模式即使在我的 pre_replace_callback() 中也可以工作。

唯一让我担心的是该模式的效率如何。如果在繁忙/高流量的网络应用程序中使用,它会导致瓶颈吗?

更新

如果 url 的路径部分的末尾有一个尾点,则上述正则表达式模式会中断,如下所示http://www.mydomain.com/page.。为了解决这个问题,我通过添加^.使最终部分看起来像这样来修改正则表达式模式的最后一部分[^\s^.]。当我阅读它时,不要匹配尾随空格或点。

到目前为止,在我的测试中,它似乎工作正常。

于 2013-01-19T01:33:38.287 回答