1

So my problem is that, in the same content there are iframes, image tags and etc. They all have regex matches that will convert them into the correct format.

The last thing left is the normal URL. I need a regex, that will find all links that are simply links and not inside of a iframe, img or any other tag. Tags used in this case are regular HTML tags and not BB.

Currently I got this code as the last pass of the content rendering. But it will also react to all the other things done above (iframes and img renderings.) So it goes and swaps the urls out there aswell.

$output = preg_replace(array(
    '%\b(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))%s'
), array(
    'test'
), $output);

And my content looks something like this:

# dont want these to be touched
<iframe width="640" height="360" src="http://somedomain.com/but-still-its-a-link-to-somewhere/" frameborder="0"></iframe>
<img src="http://someotherdomain.com/here-is-a-img-url.jpg" border="0" />

# and only these converted
http://google.com
http://www.google.com
https://www2.google.com<br />
www.google.com

As you can see, there also might be something at the end of the link. After a full day of trying regexes to work, that last <br /> has been a nightmare for me.

4

1 回答 1

2

描述

此解决方案将匹配不在标签属性值内的 url,并将其替换为新内容。

正则表达式匹配您跳过的内容和替换的内容。然后 preg_match_callback 执行一个内部函数,该函数测试是否填充了捕获组 1(这是所需的文本),如果是则返回更改,否则它只返回不需要的文本。

我使用了您的 url 匹配正则表达式,并进行了一些小的修改,例如将未使用的捕获组(...)转换为非捕获组(?:... )。这使得正则表达式引擎运行得更快,并且更容易修改表达式。

原始表达式:<(?:[^'">=]*|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*>|((?:[\w-]+:\/\/?|www[.])[^\s()<>]+(?:\([\w\d]+\)|(?:[^[:punct:]\s]|\/)))

在此处输入图像描述

例子

代码

<?php

$string = '# dont want these to be touched
<iframe width="640" height="360" src="http://somedomain.com/but-still-its-a-link-to-somewhere/" frameborder="0"></iframe>
<img src="http://someotherdomain.com/here-is-a-img-url.jpg" border="0" />

# and only these converted
http://google.com
http://www.google.com
https://www2.google.com<br />
www.google.com';


    $regex = '/<(?:[^\'">=]*|=\'[^\']*\'|="[^"]*"|=[^\'"][^\s>]*)*>|((?:[\w-]+:\/\/?|www[.])[^\s()<>]+(?:\([\w\d]+\)|(?:[^[:punct:]\s]|\/)))/ims';

    $output = preg_replace_callback(
        $regex,
        function ($matches) {
            if (array_key_exists (1, $matches)) {
                return '<a href="' . $matches[1] . '">' . $matches[1] . '<\/a>';
            }
            return $matches[0];
        },
        $string
    );
    echo $output;

输出

# dont want these to be touched
<iframe width="640" height="360" src="http://somedomain.com/but-still-its-a-link-to-somewhere/" frameborder="0"></iframe>
<img src="http://someotherdomain.com/here-is-a-img-url.jpg" border="0" />

# and only these converted
<a href="http://google.com">http://google.com<\/a>
<a href="http://www.google.com">http://www.google.com<\/a>
<a href="https://www2.google.com">https://www2.google.com<\/a><br />
<a href="www.google.com">www.google.com<\/a>
于 2013-07-20T18:29:44.290 回答