3

I'm building code matching and replacing several types of patterns (bbCode). One of the matches I'm trying to make, is [url=http:example.com] replacing all with anchor links. I'm also trying to match and replace plain textual urls with anchor links. And the combination of these two is where I'm running in to some trouble.

Since my routine is recursive, matching and replacing the entire text each run, I'm having trouble NOT replacing urls already contained in anchors.

This is the recursive routine I'm running:

if(text.search(p.pattern) !== -1) {
    text = text.replace(p.pattern, p.replace);
}

This is my regexp for plain urls so far:

/(?!href="|>)(ht|f)tps?:\/\/.*?(?=\s|$)/ig

And URLs can start with http or https or ftp or ftps, and contain whatever text afterwards, ending with whitespace or a punctuation mark (. / ! / ? / ,)

Just to be absolutely clear, I'm using this as a test for matches:

Should match:

Should not match

I would really appretiate any help I can get here.

EDIT The first accepted solution by jkshah below does have some flaws. For instance, it will match

<img src="http://www.example.com/test.jpg">

The comments in Jerry's solution however did make me want to try it again, and that solution solved this issue as well. I therefore accepted that solution instead. Thank you all for your kind help on this. :)

4

3 回答 3

3

也许是这样的?

/(?:(?:ht|f)tps?:\/\/|www)[^<>\]]+?(?![^<>\]]*([>]|<\/))(?=[\s!,?\]]|$)/gm

如果有的话,然后修剪最后的点。

正则表达式101演示

虽然如果链接包含更多标点符号,它可能会导致一些问题......我会建议先捕获链接,然后通过第二次替换删除尾随标点符号。

[^<>\]]+将匹配除<,>]

(?![^<>\]]*([>]|<\/))防止匹配 html 标签之间的链接。

(?=[\s!,?\]]|$)用于标点符号和空格。

于 2013-09-27T21:53:01.217 回答
1

遵循正则表达式应该可以工作。它在您的样本输入上给出了期望的结果。

/((?:(?:ht|f)tps?:\/\/|www)[^\s,?!]+(?!.*<\/a>))/gm

在此处查看实际操作

(?!.*<\/a>)- 锚的负前瞻

匹配的内容将被存储在替换字符串中$1,并可用于替换字符串。

编辑

<img src ..可以使用与以下内容不匹配的内容

(^(?!.*<img\s+src)(?:(?:ht|f)tps?:\/\/|www)[^\s,?!]+(?!.*<\/a>))
于 2013-09-27T22:33:56.780 回答
0

可以p.replace是函数吗?如果是这样:

var text = 'http://www.example.com \n' +
           'http://www.example.com/test \n' +
           'http://example.com/test \n' +
           'www.example.com/test \n' +
           '<a href="http://www.example.com">http://www.example.com </a>\n' +
           '<a href="http://www.example.com/test">http://www.example.com/test </a>\n' +
           '<a href="http://example.com/test">http://example.com/test </a>\n' +
           '<a href="www.example.com/test">www.example.com/test </a>';
var p = {
    flag: true,
    pattern: /(<a[^<]*<\/a>)|((ht|f)tps?:\/\/|www\.).*?(?=\s|$)/ig,
    replace: function ($0, $1) {
                 if ($1) {
                     return $0;
                 } else {
                     p.flag = true;
                     return "construct replacement string here";
                 }
    }
};
while(p.flag){
    p.flag = false;
    text = text.replace(p.pattern, p.replace);
}

我添加的正则表达式的一部分是(<a[^<]*<\/a>)|检查 url 是否位于锚点内的任何位置,如果是,则替换函数将忽略它。

如果你想避免里面的url<a href="...">但是锚里面的其他url要被替换,那么(<a[^<]*<\/a>)|改为(<a[^>]*>)|

于 2013-09-28T00:06:49.280 回答