1

I'm trying to find some words (or expression: like two words) in a string which are not in the anchor of a link (the string contains html code and is usually utf-8 encoded). The plan is to replace those words with some links after that.

I'm not really good with regex, i've searched the web and stackoverflow and found two regex patterns which help me, but each of them have an issue. I'm hoping someone can help me to combine those two example to get a good one.

First pattern: /('.$tag.')(?![^<]*<\/a>)/is

This pattern, finds the words, but if by example i'm trying to find "express" in the string:

In computing, a regular expression provides a concise and flexible means...

..i don't expect to find a match, however the match is found in the word "expression".

Second pattern: \'(?!((<.*?)|(<a.*?)))(\b'.$tag.'\b)(?!(([^<>]*?)>)|([^>]*?</a>))\'is

This pattern, doesn't have the previous issue, but if the word or expression, i'm trying to find has as a last character a special utf-8 character then i don't get a match.

Example word: apă

Example string: ...care transformă umiditatea din aer în apă potabilă. Dacă iniţial a fost creată pentru situaţia ţărilor...

4

1 回答 1

0

假设第二个正则表达式对你有用(我还没有测试过,我真的不认为你应该对这种东西使用正则表达式),你需要做的就是添加一个u修饰符,比如@hakre 说:

\'(?!((<.*?)|(<a.*?)))(\b'.$tag.'\b)(?!(([^<>]*?)>)|([^>]*?</a>))\'isu

就个人而言,我会使用DOMDocument来完成这项任务。

于 2012-05-13T16:14:57.173 回答