1

我对此感到疯狂,它是如此简单,但我无法找出正确的正则表达式。我需要一个匹配列入黑名单的单词的正则表达式,即“ass”。

例如,在这个字符串中:

<span class="bob">Blacklisted word was here</span>bass

我试过那个正则表达式:

((?!class)ass)

这与单词“bass”bot NOT“class”中的“ass”匹配。这个正则表达式在这两种情况下都标记了“ass”。我在谷歌上检查了多个负面的前瞻,但没有一个有效。

注意:这是针对 CMS,版主可以轻松找到潜在的坏词,我知道您不能依靠计算机进行过滤。

4

3 回答 3

4

如果您有可用的lookbehind (IIRC,JavaScript 没有,这似乎很可能是您使用它的目的)(只是注意到PHP 标记;您可能有lookbehind 可用),这是非常微不足道的:

(?<!cl)(ass)

如果不向后看,您可能需要执行以下操作:

(?:(?!cl)..|^.?)(ass)

也就是说ass,之前的任意两个字符都不是cl,或者ass是行首之后的零个或一个字符。

请注意,这可能不是实施黑名单的最佳方式。你可能想要这个:

\bass\b

哪个将匹配该单词ass,但不匹配其中包含ass的任何单词(例如association或其他任何单词bass)。

于 2012-09-19T16:26:12.330 回答
2

It seems to me that you're actually trying to use two lists here: one for words that should be excluded (even if one is a part of some other word), and another for words that should not be changed at all - even though they have the words from the first list as substrings.

The trick here is to know where to use the lookbehind:

/ass(?<!class)/

In other words, the good word negative lookbehind should follow the bad word pattern, not precede it. Then it would work correctly.

You can even get some of them in a row:

/ass(?<!class)(?<!pass)(?<!bass)/

This, though, will match both passhole and pass. ) To make it even more bullet-proof, we can add checking the word boundaries:

/ass(?<!\bclass\b)(?<!\bpass\b)(?<!\bbass\b)/

UPDATE: of course, it's more efficient to check for parts of the string, with (?<!cl)(?<!b) etc. But my point was that you can still use the whole words from whitelist in the regex.

Then again, perhaps it'd be wise to prepare the whitelists accordingly (so shorter patterns will have to be checked).

于 2012-09-19T16:35:15.490 回答
-1

这是你想要的吗?(?<!class)(\w+ass)

于 2012-09-19T16:05:26.990 回答