python - Django 评论中的脏话

Question

由于 Django 不处理过滤脏话 - 有人对在 django 中实现某种自然语言处理/过滤脏话的简单方法有任何建议吗？

score 7 · Accepted Answer

Django 确实处理过滤脏话。

来自https://docs.djangoproject.com/en/1.4/ref/settings/#profanities-list：

PROFANITIES_LIST

默认值：()（空元组）

一个不雅语的元组，作为字符串，在 is 时将被禁止在评论 COMMENTS_ALLOW_PROFANITIES中False。

也就是说，您仍然需要填充该列表。一些开始的链接。_

我也会熟悉斯肯索普问题。

score 2 · Accepted Answer

我个人说...不要打扰。如果您创建更好的过滤器，他们只会以不同的方式输入...

但是，这里有一个简单的例子：

import re
bad_words = ['spam', 'eggs']
# The \b gives a word boundary so you don't have the Scunthorpe problem: http://en.wikipedia.org/wiki/Scunthorpe_problem
pattern = re.compile(
    r'\b(%s)\b' % '|'.join(bad_words),
    re.IGNORECASE,
)

some_text = 'This text contains some profane words like spam and eggs. But it wont match spammy stuff.'
print some_text
# This text contains some profane words like spam and eggs. But it wont match spammy stuff.

clean_text = pattern.sub('XXX', some_text)
print clean_text
# This text contains some profane words like XXX and XXX. But it wont match spammy stuff.

python - Django 评论中的脏话

2 回答 2

Related

Reference