javascript - 如何确保用户只提交英文文本

Question

我正在构建一个涉及自然语言处理的项目，因为 nlp 模块目前只处理英文文本，所以我必须确保用户提交的内容（不长，只有几个单词）是英文的。是否有既定的方法来实现这一目标？首选 Python 或 Javascript 方式。

score 7 · Accepted Answer

如果内容足够长，我建议对字母进行一些频率分析。

但是对于一些单词，我认为最好的办法是将它们与英语词典进行比较，如果其中一半匹配，则接受输入。

score 6 · Accepted Answer

6

检查语言识别图表

于 2008-10-13T08:05:26.553 回答

score 5 · Accepted Answer

我认为最有效的方法是要求用户只提交英文文本:)

您可以在文本区域上显示语言选择下拉菜单，并使用英语/其他作为选项。当用户选择“其他”时，禁用文本区域并显示仅支持英语的消息 [目前]。

score 5 · Accepted Answer

Google has a javascript API that has an implementation of language detection. I've only play tested with it, never used it in production.

http://code.google.com/apis/ajaxlanguage/documentation/#Detect

score 3 · Accepted Answer

尝试基于 n-gram 的统计语言识别。这是一个使用这种技术的算法演示的链接，还有一个描述该算法的论文的链接。试试这个演示，即使在非常短的文本（3-4 个字）上也能表现得很好。

score 3 · Accepted Answer

你已经在做 NLP，如果你的模块不理解文本是什么语言，那么要么模块不工作，要么输入的语言不正确。

score 1 · Accepted Answer

尝试：

http://wordlist.sourceforge.net/

对于英语单词列表。

您需要注意名称，例如“Canberra”或“Bill Clinton”。这些不会出现在单词列表中。我建议只检查第一个字母是否大写作为第一次尝试。

score 0 · Accepted Answer

您可以将短语分解为单词并查看字典（您可以下载一些字典，这可能很有趣），但这需要您使用的字典足够好。

它也会因专有名词而失败（例如，我的名字不在字典中）。

score 0 · Accepted Answer

Dictionary Switcher Firefox 扩展有一个选项，可以在我键入时检测正确的字典。
我猜它会根据已安装的字典检查单词，并选择错误较少的单词...

You can't expect all words of the text to be in the dictionary: abbreviations, proper nouns, typos... Beside, some words are common to several languages: a French rock group even made the titles of their disks to have a (different) meaning both in French and in English. So it is a statistical thing: if more than x% of words are found in a good English dictionary, chances are the user types in this language (even if there are mistakes, like probably in this answer, since I am not native English).

score 0 · Accepted Answer

也许“确保用户只提交英文文本 [PHP] ”文章会对您有所帮助。代码是用 PHP 编写的，但足够小，可以很容易地重写。

javascript - 如何确保用户只提交英文文本

10 回答 10

Related

Reference