python-3.x - Polyglot 中是否有办法将希伯来语文本的语言代码从“iw”永久“修复”为“he”？

翻译自：https://stackoverflow.com/questions/55403414 2019-03-28T17:14:06.630

368 次

我想在 python 3.6 中使用 Polyglot 对希伯来语文本进行简单的情感分析。问题是 Polyglot 将文本语言代码识别为“iw”而不是“he”，因此无法处理它。

如图所示：使用 polyglot package for Named Entity Recognition in hebrew我已经添加hint_language_code = 'he'到Text函数调用中，但它只更改文本的初始形式，而不是其子形式（如句子或单词）。

例如：

输入：

import polyglot
from polyglot.text import Text, Word

article='איך ניתן לנתח טקסט בעברית? והאם ניתן לשנות את הקידוד?'
txt = Text(article)
print(txt.language.code)

txt = Text(article,hint_language_code = 'he')
print(txt.language.code)

sent=txt.sentences[1]
print(sent.language.code)
print(sent)

输出：

iw
he
iw
והאם ניתן לשנות את הקידוד?

如何将文本language_code从永久更改'iw'为'he'？

python-3.x - Polyglot 中是否有办法将希伯来语文本的语言代码从“iw”永久“修复”为“he”？

0 回答 0

Related

Reference