python - 如何使用 pyhunspell 向 .dic/.aff 文件添加新单词？

Question

我正在使用pyhunspell它是一个 python 包装器HunSpell，一个基于 .dic/.aff 文件的拼写检查器、词干分析器、单词分析器。的文档pyhunspell可在此处找到。不幸的是，文档页面没有演示如何通过 Python 脚本向字典添加新单词/扩展字典。然而的源代码包含一个函数，但与其他函数不同，没有解释，例如这个函数期望什么参数。以前有没有人设法调用过这个函数并且可以给我写一个如何使用这个函数的例子？pyhunspelladd()add()add()

这是我想调用的函数的 C 源代码，但我的 C 语言太有限，无法理解这里发生了什么。

static PyObject *
HunSpell_add(HunSpell * self, PyObject *args)
{
    char *word;
    int retvalue;

    if (!PyArg_ParseTuple(args, "s", &word))
        return NULL;
    retvalue = Hunspell_add(self->handle, word);

    return Py_BuildValue("i", retvalue);
}

static PyObject *
HunSpell_add_with_affix(HunSpell * self, PyObject *args)
{
    char *word, *example;
    int retvalue;

    if (!PyArg_ParseTuple(args, "ss", &word, &example))
        return NULL;
    retvalue = Hunspell_add_with_affix(self->handle, word, example);

    return Py_BuildValue("i", retvalue);
}

谢谢你。

更新：

正如@RedX 所暗示的，我尝试使用 1 或 2 个参数调用 add() 函数。以下是我的发现：

例如，我使用 hu_HU（匈牙利语）字典文件（.dic 和 .aff），这是我需要为应用程序使用专门的领域词汇表扩展的文件。为了使示例对英语使用者透明，我选择了一个尚未包含在 hu_HU 词典中的名称 (McNamara)。由于匈牙利语是一种形态非常丰富的语言，所以我需要关心单词的偏角，否则单词的词干将不起作用。

McNamara遵循与已经被识别的相同的偏角模式Tamara并且可以正确地进行词干，例如对于单词 Tamarával（“with Tamara”）

import hunspell

hobj = hunspell.HunSpell('/usr/share/hunspell/hu_HU.dic', '/usr/share/hunspell/hu_HU.aff')
stem = hobj.stem("Tamarával")
print(stem)

将输出 ['Tamara']，这是正确的。

现在，如果我尝试使用新单词和示例调用 add()：

import hunspell

hobj = hunspell.HunSpell('/usr/share/hunspell/hu_HU.dic', '/usr/share/hunspell/hu_HU.aff')
hobj.add("McNamara", "Tamara")

这会给我一个TypeError: function takes exactly 1 argument (2 given). 然而@RedX 基于 C 代码的建议似乎是合乎逻辑的。

此外，如果我使用单个参数调用 add("McNamara") ，它似乎只会为当前会话添加新单词，而不是为脚本的下一次运行添加新单词，例如：

hobj.add("McNamara")
print(hobj.spell("McNamara"))

这会打印True，但是下次我只使用最后一行运行脚本时，它将返回一个False.

score 1 · Accepted Answer

您错过了 C 绑定代码中的一个细节。有两种不同的功能。

第一个是add，它将一个单词添加到当前使用的字典中（仅用于运行时）。它允许您调用spell它。
第二个是add_with_affix它允许您在字典中添加一个单词并从另一个单词中复制标志。

例如（处理法语字典）：

>>> hf.spell("pipoteuse")
False  # word not in the dict
>>> hf.stem("pipoteuses")  # try some classic plural stem
[]  # no stem
>>> hf.analyze("pipoteuse")
[]  # no analysis
>>> hf.add_with_affix("pipoteuse", "chanteuse")
0  # 0 = succesful operation
>>> hf.spell("pipoteuse")
True   # word in the dict now
>>> hf.analyze('pipoteuse')
[b' st:pipoteuse is:fem is:sg']  # flags copied from "chanteuse", is feminin singular and stem is itself (like chanteuse)
>>> hf.stem("pipoteuses")
[b'pipoteuse']  # now stem the plural of this fake word

一些链接更新在路上：

新的存储库在这里：https ://github.com/blatinier/pyhunspell
最新版本（0.4.0）现在有一些适用于所有功能的 pydoc。（虽然没有在线文档）

python - 如何使用 pyhunspell 向 .dic/.aff 文件添加新单词？

1 回答 1

Related

Reference