这可能不是最适合的答案。希望这些建议能有所帮助。
也许您可以使用词形还原而不是词干来减少不可接受的结果。简短而密集: http: //nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html
The goal of both stemming and lemmatization is to reduce inflectional forms and
sometimes derivationally related forms of a word to a common base form.
However, the two words differ in their flavor. Stemming usually refers to a crude
heuristic process that chops off the ends of words in the hope of achieving this
goal correctly most of the time, and often includes the removal of derivational
affixes. Lemmatization usually refers to doing things properly with the use of a
vocabulary and morphological analysis of words, normally aiming to remove
inflectional endings only and to return the base or dictionary form of a word,
which is known as the lemma.
一个例子:
go,goes,going ->Lemma: go,go,go ||Stemming: go, goe, go
并使用一些预定义的规则集;这样短期单词就被概括了。例如:
I'am -> I am
should't -> should not
can't -> can not
如何处理句子中的括号。
This is a dog(Its name is doggy)
括号内的文本通常指所提及实体的别名。您可以删除它们或进行关联分析并将其视为新句子。