8

i'm searching a java library or something to do stemming of italian strings of words.

The goal is to compare italian words. In this moment words like "attacco", "attacchi","attaccare" etc., are considered different, instead I want returned a true comparison.

I found something like Lucene, snowball.tartarus.org, etc. Is there something else useful, or how can I use them in java?

Thanks for answers.

4

1 回答 1

9

在此处下载适用于 Java 的Snowball

它包括一个名为org.tartarus.snowball.ext.italianStemmerextends的类SnowballStemmer

要使用 aSnowballStemmer请查看以下动词attaccare现在时的测试代码:

import org.junit.Test;
import org.tartarus.snowball.SnowballStemmer;
import org.tartarus.snowball.ext.italianStemmer;

public class SnowballItalianStemmerTest {

    @Test
    public void testSnowballItalianStemmerAttaccare() {

        SnowballStemmer stemmer = (SnowballStemmer) new italianStemmer();

        String[] tokens = "attacco attacchi attacca attacchiamo attaccate attaccano".split(" ");    
        for (String string : tokens) {
            stemmer.setCurrent(string);
            stemmer.stem();
            String stemmed = stemmer.getCurrent();
            Assert.assertEquals("attacc", stemmed);
            System.out.println(stemmed);
        }

    }

}

输出:

attacc
attacc
attacc
attacc
attacc
attacc

另一个使用示例参见TestApp.java包含在同一个 tgz 文件中。

用 Java 编写的 Lucene 使用 Snowball 进行词干提取,例如作为SnowballFilter中的过滤器。

于 2012-11-14T15:24:53.313 回答