0

在浏览时,我想出了一个在 lucene 中的拼写检查程序。我有兴趣从 tangentum 添加 phonetix 附加组件(特别是 metaphone)。有没有办法可以将变音器集成到我的程序中?如何整合它?

package com.lucene.spellcheck;
import java.io.File;
import java.io.IOException;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.spell.Dictionary;
import org.apache.lucene.search.spell.PlainTextDictionary;
import org.apache.lucene.search.spell.SpellChecker;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
public class SimpleSuggestionService {
private static final String F_WORD = null;
public static void main(String[] args) throws Exception {
File dir = new File("e:/spellchecker/");
Directory directory = FSDirectory.open(dir);
SpellChecker spellChecker1 = new SpellChecker(directory);
spellChecker1.indexDictionary(
new PlainTextDictionary(new File("c:/fulldictionary00.txt")));
String wordForSuggestions = "noveil";
int suggestionsNumber = 5;
String[] suggestions = spellChecker1.
suggestSimilar(wordForSuggestions, suggestionsNumber);
if (suggestions!=null && suggestions.length>0) {
for (String word : suggestions) {
System.out.println("Did you mean:" + word);
}
}
else {
System.out.println("No suggestions found for word:"+wordForSuggestions);
}
}
}    
4

1 回答 1

0

您可以传入使用所需语音算法的自定义StringDistance实现,或者以某种方式将其与其他相似性算法(例如标准LevensteinDistance。您只需要在 StringDistance 实现中实现 getDistance(String, String) 方法.也许是这样的:

public MetaphoneDistance() {
    Metaphone metaphone = new Metaphone();
}

//I'm not really familiar with the library you mentioned, but I assume generateKeys performs a double metaphone?
public float getDistance(String str1, ,String str2) {
    String[] keys1 = metaphone.getKeys(str1);  
    String[] keys2 = metaphone.getKeys(str2);
    float result = 0;
    if (key1[0] == key2[0] || key1[0] == key2[1]) result += .5
    if (key1[1] == key2[0] || key1[1] == key2[1]) result += .5
    return result;
}
于 2013-06-02T17:22:27.783 回答