lucene - Lucene 拼写检查器 3.6 字符集

Question

我需要帮助设置 lucene 拼写检查器的字符集（版本 3.6，包括核心 lucene 和拼写检查器）。我的字典（“D:\dictionary.txt”）有英语和俄语单词。我的代码适用于英文文本。例如，它返回我正确的单词“你好”的拼写。但它不适用于俄语。例如，当我拼错一些俄语单词时，编译器会引发异常（线程“main”java.lang.ArrayIndexOutOfBoundsException：0 中的异常）它找不到任何俄语单词的建议。

这是我的代码：

        RAMDirectory spellCheckerDir = new RAMDirectory();
        SpellChecker spellChecker = new SpellChecker(spellCheckerDir);
        StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_36);
        IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_36, analyzer);
        InputStreamReader isr = new InputStreamReader(new FileInputStream(new File("D:\\dictionary.txt")), "UTF-8");
        PlainTextDictionary dictionary = new PlainTextDictionary(isr);
        spellChecker.indexDictionary(dictionary, config, true);
        suggestions = spellChecker.suggestSimilar("hwllo", 1); // word 'hello' is misspeled like 'hwllo'

score 0 · Accepted Answer

根据您的代码，我可以提供的最佳选择（它很有用，10x）。我刚刚分别加载了两个字典，也应该在组合文件中工作。

    RAMDirectory spellCheckerDir = new RAMDirectory();
    SpellChecker spellChecker = new SpellChecker(spellCheckerDir);
    StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_44);
    IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_44, analyzer);
    InputStreamReader isr = new InputStreamReader(new FileInputStream(new File("d:/dictionaries/English/words.english")), "UTF-8");
    PlainTextDictionary dictionary = new PlainTextDictionary(isr);
    spellChecker.indexDictionary(dictionary, config, true);
    isr = new InputStreamReader(new FileInputStream(new File("d:/dictionaries/Swedish/words.swedish")), "UTF-8");
    PlainTextDictionary swdictionary = new PlainTextDictionary(isr);
    spellChecker.indexDictionary(swdictionary, config, true);
    String wordForSuggestions = "hwllo";
    int suggestionsNumber = 5;

    String[] suggestions = spellChecker.suggestSimilar("hwllo", suggestionsNumber); // word 'hello' is misspeled like 'hwllo'

lucene - Lucene 拼写检查器 3.6 字符集

1 回答 1

Related

Reference