hibernate - 在特定查询上出现错误

Question

Lucene 新手在这里。我在 java 客户端中将它与 Hibernate 一起使用，并且在特定查询中遇到此错误：

HSEARCH000146: The query string 'a' applied on field 'name' has no meaningfull tokens to  
be matched. Validate the query input against the Analyzer applied on this field.

搜索适用于所有其他查询，即使结果集为空。我的测试数据库确实有这个带有“a”的记录。这里有什么问题？

score 9 · Accepted Answer

'a' 是一个停用词，将被 StandardAnalyzer 从您的查询中过滤掉。停用词是在您搜索的语言中足够常见的词，并且被认为对生成搜索结果没有意义。这是一个简短的列表，但“a”是其中之一。

由于分析器已经删除了该术语，并且它是唯一存在的术语，因此您现在正在发送一个空查询，这是不可接受的，并且搜索失败。

对于好奇的人，这些是标准的 Lucene 英语停用词：

"a", "an", "and", "are", "as", "at", "be", "but", "by",
"for", "if", "in", "into", "is", "it",
"no", "not", "of", "on", "or", "such",
"that", "the", "their", "then", "there", "these",
"they", "this", "to", "was", "will", "with"

如果您不想删除停用词，那么您应该设置您的Analyzer不带StopFilter, 或设置空停用词。在的情况下StandardAnalyzer，您可以将自定义停止集传递给构造函数：

Analyzer analyzer = new StandardAnalyzer(CharArraySet.EMPTY_SET);

score 1 · Accepted Answer

1

你可以把

@Analyzer(impl=KeywordAnalyzer.class)

到你的领域来避免这个问题。

于 2016-06-30T03:40:45.167 回答

score 1 · Accepted Answer

建议的工作

@femtoRgon 已经解释了此错误的原因，当您尝试将用户输入标记为字符串列表然后将每个字符串输入 Hibernate 搜索查询时也会发生此问题。当你现在有一个停用词的字符串时，Hibernate 不知道如何处理这个字符串。

但是，您可以在将输入发送到 Hibernate Search 查询之前使用相同的分析器解析和验证输入。使用此方法，您可以从输入中提取相同的单词并避免错误，而无需更改为备用分析器类。

从您的实体类 MyModelClass.class 中检索当前分析器

FullTextEntityManager fullTextEntityManager = org.hibernate.search.jpa.Search
    .getFullTextEntityManager(entityManager);

QueryBuilder builder = fullTextEntityManager.getSearchFactory()
    .buildQueryBuilder().forEntity(MyModelClass.class).get();

Analyzer customAnalyzer = fullTextEntityManager.getSearchFactory()
    .getAnalyzer(MyModelClass.class);

输入分词器

/**
 * Validate input against the tokenizer and return a list of terms.
 * @param analyzer
 * @param string
 * @return
 */
public static List<String> tokenizeString(Analyzer analyzer, String string)
{
    List<String> result = new ArrayList<String>();
    try
    {
        TokenStream stream = analyzer.tokenStream(null, new StringReader(string));
        stream.reset();
        while (stream.incrementToken())
        {
            result.add(stream.getAttribute(CharTermAttribute.class).toString());
        }
        stream.close();
    } catch (IOException e)
    {
        throw new RuntimeException(e);
    }
    return result;
}

验证输入

现在您可以简单地通过同一个分析器运行您的输入字符串，并接收一个字符串列表，并像这样正确标记：

List<String> keywordsList = tokenizeString(customAnalyzer, "This is a sentence full of the evil stopwords);

并会收到这份清单

[this, sentence, full, evil, stopwords]

我的回答是基于这个和这个SO 帖子。

hibernate - 在特定查询上出现错误

3 回答 3

建议的工作

从您的实体类 MyModelClass.class 中检索当前分析器

输入分词器

验证输入

Related

Reference