elasticsearch - 如何处理弹性搜索结构化查询中的通配符

Question

我的用例需要使用尾随通配符查询我们的弹性搜索域。我想就在查询中处理此类通配符的最佳做法征求您的意见。

您是否认为添加以下子句是查询的好习惯：

"query" : { 
    "query_string" : { 
        "query" :   "attribute:postfix*",
        "analyze_wildcard" : true,
        "allow_leading_wildcard" : false,
        "use_dis_max" : false
    } 
}

我不允许使用前导通配符，因为它是一项繁重的操作。但是，我想从长远来看，为每个查询请求分析通配符有多好。我的理解是，如果查询实际上没有任何通配符，则分析通配符将没有影响。那是对的吗？

score 2 · Accepted Answer

如果您有可能更改映射类型和索引设置，那么正确的方法是创建一个自定义分析器，该分析器带有一个edge-n-gram 标记过滤器，它将索引该attribute字段的所有前缀。

curl -XPUT http://localhost:9200/your_index -d '{
    "settings": {
        "analysis": {
            "filter": {
                "edge_filter": {
                    "type": "edgeNGram",
                    "min_gram": 1,
                    "max_gram": 15
                }
            },
            "analyzer": {
                "attr_analyzer": {
                    "type": "custom",
                    "tokenizer": "standard",
                    "filter": ["lowercase", "edge_filter"]
                }
            }
        }
    },
    "mappings": {
        "your_type": {
            "properties": {
                "attribute": {
                    "type": "string",
                    "analyzer": "attr_analyzer",
                    "search_analyzer": "standard"
                }
            }
        }
    }
}'

然后，当您索引文档时，attribute字段值（例如）postfixing将被索引为以下标记：p, po, pos, post, postf, postfi, postfix, postfixi, postfixin, postfixing。

最后，您可以使用这样的简单查询轻松查询attribute字段的postfix值match。无需在查询字符串查询中使用性能不佳的通配符。

{
  "query": {
     "match" : {
        "attribute" : "postfix"
     }
  }
}

elasticsearch - 如何处理弹性搜索结构化查询中的通配符

1 回答 1

Related

Reference