elasticsearch - Elastichsearch 范围查询不适用于土耳其语单词的 icu_collation

Question

我的文档包含土耳其语单词，例如“şa、za、sb、şc、sd、şe”等，作为 customer_address 属性。

我已将我的文档编入索引，如下所述，因为我想根据 customer_address 字段对文档进行排序。排序运行良好。排序和校对

现在我正在尝试对“customer_address”字段应用范围查询。当我发送下面的查询时，我得到了一个空结果。（预期结果：sb、sd、şa、şd）

curl -XGET http://localhost:9200/sampleindex/_search?pretty -d '{"query":{"bool":{"filter":[{"range":{"customer_address.sort":{"from":"plaj","to":"şcam","include_lower":true,"include_upper":true,"boost":1.0}}}],"disable_coord":false,"adjust_pure_negative":true,"boost":1.0}}}'

当我查询时，我看到我的字段按照文档中的规定进行了加密。

curl -XGET http://localhost:9200/sampleindex/_search?pretty -d '{"aggs":{"myaggregation":{"terms":{"field":"customer_address.sort","size":10000}}},"size":0}'

{
"took" : 5,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
    "total" : 6,
    "max_score" : 0.0,
    "hits" : [ ]
  }
"aggregations" : {
    "a" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "⚕䁁䀠怀\u0001",
          "doc_count" : 1
        },
        {
          "key" : "⚗䁁䀠怀\u0001",
          "doc_count" : 1
        },
        {
          "key" : "✁ੀ⃀ၠ\u0000\u0000",
          "doc_count" : 1
        },
        {
          "key" : "✁ୀ⃀ၠ\u0000\u0000",
          "doc_count" : 1
        },
        {
          "key" : "✁ీ⃀ၠ\u0000\u0000",
          "doc_count" : 1
        },
        {
          "key" : "ⶔ䁁䀠怀\u0001",
          "doc_count" : 1
        }
      ]
    }
  }
}

那么，我应该如何在范围查询中发送我的参数才能获得成功的结果？

提前致谢。

我的映射：

curl -XGET http://localhost:9200/sampleindex?pretty
{
  "sampleindex" : {
    "aliases" : { },
    "mappings" : {
      "invoice" : {
        "properties" : {
          "customer_address" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword"
              },
              "sort" : {
                "type" : "text",
                "analyzer" : "turkish",
                "fielddata" : true
              }
            }
          }
       } 
    },
    "settings" : {
      "index" : {
        "number_of_shards" : "5",
        "provided_name" : "sampleindex",
        "max_result_window" : "2147483647",
        "creation_date" : "1521732167023",
        "analysis" : {
          "filter" : {
            "turkish_phonebook" : {
              "variant" : "@collation=phonebook",
              "country" : "TR",
              "language" : "tr",
              "type" : "icu_collation"
            },
            "turkish_lowercase" : {
              "type" : "lowercase",
              "language" : "turkish"
            }
          },
          "analyzer" : {
            "turkish" : {
              "filter" : [
                "turkish_lowercase",
                "turkish_phonebook"
              ],
              "tokenizer" : "keyword"
            }
          }
        },
        "number_of_replicas" : "1",
        "uuid" : "ChNGX459TUi8VnBLTMn-Ng",
        "version" : {
          "created" : "5020099"
        }
      }
    }
  }
}

score 0 · Accepted Answer

我通过在创建索引期间定义一个带有char 过滤器的分析器解决了我的问题。我不知道这是否是一个好的解决方案，但我无法通过 ICU 的“turkish_phonebook”解决，所以该解决方案目前似乎有效。

首先，我使用“turkish_collation_analyzer”创建了一个索引。然后对于需要这个的属性，我创建了一个字段“property.tr”来使用这个定义的分析器。最后，在范围查询期间，我按照该字段的预期转换了我的值。

"settings": {
  "index": {
    "number_of_shards": "5",
    "provided_name": "sampleindex",
    "max_result_window": "2147483647",
    "creation_date": "1522050241730",
    "analysis": {
      "analyzer": {
        "turkish_collation_analyzer": {
          "char_filter": [
            "turkish_char_filter"
          ],
          "tokenizer": "keyword"
        }
      },
      "char_filter": {
        "turkish_char_filter": {
          "type": "mapping",
          "mappings": [
            "a => x01",
            "b => x02",
            .,
            .,
            .,

          ]
        }
      }
    },
    "number_of_replicas": "1",
    "uuid": "hiEqIpjYTLePjF142B8WWQ",
    "version": {
      "created": "5020099"
    }
  }
}

elasticsearch - Elastichsearch 范围查询不适用于土耳其语单词的 icu_collat​​ion

1 回答 1

Related

Reference

elasticsearch - Elastichsearch 范围查询不适用于土耳其语单词的 icu_collation