elasticsearch - 模式分析器不适用于弹性搜索中的 UUID

Question

我正在使用elasticsearch 7.x 版并使用以下映射创建了一个帐户索引。

    curl --location --request PUT 'http://localhost:9200/accounts' \
--header 'Content-Type: application/json' \
--data-raw '{
    "mappings": {
            "properties": {
                "type": {"type": "keyword"},
                "id": {"type": "keyword"},
                "label": {"type": "keyword"},
                "lifestate": {"type": "keyword"},
                "name": {"type": "keyword"},
                "users": {"type": "text"}
            }
    }
}'

我将用户存储为一个数组。在我的用例中，一个帐户可以有 n 个用户。所以我以以下格式存储它。

curl --location --request PUT 'http://localhost:9200/accounts/_doc/account3' \
--header 'Content-Type: application/json' \
--data-raw '{
    "id" : "account_uuid",
    "name" : "Account_Description",
    "users" : [
        "id:6de57db5-8fdb-4a39-ab46-21af623692ea~~status:ACTIVE",
        "id:9611e2be-784f-4a07-b5de-564b3820a660~~status:INACTIVE"
    ]
}'

为了根据用户 ID 及其状态进行搜索，我创建了一个模式分析器，它由 ~~ 符号分割，如下所示。

curl --location --request PUT 'http://localhost:9200/accounts/_settings' \
--header 'Content-Type: application/json' \
--data-raw '{
  "settings": {
    "analysis": {
      "analyzer": {
        "p_analyzer": { 
          "type": "pattern",
          "pattern" :"~~"
        }
      }
    }
  }
}'

搜索查询调用是

curl --location --request GET 'http://localhost:9200/accounts/_search' \
--header 'Content-Type: application/json' \
--data-raw '{
    "query": {
        "bool": {
            "filter": [ 
                { "term": {"id": "account_uuid"} },
                { "match" : {"users" : {
                    "query" : "id:<user_id>",
                    "analyzer" : "p_analyzer"
                }}}
            ]   
        }
    }
}'

如果用户标识格式是纯字符串，这确实有效。也就是说，如果用户 id 以非 UUID 格式存储，则效果很好。但它不适用于 UUID 格式的 id 。如何使这个工作？

score 2 · Accepted Answer

修改您的分析器以包含-应该解决您的问题的 hypen，因为它会为 UUID 创建令牌。

{
  "settings": {
    "analysis": {
      "analyzer": {
        "p_analyzer": {
          "type":      "pattern",
          "pattern":   "~~|-",  --> note hypen is included `-`
          "lowercase": true
        }
      }
    }
  }
}

使用上面的分析器生成以下令牌

发布 /your-index/_analyze

{
  "text" : "6de57db5-8fdb-4a39-ab46-21af623692ea~~status:ACTIVE",
  "analyzer" : "my_email_analyzer"
}

生成的令牌

{
    "tokens": [
        {
            "token": "6de57db5",
            "start_offset": 0,
            "end_offset": 8,
            "type": "word",
            "position": 0
        },
        {
            "token": "8fdb",
            "start_offset": 9,
            "end_offset": 13,
            "type": "word",
            "position": 1
        },
        {
            "token": "4a39",
            "start_offset": 14,
            "end_offset": 18,
            "type": "word",
            "position": 2
        },
        {
            "token": "ab46",
            "start_offset": 19,
            "end_offset": 23,
            "type": "word",
            "position": 3
        },
        {
            "token": "21af623692ea",
            "start_offset": 24,
            "end_offset": 36,
            "type": "word",
            "position": 4
        },
        {
            "token": "status:active",
            "start_offset": 38,
            "end_offset": 51,
            "type": "word",
            "position": 5
        }
    ]
}

现在搜索6de57db5-8fdb-4a39-ab46-21af623692ea会将其分解为6de57db5, 8fdb, 4a39, 等等，并且将匹配在索引时生成的令牌并会出现在搜索结果中。

elasticsearch - 模式分析器不适用于弹性搜索中的 UUID

1 回答 1

Related

Reference