elasticsearch - 如何索引 Elasticsearch 中的多部分字段数组

Question

我将有一个具有多部分字段的数组的类型。该字段中的数据将如下所示：

grp type num
111 ABC 112233445566
123 DEF 192898048901
222 ABC 180920948012
333 XWZ 112233445566

我想在 num 上搜索以找到我的文档。我还希望能够搜索 type 和 num 以找到我的文档。可选地包括所有三个：grp=111 type=ABC num=112233445566

我不想要的是这些复合值的交叉匹配.. IE, type=XWZ 和 num=192898048901 将是一个错误的命中

那么我是否使用自定义标记器将这些实现为 multi_fields？（大概会连接起来创建三种键类型）

或者复合词 tokenfilter 或其他一些技术可以帮助我实现这一点。TIA

score 0 · Accepted Answer

您可以将组合索引为附加字段：

"doc" : {
    "properties" : {
...
        "array_type" : {
            "type" : "object",
            "properties" : {
                "grp" : { "type" : "integer", "index" : "not_analyzed"},
                "type" : { "type" : "string", "index" : "not_analyzed" },
                "num" : { "type" : "integer", "index" : "not_analyzed"" },
                "type_num" : { "type" : "string", "index" : "not_analyzed" },
                "grp_type_num" : { "type" : "string", "index" : "not_analyzed" },
                }
            },
...
    }
}

查询时，请使用与您拥有的信息相匹配的字段。例如，要搜索 type 和 num，您可以编写如下查询：

{
  "size": 20,
  "from": 0,
  "query": {
    "filtered": {
      "filter": {
        "and": [
          {
            "term": {
              "type_num": "XWZ 112233445566"
            }
          }
        ]
      }
    }
  }
}

score 0 · Accepted Answer

好吧，我找到了一种更简单的方法...关键是我只需要能够通过三种可能的组合进行搜索...实际上不需要直接引用 grp typ 或 num。

Path_analyer 正在做我想做的事：

# Create a new index with custom path_hierarchy analyzer 
# See http://www.elasticsearch.org/guide/reference/index-modules/analysis/pathhierarchy-tokenizer.html
curl -XPUT "localhost:9200/accts-test" -d '{
    "settings": {
        "analysis": {
            "analyzer": {
                "accts-analyzer": {
                    "type": "custom",
                    "tokenizer": "accts-tokenizer"
                }
            },
            "tokenizer": {
                "accts-tokenizer": {
                    "type": "path_hierarchy",
                    "delimiter": "-",
                    "reverse": "true"
                }
            }
        }
    },
    "mappings": {
        "_default_": {
          "_timestamp" : {
            "enabled" : true,
            "store" : true
          }
        },
        "doc": {
            "properties": {
                "name": { "type": "string"},
                "accts": {
                    "type": "string",
                    "index_name": "acct",
                    "index_analyzer": "accts-analyzer",
                    "search_analyzer": "keyword"
               }
            }
        }
    }
}'

然后通过 _analyzer 端点对其进行测试显示：

# curious about path analyzer? test it:
echo testing analyzier
curl -XGET 'localhost:9200/accts-test/_analyze?analyzer=accts-analyzer&pretty=1' -d '111-BBB-2233445566'
echo
{
  "tokens" : [ {
    "token" : "111-BBB-2233445566",
    "start_offset" : 0,
    "end_offset" : 18,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "BBB-2233445566",
    "start_offset" : 4,
    "end_offset" : 18,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "2233445566",
    "start_offset" : 8,
    "end_offset" : 18,
    "type" : "word",
    "position" : 1
  } ]
}

elasticsearch - 如何索引 Elasticsearch 中的多部分字段数组

2 回答 2

Related

Reference