5

我想就如何基于令牌完成一个术语提出建议,类似于 google 之类的自动完成功能,但仅使用一个令牌或单词。

我想搜索将被标记的文件名。例如,“BRAND_Connect_A1233.jpg”被标记为“brand”、“connect”、“a1234”和“jpg”。

现在我想请教一些关于“ Con ”的建议。该建议应提供完整的匹配令牌,而不是完整的文件名:

  • 连接
  • 轮廓
  • 概念
  • ...

“A12”的建议应该是“A1234”、“A1233”、“A1233”......

例子

使用查询、构面和过滤器工作正常。

首先,我创建了一个包含标记器和过滤器的映射:

curl -XPUT 'localhost:9200/files/?pretty=1'  -d '
{
   "settings" : {
      "analysis" : {
         "analyzer" : {
            "filename_search" : {
               "tokenizer" : "filename",
               "filter" : ["lowercase"]
            },
            "filename_index" : {
               "tokenizer" : "filename",
               "filter" : ["lowercase","edge_ngram"]
            }
         },
         "tokenizer" : {
            "filename" : {
               "pattern" : "[^[;_\\.\\/]\\d]+",
               "type" : "pattern"
            }
         },
         "filter" : {
            "edge_ngram" : {
               "side" : "front",
               "max_gram" : 20,
               "min_gram" : 2,
               "type" : "edgeNGram"
            }
         }
      }
   },
   "mappings" : {
      "file" : {
         "properties" : {
            "filename" : {
               "type" : "string",
               "search_analyzer" : "filename_search",
               "index_analyzer" : "filename_index"
            }
         }
      }
   }
}'

两种分析器都运行良好:

curl -XGET 'localhost:9200/files/_analyze?pretty=1&text=BRAND_ConnectBlue_A1234.jpg&analyzer=filename_search'
curl -XGET 'localhost:9200/files/_analyze?pretty=1&text=BRAND_ConnectBlue_A1234.jpg&analyzer=filename_index'

现在我添加了一些示例数据

curl -X POST "localhost:9200/files/file" -d '{ "filename" : "BRAND_ConnectBlue_A1234.jpg"}'
curl -X POST "localhost:9200/files/file" -d '{ "filename" : "BRAND_Connect_A1233.jpg"}'
curl -X POST "localhost:9200/files/file" -d '{ "filename" : "BRAND_ConceptSpace_A1244.jpg"}'
curl -X POST "localhost:9200/files/file" -d '{ "filename" : "COMPANY_Connect_A1222.jpg"}'
curl -X POST "localhost:9200/files/file" -d '{ "filename" : "COMPANY_Concept_A1233.jpg"}'
curl -X POST "localhost:9200/files/file" -d '{ "filename" : "DEALER_Connect_B1234_.jpg"}'
curl -X POST "localhost:9200/files/file" -d '{ "filename" : "DEALER_Contour21_B1233.jpg"}'
curl -X POST "localhost:9200/files/file" -d '{ "filename" : "DEALER_ConceptCube_B2233.jpg"}'
curl -X POST "localhost:9200/files/_refresh"

获得所需建议的各种方法都不能提供预期的结果。我曾尝试命名分析器并尝试分析器和通配符的各种组合。

curl -XGET 'localhost:9200/files/_suggest?pretty=true'  -d '{
    "text" : "con",
    "simple_phrase" : {
      "phrase" : {
        "field" : "filename",
        "size" : 15,
        "real_word_error_likelihood" : 0.75,
        "max_errors" : 0.1,
        "gram_size" : 3
      }
    }
}'
curl -XGET 'localhost:9200/files/_suggest?pretty=true'  -d '{
    "my-suggestion" : {
    "text" : "con",
    "term" : {
        "field" : "filename",
        "analyzer": "filename_index"
        }
    }
}'
4

1 回答 1

0

您需要添加一个特殊的映射来使用完成建议,如官方 ElasticSearch 文档中所述。我已经修改了你的例子来展示它是如何工作的。

首先创建索引。注意filename_suggest映射。

curl -XPUT 'localhost:9200/files/?pretty=1'  -d '
{
   "settings" : {
      "analysis" : {
         "analyzer" : {
            "filename_search" : {
               "tokenizer" : "filename",
               "filter" : ["lowercase"]
            },
            "filename_index" : {
               "tokenizer" : "filename",
               "filter" : ["lowercase","edge_ngram"]
            }
         },
         "tokenizer" : {
            "filename" : {
               "pattern" : "[^[;_\\.\\/]\\d]+",
               "type" : "pattern"
            }
         },
         "filter" : {
            "edge_ngram" : {
               "side" : "front",
               "max_gram" : 20,
               "min_gram" : 2,
               "type" : "edgeNGram"
            }
         }
      }
   },
   "mappings" : {
      "file" : {
         "properties" : {
            "filename" : {
               "type" : "string",
               "analyzer": "filename_index",
               "search_analyzer" : "filename_search"
            },
            "filename_suggest": {
              "type": "completion",
              "analyzer": "simple",
              "search_analyzer": "simple",
              "payloads": true
            }
         }
      }
   }
}'

添加一些数据。请注意filename_suggesthasinput字段的方式,其中包含要匹配的关键字。

curl -X POST "localhost:9200/files/file" -d '{ "filename" : "BRAND_ConnectBlue_A1234.jpg", "filename_suggest": { "input": ["BRAND", "ConnectBlue", "A1234", "jpg"], "payload": {} } }'
curl -X POST "localhost:9200/files/file" -d '{ "filename" : "BRAND_Connect_A1233.jpg", "filename_suggest": { "input": ["BRAND", "Connect", "A1233", "jpg"], "payload": {} } }'
curl -X POST "localhost:9200/files/file" -d '{ "filename" : "BRAND_ConceptSpace_A1244.jpg", "filename_suggest": { "input": ["BRAND", "ConceptSpace", "A1244", "jpg"], "payload": {} } }'
curl -X POST "localhost:9200/files/file" -d '{ "filename" : "COMPANY_Connect_A1222.jpg", "filename_suggest": { "input": ["COMPANY", "Connect", "A1222", "jpg"], "payload": {} } }'
curl -X POST "localhost:9200/files/file" -d '{ "filename" : "COMPANY_Concept_A1233.jpg", "filename_suggest": { "input": ["COMPANY", "Concept", "A1233", "jpg"], "payload": {} } }'
curl -X POST "localhost:9200/files/file" -d '{ "filename" : "DEALER_Connect_B1234_.jpg", "filename_suggest": { "input": ["DEALER", "Connect", "B1234", "jpg"], "payload": {} } }'
curl -X POST "localhost:9200/files/file" -d '{ "filename" : "DEALER_Contour21_B1233.jpg", "filename_suggest": { "input": ["DEALER", "Contour21", "B1233", "jpg"], "payload": {} }}'
curl -X POST "localhost:9200/files/file" -d '{ "filename" : "DEALER_ConceptCube_B2233.jpg", "filename_suggest": { "input": ["DEALER", "ConceptCube", "B2233", "jpg"], "payload": {} }}'
curl -X POST "localhost:9200/files/_refresh"

现在执行查询:

curl -XPOST 'localhost:9200/files/_suggest?pretty=true'  -d '{
    "filename_suggest" : {
        "text" : "con",
        "completion": {
            "field": "filename_suggest", "size": 10
        }
    }
}'

结果:

{
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "filename_suggest" : [ {
    "text" : "con",
    "offset" : 0,
    "length" : 3,
    "options" : [ {
      "text" : "Connect",
      "score" : 2.0,
      "payload":{}
    }, {
      "text" : "Concept",
      "score" : 1.0,
      "payload":{}
    }, {
      "text" : "ConceptSpace",
      "score" : 1.0,
      "payload":{}
    }, {
      "text" : "ConnectBlue",
      "score" : 1.0,
      "payload":{}
    }, {
      "text" : "Contour21",
      "score" : 1.0,
      "payload":{}
    } ]
  } ]
}
于 2016-09-16T17:56:22.863 回答