database - Elasticsearch - River 和 nGrams

Question

我正在使用带有河流插件的 ES，因为我正在使用 couchDB，并且我正在尝试使用 nGrams 进行查询。我基本上已经完成了我需要的所有事情，除了当有人输入空格时，查询无法正常工作。这是因为 ES 对查询的每个元素进行了标记，将其按空格分隔。

这是我需要做的：

查询字符串中的部分文本：

查询：“Hello Wor”响应：“Hello World，Hello Word”/排除“Hello，World，Word”
按我指定的标准对结果进行排序；
不区分大小写。

这是我在这个问题之后所做的：如何使用 ElasticSearch 搜索单词的一部分

curl -X PUT  'localhost:9200/_river/myDB/_meta' -d '
{
"type" : "couchdb",
"couchdb" : {
    "host" : "localhost",
    "port" : 5984,
    "db" : "myDB",
    "filter" : null
},
   "index" : {
    "index" : "myDB",
    "type" : "myDB",
    "bulk_size" : "100",
    "bulk_timeout" : "10ms",
    "analysis" : {
               "index_analyzer" : {
                          "my_index_analyzer" : {
                                        "type" : "custom",
                                        "tokenizer" : "standard",
                                        "filter" : ["lowercase", "mynGram"]
                          }
               },
               "search_analyzer" : {
                          "my_search_analyzer" : {
                                        "type" : "custom",
                                        "tokenizer" : "standard",
                                        "filter" : ["standard", "lowercase", "mynGram"]
                          }
               },
               "filter" : {
                        "mynGram" : {
                                   "type" : "nGram",
                                   "min_gram" : 2,
                                   "max_gram" : 50
                        }
               }
    }
}
}
'

然后，我将为排序添加一个映射：

curl -s -XGET 'localhost:9200/myDB/myDB/_mapping' 
{
"sorting": {
       "Title": {
            "fields": {
                "Title": {
                     "type": "string"
                  }, 
                "untouched": {
                    "include_in_all": false, 
                    "index": "not_analyzed", 
                    "type": "string"
                    }
               }, 
              "type": "multi_field"
         },
        "Year": {
              "fields": {
                   "Year": {
                       "type": "string"
                       }, 
                       "untouched": {
                           "include_in_all": false, 
                           "index": "not_analyzed", 
                           "type": "string"
                         }
                     }, 
                    "type": "multi_field"
        }
     }
    }
   }'

我已经添加了我使用的所有信息，只是为了完整。无论如何，通过这种设置，我认为应该可以工作，每当我尝试获得一些结果时，该空间仍用于拆分我的查询，例如：

  http://localhost:9200/myDB/myDB/_search?q=Title:(Hello%20Wor)&pretty=true

返回包含“Hello”和“Wor”的任何内容（我通常不使用括号，但我在示例中看到过它们，但结果看起来仍然非常相似）。

任何帮助都非常感谢，因为这让我非常烦恼。

更新：最后，我意识到我不需要 nGram。一个正常的索引就可以了；只需用“ AND ”替换查询的空格就可以了。

例子：

 Query: "Hello World"  --->  Replaced as "(*Hello And World*)"

score 1 · Accepted Answer

现在没有弹性搜索设置，但也许这对文档有帮助？

http://www.elasticsearch.org/guide/reference/query-dsl/match-query.html

Types of Match Queries

boolean

The default match query is of type boolean. It means that the text provided is analyzed and the analysis process constructs a boolean query from the provided text. The operator flag can be set to or or and to control the boolean clauses (defaults to or).

The analyzer can be set to control which analyzer will perform the analysis process on the text. It default to the field explicit mapping definition, or the default search analyzer.

fuzziness can be set to a value (depending on the relevant type, for string types it should be a value between 0.0 and 1.0) to constructs fuzzy queries for each term analyzed. The prefix_length and max_expansions can be set in this case to control the fuzzy process. If the fuzzy option is set the query will use constant_score_rewrite as its rewrite method the rewrite parameter allows to control how the query will get rewritten.

Here is an example when providing additional parameters (note the slight change in structure, message is the field name):

{
    "match" : {
        "message" : {
            "query" : "this is a test",
            "operator" : "and"
        }
    }
}

database - Elasticsearch - River 和 nGrams

1 回答 1

Related

Reference