elasticsearch - 如何渗透 simple_query_string/query_string 查询

Question

指数：

{
    "settings": {
        "index.percolator.map_unmapped_fields_as_text": true,
    },
    "mappings": {
        "properties": {
            "query": {
                "type": "percolator"
            }
        }
    }
}

此测试过滤器查询有效

{
    "query": {
        "match": {
            "message": "blah"
        }
    }
}

此查询不起作用

{
    "query": {
        "simple_query_string": {
            "query": "bl*"
        }
    }
}

结果：

{"took":15,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":{"value":1,"relation":"eq"},"max_score":0.13076457,"hits":[{"_index":"my-index","_type":"_doc","_id":"1","_score":0.13076457,"_source":{"query":{"match":{"message":"blah"}}},"fields":{"_percolator_document_slot":[0]}}]}}

为什么这个 simple_query_string 查询与文档不匹配？

score 3 · Accepted Answer

我也不明白你在问什么。可能是你对渗滤器不是很了解？这是我现在刚刚尝试的一个例子。

假设您有一个索引（我们称之为索引test），您想在其中索引一些文档。该索引具有以下映射（只是我的测试设置中的随机测试索引）：

{  
    "settings": {
        "analysis": {
          "filter": {
            "email": {
              "type": "pattern_capture",
              "preserve_original": true,
              "patterns": [
                "([^@]+)",
                "(\\p{L}+)",
                "(\\d+)",
                "@(.+)",
                "([^-@]+)"
              ]
            }
          },
          "analyzer": {
            "email": {
              "tokenizer": "uax_url_email",
              "filter": [
                "email",
                "lowercase",
                "unique"
              ]
            }
          }
        }
      },
    "mappings": {
        "properties": {
            "code": {
                "type": "long"
            },
            "date": {
                "type": "date"
            },
            "part": {
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                    }
                }
            },
            "val": {
                "type": "long"
            },
            "email": {
              "type": "text",
              "analyzer": "email"
            }
        }
    }
}

您注意到它有一个自定义email分析器，可以将类似的东西拆分foo@bar.com为这些标记：foo@bar.com, foo, bar.com, bar, com.

正如文档所说，您可以创建一个单独的过滤器索引，该索引将仅保存您的过滤器查询，而不是文档本身。而且，即使 percolator 索引不包含文档本身，它也应该保存应该保存文档的索引的映射（test在我们的例子中）。

这是渗透器索引（我称之为它percolator_index）的映射，它还具有用于拆分email字段的特殊分析器：

{  
    "settings": {
        "analysis": {
          "filter": {
            "email": {
              "type": "pattern_capture",
              "preserve_original": true,
              "patterns": [
                "([^@]+)",
                "(\\p{L}+)",
                "(\\d+)",
                "@(.+)",
                "([^-@]+)"
              ]
            }
          },
          "analyzer": {
            "email": {
              "tokenizer": "uax_url_email",
              "filter": [
                "email",
                "lowercase",
                "unique"
              ]
            }
          }
        }
      },
    "mappings": {
        "properties": {
            "query": {
                "type": "percolator"
            },
            "code": {
                "type": "long"
            },
            "date": {
                "type": "date"
            },
            "part": {
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                    }
                }
            },
            "val": {
                "type": "long"
            },
            "email": {
              "type": "text",
              "analyzer": "email"
            }
        }
    }
}

它的映射和设置几乎与我的原始索引相同，唯一的区别是添加到映射query中的类型的附加字段。percolator

你感兴趣的查询吧simple_query_string——应该放到一个文档里面percolator_index。像这样：

PUT /percolator_index/_doc/1?refresh
{
    "query": {
        "simple_query_string" : {
            "query" : "month foo@bar.com",
            "fields": ["part", "email"]
        }
    }
}

为了让它更有趣，我在其中添加了email要在查询中专门搜索的字段（默认情况下，所有这些都被搜索）。

现在，目的是测试一个文档，该文档最终应该从您的渗透器索引test中针对该simple_query_string查询进入索引。例如：

GET /percolator_index/_search
{
  "query": {
    "percolate": {
      "field": "query",
      "document": {
        "date":"2004-07-31T11:57:52.000Z","part":"month","code":109,"val":0,"email":"foo@bar.com"
      }
    }
  }
}

document显然，下面是您未来（尚不存在）的文件。这将与上面定义simple_query_string的匹配，并将导致匹配：

{
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 0.39324823,
        "hits": [
            {
                "_index": "percolator_index",
                "_type": "_doc",
                "_id": "1",
                "_score": 0.39324823,
                "_source": {
                    "query": {
                        "simple_query_string": {
                            "query": "month foo@bar.com",
                            "fields": [
                                "part",
                                "email"
                            ]
                        }
                    }
                },
                "fields": {
                    "_percolator_document_slot": [
                        0
                    ]
                }
            }
        ]
    }
}

如果我改为渗透此文档会怎样：

{
  "query": {
    "percolate": {
      "field": "query",
      "document": {
        "date":"2004-07-31T11:57:52.000Z","part":"month","code":109,"val":0,"email":"foo"
      }
    }
  }
}

（请注意，电子邮件只是foo）这是结果：

{
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 0.26152915,
        "hits": [
            {
                "_index": "percolator_index",
                "_type": "_doc",
                "_id": "1",
                "_score": 0.26152915,
                "_source": {
                    "query": {
                        "simple_query_string": {
                            "query": "month foo@bar.com",
                            "fields": [
                                "part",
                                "email"
                            ]
                        }
                    }
                },
                "fields": {
                    "_percolator_document_slot": [
                        0
                    ]
                }
            }
        ]
    }
}

请注意，分数略低于第一个渗透文档。这可能是这样的，因为foo（我的电子邮件）只匹配了我所分析的术语中的一个foo@bar.com，而foo@bar.com会匹配所有的术语（从而给出更好的分数）

不知道你在说什么分析仪。我认为上面的示例涵盖了我认为可能有点令人困惑的唯一“分析器”问题/未知。

elasticsearch - 如何渗透 simple_query_string/query_string 查询

1 回答 1

Related

Reference