6

指数:

{
    "settings": {
        "index.percolator.map_unmapped_fields_as_text": true,
    },
    "mappings": {
        "properties": {
            "query": {
                "type": "percolator"
            }
        }
    }
}

此测试过滤器查询有效

{
    "query": {
        "match": {
            "message": "blah"
        }
    }
}

此查询不起作用

{
    "query": {
        "simple_query_string": {
            "query": "bl*"
        }
    }
}

结果:

{"took":15,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":{"value":1,"relation":"eq"},"max_score":0.13076457,"hits":[{"_index":"my-index","_type":"_doc","_id":"1","_score":0.13076457,"_source":{"query":{"match":{"message":"blah"}}},"fields":{"_percolator_document_slot":[0]}}]}}

为什么这个 simple_query_string 查询与文档不匹配?

4

1 回答 1

3

我也不明白你在问什么。可能是你对渗滤器不是很了解?这是我现在刚刚尝试的一个例子。

假设您有一个索引(我们称之为索引test),您想在其中索引一些文档。该索引具有以下映射(只是我的测试设置中的随机测试索引):

{  
    "settings": {
        "analysis": {
          "filter": {
            "email": {
              "type": "pattern_capture",
              "preserve_original": true,
              "patterns": [
                "([^@]+)",
                "(\\p{L}+)",
                "(\\d+)",
                "@(.+)",
                "([^-@]+)"
              ]
            }
          },
          "analyzer": {
            "email": {
              "tokenizer": "uax_url_email",
              "filter": [
                "email",
                "lowercase",
                "unique"
              ]
            }
          }
        }
      },
    "mappings": {
        "properties": {
            "code": {
                "type": "long"
            },
            "date": {
                "type": "date"
            },
            "part": {
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                    }
                }
            },
            "val": {
                "type": "long"
            },
            "email": {
              "type": "text",
              "analyzer": "email"
            }
        }
    }
}

您注意到它有一个自定义email分析器,可以将类似的东西拆分foo@bar.com为这些标记:foo@bar.com, foo, bar.com, bar, com.

正如文档所说,您可以创建一个单独的过滤器索引,该索引将仅保存您的过滤器查询,而不是文档本身。而且,即使 percolator 索引不包含文档本身,它也应该保存应该保存文档的索引的映射(test在我们的例子中)。

这是渗透器索引(我称之为它percolator_index)的映射,它还具有用于拆分email字段的特殊分析器:

{  
    "settings": {
        "analysis": {
          "filter": {
            "email": {
              "type": "pattern_capture",
              "preserve_original": true,
              "patterns": [
                "([^@]+)",
                "(\\p{L}+)",
                "(\\d+)",
                "@(.+)",
                "([^-@]+)"
              ]
            }
          },
          "analyzer": {
            "email": {
              "tokenizer": "uax_url_email",
              "filter": [
                "email",
                "lowercase",
                "unique"
              ]
            }
          }
        }
      },
    "mappings": {
        "properties": {
            "query": {
                "type": "percolator"
            },
            "code": {
                "type": "long"
            },
            "date": {
                "type": "date"
            },
            "part": {
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                    }
                }
            },
            "val": {
                "type": "long"
            },
            "email": {
              "type": "text",
              "analyzer": "email"
            }
        }
    }
}

它的映射和设置几乎与我的原始索引相同,唯一的区别是添加到映射query中的类型的附加字段。percolator

你感兴趣的查询吧simple_query_string——应该放到一个文档里面percolator_index。像这样:

PUT /percolator_index/_doc/1?refresh
{
    "query": {
        "simple_query_string" : {
            "query" : "month foo@bar.com",
            "fields": ["part", "email"]
        }
    }
}

为了让它更有趣,我在其中添加了email要在查询中专门搜索的字段(默认情况下,所有这些都被搜索)。

现在,目的是测试一个文档,该文档最终应该从您的渗透器索引test中针对该simple_query_string查询进入索引。例如:

GET /percolator_index/_search
{
  "query": {
    "percolate": {
      "field": "query",
      "document": {
        "date":"2004-07-31T11:57:52.000Z","part":"month","code":109,"val":0,"email":"foo@bar.com"
      }
    }
  }
}

document显然,下面是您未来(尚不存在)的文件。这将与上面定义simple_query_string的匹配,并将导致匹配:

{
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 0.39324823,
        "hits": [
            {
                "_index": "percolator_index",
                "_type": "_doc",
                "_id": "1",
                "_score": 0.39324823,
                "_source": {
                    "query": {
                        "simple_query_string": {
                            "query": "month foo@bar.com",
                            "fields": [
                                "part",
                                "email"
                            ]
                        }
                    }
                },
                "fields": {
                    "_percolator_document_slot": [
                        0
                    ]
                }
            }
        ]
    }
}

如果我改为渗透此文档会怎样:

{
  "query": {
    "percolate": {
      "field": "query",
      "document": {
        "date":"2004-07-31T11:57:52.000Z","part":"month","code":109,"val":0,"email":"foo"
      }
    }
  }
}

(请注意,电子邮件只是foo)这是结果:

{
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 0.26152915,
        "hits": [
            {
                "_index": "percolator_index",
                "_type": "_doc",
                "_id": "1",
                "_score": 0.26152915,
                "_source": {
                    "query": {
                        "simple_query_string": {
                            "query": "month foo@bar.com",
                            "fields": [
                                "part",
                                "email"
                            ]
                        }
                    }
                },
                "fields": {
                    "_percolator_document_slot": [
                        0
                    ]
                }
            }
        ]
    }
}

请注意,分数略低于第一个渗透文档。这可能是这样的,因为foo(我的电子邮件)只匹配了我所分析的术语中的一个foo@bar.com,而foo@bar.com会匹配所有的术语(从而给出更好的分数)

不知道你在说什么分析仪。我认为上面的示例涵盖了我认为可能有点令人困惑的唯一“分析器”问题/未知。

于 2019-11-03T23:18:04.073 回答