1

我有一个json结构,如下所示:

{"DocumentName":"es","DocumentId":"2","Content": [{"PageNo":1,"Text": "The full text queries enable you to search analyzed text fields such as the body of an email. The query string is processed using the same analyzer that was applied to the field during indexing."},{"PageNo":2,"Text": "The query string is processed using the same analyzer that was applied to the field during indexing."}]}

我需要获取 Content.Text 字段的词干分析结果。为此,我在创建索引时创建了一个映射。如下所示:

curl -X PUT "localhost:9200/myindex?pretty" -H "Content-Type: application/json" -d"{
    "settings": {
        "analysis": {
            "analyzer": {
                "my_analyzer": {
                    "tokenizer": "standard",
                    "filter": ["lowercase", "my_stemmer"]
                }
            },
            "filter": {
                "my_stemmer": {
                    "type": "stemmer",
                    "name": "english"
                }
            }
        }
    }
}, {
    "mappings": {
        "properties": {
            "DocumentName": {
                "type": "text"
            },
            "DocumentId": {
                "type": "keyword"
            },
            "Content": {
                "properties": {
                    "PageNo": {
                        "type": "integer"
                    },
                    "Text": "_all": {
                        "type": "text",
                        "analyzer": "my_analyzer",
                        "search_analyzer": "my_analyzer"
                    }
                }
            }
        }
    }
}
}"

我检查了创建的分析器:

curl -X GET "localhost:9200/myindex/_analyze?pretty" -H "Content-Type: application/json" -d"{\"analyzer\":\"my_analyzer\",\"text\":\"indexing\"}"

它给出了结果:

{
  "tokens" : [
    {
      "token" : "index",
      "start_offset" : 0,
      "end_offset" : 8,
      "type" : "<ALPHANUM>",
      "position" : 0
    }
  ]
}

但是在将 json 上传到索引后,当我尝试搜索“索引”时,它返回 0 个结果。

res = requests.get('http://localhost:9200') 
es = Elasticsearch([{'host': 'localhost', 'port': '9200'}])
res= es.search(index='myindex', body={"query": {"match": {"Content.Text": "index"}}})

任何帮助将不胜感激。在此先感谢您。

4

1 回答 1

1

忽略我的评论。词干分析器正在工作。尝试以下操作:

映射:

curl -X DELETE "localhost:9200/myindex"

curl -X PUT "localhost:9200/myindex?pretty" -H "Content-Type: application/json" -d'
{ 
    "settings":{ 
       "analysis":{ 
          "analyzer":{ 
             "english_exact":{ 
                "tokenizer":"standard",
                "filter":[ 
                   "lowercase"
                ]
             }
          }
       }
    },
    "mappings":{ 
       "properties":{ 
          "DocumentName":{ 
             "type":"text"
          },
          "DocumentId":{ 
             "type":"keyword"
          },
          "Content":{ 
             "properties":{ 
                "PageNo":{ 
                   "type":"integer"
                },
                "Text":{ 
                   "type":"text",
                   "analyzer":"english",
                   "fields":{ 
                      "exact":{ 
                         "type":"text",
                         "analyzer":"english_exact"
                      }
                   }
                }
             }
          }
       }
    }
 }'

数据:

curl -XPOST "localhost:9200/myindex/_doc/1" -H "Content-Type: application/json" -d'
{ 
   "DocumentName":"es",
   "DocumentId":"2",
   "Content":[ 
      { 
         "PageNo":1,
         "Text":"The full text queries enable you to search analyzed text fields such as the body of an email. The query string is processed using the same analyzer that was applied to the field during indexing."
      },
      { 
         "PageNo":2,
         "Text":"The query string is processed using the same analyzer that was applied to the field during indexing."
      }
   ]
}'

询问:

curl -XGET 'localhost:9200/myindex/_search?pretty' -H "Content-Type: application/json"  -d '
{ 
   "query":{ 
      "simple_query_string":{ 
         "fields":[ 
            "Content.Text"
         ],
         "query":"index"
      }
   }
}'

只返回一个文档 - 正如预期的那样。我还测试了以下词干,它们都适用于建议的映射:应用(应用)、文本(文本)、使用(使用)。

Python 示例:

import requests
from elasticsearch import Elasticsearch

res = requests.get('http://localhost:9200')
es = Elasticsearch([{'host': 'localhost', 'port': '9200'}])
res = es.search(index='myindex', body={"query": {"match": {"Content.Text": "index"}}})

print(res)

在 Elasticsearch 7.4 上测试。

于 2019-11-19T12:39:15.403 回答