elasticsearch - 在 Elasticsearch 中使用 GET 请求进行查询时删除停用词

Question

我正在尝试Stop Token Filter在 Elasticsearch 的索引中实现。我已经从这里获取了以下代码。

PUT /test1
{
"settings": {
    "analysis": {
        "filter": {
            "my_stop": {
                "type":       "stop",
                "stopwords":  "_english_"
            }

        }
    }
}
}

我的数据以JSON格式存储，并有一个名为“成分”的字段，其中包含停用词。我想在整个索引（包含近 8 万条记录）中搜索有关成分标签中出现次数最多的 100 个值的信息。我用来检索结果的查询是

GET test1/_search?size=0&pretty
{
"aggs": {
"genres": {
  "terms": {
    "field": "Ingredients",
    "size": 100,
    "exclude": "[0-9].*"
  }
}
}
}

我需要从中排除我正在使用的数字exclude。但是使用Kibana它应用上面的查询不会删除Stop Words并在查询响应时保持它们显示。根据文档，它应该删除停止的单词，但它没有这样做。我是新手，找不到原因Elasticsearch。请帮我弄清楚。我正在使用elasticsearch-7.3.1和Kibana-7.3.1。我正在研究它大约两天，但没有一种方法有效。谢谢！任何帮助将非常感激。

如果我用这种方式尝试它，它可以工作，但是在GET按照上面定义的方法发出请求时，它根本不起作用。

POST test1/_analyze
{
 "analyzer": "my_stop",
 "text": "House of Dickson<br> corp"
 }

我的映射

    {
      "recipe_test" : {
"aliases" : { },
"mappings" : {
  "properties" : {
    "Author" : {
      "properties" : {
        "additionalInfo" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "description" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "eval" : {
          "type" : "boolean"
        },
        "url" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "value" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    },
    "Category" : {
      "type" : "text",
      "fields" : {
        "keyword" : {
          "type" : "keyword",
          "ignore_above" : 256
        }
      }
    },
    "Channel" : {
      "type" : "text",
      "fields" : {
        "keyword" : {
          "type" : "keyword",
          "ignore_above" : 256
        }
      }
    },
    "Cousine" : {
      "type" : "text",
      "fields" : {
        "keyword" : {
          "type" : "keyword",
          "ignore_above" : 256
        }
      }
    },
    "Ingredients" : {
      "type" : "text",
      "fields" : {
        "keyword" : {
          "type" : "keyword",
          "ignore_above" : 256
        }
      },
      "fielddata" : true
    },
    "Keywords" : {
      "type" : "text",
      "fields" : {
        "keyword" : {
          "type" : "keyword",
          "ignore_above" : 256
        }
      }
    },
    "MakingMethod" : {
      "type" : "text",
      "fields" : {
        "keyword" : {
          "type" : "keyword",
          "ignore_above" : 256
        }
      }
    },
    "Publication" : {
      "properties" : {
        "additionalInfo" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "eval" : {
          "type" : "boolean"
        },
        "published" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "url" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "value" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    },
    "Rating" : {
      "properties" : {
        "bestRating" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "ratingCount" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "ratingValue" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "worstRating" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    },
    "Servings" : {
      "type" : "text",
      "fields" : {
        "keyword" : {
          "type" : "keyword",
          "ignore_above" : 256
        }
      }
    },
    "Timings" : {
      "properties" : {
        "cookTime" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "prepTime" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "totalTime" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    },
    "Title" : {
      "type" : "text",
      "fields" : {
        "keyword" : {
          "type" : "keyword",
          "ignore_above" : 256
        }
      }
    },
    "description" : {
      "type" : "text",
      "fields" : {
        "keyword" : {
          "type" : "keyword",
          "ignore_above" : 256
        }
      }
    },
    "recipe_url" : {
      "type" : "text",
      "fields" : {
        "keyword" : {
          "type" : "keyword",
          "ignore_above" : 256
        }
      }
    }
  }
},
"settings" : {
  "index" : {
    "number_of_shards" : "1",
    "provided_name" : "recipe_test",
    "creation_date" : "1567443878756",
    "analysis" : {
      "filter" : {
        "english_stop" : {
          "type" : "stop",
          "stopwords" : "_english_"
        }
      },
      "analyzer" : {
        "rebuilt_stop" : {
          "filter" : [
            "asciifolding",
            "lowercase",
            "english_stop"
          ],
          "tokenizer" : "standard"
        }
      }
    },
    "number_of_replicas" : "1",
    "uuid" : "K-FrOyc6QlWokGQoN6HxCg",
    "version" : {
      "created" : "7030199"
    }
  }
}

} }

我的示例数据

{
"recipe_url": "http1742637/bean-and-pesto-mash",
"Channel": "waqas",
 "recipe_id":"31",
"Title": "Bean & pesto mash",
"Rating": {
    "ratingValue": "4.625",
    "bestRating": "5",
    "worstRating": "1",
    "ratingCount": "8"
},
"Timings": {
    "cookTime": "PT5M",
    "prepTime": "PT5M",
    "totalTime": "PT10M"
},
"Author": {
    "eval": false,
    "value": "dfgkkdfgdfgfmes",
    "url": "https://www.example.com/",
    "additionalInfo": "Recipe from Good Food magazine, ",
    "description": "Substitute potatoes with pulses for a healthy alternative mash with a chunky texture",
    "published": "November 2011"
},
"Publication": {
    "eval": false,
    "value": "",
    "url": "",
    "additionalInfo": "",
    "published": ""
},
"Nutrition": "per serving",
"NutritionContents": {
    "kcal": "183",
    "fat": "5g",
    "saturates": "1g",
    "carbs": "25g",
    "sugars": "3g",
    "fibre": "7g",
    "protein": "11g",
    "salt": "0.84g"
},
"SkillLevel": "Easy",
"Ingredients": [
   "drizzle", "Asparagus" , "Asparagus" , "Asparagus" , "Asparagus" , "Asparagus" , "Asparagus" , "Asparagus" , "Asparagus" , "Asparagus" 

 ],
"MakingMethod": [
    "Heat the oil in a large saucepan. Add the beans and cook for 3-4 mins until hot through. Lightly mash with a potato masher for a chunky texture. Stir through the pesto and season. To serve, drizzle with a little olive oil, if you like."
],
"Keywords": [
    "Cannellini bean",
    "Cannellini beans",
    "Mash",
    "Beans",
    "Super healthy",
    "Pulses",
    "5-a-day",
    "Low fat",
    "Diet",
    "Dieting",
    "Side dish",
    "Bangers and mash",
    "Sausage and mash",
    "Texture",
    "Fireworks",
    "Pesto",
    "Easy",
    "Vegetarian",
    "Healthy",
    "Bonfire Night"
],
"Category": [
    "Side dish",
    "Dinner"
],
"Cousine": "British",
"Servings": "Serves 4"

}

score 1 · Accepted Answer

如何做到这一点没有简单的方法。

选项1

在您应用了正确分析器fielddata的text字段上启用。像这样的东西：

{
  "settings": {
    "analysis": {
      "filter": {
        "english_stop": {
          "type":       "stop",
          "stopwords":  "_english_" 
        }
      },
      "analyzer": {
        "rebuilt_stop": {
          "filter": [
              "asciifolding",
              "lowercase",
              "english_stop"
            ],
            "tokenizer": "standard"
        }
      }
    }
  },
  "mappings": {
    "properties": {
        "Ingredients": {
            "type": "text",
            "analyzer": "rebuilt_stop",
            "fielddata": true
        }
    }
  }
}

然后你运行你的terms聚合。缺点：由于 fielddata 的使用，它可能会使用大量内存。

选项 2

使用术语向量 API。由于您对Ingredients字段中最常用的“值”/“术语”感兴趣，因此您可以在索引中的一个文档上调用此 API，并获得该特定文档中每个术语的总术语频率。缺点：您需要指定某个文档 ID，并且只会报告该文档中的术语。

像这样的东西：

GET /test/_termvectors/1
{
  "fields" : ["Ingredients"],
  "offsets" : false,
  "payloads" : false,
  "positions" : false,
  "term_statistics" : true,
  "field_statistics" : false
}

选项 3

应该是最丑的围绕这些行：Elasticsearch：使用关键字标记器索引字段但没有停用词

优点：不使用fielddata（堆内存）。缺点：您必须在定义中手动定义停用词char_filter。

elasticsearch - 在 Elasticsearch 中使用 GET 请求进行查询时删除停用词

1 回答 1

Related

Reference