1

我是弹性搜索用法的新手我正在尝试做一个分析器或一个摄取管道,它会创建单词的搭配(一元、二元、三元,最高可达 2)。我知道这在 python 中是可行的,但我只对 ES 解决方案感兴趣。据了解,我尝试使用这样的带状疱疹来做到这一点:

GET /_analyze
{
  "tokenizer": "standard",
  
  "filter": [
    {
      "type": "predicate_token_filter",
      "script": {
        "source": "token.getPosition() % 2 == 0"
      }
    },
    {
      "type": "shingle",
      "max_shingle_size": 5,
      "min_shingle_size": 3,
      "output_unigrams":false,
      "token_separator":" ",
      "filler_token":""
    },
    "trim",
    "unique",
    {
      "type":"pattern_replace",
      "pattern":"\\s+",
      "replacement":" "
    }
   
  ],
  "text": "aerial photo airplane taken with a nice camera"
}

它给了我这个输出输出:

{
  "tokens" : [
    {
      "token" : "aerial airplane",
      "start_offset" : 0,
      "end_offset" : 21,
      "type" : "shingle",
      "position" : 0
    },
    {
      "token" : "aerial airplane with",
      "start_offset" : 0,
      "end_offset" : 32,
      "type" : "shingle",
      "position" : 0,
      "positionLength" : 3
    },
    {
      "token" : "airplane",
      "start_offset" : 13,
      "end_offset" : 28,
      "type" : "shingle",
      "position" : 1
    },
    {
      "token" : "airplane with",
      "start_offset" : 13,
      "end_offset" : 32,
      "type" : "shingle",
      "position" : 1,
      "positionLength" : 2
    },
    {
      "token" : "airplane with nice",
      "start_offset" : 13,
      "end_offset" : 39,
      "type" : "shingle",
      "position" : 1,
      "positionLength" : 3
    },
    {
      "token" : "with",
      "start_offset" : 28,
      "end_offset" : 35,
      "type" : "shingle",
      "position" : 2
    },
    {
      "token" : "with nice",
      "start_offset" : 28,
      "end_offset" : 39,
      "type" : "shingle",
      "position" : 2,
      "positionLength" : 2
    },
    {
      "token" : "nice",
      "start_offset" : 35,
      "end_offset" : 46,
      "type" : "shingle",
      "position" : 3
    }
  ]
}

但我理想的输出是(只输出令牌)

['aerial', 'photo', 'airplane', 'taken', 'with', 'a', 'nice', 'camera', ('aerial', 'photo'), ('aerial', 'airplane'), ('aerial', 'taken'), ('photo', 'airplane'), ('photo', 'taken'), ('photo', 'with'), ('airplane', 'taken'), ('airplane', 'with'), ('airplane', 'a'), ('taken', 'with'), ('taken', 'a'), ('taken', 'nice'), ('with', 'a'), ('with', 'nice'), ('with', 'camera'), ('a', 'nice'), ('a', 'camera'), ('nice', 'camera'), ('aerial', 'photo', 'airplane'), ('aerial', 'photo', 'taken'), ('aerial', 'photo', 'with'), ('aerial', 'airplane', 'taken'), ('aerial', 'airplane', 'with'), ('aerial', 'taken', 'with'), ('photo', 'airplane', 'taken'), ('photo', 'airplane', 'with'), ('photo', 'airplane', 'a'), ('photo', 'taken', 'with'), ('photo', 'taken', 'a'), ('photo', 'with', 'a'), ('airplane', 'taken', 'with'), ('airplane', 'taken', 'a'), ('airplane', 'taken', 'nice'), ('airplane', 'with', 'a'), ('airplane', 'with', 'nice'), ('airplane', 'a', 'nice'), ('taken', 'with', 'a'), ('taken', 'with', 'nice'), ('taken', 'with', 'camera'), ('taken', 'a', 'nice'), ('taken', 'a', 'camera'), ('taken', 'nice', 'camera'), ('with', 'a', 'nice'), ('with', 'a', 'camera'), ('with', 'nice', 'camera'), ('a', 'nice', 'camera')]
4

0 回答 0