我是弹性搜索用法的新手我正在尝试做一个分析器或一个摄取管道,它会创建单词的搭配(一元、二元、三元,最高可达 2)。我知道这在 python 中是可行的,但我只对 ES 解决方案感兴趣。据了解,我尝试使用这样的带状疱疹来做到这一点:
GET /_analyze
{
"tokenizer": "standard",
"filter": [
{
"type": "predicate_token_filter",
"script": {
"source": "token.getPosition() % 2 == 0"
}
},
{
"type": "shingle",
"max_shingle_size": 5,
"min_shingle_size": 3,
"output_unigrams":false,
"token_separator":" ",
"filler_token":""
},
"trim",
"unique",
{
"type":"pattern_replace",
"pattern":"\\s+",
"replacement":" "
}
],
"text": "aerial photo airplane taken with a nice camera"
}
它给了我这个输出输出:
{
"tokens" : [
{
"token" : "aerial airplane",
"start_offset" : 0,
"end_offset" : 21,
"type" : "shingle",
"position" : 0
},
{
"token" : "aerial airplane with",
"start_offset" : 0,
"end_offset" : 32,
"type" : "shingle",
"position" : 0,
"positionLength" : 3
},
{
"token" : "airplane",
"start_offset" : 13,
"end_offset" : 28,
"type" : "shingle",
"position" : 1
},
{
"token" : "airplane with",
"start_offset" : 13,
"end_offset" : 32,
"type" : "shingle",
"position" : 1,
"positionLength" : 2
},
{
"token" : "airplane with nice",
"start_offset" : 13,
"end_offset" : 39,
"type" : "shingle",
"position" : 1,
"positionLength" : 3
},
{
"token" : "with",
"start_offset" : 28,
"end_offset" : 35,
"type" : "shingle",
"position" : 2
},
{
"token" : "with nice",
"start_offset" : 28,
"end_offset" : 39,
"type" : "shingle",
"position" : 2,
"positionLength" : 2
},
{
"token" : "nice",
"start_offset" : 35,
"end_offset" : 46,
"type" : "shingle",
"position" : 3
}
]
}
但我理想的输出是(只输出令牌)
['aerial', 'photo', 'airplane', 'taken', 'with', 'a', 'nice', 'camera', ('aerial', 'photo'), ('aerial', 'airplane'), ('aerial', 'taken'), ('photo', 'airplane'), ('photo', 'taken'), ('photo', 'with'), ('airplane', 'taken'), ('airplane', 'with'), ('airplane', 'a'), ('taken', 'with'), ('taken', 'a'), ('taken', 'nice'), ('with', 'a'), ('with', 'nice'), ('with', 'camera'), ('a', 'nice'), ('a', 'camera'), ('nice', 'camera'), ('aerial', 'photo', 'airplane'), ('aerial', 'photo', 'taken'), ('aerial', 'photo', 'with'), ('aerial', 'airplane', 'taken'), ('aerial', 'airplane', 'with'), ('aerial', 'taken', 'with'), ('photo', 'airplane', 'taken'), ('photo', 'airplane', 'with'), ('photo', 'airplane', 'a'), ('photo', 'taken', 'with'), ('photo', 'taken', 'a'), ('photo', 'with', 'a'), ('airplane', 'taken', 'with'), ('airplane', 'taken', 'a'), ('airplane', 'taken', 'nice'), ('airplane', 'with', 'a'), ('airplane', 'with', 'nice'), ('airplane', 'a', 'nice'), ('taken', 'with', 'a'), ('taken', 'with', 'nice'), ('taken', 'with', 'camera'), ('taken', 'a', 'nice'), ('taken', 'a', 'camera'), ('taken', 'nice', 'camera'), ('with', 'a', 'nice'), ('with', 'a', 'camera'), ('with', 'nice', 'camera'), ('a', 'nice', 'camera')]