java - 如果源包含 Elastic Search Server 中的给定搜索文本，则获取所有文档

Question

我是弹性搜索的新手。我将一个字段映射到弹性搜索索引中的“字符串”。如果字段值包含给定的搜索文本，我需要检索文档。

JSON1 : "{\"id\":\"1\",\"message\":\"Welcome to elastic search\"}"
JSON2 : "{\"id\":\"2\",\"message\":\"elasticsearch\"}"

如果我用“弹性”搜索，我需要同时获取这两条记录。我只得到第一个。

现在我正在获取基于 FTS 的文件。请指导我在 Elastic Search 中的 psql 中实现 search like/ilike。

提前致谢。

score 1 · Accepted Answer

这是分词器的问题。你可以看看 NGram http://www.elasticsearch.org/guide/reference/index-modules/analysis/ngram-tokenizer/

您可以使用路线对其进行测试/_analyze

这是默认情况下 Elasticsearch 标记化的方式。

curl -XGET 'localhost:9200/_analyze?tokenizer=standard' -d 'this is a test elasticsearch'

{
"tokens": [{
        "token": "this",
        "start_offset": 0,
        "end_offset": 4,
        "type": "<ALPHANUM>",
        "position": 1
    }, {
        "token": "is",
        "start_offset": 5,
        "end_offset": 7,
        "type": "<ALPHANUM>",
        "position": 2
    }, {
        "token": "a",
        "start_offset": 8,
        "end_offset": 9,
        "type": "<ALPHANUM>",
        "position": 3
    }, {
        "token": "test",
        "start_offset": 10,
        "end_offset": 14,
        "type": "<ALPHANUM>",
        "position": 4
    }, {
        "token": "elasticsearch",
        "start_offset": 15,
        "end_offset": 28,
        "type": "<ALPHANUM>",
        "position": 5
    }
]

}

这是一个使用 nGram 和默认值的示例

curl -XGET 'localhost:9200/_analyze?tokenizer=nGram' -d 'this is a test elasticsearch'

{
    "tokens": [{
            "token": "t",
            "start_offset": 0,
            "end_offset": 1,
            "type": "word",
            "position": 1
        }, {
            "token": "h",
            "start_offset": 1,
            "end_offset": 2,
            "type": "word",
            "position": 2
        }, {
            "token": "i",
            "start_offset": 2,
            "end_offset": 3,
            "type": "word",
            "position": 3
        }, {
            "token": "s",
            "start_offset": 3,
            "end_offset": 4,
            "type": "word",
            "position": 4
        }, {
            "token": " ",
            "start_offset": 4,
            "end_offset": 5,
            "type": "word",
            "position": 5
        }, {
            "token": "i",
            "start_offset": 5,
            "end_offset": 6,
            "type": "word",
            "position": 6
        }, {
            "token": "s",
            "start_offset": 6,
            "end_offset": 7,
            "type": "word",
            "position": 7
        }, {
            "token": " ",
            "start_offset": 7,
            "end_offset": 8,
            "type": "word",
            "position": 8
        }, {
            "token": "a",
            "start_offset": 8,
            "end_offset": 9,
            "type": "word",
            "position": 9
        }, {
            "token": " ",
            "start_offset": 9,
            "end_offset": 10,
            "type": "word",
            "position": 10
        }, {
            "token": "t",
            "start_offset": 10,
            "end_offset": 11,
            "type": "word",
            "position": 11
        }, {
            "token": "e",
            "start_offset": 11,
            "end_offset": 12,
            "type": "word",
            "position": 12
        }, {
            "token": "s",
            "start_offset": 12,
            "end_offset": 13,
            "type": "word",
            "position": 13
        }, {
            "token": "t",
            "start_offset": 13,
            "end_offset": 14,
            "type": "word",
            "position": 14
        }, {
            "token": " ",
            "start_offset": 14,
            "end_offset": 15,
            "type": "word",
            "position": 15
        }, {
            "token": "e",
            "start_offset": 15,
            "end_offset": 16,
            "type": "word",
            "position": 16
        }, {
            "token": "l",
            "start_offset": 16,
            "end_offset": 17,
            "type": "word",
            "position": 17
        }, {
            "token": "a",
            "start_offset": 17,
            "end_offset": 18,
            "type": "word",
            "position": 18
        }, {
            "token": "s",
            "start_offset": 18,
            "end_offset": 19,
            "type": "word",
            "position": 19
        }, {
            "token": "t",
            "start_offset": 19,
            "end_offset": 20,
            "type": "word",
            "position": 20
        }, {
            "token": "i",
            "start_offset": 20,
            "end_offset": 21,
            "type": "word",
            "position": 21
        }, {
            "token": "c",
            "start_offset": 21,
            "end_offset": 22,
            "type": "word",
            "position": 22
        }, {
            "token": "s",
            "start_offset": 22,
            "end_offset": 23,
            "type": "word",
            "position": 23
        }, {
            "token": "e",
            "start_offset": 23,
            "end_offset": 24,
            "type": "word",
            "position": 24
        }, {
            "token": "a",
            "start_offset": 24,
            "end_offset": 25,
            "type": "word",
            "position": 25
        }, {
            "token": "r",
            "start_offset": 25,
            "end_offset": 26,
            "type": "word",
            "position": 26
        }, {
            "token": "c",
            "start_offset": 26,
            "end_offset": 27,
            "type": "word",
            "position": 27
        }, {
            "token": "h",
            "start_offset": 27,
            "end_offset": 28,
            "type": "word",
            "position": 28
        }, {
            "token": "th",
            "start_offset": 0,
            "end_offset": 2,
            "type": "word",
            "position": 29
        }, {
            "token": "hi",
            "start_offset": 1,
            "end_offset": 3,
            "type": "word",
            "position": 30
        }, {
            "token": "is",
            "start_offset": 2,
            "end_offset": 4,
            "type": "word",
            "position": 31
        }, {
            "token": "s ",
            "start_offset": 3,
            "end_offset": 5,
            "type": "word",
            "position": 32
        }, {
            "token": " i",
            "start_offset": 4,
            "end_offset": 6,
            "type": "word",
            "position": 33
        }, {
            "token": "is",
            "start_offset": 5,
            "end_offset": 7,
            "type": "word",
            "position": 34
        }, {
            "token": "s ",
            "start_offset": 6,
            "end_offset": 8,
            "type": "word",
            "position": 35
        }, {
            "token": " a",
            "start_offset": 7,
            "end_offset": 9,
            "type": "word",
            "position": 36
        }, {
            "token": "a ",
            "start_offset": 8,
            "end_offset": 10,
            "type": "word",
            "position": 37
        }, {
            "token": " t",
            "start_offset": 9,
            "end_offset": 11,
            "type": "word",
            "position": 38
        }, {
            "token": "te",
            "start_offset": 10,
            "end_offset": 12,
            "type": "word",
            "position": 39
        }, {
            "token": "es",
            "start_offset": 11,
            "end_offset": 13,
            "type": "word",
            "position": 40
        }, {
            "token": "st",
            "start_offset": 12,
            "end_offset": 14,
            "type": "word",
            "position": 41
        }, {
            "token": "t ",
            "start_offset": 13,
            "end_offset": 15,
            "type": "word",
            "position": 42
        }, {
            "token": " e",
            "start_offset": 14,
            "end_offset": 16,
            "type": "word",
            "position": 43
        }, {
            "token": "el",
            "start_offset": 15,
            "end_offset": 17,
            "type": "word",
            "position": 44
        }, {
            "token": "la",
            "start_offset": 16,
            "end_offset": 18,
            "type": "word",
            "position": 45
        }, {
            "token": "as",
            "start_offset": 17,
            "end_offset": 19,
            "type": "word",
            "position": 46
        }, {
            "token": "st",
            "start_offset": 18,
            "end_offset": 20,
            "type": "word",
            "position": 47
        }, {
            "token": "ti",
            "start_offset": 19,
            "end_offset": 21,
            "type": "word",
            "position": 48
        }, {
            "token": "ic",
            "start_offset": 20,
            "end_offset": 22,
            "type": "word",
            "position": 49
        }, {
            "token": "cs",
            "start_offset": 21,
            "end_offset": 23,
            "type": "word",
            "position": 50
        }, {
            "token": "se",
            "start_offset": 22,
            "end_offset": 24,
            "type": "word",
            "position": 51
        }, {
            "token": "ea",
            "start_offset": 23,
            "end_offset": 25,
            "type": "word",
            "position": 52
        }, {
            "token": "ar",
            "start_offset": 24,
            "end_offset": 26,
            "type": "word",
            "position": 53
        }, {
            "token": "rc",
            "start_offset": 25,
            "end_offset": 27,
            "type": "word",
            "position": 54
        }, {
            "token": "ch",
            "start_offset": 26,
            "end_offset": 28,
            "type": "word",
            "position": 55
        }
    ]
}

这是一个带有示例的链接，用于在您的索引中设置正确的分析器/标记器如何在弹性搜索中设置标记器

然后您的查询应该返回预期的文档。

java - 如果源包含 Elastic Search Server 中的给定搜索文本，则获取所有文档

1 回答 1

Related

Reference