1

I have some twitter data I want to work with. I want to be able to search for a name. When trying to generate ngrams of the 'name' and '_id' I run into some troubles.

first, I created the analyzers:

curl -XPUT 'localhost:9200/twitter_users' -d '
{
    "settings": {
        "analysis": {
            "analyzer": {
                "str_search_analyzer": {
                    "tokenizer": "keyword",
                    "filter": [
                        "lowercase"
                    ]
                },
                "str_index_analyzer": {
                    "tokenizer": "keyword",
                    "filter": [
                        "lowercase",
                        "ngram"
                    ]
                }
            },
            "filter": {
                "ngram": {
                    "type": "ngram",
                    "min_gram": 3,
                    "max_gram": 20
                }
            }
        }
    }
}'

then I defined my mappings:

curl -XPUT 'http://localhost:9200/twitter_users/users/_mapping' -d '
{
    "users": {
        "type" : "object",
        "properties": {
            "_id": {
                "type": "string",
                "copy_to": "id"
            },
            "id": {
                "type": "string",
                "search_analyzer": "str_search_analyzer",
                "index_analyzer": "str_index_analyzer",
                "index": "analyzed"
            },
            "name": {
                "type": "multi_field",
                "fields": {
                    "name": {
                        "type": "string",
                        "index": "not_analyzed"
                    },
                    "ngrams": {
                        "type": "string",
                        "search_analyzer": "str_search_analyzer",
                        "index_analyzer": "str_index_analyzer",
                        "index": "analyzed"
                    }
                }
            }
        }
    }
}'

and inserted some test data:

curl -XPUT "localhost:9200/twitter_users/users/johndoe" -d '{
    "_id" : "johndoe",
    "name" : "John Doe"
}'

curl -XPUT "localhost:9200/twitter_users/users/janedoe" -d '{
    "_id" : "janedoe",
    "name" : "Jane Doe"
}'

querying by name gets me the expected results:

curl -XPOST "http://localhost:9200/twitter_users/users/_search" -d '{
    "query": {
        "match": {
            "name.ngrams": "doe"
        }
    }
}'

but querying on the id gives me no results:

curl -XPOST "http://localhost:9200/twitter_users/users/_search" -d '{
    "query": {
        "match": {
            "id": "doe"
        }
    }
}'

I also tested to make _id a multi field like I did with name. But that didn't work either.

is _id behaving differently than other fields? Or am I doing something wrong here?

edit: using elasticsearch v1.1.2 and pulling the data from mongodb with a river plugin.

Thanks for your Help

Mirko

4

1 回答 1

0

看起来“copy_to”是问题所在,但为什么不直接将“id”值插入“id”字段?

curl -XPUT "localhost:9200/twitter_users/users/johndoe" -d '{
    "id" : "johndoe",
    "name" : "John Doe"
}'

curl -XPUT "localhost:9200/twitter_users/users/janedoe" -d '{
    "id" : "janedoe",
    "name" : "Jane Doe"
}'
于 2014-07-07T08:23:52.807 回答