3

我将 Django 1.5 与 django-haystack 2.0 和一个 elasticsearch 后端一起使用。我正在尝试通过精确的属性匹配进行搜索。但是,即使我同时使用__exact运算符和 Exact() 类,我也会得到“相似”的结果。如何防止这种行为?

例如:

# models.py
class Person(models.Model):
    name = models.TextField()


# search_indexes.py
class PersonIndex(indexes.SearchIndex, indexes.Indexable):
    text = indexes.CharField(document=True, use_template=True)
    name = indexes.CharField(model_attr="name")

    def get_model(self):
        return Person

    def index_queryset(self, using=None):
        return self.get_model().objects.all()


# templates/search/indexes/people/person_text.txt
{{ object.name }}


>>> p1 = Person(name="Simon")
>>> p1.save()
>>> p2 = Person(name="Simons")
>>> p2.save()

$ ./manage.py rebuild_index

>>> person_sqs = SearchQuerySet().models(Person)
>>> person_sqs.filter(name__exact="Simons")
[<SearchResult: people.person (name=u'Simon')>
 <SearchResult: people.person (name=u'Simons')>]
>>> person_sqs.filter(name=Exact("Simons", clean=True))
[<SearchResult: people.person (name=u'Simon')>
 <SearchResult: people.person (name=u'Simons')>]

我只想要“Simons”的搜索结果——“Simon”结果不应该出现。

4

3 回答 3

6

Python3、Django 1.10、Elasticsearch 2.4.4。

TL;DR:定义自定义标记器(不是过滤器)


详细解释

a) 使用 EdgeNgramField:

# search_indexes.py
class PersonIndex(indexes.SearchIndex, indexes.Indexable):

    text = indexes.EdgeNgramField(document=True, use_template=True)
    ...

b) 模板:

# templates/search/indexes/people/person_text.txt
{{ object.name }}

c) 创建自定义搜索后端:

# backends.py
from django.conf import settings

from haystack.backends.elasticsearch_backend import (
    ElasticsearchSearchBackend,
    ElasticsearchSearchEngine,
)


class CustomElasticsearchSearchBackend(ElasticsearchSearchBackend):

    def __init__(self, connection_alias, **connection_options):
        super(CustomElasticsearchSearchBackend, self).__init__(
            connection_alias, **connection_options)

        setattr(self, 'DEFAULT_SETTINGS', settings.ELASTICSEARCH_INDEX_SETTINGS)


class CustomElasticsearchSearchEngine(ElasticsearchSearchEngine):

    backend = CustomElasticsearchSearchBackend

d)定义自定义标记器(不是过滤器!):

# settings.py
HAYSTACK_CONNECTIONS = {
    'default': {
        'ENGINE': 'apps.persons.backends.CustomElasticsearchSearchEngine',
        'URL': 'http://127.0.0.1:9200/',
        'INDEX_NAME': 'haystack',
    },
}

ELASTICSEARCH_INDEX_SETTINGS = {
    "settings": {
        "analysis": {
            "analyzer": {
                "ngram_analyzer": {
                    "type": "custom",
                    "tokenizer": "custom_ngram_tokenizer",
                    "filter": ["asciifolding", "lowercase"]
                },
                "edgengram_analyzer": {
                    "type": "custom",
                    "tokenizer": "custom_edgengram_tokenizer",
                    "filter": ["asciifolding", "lowercase"]
                }
            },
            "tokenizer": {
                "custom_ngram_tokenizer": {
                    "type": "nGram",
                    "min_gram": 3,
                    "max_gram": 12,
                    "token_chars": ["letter", "digit"]
                },
                "custom_edgengram_tokenizer": {
                    "type": "edgeNGram",
                    "min_gram": 2,
                    "max_gram": 12,
                    "token_chars": ["letter", "digit"]
                }
            }
        }
    }
}

HAYSTACK_DEFAULT_OPERATOR = 'AND'

e) 使用 AutoQuery(更通用):

# views.py
search_value = 'Simons'
...
person_sqs = \
    SearchQuerySet().models(Person).filter(
        content=AutoQuery(search_value)
    )

f) 更改后重新索引:

$ ./manage.py rebuild_index
于 2017-03-09T13:22:46.033 回答
1

我面临着类似的问题。如果您更改 haystacks elasticsearch 后端的设置,例如:

DEFAULT_SETTINGS = {
    'settings': {
        "analysis": {
            "analyzer": {
                "ngram_analyzer": {
                    "type": "custom",
                    "tokenizer": "standard",
                    "filter": ["haystack_ngram", "lowercase"]
                },
                "edgengram_analyzer": {
                    "type": "custom",
                    "tokenizer": "standard",
                    "filter": ["haystack_edgengram", "lowercase"]
                }
            },
            "tokenizer": {
                "haystack_ngram_tokenizer": {
                    "type": "nGram",
                    "min_gram": 6,
                    "max_gram": 15,
                },
                "haystack_edgengram_tokenizer": {
                    "type": "edgeNGram",
                    "min_gram": 6,
                    "max_gram": 15,
                    "side": "front"
                }
            },
            "filter": {
                "haystack_ngram": {
                    "type": "nGram",
                    "min_gram": 6,
                    "max_gram": 15
                },
                "haystack_edgengram": {
                    "type": "edgeNGram",
                    "min_gram": 6,
                    "max_gram": 15
                }
            }
        }
    }
}

然后它只会在查询超过 6 个字符时进行标记。

如果您想要像“xyzsimonsxyz”这样的结果,那么您需要使用 ngram 分析器而不是 EdgeNGram,或者您可以根据您的要求同时使用两者。EdgeNGram 仅从头开始生成令牌。

使用 NGram 'simons' 将是术语 xyzsimonsxyz 的生成标记之一,假设 max_gram >=6 并且您将获得预期的结果,search_analyzer 也需要不同,否则您将获得奇怪的结果。

如果您有大量文本,ngram 的索引大小也可能会变得非常大

于 2015-12-28T05:13:34.540 回答
-1

不使用 CharField 使用 EdgeNgramField。

# search_indexes.py
class PersonIndex(indexes.SearchIndex, indexes.Indexable):
    text = indexes.CharField(document=True, use_template=True)
    name = indexes.EdgeNgramField(model_attr="name")

    def get_model(self):
        return Person

    def index_queryset(self, using=None):
        return self.get_model().objects.all()

而不是用户过滤器,用户自动完成

person_sqs = SearchQuerySet().models(Person)
person_sqs.autocomplete(name="Simons")

来源:http ://django-haystack.readthedocs.org/en/v2.0.0/autocomplete.html

于 2013-10-23T18:31:12.177 回答