6

我正在使用elasticsearch-py进行弹性搜索操作。

我正在尝试elasticsearch.helpers.bulk创建或更新多条记录。

from elasticsearch import Elasticsearch
from elasticsearch import helpers
es = Elasticsearch()

data = [
    {
        "_index": "customer",
        "_type": "external",
        "_op_type": "create",
        "_id": 3,
        "doc" : {"name": "test"}
    },
    {
        "_index": "customer",
        "_type": "external",
        "_op_type": "create",
        "_id": 4,
        "doc" : {"name": "test"}
    },
    {
        "_index": "customer",
        "_type": "external",
        "_op_type": "create",
        "_id": 5,
        "doc" : {"name": "test"}
    },
    {
        "_index": "customer",
        "_type": "external",
        "_op_type": "create",
        "_id": 6,
        "doc" : {"name": "test"}
    },
]


print helpers.bulk(es, data)

有什么方法可以执行此操作吗?

现在我们只能给出_op_typeascreateupdate。如果我们给出update并且记录不存在,那么它将引发错误。

Traceback (most recent call last):
  File "/tmp/test.py", line 37, in <module>
    print helpers.bulk(es, data)
  File "/local/lib/python2.7/site-packages/elasticsearch/helpers/__init__.py", line 182, in bulk
    for ok, item in streaming_bulk(client, actions, **kwargs):
  File "/local/lib/python2.7/site-packages/elasticsearch/helpers/__init__.py", line 155, in streaming_bulk
    raise BulkIndexError('%i document(s) failed to index.' % len(errors), errors)
elasticsearch.helpers.BulkIndexError: ('4 document(s) failed to index.', [{u'update': {u'status': 404, u'_type': u'external', u'_id': u'3', u'error': u'DocumentMissingException[[customer][-1] [external][3]: document missing]', u'_index': u'customer'}}, {u'update': {u'status': 404, u'_type': u'external', u'_id': u'4', u'error': u'DocumentMissingException[[customer][-1] [external][4]: document missing]', u'_index': u'customer'}}, {u'update': {u'status': 404, u'_type': u'external', u'_id': u'5', u'error': u'DocumentMissingException[[customer][-1] [external][5]: document missing]', u'_index': u'customer'}}, {u'update': {u'status': 404, u'_type': u'external', u'_id': u'6', u'error': u'DocumentMissingException[[customer][-1] [external][6]: document missing]', u'_index': u'customer'}}])
4

2 回答 2

7

根据_bulk端点文档,您可以并且应该为此使用该index操作,前提是您的文档始终具有相同的标识符。

create第一次创建文档时很有用,update更适用于进行部分和/或脚本更新。

您也可以不指定任何内容_op_typeindex默认情况下将被采用。

于 2015-08-21T06:39:35.310 回答
4

我尝试了@Val 建议的解决方案,它很有魅力。

from elasticsearch import Elasticsearch
from elasticsearch import helpers
es = Elasticsearch()

data = [
    {
        "_index": "customer",
        "_type": "external",
        "_id": 3,
        "doc" : {"name": "test"}
    },
    {
        "_index": "customer",
        "_type": "external",
        "_id": 4,
        "doc" : {"name": "test"}
    },
    {
        "_index": "customer",
        "_type": "external",
        "_id": 5,
        "doc" : {"name": "test"}
    },
    {
        "_index": "customer",
        "_type": "external",
        "_id": 6,
        "doc" : {"name": "test"}
    },
]


print helpers.bulk(es, data)
于 2015-08-21T06:42:20.823 回答