19

我有一个 JSON 文件,我需要在 ElasticSearch 服务器上对其进行索引。

JSOIN 文件如下所示:

{
    "sku": "1",
    "vbid": "1",
    "created": "Sun, 05 Oct 2014 03:35:58 +0000",
    "updated": "Sun, 06 Mar 2016 12:44:48 +0000",
    "type": "Single",
    "downloadable-duration": "perpetual",
    "online-duration": "365 days",
    "book-format": "ePub",
    "build-status": "In Inventory",
    "description": "On 7 August 1914, a week before the Battle of Tannenburg and two weeks before the Battle of the Marne, the French army attacked the Germans at Mulhouse in Alsace. Their objective was to recapture territory which had been lost after the Franco-Prussian War of 1870-71, which made it a matter of pride for the French. However, after initial success in capturing Mulhouse, the Germans were able to reinforce more quickly, and drove them back within three days. After forty-three years of peace, this was the first test of strength between France and Germany. In 1929 Karl Deuringer wrote the official history of the battle for the Bavarian Army, an immensely detailed work of 890 pages; First World War expert and former army officer Terence Zuber has translated this study and edited it down to more accessible length, to produce the first account in English of the first major battle of the First World War.",
    "publication-date": "07/2014",
    "author": "Deuringer, Karl",
    "title": "The First Battle of the First World War: Alsace-Lorraine",
    "sort-title": "First Battle of the First World War: Alsace-Lorraine",
    "edition": "0",
    "sampleable": "false",
    "page-count": "0",
    "print-drm-text": "This title will only allow printing of 2 consecutive pages at a time.",
    "copy-drm-text": "This title will only allow copying of 2 consecutive pages at a time.",
    "kind": "book",
    "fro": "false",
    "distributable": "true",
    "subjects": {
      "subject": [
        {
          "-schema": "bisac",
          "-code": "HIS027090",
          "#text": "World War I"
        },
        {
          "-schema": "coursesmart",
          "-code": "cs.soc_sci.hist.milit_hist",
          "#text": "Social Sciences -> History -> Military History"
        }
      ]
    },   
   "pricelist": {
      "publisher-list-price": "0.0",
      "digital-list-price": "7.28"
    },
    "publisher": {
      "publisher-name": "The History Press",
      "imprint-name": "The History Press Ireland"
    },
    "aliases": {
      "eisbn-canonical": "1",
      "isbn-canonical": "1",
      "print-isbn-canonical": "9780752460864",
      "isbn13": "1",
      "isbn10": "0750951796",
      "additional-isbns": {
        "isbn": [
          {
            "-type": "print-isbn-10",
            "#text": "0752460862"
          },
          {
            "-type": "print-isbn-13",
            "#text": "97807524608"
          }
        ]
      }
    },
    "owner": {
      "company": {
        "id": "1893",
        "name": "The History Press"
      }
    },
    "distributor": {
      "company": {
        "id": "3658",
        "name": "asc"
      }
    }
  }

但是当我尝试使用命令索引这个 JSON 文件时

curl -XPOST 'http://localhost:9200/_bulk' -d @1.json

我收到此错误:

{"error":{"root_cause":[{"type":"action_request_validation_exception","reason":"Validation Failed: 1: no requests added;"}],"type":"action_request_validation_exception","reason":"Validation Failed: 1: no requests added;"},"status":400}

我不知道我在哪里犯了错误。

4

4 回答 4

34

Elasticsearch 的批量 API 使用一种特殊的语法,它实际上是由json单行编写的文档组成。看看文档

语法非常简单。对于索引、创建和更新,您需要 2 个单行 json 文档。第一行告诉操作,第二行将文档提供给索引/创建/更新。要删除文档,只需要操作行。例如(来自文档):

{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
{ "create" : { "_index" : "test", "_type" : "type1", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_type" : "type1", "_index" : "index1"} }   
{ "doc" : {"field2" : "value2"} }
{ "delete" : { "_index" : "test", "_type" : "type1", "_id" : "2" } }

不要忘记以新行结束文件。 然后,要调用批量 api,请使用以下命令:

curl -s -XPOST localhost:9200/_bulk --data-binary "@requests"

从文档中:

如果您向 curl 提供文本文件输入,则必须使用--data-binary标志而不是纯文本-d

于 2016-04-21T12:27:26.583 回答
1

添加下一行(如果您在客户端 API 中使用 json 作为正文,则输入邮递员或“\n”)完成了我的工作

于 2021-02-13T19:11:45.880 回答
0

我有一个类似的问题,我想删除特定类型的特定文档,通过上面的答案,我终于让我的简单 bash 脚本工作了!

我有一个每行都有一个 document_id 的文件(document_id.txt),并且使用下面的 bash 脚本,我可以使用提到的 document_id 删除某种类型的文档。

这是文件的样子:

c476ce18803d7ed3708f6340fdfa34525b20ee90
5131a30a6316f221fe420d2d3c0017a76643bccd
08ebca52025ad1c81581a018febbe57b1e3ca3cd
496ff829c736aa311e2e749cec0df49b5a37f796
87c4101cb10d3404028f83af1ce470a58744b75c
37f0daf7be27cf081e491dd445558719e4dedba1

bash 脚本如下所示:

#!/bin/bash

es_cluster="http://localhost:9200"
index="some-index"
doc_type="some-document-type"

for doc_id in `cat document_id.txt`
do
    request_string="{\"delete\" : { \"_type\" : \"${doc_type}\", \"_id\" : \"${doc_id}\" } }"
    echo -e "${request_string}\r\n\r\n" | curl -s -XPOST "${es_cluster}/${index}/${doc_type}/_bulk" --data-binary @-
    echo
done

在经历了很多挫折之后,诀窍是使用-e选项来回显并将\n\n附加到 echo的输出中,然后再将其导入 curl 中。

然后在 curl 中,我设置了--data-binary选项以阻止它删除_bulk端点所需的\n\n ,然后是@-选项以使其从标准输入中读取!

于 2017-02-03T18:55:22.780 回答
0

在我的情况下是一个奇怪的错误。我正在创建 bulkRequest 对象并在插入 ElasticSearch 之前将其清除。

造成问题的行。

bulkRequest.requests().clear();
于 2020-12-22T07:32:07.490 回答