elasticsearch - 转储 Elasticsearch 的所有文档

Question

有没有办法创建一个转储文件，其中包含索引的所有数据及其设置和映射？

与 mongoDB 对mongodump
或在 Solr中所做的类似方式，其数据文件夹被复制到备份位置。

干杯!

score 61 · Accepted Answer

这是我们一直在为此目的开发的一个新工具https://github.com/taskrabbit/elasticsearch-dump。您可以将索引导出到/导出 JSON 文件，或者从一个集群导出到另一个集群。

score 32 · Accepted Answer

Elasticsearch 支持开箱即用的快照功能：

https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-snapshots.html

score 14 · Accepted Answer

对于您的情况， Elasticdump是完美的答案。
首先，您需要下载映射，然后下载索引

# Install the elasticdump 
npm install elasticdump -g

# Dump the mapping 
elasticdump --input=http://<your_es_server_ip>:9200/index --output=es_mapping.json --type=mapping

# Dump the data
elasticdump --input=http://<your_es_server_ip>:9200/index --output=es_index.json --type=data

如果你想在任何服务器上转储数据，我建议你通过 docker 安装 esdump。您可以从此网站获取更多信息博客链接

score 13 · Accepted Answer

我们可以使用 elasticdump 进行备份和恢复，我们可以将数据从一个服务器/集群移动到另一个服务器/集群。

1. 命令将一个索引数据从一个服务器/集群移动到另一个使用elasticdump.

# Copy an index from production to staging with analyzer and mapping:
elasticdump \
  --input=http://production.es.com:9200/my_index \
  --output=http://staging.es.com:9200/my_index \
  --type=analyzer
elasticdump \
  --input=http://production.es.com:9200/my_index \
  --output=http://staging.es.com:9200/my_index \
  --type=mapping
elasticdump \
  --input=http://production.es.com:9200/my_index \
  --output=http://staging.es.com:9200/my_index \
  --type=data

2. 命令将所有索引数据从一个服务器/集群移动到另一个使用multielasticdump.

备份

multielasticdump \
  --direction=dump \
  --match='^.*$' \
  --limit=10000 \
  --input=http://production.es.com:9200 \
  --output=/tmp

恢复

multielasticdump \
  --direction=load \
  --match='^.*$' \
  --limit=10000 \
  --input=/tmp \
  --output=http://staging.es.com:9200

笔记：

如果 --direction 是默认的转储， --input 必须是 ElasticSearch 服务器的基本位置的 URL（即http://localhost:9200）， --output 必须是目录。每个匹配的索引都会创建一个数据、映射和分析器文件。
要加载从 multi-elasticsearch 转储的文件，--direction 应设置为 load，--input 必须是 multielasticsearch 转储的目录，--output 必须是 Elasticsearch 服务器 URL。
第二条命令将备份settings,mappings和本身作为 JSON 文件。templatedata
--limit不应该超过，10000否则会给出异常。
在此处获取更多详细信息。

score 11 · Accepted Answer

ElasticSearch 本身提供了一种创建数据备份和恢复的方法。执行此操作的简单命令是：

CURL -XPUT 'localhost:9200/_snapshot/<backup_folder name>/<backupname>' -d '{
    "indices": "<index_name>",
    "ignore_unavailable": true,
    "include_global_state": false
}'

现在，如何创建，这个文件夹，如何在 ElasticSearch 配置中包含这个文件夹路径，以便它可以用于 ElasticSearch，恢复方法，这里有很好的解释。在这里查看它的实用演示冲浪。

score 2 · Accepted Answer

数据本身是一个或多个 lucene 索引，因为您可以有多个分片。您还需要备份的是集群状态，其中包含有关集群、可用索引、它们的映射、它们组成的分片等的各种信息。

不过，它都在data目录中，您可以复制它。它的结构非常直观。在复制之前最好禁用自动刷新（为了备份索引的一致视图并避免在复制文件时对其进行写入），发出手动刷新，同时禁用分配。请记住从所有节点复制目录。

此外，elasticsearch 的下一个主要版本将提供一个新的快照/恢复 api，允许您执行增量快照并通过 api 恢复它们。这是相关的 github 问题：https ://github.com/elasticsearch/elasticsearch/issues/3826 。

score 1 · Accepted Answer

您还可以通过 http 请求以 JSON 格式转储 elasticsearch 数据： https ://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html
CURL -XPOST 'https://ES/INDEX/_search?scroll=10m'
CURL -XPOST 'https://ES/_search/scroll' -d '{"scroll": "10m", "scroll_id": "ID"}'

score 1 · Accepted Answer

要将 ElasticSearch 中的所有文档导出为 JSON，您可以使用 esbackupexporter 工具。它适用于索引快照。它将带有快照（S3、Azure blob 或文件目录）的容器作为输入，并每天为每个索引输出一个或多个压缩 JSON 文件。导出历史快照时非常方便。要导出热索引数据，您可能需要先制作快照（请参阅上面的答案）。

score 1 · Accepted Answer

在撰写此答案时（2021 年），备份 ElasticSearch 集群的官方方法是对其进行快照。参考：https ://www.elastic.co/guide/en/elasticsearch/reference/current/snapshot-restore.html

score 0 · Accepted Answer

如果您想在数据从 Elasticsearch 出来时对其进行按摩，您可能需要使用 Logstash。它有一个方便的Elasticsearch Input Plugin。

然后您可以导出到任何内容，从 CSV 文件到重新索引另一个 Elasticsearch 集群上的数据。尽管对于后者，您还拥有Elasticsearch 自己的 Reindex。

elasticsearch - 转储 Elasticsearch 的所有文档

10 回答 10

Related

Reference