lucene - 在 Hadoop 上运行 Lucene/Solr 的最佳方式是什么？

Question

我们在具有 1TB EBS 卷的 Amazon Web Services EC2 实例上运行 Solr 来存储索引，以便我们可以轻松启动具有相同（只读）索引的其他服务器。但是，我们的索引很快就会超过 1TB，我真的不想处理条带化多个 EBS 卷来保存索引。此外，重新生成索引非常慢。我想将索引生成（可能还有托管）转移到 Hadoop，最好转移到 Amazon 的 Elastic MapReduce，尽管如果需要我可以设置单独的 Hadoop 服务器。我们使用 RightScale，因此我们可以使用他们的 ServerTemplates 库。

在 Hadoop 上开始使用 Lucene/Solr 的最佳起点是什么？

score 1 · Accepted Answer

1

你的索引是分片的吗？您可以对索引进行分片并在多个实例中分配分片。

于 2011-07-10T13:28:29.270 回答

score 1 · Accepted Answer

Take a look at ElasticSearch. You can index to ElasticSearch from Hadoop for bulk loading. Infochimps has open sourced an ElasticSearch bulk indexer called Wonderdog that you can look at for a proof of concept.

https://github.com/infochimps/wonderdog http://www.elasticsearch.com

It's cloud friendly (See cloud-aws plugin for discovery), and can scale up / down by adding nodes to hold the index.

lucene - 在 Hadoop 上运行 Lucene/Solr 的最佳方式是什么？

2 回答 2

Related

Reference