performance - 具有 4 个服务器的大型索引的 Solr 性能

Question

我们有 4 台服务器（2 台 48GB RAM、24 核、2.4GHz 的服务器和 2 台 64GB RAM、24 核、2.4GHz 的服务器）。我们使用 4 个分片（每台服务器上 1 个分片）。每个分片索引大小约为 500GB。

我们正在使用 edismax 解析器 && 环绕查询解析器来处理短语、邻近度和通配符搜索。

即使是简单的通配符/邻近搜索也需要 10-20 秒。

我们在具有 8 个分片（每个分片索引大小为 250GB）的单个服务器（24 核，64 GB RAM，2.4GHz）上具有相同的设置

与 4 台服务器设置相比，单台服务器设置的性能几乎是 2 倍（更好）。

我们设置了 4 台服务器 solr cloud 来提高性能，但性能下降了。有什么我们可能在这里遗漏的吗？

score 1 · Accepted Answer

This question looks like a sister to CPU usage when searching using solr and the problem is the same: You are CPU-bound as your queries are very heavy. Your queries are matched against each shard in a single-threaded manner, so your 4 machine setup means that you have 4 threads working on 500GB of index each, while your single machine setup has 8 threads working on 250GB of index each. As you have more than enough CPU cores, the setup with the smaller shards will finish first.

If you split the shards further to e.g. 50GB each, you will have 40 shards. If you split them along the 4 machines with 10 shards/machine, you can support 2 (in reality more like 3) concurrent requests at full CPU speed. Ideally that should give you 5 times the speed of your single machine setup.

performance - 具有 4 个服务器的大型索引的 Solr 性能

1 回答 1

Related

Reference