我正在体验使用 .NET NEST 客户端和 ElasticSearch 的批量索引性能随着时间的推移而降低,索引数量和文档数量都是恒定的。
我们ElasticSearch Version: 0.19.11, JVM: 23.5-b02
在一个 m1.large Amazon 实例上运行,它带有 Ubuntu Server 12.04.1 LTS 64 位和 Sun Java 7。除了 Ubuntu 安装附带的东西之外,这个实例上没有运行其他任何东西。
Amazon M1 大型实例:来自http://aws.amazon.com/ec2/instance-types/
7.5 GiB memory
4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each)
850 GB instance storage
64-bit platform
I/O Performance: High
EBS-Optimized Available: 500 Mbps
API name: m1.large
ES_MAX_MEM 设置为 4g,ES_MIN_MEM 设置为 2g
每天晚上,我们在 .NET 应用程序中使用 NEST 索引/重新索引约 15000 个文档。在任何给定时间,只有一个索引包含 <= 15000 个文档。
首次安装服务器时,最初几天的索引和搜索速度很快,然后索引开始变得越来越慢。批量索引一次索引 100 个文档,一段时间后,完成批量操作最多需要 15 秒。在那之后,我们开始看到很多以下异常,并且索引停止了。
System.Net.WebException: The request was aborted: The request was canceled.
at System.Net.HttpWebRequest.EndGetResponse(IAsyncResult asyncResult)
at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization) :
构建索引实现看起来像这样
private ElasticClient GetElasticClient()
{
var setting = new ConnectionSettings(ConfigurationManager.AppSettings["elasticSearchHost"], 9200);
setting.SetDefaultIndex("products");
var elastic = new ElasticClient(setting);
return elastic;
}
private void DisableRefreshInterval()
{
var elasticClient = GetElasticClient();
var s = elasticClient.GetIndexSettings("products");
var settings = s != null && s.Settings != null ? s.Settings : new IndexSettings();
settings["refresh_interval"] = "-1";
var result = elasticClient.UpdateSettings(settings);
if (!result.OK)
_logger.Warn("unable to set refresh_interval to -1, {0}", result.ConnectionStatus == null || result.ConnectionStatus.Error == null ? "" : result.ConnectionStatus.Error.ExceptionMessage);
}
private void EnableRefreshInterval()
{
var elasticClient = GetElasticClient();
var s = elasticClient.GetIndexSettings("products");
var settings = s != null && s.Settings != null ? s.Settings : new IndexSettings();
settings["refresh_interval"] = "1s";
var result = elasticClient.UpdateSettings(settings);
if (!result.OK)
_logger.Warn("unable to set refresh_interval to 1s, {0}", result.ConnectionStatus == null || result.ConnectionStatus.Error == null ? "" : result.ConnectionStatus.Error.ExceptionMessage);
}
public void Index(IEnumerable<Product> products)
{
var enumerable = products as Product[] ?? products.ToArray();
var elasticClient = GetElasticClient();
try
{
DisableRefreshInterval();
_logger.Info("Indexing {0} products", enumerable.Count());
var status = elasticClient.IndexMany(enumerable as IEnumerable<Product>, "products");
if (status.Items != null)
_logger.Info("Done, Indexing {0} products, duration: {1}", status.Items.Count(), status.Took);
if (status.ConnectionStatus.Error != null)
{
_logger.Error(status.ConnectionStatus.Error.OriginalException);
}
}
catch(Exception ex)
{
_logger.Error(ex);
}
finally
{
EnableRefreshInterval();
}
}
重新启动 elasticsearch 守护进程似乎没有任何区别,但删除索引并重新索引所有内容。但是几天后,我们将遇到同样的索引缓慢问题。
我刚刚删除了索引并在每次批量索引操作后重新启用刷新间隔后添加了优化,希望这可以防止索引降级。
...
...
finally
{
EnableRefreshInterval();
elasticClient.Optimize("products");
}
我在这里做错了什么吗?