python - 提高 elasticsearch-py 扫描的速度

Question

我正在寻找一种方法来提高对我的弹性搜索数据的滚动扫描速度。
以下 python 代码扫描多个索引并将结果输出到控制台和位于某处的文件。
我的测试得出结论，这种方法效率极低，并且需要大量时间（10 个事件/秒？）。我猜这是由一些内部默认值或限制引起的。
有没有办法设计它以实现更好的性能？

from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search, Q

client = Elasticsearch(
    [
        'http://localhost:9201/',
    ],
    verify_certs=True
)

search = Search(using=client, index="test1,test2,test3") \
    .filter(Q("wildcard", name="bob*") & Q("term", color="green")) \
    .filter('range', **{'@timestamp':{'gte': 'now-2d', 'lt': 'now'}}) \
    .sort('@timestamp') \
    .params(preserve_order=True)



file = open("X:/files/people.txt", "a")
for hit in search.scan():
    line = (hit.message + "\n")
    file.write(line)
    print(line)

file.close()

谢谢你调查这个:)

score 1 · Accepted Answer

老问题，但可能对其他人有帮助：

其他两件要尝试的事情是调整size，找到适合您环境的最佳值。另外，如果您不需要 full _source，请尝试使用_source_excludeor消除字段_source_include，我已经看到使用这些字段可以提高性能。

score 0 · Accepted Answer

使其更快的最好方法是删除sort和preserve_order参数。您还可以查看切片滚动以使用并行运行多个扫描multiprocessing，您可以在 (0) 处看到一个示例。希望这可以帮助！

0 - https://github.com/elastic/elasticsearch-dsl-py/issues/817#issuecomment-372271460

python - 提高 elasticsearch-py 扫描的速度

2 回答 2

Related

Reference