使用 ES 6.5.x 和 Storm crawler 1.10。如何加快爬虫获取记录的速度。当我检查其显示的指标时,平均每秒显示 0.4 页。在下面的爬虫配置中我需要更改什么吗?
履带式会议:
config:
topology.workers: 2
topology.message.timeout.secs: 300
topology.max.spout.pending: 100
topology.debug: false
fetcher.server.delay: .25
fetcher.threads.number: 200
fetcher.threads.per.queue: 5
worker.heap.memory.mb: 2048
topology.kryo.register:
- com.digitalpebble.stormcrawler.Metadata
http.content.limit: -1
fetchInterval.default: 1440
fetchInterval.fetch.error: 120
fetchInterval.error: -1
topology.metrics.consumer.register:
- class: "org.apache.storm.metric.LoggingMetricsConsumer"
parallelism.hint: 1