我刚开始接触加缪。我计划每小时运行一次加缪工作。我们每小时收到约 80000000 条消息(平均大小约 4KB)。
如何设置以下属性:
# max historical time that will be pulled from each partition based on event timestamp
kafka.max.pull.hrs=1
# events with a timestamp older than this will be discarded.
kafka.max.historical.days=3
我无法清楚地弄清楚这些配置。我应该将天数设为 1 并将小时数属性设为 2 吗?camus 是如何提取数据的?我经常看到以下错误:
ERROR kafka.CamusJob: Offset range from kafka metadata is outside the previously persisted offset
Please check whether kafka cluster configuration is correct. You can also specify config parameter: kafka.move.to.earliest.offset to start processing from earliest kafka metadata offset.
如何正确设置配置以每小时运行一次并避免该错误?