0

I use pykafka group consumer with gevent, but the results have repeating data. show my code:

import gevent
from pykafka import KafkaClient

topic_name = 'test2'
bootstrap_servers = '192.168.199.228:9094,192.168.199.228:9092,192.168.199.228:9093'
group = 'test_g'


def get_consumer():
    client = KafkaClient(hosts=bootstrap_servers, use_greenlets=True)
    topic = client.topics[topic_name.encode()]

    consumer = topic.get_simple_consumer(auto_commit_interval_ms=10000,
                                     consumer_group=group.encode(),
                                     auto_commit_enable=True,
                                     )
    return consumer


def worker(worker_id):
    consumer = get_consumer()
    for msg in consumer:
        print('worker {} partition: {}, offset: {}'.format(worker_id, msg.partition, msg.offset))


if __name__ == '__main__':
    tasks = [gevent.spawn(worker, *(i, )) for i in range(3)]
    ret = gevent.joinall(tasks)

reulst: Anyone can tell me how to make it work, does pykafka not support gevent?

4

1 回答 1

0

我敢打赌,这个问题与您使用 gevent 没有任何关系。您注意到消费者之间重复数据的原因是您使用的是 aSimpleConsumer而不是BalancedConsumer. SimpleConsumer不执行自动平衡 - 它只是从其起始偏移量消耗整个主题。因此,如果您有许多SimpleConsumer实例并排运行,那么每个实例将从其起始偏移量开始消耗整个主题。BalancedConsumer( topic.get_balanced_consumer(consumer_group='mygroup')) 可能是您想要的。它使用消费者重新平衡算法来确保在同一组中运行的消费者不会收到相同的消息。为此,您的主题需要至少具有与您使用它的进程一样多的分区。请参阅pykafka README文档了解更多信息。

于 2018-03-14T15:55:59.583 回答