python - 如何获取 kafka 主题分区的最新偏移量？

Question

我正在为 Kafka 使用 Python 高级消费者，并且想知道主题的每个分区的最新偏移量。但是我无法让它工作。

from kafka import TopicPartition
from kafka.consumer import KafkaConsumer

con = KafkaConsumer(bootstrap_servers = brokers)
ps = [TopicPartition(topic, p) for p in con.partitions_for_topic(topic)]

con.assign(ps)
for p in ps:
    print "For partition %s highwater is %s"%(p.partition,con.highwater(p))

print "Subscription = %s"%con.subscription()
print "con.seek_to_beginning() = %s"%con.seek_to_beginning()

但我得到的输出是

For partition 0 highwater is None
For partition 1 highwater is None
For partition 2 highwater is None
For partition 3 highwater is None
For partition 4 highwater is None
For partition 5 highwater is None
....
For partition 96 highwater is None
For partition 97 highwater is None
For partition 98 highwater is None
For partition 99 highwater is None
Subscription = None
con.seek_to_beginning() = None
con.seek_to_end() = None

我有另一种方法使用assign，但结果是一样的

con = KafkaConsumer(bootstrap_servers = brokers)
ps = [TopicPartition(topic, p) for p in con.partitions_for_topic(topic)]

con.assign(ps)
for p in ps:
    print "For partition %s highwater is %s"%(p.partition,con.highwater(p))

print "Subscription = %s"%con.subscription()
print "con.seek_to_beginning() = %s"%con.seek_to_beginning()
print "con.seek_to_end() = %s"%con.seek_to_end()

从一些文档看来，如果 afetch尚未发布，我可能会得到这种行为。但我找不到强制执行的方法。我究竟做错了什么？

或者是否有不同/更简单的方法来获取主题的最新偏移量？

score 35 · Accepted Answer

最后，在花了一天的时间和几个错误的开始之后，我找到了解决方案并让它发挥作用。把它贴在她身上，以便其他人可以参考。

from kafka import SimpleClient
from kafka.protocol.offset import OffsetRequest, OffsetResetStrategy
from kafka.common import OffsetRequestPayload

client = SimpleClient(brokers)

partitions = client.topic_partitions[topic]
offset_requests = [OffsetRequestPayload(topic, p, -1, 1) for p in partitions.keys()]

offsets_responses = client.send_offset_request(offset_requests)

for r in offsets_responses:
    print "partition = %s, offset = %s"%(r.partition, r.offsets[0])

score 27 · Accepted Answer

如果您希望使用 kafka/bin 中存在的 Kafka shell 脚本，那么您可以使用 kafka-run-class.sh 获取最新和最小的偏移量。

获取最新的偏移命令将如下所示

bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list localhost:9092 --time -1 --topic topiname

要获得最小的偏移量命令将如下所示

bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list localhost:9092 --time -2 --topic topiname

您可以从以下链接找到有关 Get Offsets Shell 的更多信息

希望这可以帮助！

score 17 · Accepted Answer

from kafka import KafkaConsumer, TopicPartition

TOPIC = 'MYTOPIC'
GROUP = 'MYGROUP'
BOOTSTRAP_SERVERS = ['kafka01:9092', 'kafka02:9092']

consumer = KafkaConsumer(
        bootstrap_servers=BOOTSTRAP_SERVERS,
        group_id=GROUP,
        enable_auto_commit=False
    )


for p in consumer.partitions_for_topic(TOPIC):
    tp = TopicPartition(TOPIC, p)
    consumer.assign([tp])
    committed = consumer.committed(tp)
    consumer.seek_to_end(tp)
    last_offset = consumer.position(tp)
    print("topic: %s partition: %s committed: %s last: %s lag: %s" % (TOPIC, p, committed, last_offset, (last_offset - committed)))

consumer.close(autocommit=False)

score 11 · Accepted Answer

kafka-python>=1.3.4你可以使用：

kafka.KafkaConsumer.end_offsets（分区）

获取给定分区的最后一个偏移量。一个分区的最后一个偏移量是即将到来的消息的偏移量，即最后一条可用消息的偏移量+1。

from kafka import TopicPartition
from kafka.consumer import KafkaConsumer

con = KafkaConsumer(bootstrap_servers = brokers)
ps = [TopicPartition(topic, p) for p in con.partitions_for_topic(topic)]

con.end_offsets(ps)

score 3 · Accepted Answer

实现此目的的另一种方法是轮询消费者以获取上次使用的偏移量，然后使用 seek_to_end 方法获取最近可用的偏移量分区。

from kafka import KafkaConsumer
consumer = KafkaConsumer('my-topic',
                     group_id='my-group',
                     bootstrap_servers=['localhost:9092'])
consumer.poll()
consumer.seek_to_end()

这种方法在使用消费者组时特别有用。

来源：

score 2 · Accepted Answer

使用confluent-kafka-python

您可以使用position：

检索分区列表的当前位置（偏移量）。

from confluent_kafka import Consumer, TopicPartition


consumer = Consumer({"bootstrap.servers": "localhost:9092"})
topic = consumer.list_topics(topic='topicName')
partitions = [TopicPartition('topicName', partition) for partition in list(topic.topics['topicName'].partitions.keys())] 

offset_per_partition = consumer.position(partitions)

或者，您也可以使用get_watermark_offsets，但您必须一次传递一个分区，因此需要多次调用：

检索分区的低偏移量和高偏移量。

from confluent_kafka import Consumer, TopicPartition


consumer = Consumer({"bootstrap.servers": "localhost:9092"})
topic = consumer.list_topics(topic='topicName')
partitions = [TopicPartition('topicName', partition) for partition in list(topic.topics['topicName'].partitions.keys())] 

for p in partitions:
    low_offset, high_offset = consumer.get_watermark_offsets(p)
    print(f"Latest offset for partition {p}: {high_offset}")

使用kafka-python

您可以使用end_offsets：

获取给定分区的最后一个偏移量。一个分区的最后一个偏移量是即将到来的消息的偏移量，即最后一条可用消息的偏移量+1。

此方法不会更改分区的当前使用者位置。

from kafka import TopicPartition
from kafka.consumer import KafkaConsumer


consumer = KafkaConsumer(bootstrap_servers = "localhost:9092" )
partitions= = [TopicPartition('myTopic', p) for p in consumer.partitions_for_topic('myTopic')]
last_offset_per_partition = consumer.end_offsets(partitions)

python - 如何获取 kafka 主题分区的最新偏移量？

6 回答 6

Related

Reference