2

我知道 Kafka 可以批量提取事件。我试图理解这种情况:

  • 我有一个主题的 4 个分区
  • 我有 1 个消费者,Kafka 将所有 4 个分区分配给它。
  • 假设 Kafka 客户端从 Kafka 中提取的每批消息都是 5 条消息。

我在这里想要了解的是,如果 1 批次中的事件都来自同一个分区,然后循环到下一个分区批次。或者批次本身是否已经包含来自不同分区的事件?

4

1 回答 1

2

我不能给你一个准确的答案,但发现它足够有趣,可以测试一下。

为此,我创建了一个具有四个分区的主题,并使用kafka-producer-perf-test命令行工具在该主题中生成了一些消息。由于性能测试工具根本不会创建任何键,因此消息以循环方式写入主题分区。

kafka-producer-perf-test --topic test --num-records 1337 --throughput -1 --record-size 128 --producer-props key.serializer=org.apache.kafka.common.serialization.StringSerializer --producer-props value.serializer=org.apache.kafka.common.serialization.StringSerializer --producer-props bootstrap.servers=localhost:9092

之后,我使用配置创建了一个简单的 KafkaConsumermax_poll_records=5来匹配您的问题。消费者只需打印出所消费的每条消息的偏移量和分区:

Integer counter = 0;

// consume messages with `poll` call and print out results
try(KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(settings)) {
    consumer.subscribe(Arrays.asList(topic));
    while (true) {
        System.out.printf("Batch = %d\n", counter);
        ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
        for (ConsumerRecord<String, String> record : records) {
            System.out.printf("offset = %d, partition = %d\n", record.offset(), record.partition());
        }
        counter += 1;
    }
}

回答您的问题的结果是,消费者在移动到另一个分区之前尝试从一个分区获取尽可能多的数据。只有在所有来自 partition 的消息1都被消费但 max_poll_records 的限制未达到 5 的情况下,它才从 partition 添加了两条消息2

以下是一些印刷品,以便更好地理解。

Batch = 0
offset = 310, partition = 0
offset = 311, partition = 0
offset = 312, partition = 0
offset = 313, partition = 0
offset = 314, partition = 0

Batch = 1
offset = 315, partition = 0
offset = 316, partition = 0
offset = 317, partition = 0
offset = 318, partition = 0
offset = 319, partition = 0

# only offsets with partition 0

Batch = 45
offset = 525, partition = 0
offset = 526, partition = 0
offset = 527, partition = 0
offset = 528, partition = 0
offset = 529, partition = 0
Batch = 46
offset = 728, partition = 1
offset = 729, partition = 1
offset = 730, partition = 1
offset = 731, partition = 1
offset = 732, partition = 1

# only offsets with partition 1

Batch = 86
offset = 928, partition = 1
offset = 929, partition = 1
offset = 930, partition = 1
offset = 931, partition = 1
offset = 932, partition = 1
Batch = 87
offset = 465, partition = 2
offset = 466, partition = 2
offset = 933, partition = 1
offset = 934, partition = 1
offset = 935, partition = 1
Batch = 88
offset = 467, partition = 2
offset = 468, partition = 2
offset = 469, partition = 2
offset = 470, partition = 2
offset = 471, partition = 2

## and so on
于 2020-10-19T13:31:11.023 回答