Can messages from a given partition ever be divided on multiple threads? Let's say that I have a single partition and a hundred processes with a hundred threads each - will the messages from my single partition be given to only one of those 10000 threads?
4 回答
Multiple threads cannot consume the same partition unless those threads are in different consumer groups. Only a single thread will consume the messages from the single partition although you have lots of idle consumers.
The number of partitions is the unit of parallelism in Kafka. To make multiple consumers consume the same partition, you must increase the number of partitions of the topic up to the parallelism you want to achieve or put every single thread into the separate consumer groups, but I think the latter is not desirable.
If you have multiple consumers consuming from the same topic under same consumer group then the messages in a topic are distributed among those consumers. In other words, each consumer will get a non-overlapping subset of the message. The following few line is taken from the Kafka FAQ page
Should I choose multiple group ids or a single one for the consumers?
If all consumers use the same group id, messages in a topic are distributed among those consumers. In other words, each consumer will get a non-overlapping subset of the messages. Having more consumers in the same group increases the degree of parallelism and the overall throughput of consumption. See the next question for the choice of the number of consumer instances. On the other hand, if each consumer is in its own group, each consumer will get a full copy of all messages.
Why some of the consumers in a consumer group never receive any message? Currently, a topic partition is the smallest unit that we distribute messages among consumers in the same consumer group. So, if the number of consumers is larger than the total number of partitions in a Kafka cluster (across all brokers), some consumers will never get any data. The solution is to increase the number of partitions on the broker
No in extreme cases.
Kafka high-level consumer can make sure that one message will only consumed once.And make sure that one partition only be consumed by one thread at the most time.
Because, there is a local queue in kafka high-level consumer. Consumers considers if you polled a message from the local queue out, you have consumed the message.
So lets tell a story:
Thread 1 consumes partition 0.
Thread 1 polled a message m0. Message m1,m2... have been in the local queue.
Rebalanced, kafka will clear the local queue and re-registered.
Thread 2 consumes partition 0 now, but thread 1 is still consuming m0.
Thread 2 could poll m1,m2... now.
You can see two threads are consuming the same partition at this time.
instead of using threads it better to increase consumers and partitions to get better throughput and better control