Question 1

Explain Kafka consumer group rebalancing and how to prevent rebalance storms.

Accepted Answer

Rebalancing occurs when a consumer joins or leaves a group, or partitions are added, forcing Kafka to reassign partitions. A rebalance storm happens if consumers are slow to process messages, trigger timeouts, and leave the group, causing infinite rebalance loops. Prevent them by increasing `max.poll.interval.ms` or tuning `heartbeat.interval.ms`.

Question 2

Explain Kafka producer configurations for message delivery: acks=0, acks=1, and acks=all.

Accepted Answer

The `acks` parameter controls write confirmations:
- `acks=0`: Producer does not wait for confirmations, maximizing throughput but risking data loss.
- `acks=1`: Producer waits for the Leader broker to write to disk, protecting against connection drops.
- `acks=all` (or `-1`): Producer waits for the Leader and all In-Sync Replicas (ISR) to confirm writes, preventing data loss.

Question 3

How does Kafka guarantee message ordering within a topic partition?

Accepted Answer

Kafka guarantees strict message ordering *only* within a single partition. To preserve ordering: publish related messages with the same record key (routing them to the same partition) and configure `max.in.flight.requests.per.connection=1` on producers to prevent out-of-order retries.

Question 4

How do you write integration tests for Kafka producers and consumers using Testcontainers?

Accepted Answer

Use Testcontainers. In test setups, instantiate a Kafka container: `static KafkaContainer kafka = new KafkaContainer(DockerImageName.parse("confluentinc/cp-kafka:latest"))`. Start the container, configure client addresses, produce and consume messages, and assert payloads.

Question 5

Explain In-Sync Replicas (ISR) and partition leader elections in Kafka.

Accepted Answer

The ISR list contains replica brokers that are caught up with the partition leader. If the leader crashes, Kafka elects a follower *only* if it is in the ISR list. If no followers are in the ISR and `unclean.leader.election.enable` is true, Kafka elects an out-of-sync node, risking data loss.

Question 6

Explain how Kafka achieves high throughput using Zero-Copy and Page Cache techniques.

Accepted Answer

Kafka achieves high throughput by: 
1. Page Cache: Leveraging OS page caches in RAM instead of buffering in JVM heap.
2. Zero-Copy: Bypassing user-space memory copies. When a consumer reads, Kafka uses the `sendfile` system call to transfer log bytes from the page cache directly to the network socket.

Question 7

How do you monitor and resolve consumer lag in production?

Accepted Answer

Consumer lag is the offset difference between the latest produced message and the consumer's read offset. Monitor it using metrics collectors (like Burrow). Resolve by scaling consumer group sizes (up to partition counts) or tuning consumer configurations.

Question 8

How do you mock Kafka producers and consumers in unit tests?

Accepted Answer

Use MockProducer and MockConsumer classes from the `org.apache.kafka.clients.producer/consumer` packages. These mock classes simulate broker connections, letting you test message serialization and polling logic in unit tests.

Question 9

Explain how log cleaner processes execute log compaction.

Accepted Answer

Log cleaner threads run in the background. They scan compaction topics, group messages by keys, and discard older offsets. The latest record value is retained, along with a marker (tombstone) if deleted, saving space.

Question 10

What is partition skew and how does it degrade throughput?

Accepted Answer

Partition skew occurs when messages are distributed unevenly across partitions. This causes specific broker nodes to experience high CPU and disk load while others remain idle, degrading cluster performance.

Question 11

Explain Kafka transaction processing and transactional IDs.

Accepted Answer

To process messages across topics atomically (read-process-write), configure producers with `transactional.id` and run commands inside `beginTransaction()`/`commitTransaction()` blocks, allowing consumers to read committed data only.

Question 12

What is the difference between offset commit strategies: auto commit vs manual commit?

Accepted Answer

- Auto Commit (`enable.auto.commit=true`): Automatically commits offsets at intervals, which is simple but risks duplicate processing on crashes.
- Manual Commit: Consumer calls `commitSync()` or `commitAsync()` after processing messages, ensuring exact execution.

Question 13

How do you test Kafka schema validations in CI/CD pipelines?

Accepted Answer

Integrate with the Confluent Schema Registry. Write tests that register Avro/JSON schemas, validate that producers reject mismatched payloads, and verify that schemas are backwards compatible before updates.

Question 14

Explain Kafka Streams API and stateless vs stateful operations.

Accepted Answer

Kafka Streams is a client library for building stream processing applications:
- Stateless: Simple mappings or filters on individual messages.
- Stateful: Windowed joins and aggregations on keys, which store states locally in RocksDB databases.

Question 15

What is segment size in Kafka logs and how does it affect compaction?

Accepted Answer

Kafka splits partition logs into segment files on disk (default 1GB). Log compaction and deletions only occur on closed segments; active segments are never cleaned, which is important for memory sizing.

Question 16

How do you manage Kafka client connections leaks?

Accepted Answer

Ensure Kafka clients (producers and consumers) are reused as singletons and closed properly in shutdown hooks. Connection leaks exhaust broker threads, causing timeouts in clusters.

Question 17

Explain Kafka Exactly-Once Semantics (EOS), detailing how idempotent producers, transactional coordinators, and 2PC transactions work.

Accepted Answer

Kafka Exactly-Once Semantics (EOS) guarantees that messages are processed exactly once across read-process-write cycles. Key components:
1. Idempotent Producers: Producers attach unique sequence numbers and producer IDs (PIDs) to messages. If a broker receives duplicate sequence numbers due to network retries, it discards them, avoiding duplicate writes.
2. Transactional Coordinator: A broker node that manages transaction logs.
3. Two-Phase Commit (2PC): When a transaction runs, the coordinator writes the status (prepare/commit) to a `__transaction_state` topic. Once all writes to partition logs confirm, the coordinator writes a commit marker, letting consumers configured with `isolation.level=read_committed` read the data.

Question 18

How would you optimize a Kafka cluster experiencing high controller election times and disk I/O bottlenecks under heavy traffic?

Accepted Answer

Optimize Kafka clusters by:
1. Controller Optimization: Reduce partition counts. Having too many partitions (e.g. > 10k per broker) slows down controller metadata updates and increases election times on broker crashes.
2. Disk I/O: Bind log directories to separate physical SSDs. Tune kernel settings: increase page cache allocations, set `vm.dirty_background_ratio = 5` to flush page caches to disk early, and increase `num.io.threads`.

Question 19

Explain how to secure a Kafka cluster using SASL/SCRAM, SSL/TLS encryption, and ACLs.

Accepted Answer

Secure Kafka by:
1. Encryption in Transit: Enable SSL/TLS encryption for all client-broker and inter-broker communication.
2. Authentication: Configure SASL/SCRAM or SASL/OAUTHBEARER authentication to verify client identities.
3. Authorization: Use Access Control Lists (ACLs) to restrict user access to specific topics (e.g. allowing read/write only on matching paths).

Senior Kafka Interview Questions (5+ Years Experience) (2026)

What is Kafka and Why is it Critical in Modern Engineering?

Kafka Lifecycle Visualizer

Core Architectural Concepts in Kafka

Log-Structured Appending

Consumer Group Balancing

Partition Replications

Offset Commit Modes

Zero-Copy Data Pipelines

check_circleWhy Modern Companies Choose Kafka

lightbulbStrategic Preparation Tips

errorCrucial Mistakes to Avoid

trending_upHiring Trends & Career Outlook (2026)

Performance