Felpfe Inc.
Close this search box.
call 24/7

+484 237-1364‬

Close this search box.

Fault-Tolerance in Kafka: Understanding Replication and Leader Election


In the world of distributed systems, fault tolerance is a critical aspect. With several components working together, any single component’s failure shouldn’t halt the entire system. Apache Kafka, a distributed streaming platform, also adheres to this principle through its replication mechanism and leader election process. This post delves into the ins and outs of Kafka’s approach to achieving fault tolerance, enabling your applications to handle failures gracefully.

Kafka Replication

Replication in Kafka is an essential feature that ensures the availability and durability of data. Kafka stores replicas of each topic partition across a configurable number of servers (brokers), which allows it to continue functioning even in the face of hardware failures.

1. Creating a Topic with Replication

Let’s start by creating a topic with a replication factor.

kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 3 --partitions 1 --topic replicated-topic

In this command, replication-factor 3 denotes that Kafka will maintain three copies of this topic across the brokers.

2. Describing a Topic

We can describe the topic to view its details.

kafka-topics.sh --describe --bootstrap-server localhost:9092 --topic replicated-topic

This command provides information about the replicated-topic, including its replication factor and the status of its replicas.

Kafka Leader Election

In Kafka’s replication design, every partition of a topic has one server acting as the “leader” and others as “followers”. The leader handles all read-write operations for the partition, while the followers replicate the leader’s data. If the leader fails, one of the followers will automatically become the new leader – a process known as “Leader Election”.

3. Configuring Unclean Leader Election

By default, Kafka only allows a follower that is in-sync to become the leader. However, in some cases, you may prefer availability over durability. Kafka provides a configuration for this: unclean.leader.election.enable.

# server.properties

Setting this property to true means that an out-of-sync replica can be elected leader, potentially increasing availability but risking data loss.

4. Controlled Shutdown

In the event of planned downtime, Kafka supports a controlled shutdown process to limit the impact on the service.


This command will trigger a controlled shutdown of the Kafka server, transferring leadership roles before stopping.

5. Forced Leader Election

In some scenarios, it may be beneficial to force a leader election manually. This can be achieved using the kafka-preferred-replica-election.sh tool.

kafka-preferred-replica-election.sh --bootstrap-server localhost:9092 --path-to-json-file preferred_replica_election.json

This command triggers a preferred replica leader election based on the replicas specified in preferred_replica_election.json.

Partition Reassignment

Reassigning partitions can be a useful technique when scaling or recovering from a failed broker.

6. Generate Current Partition Assignments

First, we need to generate a JSON file containing the current partition assignments.

kafka-reassign-partitions.sh --bootstrap-server localhost:9092 --generate --broker-list "1,2" --topics-to-move-json-file topics.json

This command generates the current partition assignments for the topics listed in topics.json and outputs it to the console.

7. Execute Partition Reassignment

Next, we need to execute the reassignment based on a JSON file.

kafka-reassign-partitions.sh --bootstrap-server localhost:9092 --execute --reassignment-json-file reassignment.json

This command initiates the partition reassignment process based on reassignment.json.

8. Verify Partition Reassignment

Finally, we should verify the status of the reassignment.

kafka-reassign-partitions.sh --bootstrap-server localhost:9092 --verify --reassignment-json-file reassignment.json

This command checks the status of the reassignment process.

Broker Configuration for High Availability

Lastly, let’s configure our Kafka brokers for high availability.

9. Configuring Replication Factors

As discussed, the replication factor is a critical configuration for fault tolerance. It can be set in the broker’s configuration file.

# server.properties

In this property, we set the default replication factor for automatically created topics to 3.

10. Min In-sync Replicas

This configuration denotes the minimum number of in-sync replicas required to acknowledge a produce request.

# server.properties

Setting min.insync.replicas=2 means at least two replicas must confirm they’ve received the data before the producer gets a successful response.


Fault tolerance in Apache Kafka relies heavily on its replication and leader election processes. By understanding these mechanisms and the various configurations that govern them, you can build Kafka-based systems resilient to failures and capable of providing continuous service.

As we’ve seen, these configurations allow you to balance between availability, durability, and performance, depending on your specific needs. Remember, managing a distributed system like Kafka is an ongoing task that requires monitoring, fine-tuning, and timely responses to emerging situations. With this guide, you should now be better equipped to navigate the complexities of Kafka’s fault tolerance features and build robust, reliable Kafka deployments.

Leave a Reply

About Author
Ozzie Feliciano CTO @ Felpfe Inc.

Ozzie Feliciano is a highly experienced technologist with a remarkable twenty-three years of expertise in the technology industry.

Stream Dream: Diving into Kafka Streams
In “Stream Dream: Diving into Kafka Streams,”...
Talking in Streams: KSQL for the SQL Lovers
“Talking in Streams: KSQL for the SQL Lovers”...
Stream Symphony: Real-time Wizardry with Spring Cloud Stream Orchestration
Description: The blog post, “Stream Symphony:...
Kafka Chronicles: Saga of Resilient Microservices Communication with Spring Cloud Stream
“Kafka Chronicles: Saga of Resilient Microservices...
Tackling Security in Kafka: A Comprehensive Guide on Authentication and Authorization
As the usage of Apache Kafka continues to grow in organizations...
1 2 3 58
90's, 2000's and Today's Hits
Decades of Hits, One Station

Listen to the greatest hits of the 90s, 2000s and Today. Now on TuneIn. Listen while you code.