Leader Election in Distributed Systems

Learn Leader Election from the ground up. Understand why leader election is needed, how distributed systems elect a leader, heartbeat mechanisms, election timeouts, quorum voting, leader failover, split-brain prevention, and how systems like Kubernetes, ZooKeeper, etcd, and Apache Kafka coordinate distributed clusters.

Introduction

Imagine you're designing an enterprise banking platform.

The application runs on multiple servers deployed across AWS.

US-East-1

↓

US-West-2

↓

Europe

Each region contains multiple application instances.

All servers can receive requests simultaneously.

Now imagine this situation:

Customer transfers $10,000.

Two different servers receive the same request.

Server A

Withdraw $10,000

Server B

Withdraw $10,000

If both process the transaction,

the customer may lose $20,000.

This is unacceptable.

Distributed systems solve this by electing one server as the Leader.

Only the leader performs critical operations.

All remaining servers become Followers.

This process is known as Leader Election.

Learning Objectives

By the end of this article, you'll understand:

What is Leader Election?
Why Leader Election is Required
Leader vs Followers
Active-Active vs Active-Passive
Leader Responsibilities
Follower Responsibilities
Heartbeats
Election Timeout
Leader Failure Detection
Quorum
Split Brain
Production Examples

Why Do We Need Leader Election?

Suppose you have three application servers.

flowchart TD
    C[Client]

    S1[Server 1]
    S2[Server 2]
    S3[Server 3]

    C --> S1
    C --> S2
    C --> S3

If every server modifies shared data independently,

problems occur.

Problems Without Leader Election

Imagine three servers updating inventory.

Inventory = 5

Server A

Sell 3

Server B

Sell 4

Server C

Sell 2

Each server believes inventory is available.

Result

Negative Inventory

Overselling

Data Corruption

Real Banking Example

Customer Balance

$50,000

ATM

Withdraw $20,000

Mobile Banking

Withdraw $15,000

Internet Banking

Withdraw $30,000

If all servers execute simultaneously,

the final balance becomes incorrect.

Distributed System With Leader

flowchart TD
    CLIENT[Clients]

    LB[Load Balancer]

    LEADER[(Leader)]

    F1[(Follower 1)]

    F2[(Follower 2)]

    CLIENT --> LB

    LB --> LEADER

    LEADER --> F1
    LEADER --> F2

Only one node performs writes.

Followers replicate data.

What is Leader Election?

Leader Election is the process of selecting one node to coordinate the cluster.

Only one leader exists.

Remaining nodes become followers.

Leader Responsibilities

The Leader performs

Write Operations
Transaction Coordination
Data Replication
Heartbeats
Cluster Metadata Updates
Configuration Changes
Distributed Lock Management

Leader Architecture

flowchart TD
    LEADER[Leader]

    WRITE[Write Requests]

    REPLICATION[Replication]

    HEARTBEAT[Heartbeats]

    CONFIG[Cluster Configuration]

    LEADER --> WRITE
    LEADER --> REPLICATION
    LEADER --> HEARTBEAT
    LEADER --> CONFIG

Follower Responsibilities

Followers

Replicate Data
Receive Heartbeats
Participate in Elections
Become Leader if necessary
Optionally Serve Read Requests

Cluster Architecture

flowchart LR
    L[(Leader)]

    F1[(Follower)]

    F2[(Follower)]

    F3[(Follower)]

    L --> F1
    L --> F2
    L --> F3

Leader vs Followers

Leader	Followers
Accepts Writes	Replicate Data
Sends Heartbeats	Receive Heartbeats
Coordinates Cluster	Wait for Leader
One Node	Multiple Nodes

Client Request Flow

sequenceDiagram
    participant Client
    participant Leader
    participant Follower1
    participant Follower2

    Client->>Leader: Update Order

    Leader->>Follower1: Replicate

    Leader->>Follower2: Replicate

    Follower1-->>Leader: ACK

    Follower2-->>Leader: ACK

    Leader-->>Client: Success

Why Followers Cannot Accept Writes

Imagine two leaders.

Leader A

Inventory = 10

Leader B

Inventory = 8

Different clients receive different values.

Eventually

Database Corruption

Active-Passive Architecture

One active node.

Others remain passive.

flowchart LR
    CLIENT[Client]

    ACTIVE[(Leader)]

    PASSIVE1[(Follower)]

    PASSIVE2[(Follower)]

    CLIENT --> ACTIVE

    ACTIVE --> PASSIVE1
    ACTIVE --> PASSIVE2

Common in Banking.

Active-Active Architecture

Multiple servers process requests.

flowchart LR
    CLIENT[Client]

    NODE1[(Node 1)]

    NODE2[(Node 2)]

    NODE3[(Node 3)]

    CLIENT --> NODE1
    CLIENT --> NODE2
    CLIENT --> NODE3

Requires sophisticated conflict resolution.

Heartbeats

How do followers know the leader is alive?

The leader continuously sends

Heartbeat Messages

Heartbeat Flow

sequenceDiagram
    participant Leader
    participant Follower1
    participant Follower2

    loop Every 2 Seconds
        Leader->>Follower1: Heartbeat
        Leader->>Follower2: Heartbeat
    end

Heartbeats are tiny messages.

Purpose

Verify leader is alive
Prevent unnecessary elections
Synchronize metadata

Heartbeat Architecture

flowchart TD
    LEADER[(Leader)]

    HB1[Heartbeat]

    HB2[Heartbeat]

    F1[(Follower)]

    F2[(Follower)]

    LEADER --> HB1
    HB1 --> F1

    LEADER --> HB2
    HB2 --> F2

Election Timeout

Followers wait for heartbeats.

If none arrive,

they assume

Leader Failed

Each follower starts a timer.

Example

Node	Timeout
Node A	150 ms
Node B	220 ms
Node C	310 ms

Random timeouts reduce simultaneous elections.

Timeout Flow

flowchart TD
    START[Receive Heartbeat]

    WAIT[Wait]

    CHECK{Heartbeat Received?}

    RESET[Reset Timer]

    ELECTION[Start Election]

    START --> WAIT
    WAIT --> CHECK

    CHECK -->|Yes| RESET
    CHECK -->|No| ELECTION

Leader Failure

Suppose

Leader crashes.

flowchart TD
    LEADER[(Leader)]

    F1[(Follower)]

    F2[(Follower)]

    F3[(Follower)]

    LEADER -. Crash .-> F1

    LEADER -. Crash .-> F2

    LEADER -. Crash .-> F3

Followers detect

Heartbeat Timeout

Election starts.

Failure Detection Timeline

sequenceDiagram
    participant Leader
    participant Follower

    Leader->>Follower: Heartbeat
    Leader->>Follower: Heartbeat

    Note over Leader: Server Crash

    Note over Follower: Timeout Expires

    Follower->>Follower: Start Election

Leader Election Steps

flowchart TD
    A[Leader Failure]

    B[Heartbeat Timeout]

    C[Follower Becomes Candidate]

    D[Request Votes]

    E[Receive Majority]

    F[Become Leader]

    A --> B
    B --> C
    C --> D
    D --> E
    E --> F

Quorum

Leader election requires

Majority Votes

Formula

Majority = (N / 2) + 1

Quorum Table

Nodes	Votes Required
3	2
5	3
7	4
9	5

Quorum Architecture

flowchart TD
    CANDIDATE[(Candidate)]

    N1[(Node)]

    N2[(Node)]

    N3[(Node)]

    N4[(Node)]

    N5[(Node)]

    CANDIDATE --> N1
    CANDIDATE --> N2
    CANDIDATE --> N3
    CANDIDATE --> N4
    CANDIDATE --> N5

If the candidate receives 3 votes in a 5-node cluster,

it becomes the Leader.

Why Majority?

Imagine

Five Nodes

Only two vote.

Network partition occurs.

Another group also elects another leader.

Now

Two leaders exist.

Majority voting prevents this.

Split Brain

One of the biggest distributed system failures.

Two leaders exist simultaneously.

flowchart LR
    L1[(Leader A)]

    L2[(Leader B)]

    CLIENT1[Client]

    CLIENT2[Client]

    CLIENT1 --> L1
    CLIENT2 --> L2

    L1 -. Network Partition .- L2

Both accept writes.

Data becomes inconsistent.

Real World Examples

Banking

Leader coordinates

Money Transfers
Account Updates
Ledger Entries

Apache Kafka

Leader handles

Message Writes
Partition Coordination

Followers replicate logs.

Kubernetes

Leader coordinates

Scheduling
Controller Manager
Cluster State

ZooKeeper

Leader manages

Configuration
Locks
Metadata
Cluster Membership

etcd

Leader coordinates

Kubernetes State
Configuration
Distributed Locks

Advantages of Leader Election

Prevents conflicting writes
Simplifies distributed coordination
Supports automatic failover
Maintains consistency
Enables distributed locking
Foundation for consensus algorithms

Challenges

Leader failure
Election latency
Network partitions
Split brain
Leader bottleneck
Cluster reconfiguration

Summary

In this part, we learned:

What is Leader Election?
Why Leader Election is required
Leader and Follower architecture
Active-Passive vs Active-Active
Heartbeats
Election Timeout
Leader Failure Detection
Quorum
Split Brain
Real-world examples from Banking, Kafka, Kubernetes, ZooKeeper, and etcd

Loading likes...

Comments

Share a question, correction, or practical insight about this article.

Loading approved comments...