Distributed Key-Value Store

A practical, step-by-step implementation of a distributed key-value store with persistence, consistent hashing, replication, and load balancing.

Note: This is a learning project and is not intended for production use.

Features

Step 1: Single in-memory FastAPI node with GET, POST, DELETE endpoints
Step 2: Persistent storage with Write-Ahead Log (WAL) and checkpointing
Step 3: Consistent hashing for distributed data placement
Step 4: Distributed cluster with automatic request forwarding
Step 5: Replication for fault tolerance (primary-replica model)
Step 6: Load balancer with smart routing

Architecture

                    [Load Balancer]
                         |
        +----------------+----------------+
        |                |                |
    [Node 1]         [Node 2]         [Node 3]
    (Primary)      (Replica)        (Replica)

Consistent Hashing: Keys are distributed across nodes using consistent hashing
Replication: Data is replicated to N nodes (default: 3) for fault tolerance
Persistence: Write-Ahead Log (WAL) + periodic snapshots for durability
Load Balancing: Smart routing directly to the correct owner node

Installation

Clone this repository:

cd /Users/huseyin/tet

Install dependencies:

pip install -r requirements.txt

Running the System

Step 1-5: Running Database Nodes

Each node needs to know:

Its own URL (CURRENT_NODE_URL)
All cluster nodes (CLUSTER_NODES)
Optional: Replication factor (REPLICATION_FACTOR, default: 3)

Example: Running 3 Nodes

Terminal 1 - Node 1 (Port 8001):

export CURRENT_NODE_URL="http://localhost:8001"
export CLUSTER_NODES="http://localhost:8001,http://localhost:8002,http://localhost:8003"
export NODE_NAME="node1"
export PORT=8001
python main.py

Terminal 2 - Node 2 (Port 8002):

export CURRENT_NODE_URL="http://localhost:8002"
export CLUSTER_NODES="http://localhost:8001,http://localhost:8002,http://localhost:8003"
export NODE_NAME="node2"
export PORT=8002
python main.py

Terminal 3 - Node 3 (Port 8003):

export CURRENT_NODE_URL="http://localhost:8003"
export CLUSTER_NODES="http://localhost:8001,http://localhost:8002,http://localhost:8003"
export NODE_NAME="node3"
export PORT=8003
python main.py

Step 6: Running the Load Balancer

Terminal 4 - Load Balancer (Port 9000):

export DATABASE_NODES="http://localhost:8001,http://localhost:8002,http://localhost:8003"
export LB_PORT=9000
python load_balancer.py

API Usage

Using the Load Balancer (Recommended)

All requests go through the load balancer on port 9000:

Set a key-value pair:

curl -X POST "http://localhost:9000/v1/my_key" \
  -H "Content-Type: application/json" \
  -d '{"value": "my_value"}'

Get a value:

curl "http://localhost:9000/v1/my_key"

Delete a key:

curl -X DELETE "http://localhost:9000/v1/my_key"

Direct Node Access

You can also access nodes directly (they will forward requests to the correct owner):

# Set a key
curl -X POST "http://localhost:8001/v1/test_key" \
  -H "Content-Type: application/json" \
  -d '{"value": "test_value"}'

# Get a key
curl "http://localhost:8001/v1/test_key"

# Delete a key
curl -X DELETE "http://localhost:8001/v1/test_key"

Environment Variables

Database Node (`main.py`)

CURRENT_NODE_URL: The URL of this node (e.g., http://localhost:8001)
CLUSTER_NODES: Comma-separated list of all cluster nodes
NODE_NAME: Name/identifier for this node (optional, default: node1)
PORT: Port to run on (optional, default: 8000)
REPLICATION_FACTOR: Number of replicas (optional, default: 3)

Load Balancer (`load_balancer.py`)

DATABASE_NODES: Comma-separated list of database nodes
LB_PORT: Port for load balancer (optional, default: 9000)

File Structure

.
├── main.py              # Database node server (Steps 1-5)
├── load_balancer.py     # Load balancer server (Step 6)
├── hashing.py           # Consistent hashing implementation (Step 3)
├── requirements.txt     # Python dependencies
└── README.md           # This file

# Runtime files (created automatically)
├── wal.log             # Write-Ahead Log
└── snapshot.db         # Database snapshot

How It Works

Consistent Hashing

Keys are hashed and assigned to nodes using a consistent hash ring. This ensures:

Even distribution of keys
Minimal rehashing when nodes are added/removed

Replication

Each write is replicated to N nodes:

Primary node receives the write
Primary node writes to its WAL and database
Primary node replicates to replica nodes asynchronously

Persistence

Write-Ahead Log (WAL): All writes are logged to wal.log before updating the database
Checkpointing: Every 5 minutes, the database state is saved to snapshot.db
Recovery: On startup, the system loads the snapshot and replays the WAL

Request Flow

Client sends request to load balancer
Load balancer calculates owner node using consistent hashing
Load balancer forwards request to owner node
Owner node processes request and replicates to replicas (if write operation)
Response is returned to client

Testing

Test Basic Operations

# Set some keys
curl -X POST "http://localhost:9000/v1/user1" -H "Content-Type: application/json" -d '{"value": "Alice"}'
curl -X POST "http://localhost:9000/v1/user2" -H "Content-Type: application/json" -d '{"value": "Bob"}'
curl -X POST "http://localhost:9000/v1/user3" -H "Content-Type: application/json" -d '{"value": "Charlie"}'

# Get values
curl "http://localhost:9000/v1/user1"
curl "http://localhost:9000/v1/user2"

# Delete a key
curl -X DELETE "http://localhost:9000/v1/user3"

Test Fault Tolerance

Start 3 nodes and load balancer
Write some data
Stop one node
Try to read data - should still work (reads from replicas)

Check Node Status

# Check load balancer health
curl "http://localhost:9000/health"

# Check individual node status
curl "http://localhost:8001/"

Notes

This is a learning project and not production-ready
Error handling is minimal
No authentication or authorization
Network partitions are not handled
Conflict resolution is not implemented
Data consistency is eventual (for replication)

Next Steps

Potential improvements:

Add authentication/authorization
Implement stronger consistency models
Add monitoring and metrics
Handle network partitions
Implement node join/leave protocols
Add data compression
Implement backups
Add transaction support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Distributed Key-Value Store

Note: This is a learning project and is not intended for production use.

Features

Architecture

Installation

Running the System

Step 1-5: Running Database Nodes

Example: Running 3 Nodes

Step 6: Running the Load Balancer

API Usage

Using the Load Balancer (Recommended)

Direct Node Access

Environment Variables

Database Node (`main.py`)

Load Balancer (`load_balancer.py`)

File Structure

How It Works

Consistent Hashing

Replication

Persistence

Request Flow

Testing

Test Basic Operations

Test Fault Tolerance

Check Node Status

Notes

Next Steps

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
README.md		README.md
hashing.py		hashing.py
load_balancer.py		load_balancer.py
main.py		main.py
package-lock.json		package-lock.json
package.json		package.json
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Distributed Key-Value Store

Note: This is a learning project and is not intended for production use.

Features

Architecture

Installation

Running the System

Step 1-5: Running Database Nodes

Example: Running 3 Nodes

Step 6: Running the Load Balancer

API Usage

Using the Load Balancer (Recommended)

Direct Node Access

Environment Variables

Database Node (main.py)

Load Balancer (load_balancer.py)

File Structure

How It Works

Consistent Hashing

Replication

Persistence

Request Flow

Testing

Test Basic Operations

Test Fault Tolerance

Check Node Status

Notes

Next Steps

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Database Node (`main.py`)

Load Balancer (`load_balancer.py`)

Packages