A practical, step-by-step implementation of a distributed key-value store with persistence, consistent hashing, replication, and load balancing.
- Step 1: Single in-memory FastAPI node with GET, POST, DELETE endpoints
- Step 2: Persistent storage with Write-Ahead Log (WAL) and checkpointing
- Step 3: Consistent hashing for distributed data placement
- Step 4: Distributed cluster with automatic request forwarding
- Step 5: Replication for fault tolerance (primary-replica model)
- Step 6: Load balancer with smart routing
[Load Balancer]
|
+----------------+----------------+
| | |
[Node 1] [Node 2] [Node 3]
(Primary) (Replica) (Replica)
- Consistent Hashing: Keys are distributed across nodes using consistent hashing
- Replication: Data is replicated to N nodes (default: 3) for fault tolerance
- Persistence: Write-Ahead Log (WAL) + periodic snapshots for durability
- Load Balancing: Smart routing directly to the correct owner node
- Clone this repository:
cd /Users/huseyin/tet- Install dependencies:
pip install -r requirements.txtEach node needs to know:
- Its own URL (
CURRENT_NODE_URL) - All cluster nodes (
CLUSTER_NODES) - Optional: Replication factor (
REPLICATION_FACTOR, default: 3)
Terminal 1 - Node 1 (Port 8001):
export CURRENT_NODE_URL="http://localhost:8001"
export CLUSTER_NODES="http://localhost:8001,http://localhost:8002,http://localhost:8003"
export NODE_NAME="node1"
export PORT=8001
python main.pyTerminal 2 - Node 2 (Port 8002):
export CURRENT_NODE_URL="http://localhost:8002"
export CLUSTER_NODES="http://localhost:8001,http://localhost:8002,http://localhost:8003"
export NODE_NAME="node2"
export PORT=8002
python main.pyTerminal 3 - Node 3 (Port 8003):
export CURRENT_NODE_URL="http://localhost:8003"
export CLUSTER_NODES="http://localhost:8001,http://localhost:8002,http://localhost:8003"
export NODE_NAME="node3"
export PORT=8003
python main.pyTerminal 4 - Load Balancer (Port 9000):
export DATABASE_NODES="http://localhost:8001,http://localhost:8002,http://localhost:8003"
export LB_PORT=9000
python load_balancer.pyAll requests go through the load balancer on port 9000:
Set a key-value pair:
curl -X POST "http://localhost:9000/v1/my_key" \
-H "Content-Type: application/json" \
-d '{"value": "my_value"}'Get a value:
curl "http://localhost:9000/v1/my_key"Delete a key:
curl -X DELETE "http://localhost:9000/v1/my_key"You can also access nodes directly (they will forward requests to the correct owner):
# Set a key
curl -X POST "http://localhost:8001/v1/test_key" \
-H "Content-Type: application/json" \
-d '{"value": "test_value"}'
# Get a key
curl "http://localhost:8001/v1/test_key"
# Delete a key
curl -X DELETE "http://localhost:8001/v1/test_key"CURRENT_NODE_URL: The URL of this node (e.g.,http://localhost:8001)CLUSTER_NODES: Comma-separated list of all cluster nodesNODE_NAME: Name/identifier for this node (optional, default:node1)PORT: Port to run on (optional, default:8000)REPLICATION_FACTOR: Number of replicas (optional, default:3)
DATABASE_NODES: Comma-separated list of database nodesLB_PORT: Port for load balancer (optional, default:9000)
.
├── main.py # Database node server (Steps 1-5)
├── load_balancer.py # Load balancer server (Step 6)
├── hashing.py # Consistent hashing implementation (Step 3)
├── requirements.txt # Python dependencies
└── README.md # This file
# Runtime files (created automatically)
├── wal.log # Write-Ahead Log
└── snapshot.db # Database snapshot
Keys are hashed and assigned to nodes using a consistent hash ring. This ensures:
- Even distribution of keys
- Minimal rehashing when nodes are added/removed
Each write is replicated to N nodes:
- Primary node receives the write
- Primary node writes to its WAL and database
- Primary node replicates to replica nodes asynchronously
- Write-Ahead Log (WAL): All writes are logged to
wal.logbefore updating the database - Checkpointing: Every 5 minutes, the database state is saved to
snapshot.db - Recovery: On startup, the system loads the snapshot and replays the WAL
- Client sends request to load balancer
- Load balancer calculates owner node using consistent hashing
- Load balancer forwards request to owner node
- Owner node processes request and replicates to replicas (if write operation)
- Response is returned to client
# Set some keys
curl -X POST "http://localhost:9000/v1/user1" -H "Content-Type: application/json" -d '{"value": "Alice"}'
curl -X POST "http://localhost:9000/v1/user2" -H "Content-Type: application/json" -d '{"value": "Bob"}'
curl -X POST "http://localhost:9000/v1/user3" -H "Content-Type: application/json" -d '{"value": "Charlie"}'
# Get values
curl "http://localhost:9000/v1/user1"
curl "http://localhost:9000/v1/user2"
# Delete a key
curl -X DELETE "http://localhost:9000/v1/user3"- Start 3 nodes and load balancer
- Write some data
- Stop one node
- Try to read data - should still work (reads from replicas)
# Check load balancer health
curl "http://localhost:9000/health"
# Check individual node status
curl "http://localhost:8001/"- This is a learning project and not production-ready
- Error handling is minimal
- No authentication or authorization
- Network partitions are not handled
- Conflict resolution is not implemented
- Data consistency is eventual (for replication)
Potential improvements:
- Add authentication/authorization
- Implement stronger consistency models
- Add monitoring and metrics
- Handle network partitions
- Implement node join/leave protocols
- Add data compression
- Implement backups
- Add transaction support