Data engineer who builds infrastructure from scratch to understand how it actually works.
Based in Mombasa, Kenya. I write Python, SQL, and whatever gets the pipeline running.
Data pipelines and infrastructure. Most of my recent work is a series of from-scratch implementations of tools I use daily. No frameworks, no dependencies, just the core algorithms:
| Project | What it is |
|---|---|
| streamlite | Stream processing engine - windowing, watermarks, keyed state, checkpoints. Flink internals demystified. |
| brokerlite | Message broker with pub/sub, consumer groups, WAL, dead letter queues. Kafka-inspired. |
| raftkv | Distributed key-value store with Raft consensus - leader election, log replication, strong consistency. |
| queryforge | SQL query engine - lexer, parser, optimizer, executor. SELECT, JOIN, GROUP BY, subqueries over CSV/JSON. |
| searchlite | Full-text search engine - inverted index, BM25 scoring, Porter stemmer, faceted search. |
| cachelite | In-memory cache with LRU/LFU/FIFO eviction, TTL, snapshots, HTTP API. |
| cronlite | Task scheduler - POSIX cron syntax, priority queues, DAG dependencies, retry strategies, SQLite persistence. |
| vaultlite | Secrets manager with AES-128 from scratch. Envelope encryption, seal/unseal, audit logging, versioning. |
| gatelite | API gateway - routing, rate limiting, JWT auth, circuit breaking, load balancing, caching. |
| tracelite | Distributed tracing - W3C Trace Context, sampling, critical path analysis, waterfall visualization. |
| servekit | HTTP/1.1 server built from raw TCP sockets. |
| tinylang | Programming language interpreter - lexer, parser, AST, closures, first-class functions. |
Every one of these is zero dependencies, pure Python standard library.
| Project | Stack |
|---|---|
| afridata-pipeline | World Bank API to DuckDB star-schema warehouse. Dimensional modeling, data quality checks, Vercel dashboard. |
| realtime-event-pipeline | Kafka + DuckDB streaming pipeline. Ingestion, transformation, enrichment, OLAP analytics. |
| dbt-ecommerce-warehouse | dbt + DuckDB analytics warehouse. Star schema, 50+ tests, custom macros, incremental models. |
| stock-market-data-pipeline | Real-time stock tracking. Airflow, Spark, Slack alerts, Metabase dashboards. |
| datapact | Data quality and contract validation library. Declare expectations, enforce in pipelines and CI. |
| datadrift | Drift detection framework - schema changes, distribution shifts, statistical testing, HTML reports. |
| Project | What it does |
|---|---|
| documind | RAG document Q&A. Hybrid search (BM25 + TF-IDF), cited answers, pluggable LLMs. |
| ai-agent-toolkit | Composable agent framework - tool use, memory, multi-agent orchestration. Under 1000 lines of core. |
| pipeforge | CI/CD pipeline generator - analyzes codebases and outputs GitHub Actions, GitLab CI, Docker configs. |
| vectorlite | Vector search engine - Flat, IVF, HNSW indexes with cosine/euclidean/dot product. |
| airbnb-clone | Full-stack MERN app. MongoDB, Express, React, Node. Auth, search, bookings, image upload. |
- BSc Mathematics and Computer Science, JKUAT
- Data Engineering certs from ExploreAI Academy and Wizeline Academy
- AWS Certified Cloud Practitioner
- Day-to-day: Python, SQL, dbt, Airflow, Spark, Kafka, DuckDB, BigQuery, Docker, GCP, Azure


