|
Agentic coding harness for small models 12-phase deterministic pipeline that makes small and open-weight models reliable on real repository tasks. Tree-sitter symbol extraction builds a code graph for repo intelligence. SecretGuard credential scanning. 4-tier command-risk gating. Docker sandboxing with network isolation. Mock mode for keyless evaluation. Solo-built end-to-end. |
CUDA kernels for SVD-based model optimization Parameter-efficient fine-tuning via SVD decomposition. Decomposes weight matrices, freezes U/V bases, learns lightweight scalars. Hand-written CUDA kernels managing GPU shared memory, thread synchronization, and warp-level execution. Profiled with Nsight Compute. ONNX export for cross-architecture deployment. HuggingFace Trainer integration. Full test suite, CI/CD. 700+ installs · |
|
221M-parameter transformer from scratch Full decoder-only transformer built at the numerical level in PyTorch. Implements RMSNorm, Rotary Positional Embeddings (RoPE), SwiGLU gated activations, grouped multi-head attention, mixed-precision training (fp16/bf16), and gradient checkpointing. Trained on a single 4GB GPU with reproducible scripts. No cloud budget, no framework wrappers, just raw PyTorch. |
Mixed-precision quantization pipeline Custom mixed int6/int8 quantization with per-row clip-search calibration for 24–27M parameter transformers. Per-layer numerical error analysis under strict model-size constraints. Investigated accuracy-compression tradeoffs across quantization configurations. |
📌 TransKV: Transactional KV Staging for Speculative Decoding under Paged KV Memory TechRxiv (IEEE), 2026 · Under submission to EMNLP 2026 workshop
Speculative decoding inflates KV cache pages and fragments memory under load. TransKV isolates speculative state in staging blocks, commits only accepted tokens, rolls back the rest. 14–41% reduction in committed-cache write traffic with formal output-equivalence proofs across Qwen2.5 model pairs.
📌 Blockchain-Based E-Voting with Proof-of-Work and ML — IET Blockchain, 2023 (peer-reviewed)
📌 NeRF: A Comprehensive Survey — IJISRT, 2023
I contribute to ML infrastructure that other engineers depend on.
vLLM · PR #44693 — Runtime memory optimization and KV-cache scheduling
CocoIndex · PR #1010 — Native Rust/PyO3 crate, throughput bottleneck fix
sglang · Structured generation, inference scheduling
nano-vllm · Lightweight LLM inference engine
3× Kaggle Master · Codeforces · AWS ML Specialty · Previously at Aerolift.AI and JPMorgan Chase




