Skip to content
View MrAnayDongre's full-sized avatar
🚀
God Speed
🚀
God Speed
  • United States
  • 00:26 (UTC -07:00)
  • LinkedIn in/anayd

Block or report MrAnayDongre

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
MrAnayDongre/README.md

Anay Dongre

ML Systems Engineer

I build LLM inference systems, agentic AI platforms, and GPU kernels.

github_readme_page

       


🔨 Shipped

Agentic coding harness for small models

12-phase deterministic pipeline that makes small and open-weight models reliable on real repository tasks. Tree-sitter symbol extraction builds a code graph for repo intelligence. SecretGuard credential scanning. 4-tier command-risk gating. Docker sandboxing with network isolation. Mock mode for keyless evaluation. Solo-built end-to-end.

8 LLM providers including

CUDA kernels for SVD-based model optimization

Parameter-efficient fine-tuning via SVD decomposition. Decomposes weight matrices, freezes U/V bases, learns lightweight scalars. Hand-written CUDA kernels managing GPU shared memory, thread synchronization, and warp-level execution. Profiled with Nsight Compute. ONNX export for cross-architecture deployment. HuggingFace Trainer integration. Full test suite, CI/CD.

700+ installs · pip install eigentune

221M-parameter transformer from scratch

Full decoder-only transformer built at the numerical level in PyTorch. Implements RMSNorm, Rotary Positional Embeddings (RoPE), SwiGLU gated activations, grouped multi-head attention, mixed-precision training (fp16/bf16), and gradient checkpointing. Trained on a single 4GB GPU with reproducible scripts. No cloud budget, no framework wrappers, just raw PyTorch.

Parameter Golf

Mixed-precision quantization pipeline

Custom mixed int6/int8 quantization with per-row clip-search calibration for 24–27M parameter transformers. Per-layer numerical error analysis under strict model-size constraints. Investigated accuracy-compression tradeoffs across quantization configurations.


📄 Research

 📌 TransKV: Transactional KV Staging for Speculative Decoding under Paged KV Memory      TechRxiv (IEEE), 2026 · Under submission to EMNLP 2026 workshop

Speculative decoding inflates KV cache pages and fragments memory under load. TransKV isolates speculative state in staging blocks, commits only accepted tokens, rolls back the rest. 14–41% reduction in committed-cache write traffic with formal output-equivalence proofs across Qwen2.5 model pairs.

 📌 Blockchain-Based E-Voting with Proof-of-Work and MLIET Blockchain, 2023 (peer-reviewed)

 📌 NeRF: A Comprehensive SurveyIJISRT, 2023


🔧 Open Source

I contribute to ML infrastructure that other engineers depend on.

vLLM · PR #44693 — Runtime memory optimization and KV-cache scheduling  

CocoIndex · PR #1010 — Native Rust/PyO3 crate, throughput bottleneck fix  

sglang · Structured generation, inference scheduling  

nano-vllm · Lightweight LLM inference engine  


✍️ Writing


3× Kaggle Master · Codeforces · AWS ML Specialty · Previously at Aerolift.AI and JPMorgan Chase

Pinned Loading

  1. eigentune eigentune Public

    Python 1

  2. PatchQuest PatchQuest Public

    Local-first agentic coding harness for small and open-weight models.

    Python 1

  3. cocoindex cocoindex Public

    Forked from cocoindex-io/cocoindex

    Data transformation framework for AI. Ultra performant, with incremental processing.

    Python 1

  4. Inference-Kernels Inference-Kernels Public

    LLM inference kernels from scratch in Triton: KV cache, FlashAttention, PagedAttention, RMSNorm, RoPE, SwiGLU, and benchmarks.

    Python 1

  5. nano-vllm nano-vllm Public

    Forked from GeeeekExplorer/nano-vllm

    Nano vLLM

    Python

  6. Machine-Learning-Collection Machine-Learning-Collection Public template

    Repo for Implementing Research Papers & Projects related to Machine Learning

    Python 13 4