Predict LLM performance, memory scaling, and OOM risks before renting a single physical GPU.
llm-twin is a lightweight, zero-dependency CLI utility that acts as an architecture validator for distributed LLM deployments. By modeling non-linear scaling laws, attention overhead, and dynamic KV Cache expansion, it evaluates your workload configurations against hardware topologies to flag deployment bottlenecks before your code hits production.
- Zero-Dependency Core: Pure Python implementation that runs instantly in any environment.
- Dynamic KV Cache Modeling: Evaluates true context window VRAM demand, catching the hidden Out-of-Memory (OOM) failures that simple parameter-counting misses.
- Hardware Profile Validation: Ready-to-go templates for single A100/H100 setups up to massive 32x distributed clusters.
- CI/CD Ready: Returns structured simulation telemetry, ideal for gating performance regressions in pull requests.
- Clone the repository:
git clone [https://github.com/yourusername/llm-twin.git](https://github.com/yourusername/llm-twin.git) cd llm-twin chmod +x twin
This CLI serves as the open-source entry point to a comprehensive, event-driven hardware simulation platform. We are actively working on:
- Trace-driven Emulation: Simulating raw PyTorch/vLLM execution traces via discrete-event network models.
- Interconnect Topologies: Modeling NVLink bandwidth degradation and InfiniBand cluster bottlenecks.
- GitHub Action Gatekeeper: Automatically commenting on PRs with infrastructure scaling and cost change.
We are actively shifting from our analytical model to a full-scale hardware simulation environment.
👉 [View our interactive Live Development Roadmap Board][(https://github.com/users/Tejas163/projects/1)] to see what we're building this week, track current sprints, or contribute!