Yashvardhan Gupta BrutalCaeser

👋 Hey, I'm Yash

I'm an MS AI student at Northeastern University (Silicon Valley), trying to understand how generative models learn the structure of language, video, and the physical world. I started out in mechanical engineering, so I care less about chasing leaderboards than about the math underneath. Calculus and Bayesian probability are the parts that feel right to me, and they happen to be the spine of diffusion and flow-based modeling.

Most of what I build doesn't work the first time. I've had research bets collapse (entropy-weighted masking, SIGReg representation lines), and the rigor of chasing them down is the real asset. So I write up the negative results too, and I'm still hunting for the problem that feels inevitable and mine.

🔬 Working on diffusion / flow-matching language models, video world models, and representation collapse in self-supervised learning.
🛠️ Currently an AI/GenAI Engineering Co-op at NovasIQ, building deterministic agentic systems on Claude + MCP.
🎓 Reinforcement Learning teaching assistant at Khoury College (DQN, PPO, policy gradients).
🌱 Long game: a startup in robotics / physical intelligence, or an AI layer over the legacy software that bottlenecks real industries.
🔗 Full portfolio, projects & writing → brutalcaeser.github.io

🧭 What I'm working on now


🎯 reinforcing_dLLMs	Reinforcement learning (diffu-GRPO) for reasoning in diffusion LLMs. I validated the one-step log-prob estimator the method hinges on, then showed RL lifts held-out Countdown ~4 pp on a single GPU (a faithful run is ~24 GPU-days).
🧩 block-diffusion-pareto	Mapped the full quality↔throughput frontier for block-diffusion LMs and found generation throughput peaks at block size 32 — the unpublished value a leading commercial model reportedly runs.
🌀 phantom-gradients	When a model's useful features live in far fewer dimensions than its embedding, training fights noise in the empty ones. A coherence-guided sampler recovers the structure without knowing the true dimension, beating even an oracle that does.
🌊 Flow-Language-Model	Reproducing and extending Flow-Map language models (one-step text generation), where I found a quality curve the original paper missed.
🤖 physical_ai	A 16-week sim-to-real track: Isaac Lab → GR00T → SO-ARM101.

✍️ Recent writing

From Noise to Shakespeare — building a diffusion language model from scratch
From K-Means to Gaussian Mixtures — the math, the intuition, and the EM algorithm
Lagrangians, KKT & SVMs — a mathematical journey to the margin

📌 A few things I've built

Diffusion_Robot_Control_Policy _—	microDLM ⭐ 1 _{From-scratch discrete diffusion language model on Tiny Shakespeare — 5 changes from GPT}
spatial-jepa-sigreg _{GAP 1: Distributional Regularization Meets Spatial Structure — SIGReg × Patch-Level JEPA Representations (N…}	matrix-game-hpc _{Matrix-Game-2.0 deployment on Northeastern Explorer HPC — docs, logs, and scripts}
storyverse _{Turn any children's storybook into an animated film — FLUX illustrations, Wan 2.2 animation, Edge TTS narra…}	minigenie _{Flow matching video world model for Procgen games — built from scratch in PyTorch}

🧰 Tools I reach for

_{+ JAX/Flax · CUDA · Hugging Face · Claude + MCP · ONNX}

Most-used languages

Away from the screen: chess, tennis, and a slow walk through a philosophy reading list. Right now it's To Kill a Mockingbird, with Russell's The Problems of Philosophy on deck. Borrowed beliefs bore me.

_{📊 The "things I've built" list and the snake both refresh themselves on a schedule, so this page is never quite stale.}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Yashvardhan Gupta BrutalCaeser

Highlights

Block or report BrutalCaeser

👋 Hey, I'm Yash

🧭 What I'm working on now

✍️ Recent writing

📌 A few things I've built

🧰 Tools I reach for

Pinned Loading

Uh oh!