I'm an MS AI student at Northeastern University (Silicon Valley), trying to understand how generative models learn the structure of language, video, and the physical world. I started out in mechanical engineering, so I care less about chasing leaderboards than about the math underneath. Calculus and Bayesian probability are the parts that feel right to me, and they happen to be the spine of diffusion and flow-based modeling.
Most of what I build doesn't work the first time. I've had research bets collapse (entropy-weighted masking, SIGReg representation lines), and the rigor of chasing them down is the real asset. So I write up the negative results too, and I'm still hunting for the problem that feels inevitable and mine.
- π¬ Working on diffusion / flow-matching language models, video world models, and representation collapse in self-supervised learning.
- π οΈ Currently an AI/GenAI Engineering Co-op at NovasIQ, building deterministic agentic systems on Claude + MCP.
- π Reinforcement Learning teaching assistant at Khoury College (DQN, PPO, policy gradients).
- π± Long game: a startup in robotics / physical intelligence, or an AI layer over the legacy software that bottlenecks real industries.
- π Full portfolio, projects & writing β brutalcaeser.github.io
| π― reinforcing_dLLMs | Reinforcement learning (diffu-GRPO) for reasoning in diffusion LLMs. I validated the one-step log-prob estimator the method hinges on, then showed RL lifts held-out Countdown ~4 pp on a single GPU (a faithful run is ~24 GPU-days). |
| π§© block-diffusion-pareto | Mapped the full qualityβthroughput frontier for block-diffusion LMs and found generation throughput peaks at block size 32 β the unpublished value a leading commercial model reportedly runs. |
| π phantom-gradients | When a model's useful features live in far fewer dimensions than its embedding, training fights noise in the empty ones. A coherence-guided sampler recovers the structure without knowing the true dimension, beating even an oracle that does. |
| π Flow-Language-Model | Reproducing and extending Flow-Map language models (one-step text generation), where I found a quality curve the original paper missed. |
| π€ physical_ai | A 16-week sim-to-real track: Isaac Lab β GR00T β SO-ARM101. |
- From Noise to Shakespeare β building a diffusion language model from scratch
- From K-Means to Gaussian Mixtures β the math, the intuition, and the EM algorithm
- Lagrangians, KKT & SVMs β a mathematical journey to the margin
| Diffusion_Robot_Control_Policy β |
microDLM β 1 From-scratch discrete diffusion language model on Tiny Shakespeare β 5 changes from GPT |
| spatial-jepa-sigreg GAP 1: Distributional Regularization Meets Spatial Structure β SIGReg Γ Patch-Level JEPA Representations (Nβ¦ |
matrix-game-hpc Matrix-Game-2.0 deployment on Northeastern Explorer HPC β docs, logs, and scripts |
| storyverse Turn any children's storybook into an animated film β FLUX illustrations, Wan 2.2 animation, Edge TTS narraβ¦ |
minigenie Flow matching video world model for Procgen games β built from scratch in PyTorch |
+ JAX/Flax Β· CUDA Β· Hugging Face Β· Claude + MCP Β· ONNX
Away from the screen: chess, tennis, and a slow walk through a philosophy reading list. Right now it's To Kill a Mockingbird, with Russell's The Problems of Philosophy on deck. Borrowed beliefs bore me.
π The "things I've built" list and the snake both refresh themselves on a schedule, so this page is never quite stale.