I'm a research engineer working on language-model evaluations, interpretability, and alignment. I build experiments that test what models represent internally, how their behavior changes across contexts, and where current monitoring methods fail.
I'm currently looking for full-time research engineering opportunities.
- llm_coherence — I investigate whether language-model preferences remain stable and transitive under parametric variation. This work supports Incoherent Values? Probing LLM Preferences Through Parametric Variation.
- SPEC-GAP — I am developing white-box probes and trajectory-level measurements for adversarial instructions that propagate through multi-agent, tool-using systems.
- femmestral — I fine-tuned Mistral 7B to address misinformation in women's health.
- self-correction-cot — I studied trajectory robustness under chain-of-thought perturbations.
- mess3-belief-geometry — I investigated the geometric structure of belief states in transformers.
- medical-safety-steering — I applied activation steering to Qwen 2.5 for medical-AI safety.
Website | Google Scholar | LinkedIn | Twitter

