🔗 Bridging Modal Isolation in Interleaved Thinking: Supervising Modality Transitions via Stepwise Reinforcement
reinforcement-learning ball-tracking sokoban spatial-reasoning visual-reasoning maze-solving chain-of-thought flow-matching multimodal-reasoning visual-generation unified-multimodal-models interleaved-thinking vlm-as-judge flow-grpo
-
Updated
Jun 10, 2026 - Python