The Next-Generation Framework for Multimodal Model Training with Agents
by MVP Lab.
We HATE over-abstraction.
MVP Engine is a lightweight training engine for multimodal model research. Its core principle is simple: keep stable orchestration in mvp_engine/, and keep experiment-specific model, data, optimizer, scheduler, and training logic in recipes/.
Most training frameworks become heavily abstracted because they need to support every model family, data format, parallel strategy, and training trick through one reusable API surface. That pressure is real, but the result is often a deep stack of config switches, adapters, hooks, and indirection that makes simple experiments hard to read and hard to modify.
MVP Engine resolves this tension with two agent-era interfaces: kits and skills. The core engine stays small and boring: launch, config merge, distributed setup, logging, checkpointing, and the training loop. Stable reusable training capabilities live as callable mvp_engine.kit APIs. Variable model- or recipe-specific workflows live as agent-facing skills/ instructions that explain how to use, extend, or deliberately bypass those APIs.
Kits are intentionally practical: a kit is a small suite of APIs that a user or agent can call from an engine to get real work done. For example, an MLLM recipe can call MLLMDataKit for processor/dataset/packing/collation orchestration, MLLMModelKit for model setup features, TokenNormedLossKit for token-normalized loss, MFUKit for MFU logging, and OptimKit for optimizer/scheduler construction. A skill should not re-teach an agent to hand-write behavior already covered by a kit; it should explain the kit API, its boundaries, and the extension points.
- Engine as the orchestration layer:
mvp_engine/engine/engine.pydefines the baseEngineclass and the train workflow (before_train -> do_train -> after_train). Subclasses implementprepare_*methods and step hooks such astrain_pre_stepandforward_step. - Core-only shared package: common code in
mvp_engine/should stay generic, minimal, and stable. - Kits as callable API suites:
mvp_engine/kit/groups stable capabilities that recipes can call directly from their engines. Kits should have clear boundaries, small public APIs, and concrete extension points. - Hydra configuration:
mvp_engine/launch.pymerges default config with recipe configs and launches the requested workflow (train,evaluate, or custom). - Logging system: metrics are aggregated and dispatched to terminal/file backends; additional backends can be added with minimal changes.
- Skills: reusable agent workflows that explain how to use kits, extend them, or implement model-specific glue when no clean kit API fits.
- Recipe: experiment-specific logic still lives in each recipe, so task-specific formats and model behavior can evolve without adding brittle abstractions to the core engine.
- Keep the core engine minimal and reusable (
mvp_engine/). - Use
mvp_engine.kitAPIs for stable reusable capabilities. - Place task-specific model/data/training logic in
recipes/. - Use a coding AI to execute relevant
skills/when a workflow needs guidance, extension, or recipe-specific glue. - Let the AI assemble or modify recipe code/configs for your target training objective.
mvp_engine— core orchestration logic, Engine base class, reusable kits, logging, distributed helpers, training utilitiesmvp_engine/kit— callable API suites for common training capabilitiesrecipes/— experiment-specific configs and custom engine/model/data definitionsskills/— reusable agent skills used by coding AI to use kits and implement recipe customization patternsoutputs/— run outputs, logs, and checkpoints
uv venv --python=3.12
source .venv/bin/activate
uv sync
# Recipes that use `flash_attention_2` may require a FlashAttention wheel that
# matches the local Python, CUDA, PyTorch, and C++ ABI versions.
# Demo Training Command
torchrun --nproc_per_node=1 -m mvp_engine.launch --config ./recipes/magic_transformer/configs/train.yamluv pip install pre-commit
pre-commit install