Summary
Add first-class LoRA adapter training support to vime.
The initial version should support Megatron actor training + colocated vLLM rollout. The actor trains only LoRA adapter weights, vLLM serves the base model with the current adapter, and checkpoints save adapter weights in a PEFT-compatible format.
This RFC does not cover MLA architecture fields such as q_lora_rank or kv_lora_rank; those are model internals, not PEFT-style LoRA adapters.
Motivation
vime currently supports full-parameter Megatron training and parameter freezing, but it does not provide an adapter training loop. LoRA would reduce optimizer memory, checkpoint size, and weight-sync cost for SFT/RL runs on large models.
Scope
In scope for v1
- Megatron backend only.
- Actor LoRA only.
- Colocated rollout only.
- Dense models first.
- One adapter per run.
- PEFT-compatible adapter save/load.
- vLLM rollout using the latest actor adapter.
Out of scope for v1
- FSDP LoRA.
- MoE expert LoRA.
- Multi-LoRA serving or training.
- PD/disaggregated rollout.
- Distributed non-colocated adapter sync.
- Merging adapters into the base model.
Proposed CLI
--lora-rank 32
--lora-alpha 64
--lora-dropout 0.0
--target-modules all-linear
--exclude-modules lm_head
--lora-adapter-path /path/to/adapter
Validation rules:
--lora-rank > 0 requires --target-modules.
- v1 requires
--train-backend megatron.
- v1 requires
--colocate.
- v1 should require
--megatron-to-hf-mode bridge if the implementation uses Megatron-Bridge PEFT hooks.
- LoRA should be mutually exclusive with
--only-train-params-name-list and --freeze-params-name-list in v1.
Design
Training
Add a LoRA utility module under:
vime/backends/megatron_utils/lora_utils.py
Responsibilities:
- Parse target modules.
- Convert HF module names to Megatron names.
- Detect LoRA-enabled runs.
- Detect adapter parameter names.
- Freeze base parameters and leave only adapter parameters trainable.
The preferred v1 implementation is to use Megatron-Bridge PEFT hooks in the model provider path. This keeps adapter injection and PEFT checkpoint compatibility close to upstream tooling.
Rollout
When LoRA is enabled, vLLM should start with LoRA serving enabled:
--vllm-enable-lora
--vllm-max-lora-rank <rank>
Rollout requests should use a fixed adapter name such as:
For v1, adapter sync can start with a simple file-based reload path. A later optimization can update adapter tensors directly through the colocated weight-sync path.
Checkpointing
Save adapter checkpoints separately from the base model:
<save>/iter_0000100/
lora_adapter/
adapter_config.json
adapter_model.safetensors
vime_lora_metadata.json
--lora-adapter-path should load an existing adapter as the actor initialization.
Implementation Plan
- Add LoRA CLI args and validation.
- Add
lora_utils.py and unit tests for module parsing and parameter detection.
- Inject LoRA adapters in the Megatron model provider.
- Freeze base weights and log trainable parameter counts.
- Save/load PEFT-compatible adapter checkpoints.
- Enable vLLM LoRA rollout.
- Add adapter sync from actor to colocated rollout engines.
- Add a small Qwen2.5/Qwen3 LoRA example and e2e test.
Tests
Required coverage:
- CLI validation.
all-linear and comma-separated target-module parsing.
- Base parameters are frozen and adapter parameters are trainable.
- Adapter checkpoint save/load round trip.
- Short colocated LoRA training run.
- Rollout uses the updated adapter after weight sync.
Risks
- vLLM LoRA runtime update APIs may be version-sensitive.
- Megatron-Bridge PEFT APIs may change.
- Fused QKV naming can differ across model families.
- Adapter versioning must be explicit to avoid stale rollout generations.
Acceptance Criteria
v1 is complete when a user can run a short colocated LoRA training job with:
--lora-rank 8
--target-modules all-linear
--megatron-to-hf-mode bridge
--colocate
--vllm-enable-lora
and verify that:
- only adapter parameters are trainable;
- rollout uses the current adapter;
- checkpoints contain a reloadable PEFT adapter;
- resume preserves adapter weights.
Summary
Add first-class LoRA adapter training support to vime.
The initial version should support Megatron actor training + colocated vLLM rollout. The actor trains only LoRA adapter weights, vLLM serves the base model with the current adapter, and checkpoints save adapter weights in a PEFT-compatible format.
This RFC does not cover MLA architecture fields such as
q_lora_rankorkv_lora_rank; those are model internals, not PEFT-style LoRA adapters.Motivation
vime currently supports full-parameter Megatron training and parameter freezing, but it does not provide an adapter training loop. LoRA would reduce optimizer memory, checkpoint size, and weight-sync cost for SFT/RL runs on large models.
Scope
In scope for v1
Out of scope for v1
Proposed CLI
Validation rules:
--lora-rank > 0requires--target-modules.--train-backend megatron.--colocate.--megatron-to-hf-mode bridgeif the implementation uses Megatron-Bridge PEFT hooks.--only-train-params-name-listand--freeze-params-name-listin v1.Design
Training
Add a LoRA utility module under:
Responsibilities:
The preferred v1 implementation is to use Megatron-Bridge PEFT hooks in the model provider path. This keeps adapter injection and PEFT checkpoint compatibility close to upstream tooling.
Rollout
When LoRA is enabled, vLLM should start with LoRA serving enabled:
Rollout requests should use a fixed adapter name such as:
For v1, adapter sync can start with a simple file-based reload path. A later optimization can update adapter tensors directly through the colocated weight-sync path.
Checkpointing
Save adapter checkpoints separately from the base model:
--lora-adapter-pathshould load an existing adapter as the actor initialization.Implementation Plan
lora_utils.pyand unit tests for module parsing and parameter detection.Tests
Required coverage:
all-linearand comma-separated target-module parsing.Risks
Acceptance Criteria
v1 is complete when a user can run a short colocated LoRA training job with:
and verify that: