Skip to content

[RFC]: LoRA Support in vime #206

Description

@princepride

Summary

Add first-class LoRA adapter training support to vime.

The initial version should support Megatron actor training + colocated vLLM rollout. The actor trains only LoRA adapter weights, vLLM serves the base model with the current adapter, and checkpoints save adapter weights in a PEFT-compatible format.

This RFC does not cover MLA architecture fields such as q_lora_rank or kv_lora_rank; those are model internals, not PEFT-style LoRA adapters.

Motivation

vime currently supports full-parameter Megatron training and parameter freezing, but it does not provide an adapter training loop. LoRA would reduce optimizer memory, checkpoint size, and weight-sync cost for SFT/RL runs on large models.

Scope

In scope for v1

  • Megatron backend only.
  • Actor LoRA only.
  • Colocated rollout only.
  • Dense models first.
  • One adapter per run.
  • PEFT-compatible adapter save/load.
  • vLLM rollout using the latest actor adapter.

Out of scope for v1

  • FSDP LoRA.
  • MoE expert LoRA.
  • Multi-LoRA serving or training.
  • PD/disaggregated rollout.
  • Distributed non-colocated adapter sync.
  • Merging adapters into the base model.

Proposed CLI

--lora-rank 32
--lora-alpha 64
--lora-dropout 0.0
--target-modules all-linear
--exclude-modules lm_head
--lora-adapter-path /path/to/adapter

Validation rules:

  • --lora-rank > 0 requires --target-modules.
  • v1 requires --train-backend megatron.
  • v1 requires --colocate.
  • v1 should require --megatron-to-hf-mode bridge if the implementation uses Megatron-Bridge PEFT hooks.
  • LoRA should be mutually exclusive with --only-train-params-name-list and --freeze-params-name-list in v1.

Design

Training

Add a LoRA utility module under:

vime/backends/megatron_utils/lora_utils.py

Responsibilities:

  • Parse target modules.
  • Convert HF module names to Megatron names.
  • Detect LoRA-enabled runs.
  • Detect adapter parameter names.
  • Freeze base parameters and leave only adapter parameters trainable.

The preferred v1 implementation is to use Megatron-Bridge PEFT hooks in the model provider path. This keeps adapter injection and PEFT checkpoint compatibility close to upstream tooling.

Rollout

When LoRA is enabled, vLLM should start with LoRA serving enabled:

--vllm-enable-lora
--vllm-max-lora-rank <rank>

Rollout requests should use a fixed adapter name such as:

vime_lora

For v1, adapter sync can start with a simple file-based reload path. A later optimization can update adapter tensors directly through the colocated weight-sync path.

Checkpointing

Save adapter checkpoints separately from the base model:

<save>/iter_0000100/
  lora_adapter/
    adapter_config.json
    adapter_model.safetensors
  vime_lora_metadata.json

--lora-adapter-path should load an existing adapter as the actor initialization.

Implementation Plan

  1. Add LoRA CLI args and validation.
  2. Add lora_utils.py and unit tests for module parsing and parameter detection.
  3. Inject LoRA adapters in the Megatron model provider.
  4. Freeze base weights and log trainable parameter counts.
  5. Save/load PEFT-compatible adapter checkpoints.
  6. Enable vLLM LoRA rollout.
  7. Add adapter sync from actor to colocated rollout engines.
  8. Add a small Qwen2.5/Qwen3 LoRA example and e2e test.

Tests

Required coverage:

  • CLI validation.
  • all-linear and comma-separated target-module parsing.
  • Base parameters are frozen and adapter parameters are trainable.
  • Adapter checkpoint save/load round trip.
  • Short colocated LoRA training run.
  • Rollout uses the updated adapter after weight sync.

Risks

  • vLLM LoRA runtime update APIs may be version-sensitive.
  • Megatron-Bridge PEFT APIs may change.
  • Fused QKV naming can differ across model families.
  • Adapter versioning must be explicit to avoid stale rollout generations.

Acceptance Criteria

v1 is complete when a user can run a short colocated LoRA training job with:

--lora-rank 8
--target-modules all-linear
--megatron-to-hf-mode bridge
--colocate
--vllm-enable-lora

and verify that:

  • only adapter parameters are trainable;
  • rollout uses the current adapter;
  • checkpoints contain a reloadable PEFT adapter;
  • resume preserves adapter weights.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions