[RFC]: LoRA Support in vime

## Summary

Add first-class LoRA adapter training support to vime.

The initial version should support **Megatron actor training + colocated vLLM rollout**. The actor trains only LoRA adapter weights, vLLM serves the base model with the current adapter, and checkpoints save adapter weights in a PEFT-compatible format.

This RFC does not cover MLA architecture fields such as `q_lora_rank` or `kv_lora_rank`; those are model internals, not PEFT-style LoRA adapters.

## Motivation

vime currently supports full-parameter Megatron training and parameter freezing, but it does not provide an adapter training loop. LoRA would reduce optimizer memory, checkpoint size, and weight-sync cost for SFT/RL runs on large models.

## Scope

### In scope for v1

- Megatron backend only.
- Actor LoRA only.
- Colocated rollout only.
- Dense models first.
- One adapter per run.
- PEFT-compatible adapter save/load.
- vLLM rollout using the latest actor adapter.

### Out of scope for v1

- FSDP LoRA.
- MoE expert LoRA.
- Multi-LoRA serving or training.
- PD/disaggregated rollout.
- Distributed non-colocated adapter sync.
- Merging adapters into the base model.

## Proposed CLI

```bash
--lora-rank 32
--lora-alpha 64
--lora-dropout 0.0
--target-modules all-linear
--exclude-modules lm_head
--lora-adapter-path /path/to/adapter
```

Validation rules:

- `--lora-rank > 0` requires `--target-modules`.
- v1 requires `--train-backend megatron`.
- v1 requires `--colocate`.
- v1 should require `--megatron-to-hf-mode bridge` if the implementation uses Megatron-Bridge PEFT hooks.
- LoRA should be mutually exclusive with `--only-train-params-name-list` and `--freeze-params-name-list` in v1.

## Design

### Training

Add a LoRA utility module under:

```text
vime/backends/megatron_utils/lora_utils.py
```

Responsibilities:

- Parse target modules.
- Convert HF module names to Megatron names.
- Detect LoRA-enabled runs.
- Detect adapter parameter names.
- Freeze base parameters and leave only adapter parameters trainable.

The preferred v1 implementation is to use Megatron-Bridge PEFT hooks in the model provider path. This keeps adapter injection and PEFT checkpoint compatibility close to upstream tooling.

### Rollout

When LoRA is enabled, vLLM should start with LoRA serving enabled:

```bash
--vllm-enable-lora
--vllm-max-lora-rank <rank>
```

Rollout requests should use a fixed adapter name such as:

```text
vime_lora
```

For v1, adapter sync can start with a simple file-based reload path. A later optimization can update adapter tensors directly through the colocated weight-sync path.

### Checkpointing

Save adapter checkpoints separately from the base model:

```text
<save>/iter_0000100/
  lora_adapter/
    adapter_config.json
    adapter_model.safetensors
  vime_lora_metadata.json
```

`--lora-adapter-path` should load an existing adapter as the actor initialization.

## Implementation Plan

1. Add LoRA CLI args and validation.
2. Add `lora_utils.py` and unit tests for module parsing and parameter detection.
3. Inject LoRA adapters in the Megatron model provider.
4. Freeze base weights and log trainable parameter counts.
5. Save/load PEFT-compatible adapter checkpoints.
6. Enable vLLM LoRA rollout.
7. Add adapter sync from actor to colocated rollout engines.
8. Add a small Qwen2.5/Qwen3 LoRA example and e2e test.

## Tests

Required coverage:

- CLI validation.
- `all-linear` and comma-separated target-module parsing.
- Base parameters are frozen and adapter parameters are trainable.
- Adapter checkpoint save/load round trip.
- Short colocated LoRA training run.
- Rollout uses the updated adapter after weight sync.

## Risks

- vLLM LoRA runtime update APIs may be version-sensitive.
- Megatron-Bridge PEFT APIs may change.
- Fused QKV naming can differ across model families.
- Adapter versioning must be explicit to avoid stale rollout generations.

## Acceptance Criteria

v1 is complete when a user can run a short colocated LoRA training job with:

```bash
--lora-rank 8
--target-modules all-linear
--megatron-to-hf-mode bridge
--colocate
--vllm-enable-lora
```

and verify that:

- only adapter parameters are trainable;
- rollout uses the current adapter;
- checkpoints contain a reloadable PEFT adapter;
- resume preserves adapter weights.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC]: LoRA Support in vime #206

Summary

Motivation

Scope

In scope for v1

Out of scope for v1

Proposed CLI

Design

Training

Rollout

Checkpointing

Implementation Plan

Tests

Risks

Acceptance Criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[RFC]: LoRA Support in vime #206

Description

Summary

Motivation

Scope

In scope for v1

Out of scope for v1

Proposed CLI

Design

Training

Rollout

Checkpointing

Implementation Plan

Tests

Risks

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions