LLM Post-Training — Hindi Instruction Tuning

Post-training Qwen/Qwen2.5-1.5B for Hindi instruction-following using QLoRA → SFT → DPO.

Goal

The base Qwen2.5-1.5B model produces poor Hindi output — repetitive, off-topic, or transliterated responses. This project fine-tunes it on a curated Hindi instruction dataset using parameter-efficient QLoRA, then applies DPO to align outputs with human preferences.

Target model on HuggingFace Hub: psaesha/llm-post-trained-qwen2.5-1.5B

Repo Structure

.
├── configs/
│   └── sft.yaml          # SFT hyperparameters and LoRA config
├── notebooks/
│   └── colab_runner.ipynb # end-to-end training notebook (Colab)
├── pyproject.toml         # dependencies (managed with uv)
├── .env.example           # required environment variables
└── .pre-commit-config.yaml

Setup

Prerequisites: Python 3.12, uv

git clone https://github.com/psaesha/LLM-Post-Training.git
cd LLM-Post-Training
uv sync
cp .env.example .env  # fill in the credentials

For Colab runs, open notebooks/colab_runner.ipynb — it mounts Drive, clones the repo, and calls uv sync automatically.

Training Config (SFT)

Parameter	Value
Base model	`Qwen/Qwen2.5-1.5B`
Quantization	4-bit NF4 (bfloat16 compute)
LoRA rank / alpha	16 / 32
LoRA target modules	q/k/v/o proj + gate/up/down proj
Trainable params	~1.2% (~18M / 1.56B)
Dataset	`ai4bharat/indic-align` (Hindi)
Train / val size	50,000 / 500
Max sequence length	1,024
Epochs	1
Effective batch size	16 (4 × 4 grad accumulation)
Learning rate	2e-4 (cosine, 3% warmup)
Optimizer	`paged_adamw_8bit`
Precision	bf16 + gradient checkpointing
Logging	WandB

Full config: configs/sft.yaml

Evaluation

IndicGenBench — automated benchmark for Indic language generation
Qualitative — hand-graded prompt set covering factual Q&A, creative writing, code-mixed Hindi, and instruction following

License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
configs		configs
notebooks		notebooks
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Post-Training — Hindi Instruction Tuning

Goal

Repo Structure

Setup

Training Config (SFT)

Evaluation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LLM Post-Training — Hindi Instruction Tuning

Goal

Repo Structure

Setup

Training Config (SFT)

Evaluation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages