Research project focused on compressing multilingual machine translation models for edge-friendly deployment without giving up most of the original quality.
This work explores German, French, and Italian to English translation using LoRA adapters, multilingual knowledge distillation, quantization, and pruning. The project goal was to retain strong translation quality while making the final model substantially smaller and cheaper to run.
- Reduced trainable parameters by about 95% through LoRA-based adaptation.
- Distilled multiple language-specific teacher models into a single multilingual student model.
- Applied quantization and pruning to further optimize the final model for edge deployment.
- Targeted roughly 90% of base-model performance with about 60% model-size reduction.
The project starts from MarianMT and adapts it for additional language pairs using low-rank updates instead of full fine-tuning.
Language-specific teachers are used to train a shared student model so one smaller model can serve multiple translation directions.
The workflow includes ONNX export, task-aware quantization, and optional pruning to improve deployment efficiency on resource-constrained devices.
train_lora.py: trains language-specific LoRA adapterslora_adapter.py: LoRA adapter implementation for MarianMTdistillation.py: multilingual distillation logictrain_student.py: student-model training with knowledge distillationquantization.py: quantization, pruning, and edge optimization workflowevaluate.py: BLEU and inference-oriented evaluation helpersdownload_opus.pyanddownload_opus100.py: data acquisition scriptsExperiments/: exploratory notebooks and earlier experiments
- Download and preprocess multilingual parallel data.
- Train or load LoRA adapters for each language pair.
- Distill those teacher models into a shared student model.
- Quantize and optionally prune the distilled model.
- Evaluate translation quality and model size tradeoffs.
- Python
- PyTorch
- Hugging Face Transformers
- PEFT / LoRA
- MarianMT
- ONNX / model quantization
- sacreBLEU
Most translation systems assume server-scale compute. This project focuses on a more practical constraint: how to make multilingual translation lightweight enough for edge and low-resource environments while keeping the model useful.
This repository is research-oriented and centers on experimentation, model compression, and evaluation rather than production packaging.