mlx-rlhf

An example implementation of RLHF (or, more accurately, RLAIF) built on MLX and HuggingFace.

This example builds on the mlx-examples lora example by adding an RLAIF demo. Much of the code here is adapted, inspired by, or copied directly from HuggingFace's trl library and/or Apples MLX Examples.

This repo supports PEFT with soft-prompts and with LoRA. The example works with Llama and Mistral style models available on Hugging Face, though I have only really tested on Llama style models.

There are two examples here, one for getting a TinyLlama to generate digits that conform to some sequence guidelines (such as increasing even numbers), and one for training a chatbot on your iMessage history (still a work-in-progress).

There is an accompanying PyTorch implementation of everything in this repo, inside of the pytorch_baseline directory.

Running

First, install the dependencies:

pip install -r requirements.txt

The main scripts are sft.py and ppo_training.py. See the sequential_digits.md file for a step-by-step walkthrough on supervised fine-tuning, learning a reward model, and using RL to further tune a model.

The sft.py script (or it's PyTorch equivalent, pytorch_baseline/pytorch_sft.py) runs supervised fine-tuning on a given LLM with data of your choice. You can use soft-prompts or LoRAs for this fine-tuning. When using MLX, the result is an adapter file that will be saved into this directory. When using PyTorch, the result is a saved directory, as one would get from a .save_pretrained() call in the transformers library. This script can be used to do supervised fine-tuning and/or to train a reward model.

The ppo_training.py (or pytorch_baselines/pytorch_ppo_training.py) runs RLHF with a specified LLM and reward model. In the sequential digit example, the reward model is not an LLM, but a ground-truth scoring function (which simplifies learning, removes a variable, and lowers compute requirements). In general, I find the process to be quite unstable and seed-dependent, so just a heads up on that front.

Files in this repo:

`sft.py`

A fine-tuning script. This loads in data from data_utils.py. Pre-trained models are loaded in with MLX-LM, but I don't suggest trying any non-Llama models (I haven't tested them here). LoRAs from MLX-LM and soft-prompts (from me).

The script spits out adapter files, which can be turned into independent/loadable models with the models/fuse.py script.

`mlx_ppo_trainer.py`

This is a gutting/rewriting of the PPO_Trainer from HuggingFace's TRL library. It matches the original quite closely, and there are unused bits of code still hanging around that I hope to come back and use someday (like the use_peft flag).

`ppo_training.py`

This is the launcher script for running RLHF/RLAIF with MLX, using the mlx_ppo_trainer and a provided model.

`talk_to_model.py`

This loads in a pretrained model with MLX-LM and runs it in the terminal for you to live-chat with the model. Good for testing out how well things are working.

To-Do

There are a few areas left open for me (or you!) to patch in:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mlx-rlhf

Running

Files in this repo:

`sft.py`

`mlx_ppo_trainer.py`

`ppo_training.py`

`talk_to_model.py`

To-Do

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
data		data
models		models
pytorch_baseline		pytorch_baseline
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
imessage_bot.md		imessage_bot.md
mlx_ppo_trainer.py		mlx_ppo_trainer.py
mlx_reward.png		mlx_reward.png
ppo_training.py		ppo_training.py
pytorch_reward.png		pytorch_reward.png
requirements.txt		requirements.txt
sequential_digits.md		sequential_digits.md
sft.py		sft.py
talk_to_model.py		talk_to_model.py
utils.py		utils.py

Folders and files

Latest commit

History

Repository files navigation

mlx-rlhf

Running

Files in this repo:

sft.py

mlx_ppo_trainer.py

ppo_training.py

talk_to_model.py

To-Do

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`sft.py`

`mlx_ppo_trainer.py`

`ppo_training.py`

`talk_to_model.py`

Packages