Upgrading transformers version to 5.5.4#191
Conversation
There was a problem hiding this comment.
Pull request overview
This PR upgrades the project’s Hugging Face / PyTorch stack to transformers==5.5.4 (and required dependency bumps) to enable newer model support and improved MoE efficiency, and it adds a regression check around UnlearnTrainer.prediction_step behavior under the new transformers loss-normalization semantics.
Changes:
- Bump core ML dependencies (transformers / torch / accelerate / bitsandbytes / huggingface-hub) and add
flash-linear-attention. - Update unlearning trainer
prediction_stepto passnum_items_in_batchthrough to the baseTrainer.compute_lossfor transformers 5.x behavior. - Add a
prediction_stepregression script and update README installation guidance (flash-attn + Docker image link).
Reviewed changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
requirements.txt |
Pins new dependency versions for the transformers 5.5.4 upgrade. |
src/trainer/unlearn/base.py |
Updates prediction_step to call base compute_loss with num_items_in_batch (transformers 5.x normalization). |
src/data/utils.py |
Adapts apply_chat_template(..., tokenize=True) handling for transformers 5.x returning BatchEncoding. |
tests/prediction_step_regression.py |
Adds a regression check script for prediction_step loss/logits/labels behavior. |
README.md |
Updates flash-attn guidance and adds a link to a prebuilt Docker image. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Or to avoid building flash-attn: | ||
| pip install "https://github.com/mjun0812/flash-attention-prebuild-wheels/releases/download/v0.7.16/flash_attn-2.8.3+cu128torch2.9-cp311-cp311-linux_x86_64.whl" |
There was a problem hiding this comment.
Ideally, we could remove flash-attn as proposed in #190
| # python setup_data.py --help | ||
| ``` | ||
|
|
||
| We also provide a [Docker image](https://hub.docker.com/r/filyp/open-unlearning), with this environment already installed. |
There was a problem hiding this comment.
I'd rather not specify it, in case someone forgets to update readme. The hub page already provides that recommended tag to pull.
|
What do you think @molereddy @Dornavineeth ? On a more general note, I have a grant to work on LLM unlearning, and I could spend some portion of it helping to develop this open-unlearning repo. I know you have limited time to review contributions, so let me know what would be the most useful for me to do. |
What does this PR do?
transformers==5.5.4. (I extensively used this setup in my unlearning experiments, without any issues.)prediction_step, to make future upgrades easier and catch issues.Before submitting
Tests
I used the newly added
prediction_stepregression test to verify validity; I ranmake quality. I also runpython src/train.py --config-name=unlearn.yaml experiment=unlearn/tofu/default eval=tofu_simple question_key=paraphrased_question eval.tofu.batch_size=16 trainer.args.report_to=wandb trainer=NPO task_name=transformers5.5.4to regression test with runs in #175. Like before, the reported loss changes (due to a different scaling convention in new version), but the actual unlearning trajectory stays almost the same.One thing worth flagging: in the newer transformers versions (not only this one), installing flash-attn is more messy. Simple pip install will trigger a long build process (which also can fail if local CUDA version mismatches); there are prebuild wheels, but only in 3rd party repos (I documented it in readme). I think the cleanest solution would be to actually remove flash-attn, given that with typical unlearning datasets it actually slows training down, as discussed in #190