Skip to content

Upgrading transformers version to 5.5.4#191

Open
filyp wants to merge 5 commits into
locuslab:mainfrom
filyp:pr-transformers-5.5.3
Open

Upgrading transformers version to 5.5.4#191
filyp wants to merge 5 commits into
locuslab:mainfrom
filyp:pr-transformers-5.5.3

Conversation

@filyp

@filyp filyp commented May 13, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

  • It upgrades transformers to 5.5.4, to support newer models and features. Especially running MoE models much more efficiently (added in transformers 5).
  • Also bumps other dependencies to versions required by transformers==5.5.4. (I extensively used this setup in my unlearning experiments, without any issues.)
  • Adds a regression test for prediction_step, to make future upgrades easier and catch issues.
  • Adds a dockerfile and a readme link to a prebuilt docker image with full environment (makes it easier to start using open-unlearning, especially on cloud GPUs that require specifying an image with the environment; helps reproducibility).

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Have you gone through the contributions guide?
  • Are your changes documented? Read documentation guidelines here.

Tests

I used the newly added prediction_step regression test to verify validity; I ran make quality. I also run python src/train.py --config-name=unlearn.yaml experiment=unlearn/tofu/default eval=tofu_simple question_key=paraphrased_question eval.tofu.batch_size=16 trainer.args.report_to=wandb trainer=NPO task_name=transformers5.5.4 to regression test with runs in #175. Like before, the reported loss changes (due to a different scaling convention in new version), but the actual unlearning trajectory stays almost the same.

image

One thing worth flagging: in the newer transformers versions (not only this one), installing flash-attn is more messy. Simple pip install will trigger a long build process (which also can fail if local CUDA version mismatches); there are prebuild wheels, but only in 3rd party repos (I documented it in readme). I think the cleanest solution would be to actually remove flash-attn, given that with typical unlearning datasets it actually slows training down, as discussed in #190

Copilot AI review requested due to automatic review settings May 13, 2026 11:42

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR upgrades the project’s Hugging Face / PyTorch stack to transformers==5.5.4 (and required dependency bumps) to enable newer model support and improved MoE efficiency, and it adds a regression check around UnlearnTrainer.prediction_step behavior under the new transformers loss-normalization semantics.

Changes:

  • Bump core ML dependencies (transformers / torch / accelerate / bitsandbytes / huggingface-hub) and add flash-linear-attention.
  • Update unlearning trainer prediction_step to pass num_items_in_batch through to the base Trainer.compute_loss for transformers 5.x behavior.
  • Add a prediction_step regression script and update README installation guidance (flash-attn + Docker image link).

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
requirements.txt Pins new dependency versions for the transformers 5.5.4 upgrade.
src/trainer/unlearn/base.py Updates prediction_step to call base compute_loss with num_items_in_batch (transformers 5.x normalization).
src/data/utils.py Adapts apply_chat_template(..., tokenize=True) handling for transformers 5.x returning BatchEncoding.
tests/prediction_step_regression.py Adds a regression check script for prediction_step loss/logits/labels behavior.
README.md Updates flash-attn guidance and adds a link to a prebuilt Docker image.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/prediction_step_regression.py Outdated
Comment thread tests/prediction_step_regression.py Outdated
Comment thread tests/prediction_step_regression.py
Comment thread tests/prediction_step_regression.py
Comment thread src/trainer/unlearn/base.py
Comment thread src/trainer/unlearn/base.py
Comment thread README.md
Comment on lines +119 to +120
# Or to avoid building flash-attn:
pip install "https://github.com/mjun0812/flash-attention-prebuild-wheels/releases/download/v0.7.16/flash_attn-2.8.3+cu128torch2.9-cp311-cp311-linux_x86_64.whl"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, we could remove flash-attn as proposed in #190

Comment thread README.md
# python setup_data.py --help
```

We also provide a [Docker image](https://hub.docker.com/r/filyp/open-unlearning), with this environment already installed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not specify it, in case someone forgets to update readme. The hub page already provides that recommended tag to pull.

@filyp

filyp commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

What do you think @molereddy @Dornavineeth ?

On a more general note, I have a grant to work on LLM unlearning, and I could spend some portion of it helping to develop this open-unlearning repo.
I'm working in a fork where I have many things I could merge, like evaluation under relearning attacks, hyperparameter tuning with hydra, more methods, models and datasets. I also plan to build some leaderboard, and keep developing the models and datasets.

I know you have limited time to review contributions, so let me know what would be the most useful for me to do.

@filyp filyp deployed to tests June 8, 2026 14:30 — with GitHub Actions Active
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants