-
Notifications
You must be signed in to change notification settings - Fork 162
Upgrading transformers version to 5.5.4 #191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
filyp
wants to merge
5
commits into
locuslab:main
Choose a base branch
from
filyp:pr-transformers-5.5.3
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
949d345
Bump transformers to 5.5.3; add prediction_step regression test
filyp e57e69c
format base.py with ruff
filyp 6fe071e
(revert accidental custom utils commit)
filyp 8031d92
(fixes after automated review)
filyp c3301d8
use smaller, non-gated model for tests; include dockerfile
filyp File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,28 @@ | ||
| FROM nvidia/cuda:12.8.0-devel-ubuntu22.04 | ||
|
|
||
| # Install Python 3.11 from deadsnakes (Ubuntu 22.04 ships 3.11.0rc1, which has | ||
| # inspect bugs that break triton's JIT source parsing). | ||
| # Bootstrap pip via get-pip.py so we get upstream pip/setuptools (Ubuntu's | ||
| # python3-pip carries Debian patches that break pyproject builds). | ||
| ENV DEBIAN_FRONTEND=noninteractive | ||
| RUN apt-get update && apt-get install -y software-properties-common curl ca-certificates \ | ||
| && add-apt-repository ppa:deadsnakes/ppa -y \ | ||
| && apt-get update && apt-get install -y \ | ||
| python3.11 python3.11-dev python3.11-distutils git \ | ||
| && rm -rf /var/lib/apt/lists/* \ | ||
| && ln -sf /usr/bin/python3.11 /usr/bin/python \ | ||
| && ln -sf /usr/bin/python3.11 /usr/bin/python3 \ | ||
| && curl -sS https://bootstrap.pypa.io/get-pip.py | python3.11 | ||
|
|
||
| # Install dependencies from requirements | ||
| COPY requirements.txt /tmp/requirements.txt | ||
| RUN pip install --no-cache-dir -r /tmp/requirements.txt | ||
|
|
||
| # Install lm-eval and hf_transfer | ||
| RUN pip install --no-cache-dir lm-eval==0.4.11 | ||
|
|
||
| # Install flash-attn (pre-built wheel for cu128 + torch2.9, avoids slow compilation) | ||
| RUN pip install --no-cache-dir \ | ||
| "https://github.com/mjun0812/flash-attention-prebuild-wheels/releases/download/v0.7.16/flash_attn-2.8.3+cu128torch2.9-cp311-cp311-linux_x86_64.whl" | ||
|
|
||
| WORKDIR /workspace |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -115,7 +115,9 @@ We provide several variants for each of the components in the unlearning pipelin | |
| conda create -n unlearning python=3.11 | ||
| conda activate unlearning | ||
| pip install ".[lm-eval]" | ||
| pip install --no-build-isolation flash-attn==2.6.3 | ||
| pip install --no-build-isolation flash-attn==2.8.3 | ||
| # Or to avoid building flash-attn: | ||
| pip install "https://github.com/mjun0812/flash-attention-prebuild-wheels/releases/download/v0.7.16/flash_attn-2.8.3+cu128torch2.9-cp311-cp311-linux_x86_64.whl" | ||
|
|
||
| # Data setup | ||
| python setup_data.py --eval # saves/eval now contains evaluation results of the uploaded models | ||
|
|
@@ -125,6 +127,8 @@ python setup_data.py --eval # saves/eval now contains evaluation results of the | |
| # python setup_data.py --help | ||
| ``` | ||
|
|
||
| We also provide a [Docker image](https://hub.docker.com/r/filyp/open-unlearning), with this environment already installed. | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd rather not specify it, in case someone forgets to update readme. The hub page already provides that recommended tag to pull. |
||
|
|
||
| --- | ||
|
|
||
| ### 🔄 Updated TOFU benchmark | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| model_args: | ||
| pretrained_model_name_or_path: "Qwen/Qwen2.5-0.5B-Instruct" | ||
| attn_implementation: 'flash_attention_2' | ||
| torch_dtype: bfloat16 | ||
| tokenizer_args: | ||
| pretrained_model_name_or_path: "Qwen/Qwen2.5-0.5B-Instruct" | ||
| template_args: | ||
| apply_chat_template: true | ||
| system_prompt: "You are a helpful assistant." | ||
| system_prompt_with_special_tokens: "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n" | ||
| user_start_tag: "<|im_start|>user\n" | ||
| user_end_tag: "<|im_end|>\n" | ||
| asst_start_tag: "<|im_start|>assistant\n" | ||
| asst_end_tag: "<|im_end|>\n" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,16 +1,18 @@ | ||
| huggingface-hub==0.36.0 | ||
| transformers==4.51.3 | ||
| hf-xet==1.2.0 | ||
| huggingface-hub==1.7.2 | ||
| transformers==5.5.4 | ||
| hf-xet==1.4.2 | ||
| numpy==2.2.3 | ||
| hydra-core==1.3 | ||
| hydra_colorlog==1.2.0 | ||
| torch==2.4.1 | ||
| torch==2.9.1 | ||
| datasets==3.0.1 | ||
| accelerate==0.34.2 | ||
| bitsandbytes==0.44.1 | ||
| accelerate==1.13.0 | ||
| bitsandbytes==0.49.2 | ||
| rouge-score==0.1.2 | ||
| scipy==1.14.1 | ||
| tensorboard==2.18.0 | ||
| scikit-learn==1.5.2 | ||
| deepspeed==0.15.4 | ||
| wandb==0.21.4 | ||
| # for Qwen3.5 speedup: | ||
| flash-linear-attention==0.4.2 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,120 @@ | ||
| """Regression check for UnlearnTrainer.prediction_step. | ||
|
|
||
| Run from repo root: | ||
| python tests/prediction_step_regression.py | ||
| """ | ||
|
|
||
| import sys | ||
| from pathlib import Path | ||
|
|
||
| import torch | ||
| from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments | ||
|
|
||
| sys.path.insert(0, str(Path(__file__).resolve().parents[1] / "src")) | ||
| from trainer.unlearn.npo import NPO # noqa: E402 | ||
|
|
||
|
|
||
| MODEL_NAME = "Qwen/Qwen2.5-0.5B-Instruct" | ||
| SEED = 0 | ||
|
|
||
|
|
||
| def main(): | ||
| torch.manual_seed(SEED) | ||
|
|
||
| tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME) | ||
| if tokenizer.pad_token_id is None: | ||
| tokenizer.pad_token = tokenizer.eos_token | ||
|
|
||
| model = AutoModelForCausalLM.from_pretrained( | ||
| MODEL_NAME, torch_dtype=torch.float32, attn_implementation="sdpa" | ||
| ) | ||
| model.eval() | ||
|
|
||
| args = TrainingArguments( | ||
| per_device_eval_batch_size=2, | ||
| report_to=[], | ||
| ) | ||
|
filyp marked this conversation as resolved.
|
||
|
|
||
| # Use NPO so we exercise a trainer with an overridden compute_loss. | ||
| # prediction_step is expected to bypass it and use the base causal-LM loss. | ||
| trainer = NPO( | ||
| model=model, | ||
| args=args, | ||
| processing_class=tokenizer, | ||
| ) | ||
|
|
||
| text = "The capital of France is Paris." | ||
| enc = tokenizer([text, text], return_tensors="pt", padding=True) | ||
| inputs = { | ||
| "input_ids": enc["input_ids"], | ||
| "attention_mask": enc["attention_mask"], | ||
| "labels": enc["input_ids"].clone(), | ||
| } | ||
|
|
||
| loss, logits, labels = trainer.prediction_step( | ||
| model, inputs, prediction_loss_only=False | ||
| ) | ||
|
|
||
| print("=== prediction_step output ===") | ||
| print(f"loss: {loss}") | ||
| print(f"logits shape: {tuple(logits.shape) if logits is not None else None}") | ||
| if logits is not None: | ||
| print(f"logits[0, 0, :8]: {logits[0, 0, :8].tolist()}") | ||
| print(f"logits sum: {logits.sum().item():.6f}") | ||
| print(f"labels shape: {tuple(labels.shape) if labels is not None else None}") | ||
|
|
||
| # Baseline captured on upstream (transformers pre-5.x). | ||
| expected_logits_shape = (2, 7, 151936) | ||
| expected_logits_head = [ | ||
| 1.8752843141555786, | ||
| 0.16622018814086914, | ||
| -1.0266990661621094, | ||
| 0.3476898670196533, | ||
| 1.5837609767913818, | ||
| -4.199005126953125, | ||
| -1.6311265230178833, | ||
| 2.0707736015319824, | ||
| ] | ||
| expected_labels_shape = (2, 7) | ||
|
|
||
| # Loss baseline captured on transformers 5.5.4 | ||
| expected_loss = 2.42057466506958 | ||
|
|
||
| assert abs(loss.item() - expected_loss) < 1e-3, (loss.item(), expected_loss) | ||
| assert tuple(logits.shape) == expected_logits_shape | ||
| assert tuple(labels.shape) == expected_labels_shape | ||
| head = logits[0, 0, :8].tolist() | ||
| for got, exp in zip(head, expected_logits_head): | ||
| assert abs(got - exp) < 1e-3, (got, exp) | ||
|
filyp marked this conversation as resolved.
|
||
|
|
||
| # Second test: verify prediction_step's loss matches the standard causal-LM | ||
| # loss computed directly from the model. This confirms prediction_step is | ||
| # bypassing NPO's overridden compute_loss and using the base loss. | ||
| device = next(model.parameters()).device | ||
| with torch.no_grad(): | ||
| out = model( | ||
| input_ids=inputs["input_ids"].to(device), | ||
| attention_mask=inputs["attention_mask"].to(device), | ||
| ) | ||
| shift_logits = out.logits[:, :-1, :].contiguous() | ||
| shift_labels = inputs["labels"][:, 1:].to(device).contiguous() | ||
| # transformers 5.x normalizes by num_items_in_batch (count of non-ignored | ||
| # labels in the full inputs), not by the number of shifted positions. | ||
| num_items = (inputs["labels"].to(device) != -100).sum() | ||
| ce_sum = torch.nn.functional.cross_entropy( | ||
| shift_logits.view(-1, shift_logits.size(-1)), | ||
| shift_labels.view(-1), | ||
| ignore_index=-100, | ||
| reduction="sum", | ||
| ) | ||
| manual_loss = ce_sum / num_items | ||
| print(f"manual base loss: {manual_loss.item()}") | ||
| print(f"prediction_step loss: {loss.item()}") | ||
| assert abs(loss.item() - manual_loss.item()) < 1e-4, ( | ||
| loss.item(), manual_loss.item() | ||
| ) | ||
| print("OK") | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| main() | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, we could remove flash-attn as proposed in #190