🚨 [ALM] Add base model without head by eustlb · Pull Request #45534 · huggingface/transformers

eustlb · 2026-04-20T15:09:17Z

What does this PR do?

This is motivated to simplify changes required for vLLM compatibility as seen in vLLM PR #39330

Fix a discrepancy in ALMs design compared to VLMs that for most of them don't have a base model class because the use the causal model for the text model directly, while Llava style (which is indeed more aligned with the lib philosophy) is rather:
XxModel: encoder + projector + language base model backbone
XxForConditionalGeneration: XxModel + lm_head

why do we actually have this discrepancy?

qwen2_audio was the first ALM in transformers, modelling reflects the status of llava when it was added
all the ALMs merged since then replicated this.

llava and VLMs evolved thanks to

🚨 Breaking changes

We don't fully ensure BC, this is breaking:

AutoModel.from_pretrained("Qwen/Qwen2-Audio-7B") now returns Qwen2AudioModel, not Qwen2AudioForConditionalGeneration, some for other ALMs 🚨
Auto mappings: AutoModel now returns base model (Qwen2AudioModel)
likewise model.language_model is still returned but it now returns base model and not the causal one as before 🚨
model.language_model on ForContionalGeneation class is not accessible anymore. Use model.get_decoder()

how do we ensure BC (the more possible)

weight loading: thanks to the dynamic weight loader and conversion mapping
ensure models cards for the references hub checkpoints in doc and test don't use AutoModel (since this will be broken)

models to which this applies

TODO

make sure to remove the get/set_input_embeddings when [MultimodalLM] add language_model to the get/set_input_embeddings logic #46029 is merged

# Conflicts: # src/transformers/models/audioflamingo3/modeling_audioflamingo3.py # src/transformers/models/glmasr/modeling_glmasr.py # src/transformers/models/granite_speech/modeling_granite_speech.py # src/transformers/models/granite_speech_plus/modeling_granite_speech_plus.py # src/transformers/models/musicflamingo/modeling_musicflamingo.py # src/transformers/models/musicflamingo/modular_musicflamingo.py # src/transformers/models/vibevoice_asr/modeling_vibevoice_asr.py # src/transformers/models/voxtral/modeling_voxtral.py # src/transformers/models/voxtral/modular_voxtral.py # src/transformers/models/voxtral_realtime/modeling_voxtral_realtime.py # tests/models/audioflamingo3/test_modeling_audioflamingo3.py # tests/models/glmasr/test_modeling_glmasr.py # tests/models/granite_speech/test_modeling_granite_speech.py # tests/models/musicflamingo/test_modeling_musicflamingo.py # tests/models/qwen2_audio/test_modeling_qwen2_audio.py # tests/models/vibevoice_asr/test_modeling_vibevoice_asr.py # tests/models/voxtral/test_modeling_voxtral.py

…rmers into alm-base-model-class

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

github-actions · 2026-05-26T09:18:39Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: audioflamingo3, auto, glmasr, granite_speech, granite_speech_plus, musicflamingo, parakeet, qwen2_audio, vibevoice_asr, voxtral, voxtral_realtime

* add a base model class * ensure BC via conversion mapping * auto classes * test updates * ensure BC for accessing attributes * simplify conversion mapping * convert modular * convert modular * apply to voxtral * convert modular * remove test_model_base_model_prefix overwrite * make * make * XXXModel class in doc * add GraniteSpeechPlusModel * tests now have a base_model_class * fix qwen2audio * class level base mappings * fix test_reverse_loading_mapping * fix * fix * nit * fix * deprec in 5.7 -> 5.15 * fix flaky test * fix flaky test * fix flaky test * remove forward_base_model_attrs * _supports_attention_backend in PretrainedModel * removed redundant get/set_input_embeddings * remove requires_grad_ handling * make fix-repo * fix voxtral realtime * update vibevoice_asr test * update test * Update src/transformers/models/audioflamingo3/modular_audioflamingo3.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/audioflamingo3/modular_audioflamingo3.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/glmasr/modeling_glmasr.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * make fix-repo --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix * fix * unnecessary and misleading

* add a base model class * ensure BC via conversion mapping * auto classes * test updates * ensure BC for accessing attributes * simplify conversion mapping * convert modular * convert modular * apply to voxtral * convert modular * remove test_model_base_model_prefix overwrite * make * make * XXXModel class in doc * add GraniteSpeechPlusModel * tests now have a base_model_class * fix qwen2audio * class level base mappings * fix test_reverse_loading_mapping * fix * fix * nit * fix * deprec in 5.7 -> 5.15 * fix flaky test * fix flaky test * fix flaky test * remove forward_base_model_attrs * _supports_attention_backend in PretrainedModel * removed redundant get/set_input_embeddings * remove requires_grad_ handling * make fix-repo * fix voxtral realtime * update vibevoice_asr test * update test * Update src/transformers/models/audioflamingo3/modular_audioflamingo3.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/audioflamingo3/modular_audioflamingo3.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/glmasr/modeling_glmasr.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * make fix-repo --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix * fix * unnecessary and misleading

eustlb added 4 commits April 20, 2026 16:59

add a base model class

4cd44e4

ensure BC via conversion mapping

79b833b

auto classes

57436db

test updates

1bd269c

This was referenced Apr 20, 2026

feat[vLLM × v5]: Add audio support for the Transformers backend vllm-project/vllm#39330

Open

audio tester class #45391

Merged

ensure BC for accessing attributes

9010366

eustlb changed the title ~~ALM base model class~~ 🚨 [ALM] Add base model without head Apr 22, 2026

simplify conversion mapping

af429ef

eustlb added 18 commits May 11, 2026 18:48

Merge branch 'main' into alm-base-model-class

e5eb9a6

convert modular

1da57e8

convert modular

7481402

apply to voxtral

cf7c5f1

convert modular

83799ce

remove test_model_base_model_prefix overwrite

464fae5

make

744567c

make

687b693

XXXModel class in doc

ff8b9f5

add GraniteSpeechPlusModel

753b755

tests now have a base_model_class

ecce440

fix qwen2audio

1f5e199

class level base mappings

6a1166b

Merge branch 'main' into alm-base-model-class

674e64c

fix test_reverse_loading_mapping

933f8c6

Merge branch 'alm-base-model-class' of github.com:huggingface/transfo…

642bdff

…rmers into alm-base-model-class

fix

5383313

eustlb and others added 3 commits May 20, 2026 10:43

Update src/transformers/models/audioflamingo3/modular_audioflamingo3.py

daad5ec

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

Update src/transformers/models/glmasr/modeling_glmasr.py

b09349d

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into alm-base-model-class

a59f750

eustlb enabled auto-merge May 20, 2026 09:27

eustlb added 2 commits May 20, 2026 11:40

make fix-repo

6f5be63

Merge branch 'main' into alm-base-model-class

159819c

eustlb added this pull request to the merge queue May 20, 2026

github-merge-queue Bot removed this pull request from the merge queue due to no response for status checks May 20, 2026

eustlb added this pull request to the merge queue May 20, 2026

github-merge-queue Bot removed this pull request from the merge queue due to a conflict with the base branch May 20, 2026

Merge remote-tracking branch 'origin' into alm-base-model-class

80306c6

eustlb enabled auto-merge May 20, 2026 11:46

eustlb added 2 commits May 21, 2026 13:48

Merge branch 'main' into alm-base-model-class

157f7a2

Merge branch 'main' into alm-base-model-class

401cf5a

eustlb added this pull request to the merge queue May 26, 2026

Merged via the queue into main with commit 027d1a9 May 26, 2026
31 checks passed

eustlb deleted the alm-base-model-class branch May 26, 2026 09:47

eustlb mentioned this pull request May 26, 2026

Support Granite Speech NAR (NLE) #46031

Open

6 tasks

ebezzam mentioned this pull request Jun 4, 2026

fix[vLLM x v5]: Default untied embeddings in AudioFlamingo3 and VibeVoice #46400

Merged

6 tasks

hmellor mentioned this pull request Jun 5, 2026

Bump Transformers version to 5.10.3 vllm-project/vllm#41359

Open

6 tasks

pull Bot pushed a commit to j3din00b/transformers that referenced this pull request Jun 8, 2026

[fix] regression introduced by huggingface#45534 (huggingface#46456)

b6bad75

* fix * fix * unnecessary and misleading

khushali9 pushed a commit to khushali9/transformers that referenced this pull request Jun 8, 2026

[fix] regression introduced by huggingface#45534 (huggingface#46456)

a3cabf0

* fix * fix * unnecessary and misleading

louzongzhi pushed a commit to louzongzhi/transformers that referenced this pull request Jun 10, 2026

[fix] regression introduced by huggingface#45534 (huggingface#46456)

995d794

* fix * fix * unnecessary and misleading

louzongzhi pushed a commit to louzongzhi/transformers that referenced this pull request Jun 10, 2026

[fix] regression introduced by huggingface#45534 (huggingface#46456)

94c2ebe

* fix * fix * unnecessary and misleading

ArthurZucker pushed a commit that referenced this pull request Jun 11, 2026

[fix] regression introduced by #45534 (#46456)

9caa4e4

* fix * fix * unnecessary and misleading

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🚨 [ALM] Add base model without head#45534

🚨 [ALM] Add base model without head#45534
eustlb merged 52 commits into
mainfrom
alm-base-model-class

eustlb commented Apr 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

github-actions Bot commented May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

eustlb commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

why do we actually have this discrepancy?

🚨 Breaking changes

how do we ensure BC (the more possible)

models to which this applies

Uh oh!

Uh oh!

github-actions Bot commented May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

eustlb commented Apr 20, 2026 •

edited

Loading