Skip to content

🚨 [ALM] Add base model without head#45534

Merged
eustlb merged 52 commits into
mainfrom
alm-base-model-class
May 26, 2026
Merged

🚨 [ALM] Add base model without head#45534
eustlb merged 52 commits into
mainfrom
alm-base-model-class

Conversation

@eustlb

@eustlb eustlb commented Apr 20, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

This is motivated to simplify changes required for vLLM compatibility as seen in vLLM PR #39330

Fix a discrepancy in ALMs design compared to VLMs that for most of them don't have a base model class because the use the causal model for the text model directly, while Llava style (which is indeed more aligned with the lib philosophy) is rather:
XxModel: encoder + projector + language base model backbone
XxForConditionalGeneration: XxModel + lm_head

why do we actually have this discrepancy?

qwen2_audio was the first ALM in transformers, modelling reflects the status of llava when it was added
all the ALMs merged since then replicated this.

llava and VLMs evolved thanks to

🚨 Breaking changes

We don't fully ensure BC, this is breaking:

  • AutoModel.from_pretrained("Qwen/Qwen2-Audio-7B") now returns Qwen2AudioModel, not Qwen2AudioForConditionalGeneration, some for other ALMs 🚨
  • Auto mappings: AutoModel now returns base model (Qwen2AudioModel)
  • likewise model.language_model is still returned but it now returns base model and not the causal one as before 🚨
  • model.language_model on ForContionalGeneation class is not accessible anymore. Use model.get_decoder()

how do we ensure BC (the more possible)

  • weight loading: thanks to the dynamic weight loader and conversion mapping
  • ensure models cards for the references hub checkpoints in doc and test don't use AutoModel (since this will be broken)

models to which this applies

  • audioflamingo3
  • glmasr
  • granite_speech
  • granite_speech_plus
  • musicflamingo
  • qwen2_audio
  • vibevoice_asr
  • voxtral
  • voxtral_realtime

TODO

@eustlb eustlb changed the title ALM base model class 🚨 [ALM] Add base model without head Apr 22, 2026
eustlb added 18 commits May 11, 2026 18:48
# Conflicts:
#	src/transformers/models/audioflamingo3/modeling_audioflamingo3.py
#	src/transformers/models/glmasr/modeling_glmasr.py
#	src/transformers/models/granite_speech/modeling_granite_speech.py
#	src/transformers/models/granite_speech_plus/modeling_granite_speech_plus.py
#	src/transformers/models/musicflamingo/modeling_musicflamingo.py
#	src/transformers/models/musicflamingo/modular_musicflamingo.py
#	src/transformers/models/vibevoice_asr/modeling_vibevoice_asr.py
#	src/transformers/models/voxtral/modeling_voxtral.py
#	src/transformers/models/voxtral/modular_voxtral.py
#	src/transformers/models/voxtral_realtime/modeling_voxtral_realtime.py
#	tests/models/audioflamingo3/test_modeling_audioflamingo3.py
#	tests/models/glmasr/test_modeling_glmasr.py
#	tests/models/granite_speech/test_modeling_granite_speech.py
#	tests/models/musicflamingo/test_modeling_musicflamingo.py
#	tests/models/qwen2_audio/test_modeling_qwen2_audio.py
#	tests/models/vibevoice_asr/test_modeling_vibevoice_asr.py
#	tests/models/voxtral/test_modeling_voxtral.py
eustlb and others added 3 commits May 20, 2026 10:43
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
@eustlb eustlb enabled auto-merge May 20, 2026 09:27
@eustlb eustlb added this pull request to the merge queue May 20, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to no response for status checks May 20, 2026
@eustlb eustlb added this pull request to the merge queue May 20, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to a conflict with the base branch May 20, 2026
@eustlb eustlb enabled auto-merge May 20, 2026 11:46
@github-actions

Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: audioflamingo3, auto, glmasr, granite_speech, granite_speech_plus, musicflamingo, parakeet, qwen2_audio, vibevoice_asr, voxtral, voxtral_realtime

@eustlb eustlb added this pull request to the merge queue May 26, 2026
Merged via the queue into main with commit 027d1a9 May 26, 2026
31 checks passed
@eustlb eustlb deleted the alm-base-model-class branch May 26, 2026 09:47
@eustlb eustlb mentioned this pull request May 26, 2026
6 tasks
yuchenxie4645 pushed a commit to yuchenxie4645/transformers that referenced this pull request May 28, 2026
* add a base model class

* ensure BC via conversion mapping

* auto classes

* test updates

* ensure BC for accessing attributes

* simplify conversion mapping

* convert modular

* convert modular

* apply to voxtral

* convert modular

* remove test_model_base_model_prefix overwrite

* make

* make

* XXXModel class in doc

* add GraniteSpeechPlusModel

* tests now have a base_model_class

* fix qwen2audio

* class level base mappings

* fix test_reverse_loading_mapping

* fix

* fix

* nit

* fix

* deprec in 5.7 -> 5.15

* fix flaky test

* fix flaky test

* fix flaky test

* remove forward_base_model_attrs

* _supports_attention_backend in PretrainedModel

* removed redundant get/set_input_embeddings

* remove requires_grad_ handling

* make fix-repo

* fix voxtral realtime

* update vibevoice_asr test

* update test

* Update src/transformers/models/audioflamingo3/modular_audioflamingo3.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/audioflamingo3/modular_audioflamingo3.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/glmasr/modeling_glmasr.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* make fix-repo

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
kashif pushed a commit to kashif/transformers that referenced this pull request Jun 1, 2026
* add a base model class

* ensure BC via conversion mapping

* auto classes

* test updates

* ensure BC for accessing attributes

* simplify conversion mapping

* convert modular

* convert modular

* apply to voxtral

* convert modular

* remove test_model_base_model_prefix overwrite

* make

* make

* XXXModel class in doc

* add GraniteSpeechPlusModel

* tests now have a base_model_class

* fix qwen2audio

* class level base mappings

* fix test_reverse_loading_mapping

* fix

* fix

* nit

* fix

* deprec in 5.7 -> 5.15

* fix flaky test

* fix flaky test

* fix flaky test

* remove forward_base_model_attrs

* _supports_attention_backend in PretrainedModel

* removed redundant get/set_input_embeddings

* remove requires_grad_ handling

* make fix-repo

* fix voxtral realtime

* update vibevoice_asr test

* update test

* Update src/transformers/models/audioflamingo3/modular_audioflamingo3.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/audioflamingo3/modular_audioflamingo3.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/glmasr/modeling_glmasr.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* make fix-repo

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
pull Bot pushed a commit to j3din00b/transformers that referenced this pull request Jun 8, 2026
khushali9 pushed a commit to khushali9/transformers that referenced this pull request Jun 8, 2026
* add a base model class

* ensure BC via conversion mapping

* auto classes

* test updates

* ensure BC for accessing attributes

* simplify conversion mapping

* convert modular

* convert modular

* apply to voxtral

* convert modular

* remove test_model_base_model_prefix overwrite

* make

* make

* XXXModel class in doc

* add GraniteSpeechPlusModel

* tests now have a base_model_class

* fix qwen2audio

* class level base mappings

* fix test_reverse_loading_mapping

* fix

* fix

* nit

* fix

* deprec in 5.7 -> 5.15

* fix flaky test

* fix flaky test

* fix flaky test

* remove forward_base_model_attrs

* _supports_attention_backend in PretrainedModel

* removed redundant get/set_input_embeddings

* remove requires_grad_ handling

* make fix-repo

* fix voxtral realtime

* update vibevoice_asr test

* update test

* Update src/transformers/models/audioflamingo3/modular_audioflamingo3.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/audioflamingo3/modular_audioflamingo3.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/glmasr/modeling_glmasr.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* make fix-repo

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
khushali9 pushed a commit to khushali9/transformers that referenced this pull request Jun 8, 2026
louzongzhi pushed a commit to louzongzhi/transformers that referenced this pull request Jun 10, 2026
louzongzhi pushed a commit to louzongzhi/transformers that referenced this pull request Jun 10, 2026
ArthurZucker pushed a commit that referenced this pull request Jun 11, 2026
* fix

* fix

* unnecessary and misleading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants