Skip to content

Feat gemma4 vision encoder local#215

Merged
guillaumejaume merged 2 commits into
mainfrom
feat-gemma4-vision-encoder-local
May 27, 2026
Merged

Feat gemma4 vision encoder local#215
guillaumejaume merged 2 commits into
mainfrom
feat-gemma4-vision-encoder-local

Conversation

@guillaumejaume

@guillaumejaume guillaumejaume commented May 27, 2026

Copy link
Copy Markdown
Contributor

Adds Gemma 4's vision tower as a TRIDENT patch encoder. Two variants are registered: gemma4-e4b (167M, dim 768) and gemma4-26b (569M, dim 1152), selected with --patch_encoder.

Only the vision tower is loaded from the multimodal checkpoint. The safetensors header is parsed and the model.vision_tower.* tensors are read by seeking their byte ranges, so the LLM weights are never materialized (the 26B's vision tower sits inside a ~50 GB shard). Loading the transformers Gemma 4 classes is deferred to _build, so the encoder only needs transformers>=5.0 at runtime and the rest of TRIDENT keeps working on 4.x.

Credit to @jrs-orellana

jrs-orellana and others added 2 commits May 23, 2026 18:23
Rename Gemma4 26B encoder class for consistency and update the Gemma4 snapshot_download patterns to include sharded safetensors.

Co-authored-by: Cursor <cursoragent@cursor.com>
@guillaumejaume guillaumejaume merged commit ba52a93 into main May 27, 2026
2 checks passed
@guillaumejaume guillaumejaume deleted the feat-gemma4-vision-encoder-local branch May 29, 2026 12:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants