feat: configurable patch encoder input resolution via dynamic_img_size#217
Open
HHenryD wants to merge 3 commits into
Open
feat: configurable patch encoder input resolution via dynamic_img_size#217HHenryD wants to merge 3 commits into
HHenryD wants to merge 3 commits into
Conversation
Add tests for the --patch_encoder_img_size / target_img_size feature: - tests/test_patch_encoder_resize.py (CI-safe, no downloads): unit tests for _resolve_target_img_size validation, RESIZE_SUPPORTED registry consistency, both CLI parsers exposing the flag, and run_task gating (unsupported encoder raises, supported forwards target_img_size). - tests/test_patch_encoders.py (integration-gated): per-branch forward tests asserting the embedding dim is invariant to target_img_size, plus non-multiple and unsupported-encoder rejection. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds optional custom input resolution for ViT patch encoders via timm
dynamic_img_size(interpolates positional embeddings at runtime instead of forcing 224px).Changes
dynamic_img_sizefor:gigapath,hoptimus0/1,gpfm,lunit-vits8,h0-minitarget_img_sizebuild-kwarg to all supported ViTs (above +uni_v1/v2,virchow/virchow2,kaiko-*)virchow/virchow2HF-hub path was droppingdynamic_img_size/reg_tokens/etc. — unified both load pathsNot covered
Non-ViT / fixed-token backbones (
ctranspathSwin,musk,gemma4-*,conch_v1,keep, …) and HF-transformersViTs (phikon*,hibou_l,midnight12k) — would need different handling, out of scope.