Support efficient flash attention for packed sequences using flash-attn-2.0#350
Merged
Conversation
lucidrains
reviewed
Feb 7, 2026
TimS-ml
pushed a commit
to TimS-ml/x-transformers
that referenced
this pull request
Feb 14, 2026
This merge brings in the following upstream changes: - add a feature needed for new vq lib research (7921e1a) - Support efficient flash attention for packed sequences using flash-attn-2.0 (lucidrains#350) - handle rotary and polar positional embeddings with caching when attention layers is not wrapped - address an edge case with seq_start_pos and when input does not include the full sequence - able to set input_not_include_cache behavior on init - fix an issue needed for metacontroller - add softmax linear unit proposed by Anthropic - latent dropout for free transformer - improvements to belief attention Conflicts resolved by preferring upstream versions to ensure compatibility with the latest features and bug fixes. https://claude.ai/code/session_011tP2xJoHnqFHLEwiBCf2bZ
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Support efficient flash attention using flash-attn-2.0 by optionally providing
xandcontextas 1-sized packed sequence batches and kwargs toflash_attn_varlen_func. This supports efficient flash-attention for at least:fixes #351
Tested that the functionality gives similar results with rope embeddings (it throws an exception without rope).