Skip to content

Support efficient flash attention for packed sequences using flash-attn-2.0#350

Merged
lucidrains merged 3 commits into
lucidrains:mainfrom
muthissar:flash-pack-seq
Feb 7, 2026
Merged

Support efficient flash attention for packed sequences using flash-attn-2.0#350
lucidrains merged 3 commits into
lucidrains:mainfrom
muthissar:flash-pack-seq

Conversation

@muthissar

@muthissar muthissar commented Feb 7, 2026

Copy link
Copy Markdown
Contributor

Support efficient flash attention using flash-attn-2.0 by optionally providing x and context as 1-sized packed sequence batches and kwargs to flash_attn_varlen_func. This supports efficient flash-attention for at least:

fixes #351

Tested that the functionality gives similar results with rope embeddings (it throws an exception without rope).

@muthissar muthissar marked this pull request as draft February 7, 2026 15:41
@muthissar muthissar marked this pull request as ready for review February 7, 2026 15:45
Comment thread tests/test_x_transformers.py
@lucidrains lucidrains merged commit 1b0ac2e into lucidrains:main Feb 7, 2026
TimS-ml pushed a commit to TimS-ml/x-transformers that referenced this pull request Feb 14, 2026
This merge brings in the following upstream changes:
- add a feature needed for new vq lib research (7921e1a)
- Support efficient flash attention for packed sequences using flash-attn-2.0 (lucidrains#350)
- handle rotary and polar positional embeddings with caching when attention layers is not wrapped
- address an edge case with seq_start_pos and when input does not include the full sequence
- able to set input_not_include_cache behavior on init
- fix an issue needed for metacontroller
- add softmax linear unit proposed by Anthropic
- latent dropout for free transformer
- improvements to belief attention

Conflicts resolved by preferring upstream versions to ensure compatibility
with the latest features and bug fixes.

https://claude.ai/code/session_011tP2xJoHnqFHLEwiBCf2bZ
@muthissar muthissar deleted the flash-pack-seq branch February 15, 2026 12:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support efficient flash attention using packed sequences

2 participants