Support efficient flash attention for packed sequences using flash-attn-2.0 by muthissar · Pull Request #350 · lucidrains/x-transformers

muthissar · 2026-02-07T15:18:50Z

Support efficient flash attention using flash-attn-2.0 by optionally providing x and context as 1-sized packed sequence batches and kwargs to flash_attn_varlen_func. This supports efficient flash-attention for at least:

block/intra-document masking
sliding window attention

fixes #351

Tested that the functionality gives similar results with rope embeddings (it throws an exception without rope).

…-2.0

This merge brings in the following upstream changes: - add a feature needed for new vq lib research (7921e1a) - Support efficient flash attention for packed sequences using flash-attn-2.0 (lucidrains#350) - handle rotary and polar positional embeddings with caching when attention layers is not wrapped - address an edge case with seq_start_pos and when input does not include the full sequence - able to set input_not_include_cache behavior on init - fix an issue needed for metacontroller - add softmax linear unit proposed by Anthropic - latent dropout for free transformer - improvements to belief attention Conflicts resolved by preferring upstream versions to ensure compatibility with the latest features and bug fixes. https://claude.ai/code/session_011tP2xJoHnqFHLEwiBCf2bZ

Added efficient flash attention for packed sequences using flash-attn…

b56a72a

…-2.0

muthissar marked this pull request as draft February 7, 2026 15:41

muthissar marked this pull request as ready for review February 7, 2026 15:45

lucidrains reviewed Feb 7, 2026

View reviewed changes

Comment thread tests/test_x_transformers.py

muthissar and others added 2 commits February 7, 2026 19:20

Skip test_flash_pack_seq if torch is not available.

0b6e059

Merge branch 'main' into flash-pack-seq

69708ca

lucidrains merged commit 1b0ac2e into lucidrains:main Feb 7, 2026

muthissar deleted the flash-pack-seq branch February 15, 2026 12:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support efficient flash attention for packed sequences using flash-attn-2.0#350

Support efficient flash attention for packed sequences using flash-attn-2.0#350
lucidrains merged 3 commits into
lucidrains:mainfrom
muthissar:flash-pack-seq

muthissar commented Feb 7, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

muthissar commented Feb 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

muthissar commented Feb 7, 2026 •

edited

Loading