Skip to content

Optimize temporal dropout using vectorized masked_fill#51

Open
dearabhin wants to merge 1 commit into
facebookresearch:mainfrom
dearabhin:optimize-temporal-dropout
Open

Optimize temporal dropout using vectorized masked_fill#51
dearabhin wants to merge 1 commit into
facebookresearch:mainfrom
dearabhin:optimize-temporal-dropout

Conversation

@dearabhin
Copy link
Copy Markdown

This PR optimizes the temporal dropout implementation in tribev2/model.py to remove a cross-device memory transfer and CPU bottleneck.

Currently, the dropout mask is generated using a Python for loop over the batch dimension, with torch.rand defaulting to the CPU. This breaks the CUDA execution graph for every item in the batch.

This patch vectorizes the operation directly on the GPU using masked_fill_, which improves training speed and memory efficiency.

Resolves #50

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize temporal dropout to remove CPU/GPU bottleneck

2 participants