Optimize temporal dropout using vectorized masked_fill by dearabhin · Pull Request #51 · facebookresearch/tribev2

dearabhin · 2026-04-22T14:21:52Z

This PR optimizes the temporal dropout implementation in tribev2/model.py to remove a cross-device memory transfer and CPU bottleneck.

Currently, the dropout mask is generated using a Python for loop over the batch dimension, with torch.rand defaulting to the CPU. This breaks the CUDA execution graph for every item in the batch.

This patch vectorizes the operation directly on the GPU using masked_fill_, which improves training speed and memory efficiency.

Resolves #50

Optimize temporal dropout using vectorized masked_fill

bfb9189

meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 22, 2026

ahmedemam992 approved these changes Apr 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize temporal dropout using vectorized masked_fill#51

Optimize temporal dropout using vectorized masked_fill#51
dearabhin wants to merge 1 commit into
facebookresearch:mainfrom
dearabhin:optimize-temporal-dropout

dearabhin commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dearabhin commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants