Remove legacy get_max_thread_blocks helper from cuda_utilities by q10 · Pull Request #5897 · pytorch/FBGEMM

q10 · 2026-06-14T00:54:07Z

Summary:
Now that the TBE backward template (the last caller) has migrated to
cap_grid_dim_x, remove the legacy
fbgemm_gpu::utils::cuda::get_max_thread_blocks(stream) helper from
include/fbgemm_gpu/utils/cuda_utilities.cuh and inline its sole
remaining use inside cap_grid_dim_x.

Behavior-preserving: the inlined body computes
MAX_THREAD_BLOCKS_FACTOR * #SMs exactly as before.

Reviewed By: spcyppt

Differential Revision: D107317501

…max_thread_blocks helpers (pytorch#5853) Summary: X-link: facebookresearch/FBGEMM#2775 Final diff in the threshold-guard helper introduction stack. Migrates the two host-side cap sites in `codegen/training/backward/embedding_backward_split_template.cu` from the legacy `std::min(blocks_uncapped, get_max_thread_blocks_(...))` form to the new threshold-guarded helper `fbgemm_gpu::utils::cuda::determine_grid_blocks_from_blocks(..., BlockCapPolicy::Always)`. With this migration the last legacy callers are gone, so this diff also cleans up: - Removes `fbgemm_gpu::utils::cuda::get_max_thread_blocks(stream)` from `include/fbgemm_gpu/utils/cuda_utilities.cuh`. - Removes the file-local `get_max_thread_blocks_()` and `MAX_THREAD_BLOCKS_FACTOR` from `include/fbgemm_gpu/embedding_backward_template_helpers.cuh`. - Adds `#include "fbgemm_gpu/utils/cuda_utilities.cuh"` to `embedding_backward_template_helpers.cuh` for the new helper. Behavior-preserving on the TBE backward variants: the policy `Always` matches the prior unconditional ROCm cap exactly. Reviewed By: spcyppt Differential Revision: D106453408

Summary: Now that the TBE backward template (the last caller) has migrated to `cap_grid_dim_x`, remove the legacy `fbgemm_gpu::utils::cuda::get_max_thread_blocks(stream)` helper from `include/fbgemm_gpu/utils/cuda_utilities.cuh` and inline its sole remaining use inside `cap_grid_dim_x`. Behavior-preserving: the inlined body computes `MAX_THREAD_BLOCKS_FACTOR * #SMs` exactly as before. Reviewed By: spcyppt Differential Revision: D107317501

meta-codesync · 2026-06-14T00:54:18Z

@q10 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D107317501.

meta-codesync · 2026-06-14T19:52:41Z

This pull request has been merged in 57f3a8b.

q10 added 2 commits June 13, 2026 17:53

meta-cla Bot added the cla signed label Jun 14, 2026

meta-codesync Bot added the meta-exported label Jun 14, 2026

meta-codesync Bot closed this in 57f3a8b Jun 14, 2026

meta-codesync Bot added the Merged label Jun 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove legacy get_max_thread_blocks helper from cuda_utilities#5897

Remove legacy get_max_thread_blocks helper from cuda_utilities#5897
q10 wants to merge 2 commits into
pytorch:mainfrom
q10:export-D107317501

q10 commented Jun 14, 2026

Uh oh!

meta-codesync Bot commented Jun 14, 2026

Uh oh!

meta-codesync Bot commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

q10 commented Jun 14, 2026

Uh oh!

meta-codesync Bot commented Jun 14, 2026

Uh oh!

meta-codesync Bot commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant