dense_to_jagged_forward: realize total_L SymInt before empty by haoyuz · Pull Request #5873 · pytorch/FBGEMM

haoyuz · 2026-06-11T02:48:45Z

Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/2793

CONTEXT: On AMD MI350X (HIP), MAST job fire-fandw06-f1096341099 (Stories LSR
train_eval) crashed inside fbgemm::dense_to_jagged during the forward pass
of UhmEventTokenizer.get_position_encoding. Two related symptoms appeared
across ranks:

RuntimeError: ...RegisterCUDA_0.cpp:7563: SymIntArrayRef expected to contain only concrete integers (asIntArrayRefSlow check fired)
RuntimeError: Trying to create tensor with negative dimension -1409625905161306112: [-1409625905161306112, 8] (heap SymNode pointer
reinterpreted as int64 in at::detail::empty_generic)

dense_to_jagged_forward.cu and the CPU variant forward an
std::optional<at::SymInt> total_L straight into
at::empty_symint({total_L, D}, ...). The aten empty.memory_format CUDA/HIP
wrapper at RegisterCUDA_0.cpp calls C10_AS_INTARRAYREF_SLOW on the size
array, which TORCH_CHECKs that no SymInt in the array is
is_heap_allocated(). Any heap-allocated SymInt arriving here (e.g. an
unbacked SymInt produced inside a torch.compile region in production)
trips that check, or - depending on how the dispatcher walked the array -
leaks the SymNode pointer through as a raw int64_t dimension.

WHAT: Realize total_L to a concrete int64_t via guard_int(__FILE__, __LINE__) before constructing the output tensor, and switch the allocation
from at::empty_symint / at::zeros_symint (SymInt-shape) to at::empty /
at::zeros (int64_t shape). For heap SymInts with a hint or a runtime
guard guard_int resolves cleanly to the concrete value; for truly unbacked
SymInts with no value the kernel now produces a clean
"Could not extract specialized integer from data-dependent expression"
error instead of the low-level memory crash.

Same fix applied to both the CUDA/HIP kernel
(src/jagged_tensor_ops/dense_to_jagged_forward.cu) and the CPU kernel
(src/jagged_tensor_ops/jagged_tensor_ops_cpu.cpp).

Adds a regression test test_dense_to_jagged_heap_symint_total_L that
constructs an unbacked, heap-allocated SymInt via
ShapeEnv.create_unbacked_symint() and calls
torch.ops.fbgemm.dense_to_jagged directly. Pre-fix the test fails with the
SymIntArrayRef crash; post-fix it passes (asserting the clean guard_int
error path).

Differential Revision: D108236923

Summary: X-link: facebookresearch/FBGEMM#2793 CONTEXT: On AMD MI350X (HIP), MAST job `fire-fandw06-f1096341099` (Stories LSR train_eval) crashed inside `fbgemm::dense_to_jagged` during the forward pass of `UhmEventTokenizer.get_position_encoding`. Two related symptoms appeared across ranks: - `RuntimeError: ...RegisterCUDA_0.cpp:7563: SymIntArrayRef expected to contain only concrete integers` (asIntArrayRefSlow check fired) - `RuntimeError: Trying to create tensor with negative dimension -1409625905161306112: [-1409625905161306112, 8]` (heap SymNode pointer reinterpreted as int64 in `at::detail::empty_generic`) `dense_to_jagged_forward.cu` and the CPU variant forward an `std::optional<at::SymInt> total_L` straight into `at::empty_symint({total_L, D}, ...)`. The aten `empty.memory_format` CUDA/HIP wrapper at `RegisterCUDA_0.cpp` calls `C10_AS_INTARRAYREF_SLOW` on the size array, which `TORCH_CHECK`s that no `SymInt` in the array is `is_heap_allocated()`. Any heap-allocated `SymInt` arriving here (e.g. an unbacked SymInt produced inside a `torch.compile` region in production) trips that check, or - depending on how the dispatcher walked the array - leaks the `SymNode` pointer through as a raw `int64_t` dimension. WHAT: Realize `total_L` to a concrete `int64_t` via `guard_int(__FILE__, __LINE__)` before constructing the output tensor, and switch the allocation from `at::empty_symint` / `at::zeros_symint` (SymInt-shape) to `at::empty` / `at::zeros` (`int64_t` shape). For heap SymInts with a hint or a runtime guard `guard_int` resolves cleanly to the concrete value; for truly unbacked SymInts with no value the kernel now produces a clean `"Could not extract specialized integer from data-dependent expression"` error instead of the low-level memory crash. Same fix applied to both the CUDA/HIP kernel (`src/jagged_tensor_ops/dense_to_jagged_forward.cu`) and the CPU kernel (`src/jagged_tensor_ops/jagged_tensor_ops_cpu.cpp`). Adds a regression test `test_dense_to_jagged_heap_symint_total_L` that constructs an unbacked, heap-allocated SymInt via `ShapeEnv.create_unbacked_symint()` and calls `torch.ops.fbgemm.dense_to_jagged` directly. Pre-fix the test fails with the `SymIntArrayRef` crash; post-fix it passes (asserting the clean `guard_int` error path). Differential Revision: D108236923

meta-codesync · 2026-06-11T02:48:54Z

@haoyuz has exported this pull request. If you are a Meta employee, you can view the originating Diff in D108236923.

meta-cla Bot added the cla signed label Jun 11, 2026

meta-codesync Bot added the meta-exported label Jun 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dense_to_jagged_forward: realize total_L SymInt before empty#5873

dense_to_jagged_forward: realize total_L SymInt before empty#5873
haoyuz wants to merge 1 commit into
pytorch:mainfrom
haoyuz:export-D108236923

haoyuz commented Jun 11, 2026

Uh oh!

meta-codesync Bot commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

haoyuz commented Jun 11, 2026

Uh oh!

meta-codesync Bot commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant