feat: GPU-accelerated FrameView local poses + a new xform_space_writer API by pv-nvidia · Pull Request #5677 · isaac-sim/IsaacLab

pv-nvidia · 2026-05-18T13:17:13Z

Fast GPU pose/scale ops on Fabric + a new writer-scope API

What this does

Two things:

Local-space pose and scale ops are now GPU-fast on the Fabric backend.
They go through the same Warp-kernel path that world-space ops already used.
The old USD fallback for local-space is ~100× slower (see the table below).
All transform writes go through a small context manager. You open a
scope, do your writes inside it, and the scope handles the cleanup:
```
with view.xform_world_space_writer() as writer:
    writer.set_poses(positions=p, orientations=o)
    writer.set_scales(scales=s)
```
Use view.xform_local_space_writer() for local-space writes.

Builds on Piotr's prototype at
bareya/pbarejko/camera-update.

Why a scope?

A FrameView keeps two copies of each prim's transform in Fabric storage:
the omni:fabric:worldMatrix and omni:fabric:localMatrix attributes.
When you write one, the other has to be recomputed so they stay
consistent.

Without a scope, that recompute has to run after every single write. With
a scope, it runs once, when the scope closes. You can call set_poses
and set_scales as many times as you want inside the scope; on exit, one
Warp kernel derives the other space and one wp.synchronize() runs.
Empty scopes cost nothing.

The scope also pauses Fabric Hierarchy's transform-change tracking
while you're writing, then restores it on exit. (Fabric itself is just
the data store; Fabric Hierarchy, exposed through USDRT's
IFabricHierarchy, is the plugin that watches writes to
omni:fabric:localMatrix / omni:fabric:worldMatrix and keeps them
mutually consistent across the hierarchy.) Its tracking is pull-based:
a per-attribute listener records writes into a changelog, and the
plugin drains and processes that changelog on the next call to
IFabricHierarchy::update_world_xforms(). Pausing the listener does
not block the writes themselves — they still land in Fabric storage —
it just keeps them out of the changelog. With the changelog empty, the
next tick has nothing to "catch up on" for these prims, and can't
decide one of the two spaces is canonical and derive the other from
it. See the next section for why that matters.

Rules:

Only one writer can be open on a view at a time. Opening a second one
raises RuntimeError.
While a writer is open, the view's own getters (get_world_poses, etc.)
raise. Read through the writer (writer.get_poses()) inside the scope,
or close the scope first.
Do not advance the simulation or render from inside the scope.
No sim.step(), no world.render(), no SimulationApp.update().
The matrices are mid-write until exit, and rendering against that state
would read torn data. Keep scopes short and close them before stepping.
If something raises inside the scope (a user-code bug, a notebook
cell interrupt, etc.), the scope still runs the opposite-space recompute
on exit so the world and local matrices stay consistent prim-by-prim.
The partial write itself is not rolled back — if you need
all-or-nothing, snapshot the matrices yourself before entering.

Why we have to coordinate with Fabric Hierarchy (renderer correctness)

A writer scope is synchronous Python code, so the renderer can't see
anything mid-scope — no tick runs until we exit. The risk is on the
next render/sim tick after the scope closes.

On that next tick, Fabric Hierarchy runs its
IFabricHierarchy::update_world_xforms() step, which can recompute
omni:fabric:worldMatrix from omni:fabric:localMatrix (or vice
versa). If it thinks one of the two was the user's most recent edit,
it derives the other from it; the derived value — not what we wrote —
is then what the Fabric Scene Delegate (FSD) hands to RTX on the
render path.

Three things in this PR keep that next-tick recompute from clobbering our
work:

Opposite-space derive at scope exit. By the time the scope closes,
omni:fabric:worldMatrix and omni:fabric:localMatrix are mutually
consistent prim-by-prim, so any recompute Fabric Hierarchy does is a
no-op.
Fabric-Hierarchy change tracking is paused for the whole scope.
The writer calls track_local_xform_changes(False) and
track_world_xform_changes(False) on enter (saving the prior state)
and restores them on exit. While tracking is paused, the listeners
don't record our writes in their changelog, so when the next tick
runs there's nothing for update_world_xforms() to "catch up on".
Two persistent selections with explicit read-only / read-write
flags. Outside the scope, the view's selection is fully read-only —
both omni:fabric:worldMatrix and omni:fabric:localMatrix are
flagged RO. Inside the scope, the writer flips to a fully read-write
selection.
```
_sel_ro :  worldMatrix=RO, localMatrix=RO   (steady state, between writes)
_sel_rw :  worldMatrix=RW, localMatrix=RW   (active during a writer scope)
```
Both selections are built once when the view is initialized and kept
alive for its lifetime. The writer scope flips a single flag (_is_rw)
on enter/exit; nothing is rebuilt. The RO steady state tells Fabric
Hierarchy's next-tick update_world_xforms() that no attribute is
user-authored, so it leaves both alone.

Backend support

Fabric (PhysX): the fast GPU path described above.
USD, Newton, OVPhysX: the writer is a thin wrapper. Writes go straight
to the backend's existing storage. No batching, no extra recompute (there
is no second matrix to keep in sync). Same API across all four backends.

Newton has two different "scale" ideas, and the API keeps them apart:

NewtonSiteFrameView.set_scales(...) (deprecated) writes the Newton
collision-shape geometry size.
The writer's set_scales(...) writes per-site transform scale,
matching what USD FrameView does.

They operate on different state and are not merged.

API changes

New (recommended):

view.xform_world_space_writer() / view.xform_local_space_writer() —
a context manager with set_poses, set_scales, get_poses,
get_scales.

Deprecated (still works, warns once per class on first use):

set_world_poses / set_local_poses — use the writer scope.
get_scales / set_scales — use get_local_scales /
get_world_scales, or the writer's set_scales.

Benchmark (1024 prims, 50 iterations, Blackwell RTX PRO 5000)

Per-iteration timings (lower is better):

Operation	USD (ms)	Fabric (ms)	Newton Site (ms)
Get World Poses	8.6148	0.0375	0.0307
Set World Poses	19.1357	0.1090	0.0368
Get Local Poses	5.9556	0.0374	0.0497
Set Local Poses	7.9794	0.0979	0.0790
Get World Scales	13.1193	0.0371	0.0015
Set World Scales	21.4571	0.1014	0.0612
Get Local Scales	3.2544	0.0375	0.0022
Set Local Scales	3.8350	0.0937	0.0587
Get Both (World+Local)	14.7506	0.0738	0.1302
Interleaved World Set→Get	28.1046	0.1432	0.1588
Per-iter total	126.2	0.77	0.61

Speedup vs USD (per-iter ops; one-time view construction excluded):

Operation	Fabric ×	Newton Site ×
Get World Poses	229.6×	280.2×
Set World Poses	175.6×	519.8×
Get Local Poses	159.0×	119.9×
Set Local Poses	81.5×	101.0×
Get World Scales	353.9×	8911.5×
Set World Scales	211.6×	350.4×
Get Local Scales	86.8×	1489.9×
Set Local Scales	40.9×	65.4×
Get Both (World+Local)	200.0×	113.3×
Interleaved World Set→Get	196.2×	177.0×
Overall (per-iter)	164×	207×

One-time view construction (reported separately, not part of the per-iter total): USD 4.6 ms, Fabric 4.4 ms, Newton Site 1013 ms — the Newton number is dominated by stage population on the first call. Steady-state per-iteration cost is what the speedup row reflects.

Notes for reviewers

Getters call wp.synchronize() before returning, so the returned
ProxyArray is always immediately readable from both GPU and host
code — no caller-side sync needed. (Both the cached and the
per-indices paths sync; this used to be asymmetric.)

isaaclab-review-bot

🔍 Code Review Update

Review of new commits: PR was rebased and expanded since last review.
Previously reviewed: b15d6235 | Now reviewing: 80838c00

Summary

The PR has been expanded from 5 to 6 commits, adding Fabric-accelerated get/set_local_poses:

Service locator infrastructure on SimulationContext
Service locator tests + changelog
Indexed Fabric transform kernels in isaaclab.utils.warp.fabric
FabricStageCache as a shared hierarchy handle
(NEW) Merge commit to consolidate branches
(NEW) FabricFrameView rewrite with get/set_local_poses + dirty tracking

✅ New Additions Since Last Review

Fabric-accelerated local poses: set_local_poses / get_local_poses now use wp.indexedfabricarray to read/write omni:fabric:localMatrix directly on the GPU
Bidirectional world↔local sync:
- set_world_poses → recomputes localMatrix via _sync_local_from_world()
- set_local_poses → marks _world_dirty, world recomputed on next get_world_poses
Per-view dirty tracking: _world_dirty flag is instance-scoped, so concurrent views on the same stage don't clear each other's flag
Parent matrix handling: _build_parent_indexed_array() + _compute_parent_fabric_indices() for parent world matrix lookups
Topology-adaptive: PrepareForReuse() calls + _rebuild_trans_ro_arrays() for automatic recovery
Comprehensive tests: 13 new integration tests covering local/world consistency, rotated/scaled parents, multi-view isolation

🔧 Remaining Observations

[Minor] Index array dtype mismatch still present

The kernels declare indices: ArrayUInt32, but _compute_fabric_indices() returns dtype=wp.int32:

# fabric_frame_view.py
return wp.array(indices, dtype=wp.int32, device=self._device)

Warp will silently cast, so this works in practice. Suggestion: switch to dtype=wp.uint32 for consistency with the kernel signatures. Not blocking.

[Minor] Undefined buffer references in get_local_poses

get_local_poses references self._fabric_local_translations_buf and self._fabric_local_orientations_buf:

if use_cached:
    translations_wp = self._fabric_local_translations_buf
    orientations_wp = self._fabric_local_orientations_buf

These don't appear to be initialized in _initialize_fabric(). Verify these buffers are created alongside the existing world-pose buffers.

📋 Architecture Notes

The world↔local propagation design is clean:

Write world → update local: _sync_local_from_world() runs update_indexed_local_matrix_from_world kernel immediately after world writes
Write local → lazy world update: _world_dirty flag defers update_indexed_world_matrix_from_local until the next world read

This asymmetry makes sense: world writes are typically followed by physics steps (which don't need locals), while local writes are often followed by world reads for rendering.

📋 Verdict

LGTM — the new local-pose acceleration is a significant feature addition. The bidirectional sync logic is well-designed, and the test coverage is comprehensive (33 test functions across all new modules). The minor dtype observation from the previous review remains, plus one potential undefined buffer issue to double-check.

Automated review by isaaclab-review-bot • Reviewed at 80838c00

isaaclab-review-bot

🔍 Code Review Update

Review of new commits: PR was rebased and expanded since last review.
Previously reviewed: eb5582ec | Now reviewing: 9ff3155

Summary

This PR delivers a well-architected feature: GPU-accelerated local-pose operations for FabricFrameView. The implementation is comprehensive and addresses the core limitation where local poses previously fell back to USD round-trips.

✅ Strengths

1. Clean Architecture

FabricStageCache provides shared hierarchy handles via the service locator pattern, avoiding per-view duplication
Three persistent selections (trans_sel_ro, world_sel_rw, local_sel_rw) cleanly separate read vs. write access patterns
Factory dispatch in FrameView._get_backend() correctly routes to UsdFrameView when Fabric is unavailable

2. Robust World↔Local Consistency

Bidirectional dirty tracking: set_local_poses marks _world_dirty, deferred until next world read
_sync_local_from_world() / _sync_world_from_local_if_dirty() keep matrices consistent
Per-view dirty flags prevent concurrent views from clearing each other's pending syncs

3. Topology-Adaptive Design

PrepareForReuse() + lazy array rebuild in _get_*_array() handles Fabric memory layout changes
_rebuild_trans_ro_arrays() consolidates index and indexed-array refresh

4. Excellent Test Coverage

13+ new integration tests covering rotated parents, scaled parents, multi-view isolation
test_set_local_then_get_world_with_rotated_parent validates transpose-convention correctness
test_multi_view_per_view_dirty_isolation catches per-stage vs. per-view flag bugs

🔧 Minor Observations

[Minor] Index array dtype mismatch

_compute_fabric_indices() returns dtype=wp.int32, but kernels declare indices: ArrayUInt32:

return wp.array(indices, dtype=wp.int32, device=self._device)

Warp silently casts, so this works. Suggestion: use dtype=wp.uint32 for consistency with kernel signatures. Not blocking.

[Nit] Docstring transpose-convention note

The docstrings for update_indexed_local_matrix_from_world and update_indexed_world_matrix_from_local explain the transpose identity well. Consider adding a brief note that this relies on Fabric's row-major storage convention for future maintainers.

[Style] Empty sentinel shape

_fabric_empty_2d_array_sentinel uses shape (0, 0):

self._fabric_empty_2d_array_sentinel = wp.zeros((0, 0), dtype=wp.float32, device=self._device)

This is fine since the kernels gate on shape[0] > 0, but (0, 3) or (0, 4) might be slightly more self-documenting for unused position/quaternion slots.

📋 Architecture Notes

The asymmetric sync strategy is well-reasoned:

Write world → sync local immediately: _sync_local_from_world() runs right after world writes because downstream code (e.g., rendering) typically reads locals soon after
Write local → lazy world sync: _world_dirty flag defers the world = parent * local kernel until the next world read, avoiding unnecessary computation when multiple local writes occur before a world read

📋 CI Status

CI checks are currently pending. The pre-commit and changelog checks have passed.

📋 Verdict

LGTM — This is a significant feature addition that completes the Fabric acceleration story for FabricFrameView. The bidirectional sync logic is sound, test coverage is thorough, and the codebase is well-documented. The minor dtype observation is non-blocking.

Automated review by isaaclab-review-bot • Reviewed at 9ff31550

Update (d13ed99→7f1a012c): Reviewed large incremental batch (50+ files changed). This batch contains no changes to the core FabricFrameView implementation — the Fabric local-poses feature code is identical to the previously reviewed state.

Key changes in this batch (all outside the PR's core feature scope):

isaaclab_ppisp made fully optional: All packages (isaaclab_physx, isaaclab_newton, isaaclab_ov, isaaclab_ovphysx) removed hard isaaclab_ppisp from dependencies. Lazy try/except ModuleNotFoundError with _raise_missing_ppisp_error() helper provides actionable install hints only when isp_cfg is actually set. Clean pattern, consistently applied.
Cloner refactoring to ReplicateContext classes: physx_replicate, newton_physics_replicate, and ovphysx_replicate legacy functions now delegate to new PhysxReplicateContext, NewtonReplicateContext, and OvPhysxReplicateContext classes. Asset constructors call queue_*_replication(cfg) to register on REPLICATION_QUEUE. Well-structured — separates queueing from execution, enables batched replication.
MJWarp ls_parallel deprecated: Config field emits DeprecationWarning via __post_init__(), _build_solver() ignores it. Removed from all test configurations.
Newton manager lifecycle fix: close() now calls super().close() before cls.clear(), ensuring sensor invalidation callbacks fire while Newton state is still alive, preventing stale registrations leaking.
OvPhysX/OVRTX runtime guards: import_ovphysx() helper with actionable install messages, ovphysx==0.4.13 pin.
PhysxManager rigid body view: Now detects name collisions between rigid and non-rigid prims (e.g., JointWrenchSensor frame prim vs. rigid link with same name) and uses exact paths for ambiguous names.
Version bumps and changelog consolidation: All changelog fragment files moved into proper versioned CHANGELOG.rst entries.

Previous inline comments status:

⏳ P1 (Missing wp.synchronize() in indexed getter paths): Still present — no changes to fabric_frame_view.py in this batch. Remains non-blocking per previous assessment.
✅ P2 (Redundant kernel launch): Fixed in earlier commits (confirmed).

No new issues found in the incremental changes. All additions are well-implemented and follow established patterns. LGTM.

Automated incremental review by isaaclab-review-bot • Reviewed at 7f1a012c

Update (7f1a012→999cf3c6): Major architectural refactor reviewed.

Changes in this batch

The lazy dirty-flag mechanism (_DirtyFlag enum, _dirty field, _sync_*_if_dirty helpers, interleaved-write warning) has been removed entirely and replaced with a context-managed writer scope API (FrameViewSpaceWriterBase). This is a substantial and well-executed improvement:

New xform_space_writer() context manager: Opens a scoped write session ("world" or "local"). On __exit__, derives the opposite-space matrices once via a single Warp kernel and calls wp.synchronize() once. Clean, predictable, no more subtle interleaving bugs.
Single-active-writer invariant: Per-view lock prevents concurrent scopes — RuntimeError on double-entry. Well-tested.
Hierarchy listener pause/restore: _FabricWriterMixin._enter_impl() pauses track_local_xform_changes / track_world_xform_changes during writes, restoring prior state on exit. Correctly handles pre-paused listeners (no accidental re-enable).
Deprecated shims preserved: set_world_poses() / set_local_poses() remain as thin wrappers that open one-shot writer scopes internally + emit DeprecationWarning. Clean migration path.
set_world_scales / set_local_scales removed (breaking): Since these were introduced in this release cycle without stable downstream users, removal without deprecation is appropriate. Changelog documents it.
Test coverage: Comprehensive new tests for the writer contract — single-active invariant, derive-on-exit counting, empty-scope no-op, getter-raises-inside-scope, hierarchy tracking restore.

Previous comments status

✅ P2 (Redundant kernel launch): Fixed — the _sync_fabric_from_usd_initial() path now only composes localMatrix then calls _recompute_world_from_local_all(). The redundant world compose is gone.
⏳ P1 (Missing wp.synchronize() in indexed getter paths): Still present in the non-cached decompose paths. Remains non-blocking per previous assessment (callers on the same CUDA stream see correct ordering; only cross-stream or immediate .numpy() usage is affected).
✅ Minor observations (dtype, docstring, sentinel shape): These were non-blocking nits and remain unchanged. The sentinel shape (0, 0) is now less relevant since it is only used internally by the writer compose kernel.

New observations

No new issues found. The architectural shift from lazy dirty-flag tracking to eager derive-on-scope-exit is a clear improvement in correctness and readability. The _FabricWriterMixin pattern cleanly separates Fabric lifecycle management from the write logic. LGTM.

Automated incremental review by isaaclab-review-bot • Reviewed at 999cf3c6

pv-nvidia · 2026-05-22T13:11:21Z

Superseded by the consolidated PR #5728 (pv/fabric-full-stack).

…rent Implemented new tests to validate the behavior of world and local scale conversions in a hierarchy with a scaled parent. The tests ensure that setting local scales correctly computes world scales and vice versa, addressing USD-specific scale math not covered by existing tests.

Restores PR isaac-sim#5728's three-selection layout (`_trans_sel_ro`, `_world_sel_rw`, `_local_sel_rw`) with asymmetric Fabric access flags on `worldMatrix` and `localMatrix`. Those flags are what protect the user's write from being clobbered by Kit's per-tick `IFabricHierarchy.update_world_xforms`: * On `set_world_poses` (via `_world_sel_rw`, `localMatrix=RO`), Fabric does not recompute world from local -- the user's worldMatrix write survives until the renderer reads it. * On `set_local_poses` (via `_local_sel_rw`, `worldMatrix=RO`), Fabric recomputes world from the new local on the next tick -- the renderer reads the correct world. A single combined `worldMatrix=RW, localMatrix=RW` selection (the recent design on this branch) removed that protection. Fabric saw both attributes as user-authored and fell back to the hierarchy's canonical direction (local -> world), recomputing world from a stale local and silently overwriting the user's world write. That was the failure mode behind the `test_output_equal_to_usdcamera` regression and any other Camera + RTX path that drives world poses through Fabric. With the RO/RW protection back in place, the eager world<->local flushes introduced by commits "fix: flush Fabric world matrices after local writes" and the follow-up "set_world_poses eager local sync" are no longer needed and are removed. The `change_block` context manager and its companion helpers existed only to batch those eager flushes; with the flushes gone, the API has nothing to defer and is removed from both `BaseFrameView` and `FabricFrameView`. Class docstring now spells out the load-bearing role of the RO/RW layout so a future refactor doesn't reintroduce the single-selection shape. Tests: * Removed `test_set_local_*_updates_renderer_facing_fabric_world_matrix` (asserted an eager-update contract that the lazy design deliberately does not hold; correctness across the next render tick is provided by the RO/RW protection, not by an extra Warp kernel). * Removed the four `test_change_block_*` tests; the API is gone. * Inverted `test_interleaved_set_emits_no_warning` back to `test_interleaved_set_emits_warning`, restored `_dirty == LOCAL` assertions in `test_world_scales_roundtrip` and the symmetric `WORLD` assertion in `test_local_scales_roundtrip`, and updated `test_multi_view_per_view_dirty_isolation` to expect the lazy cross-view behavior. * Adapted `test_prepare_for_reuse_detects_topology_change` and `test_fabric_rebuild_after_topology_change` to poll/rebuild all three selections. Verified: * `pytest test_ray_caster_camera.py::test_output_equal_to_usdcamera` passes. * `pytest test_ray_caster_camera.py::test_output_equal_to_usd_camera_when_intrinsics_set` 4/4 pass. * `pytest test_views_xform_prim_fabric.py` 71 passed, 3 skipped (cuda:1). * `./isaaclab.sh -f` clean. Net diff -198 lines.

Replaces the four FrameView pose/scale setters with a single context-managed writer scope on every backend. The writer batches multiple writes (poses + scales) inside one scope, derives the opposite-space matrix once on ``__exit__``, and synchronizes once. Only one writer scope may be active per view at a time; view-level getters raise ``RuntimeError`` while a writer scope is active. API summary: with view.xform_space_writer("world") as w: w.set_poses(positions=p, orientations=o) w.set_scales(scales=s) # Derived (local) matrices are recomputed, the scope releases. Public classes (in ``isaaclab.sim.views``): * ``FrameViewSpaceWriterBase`` -- abstract base * ``FrameViewWorldSpaceWriter`` -- world-space tag class * ``FrameViewLocalSpaceWriter`` -- local-space tag class Behind the scenes: * ``BaseFrameView`` gains ``xform_space_writer()``, ``_active_writer``, and the abstract factory hooks ``_make_world_space_writer`` / ``_make_local_space_writer``. Public getters become guarded wrappers around new ``_get_*_impl`` backend hooks; they raise ``RuntimeError`` when a writer scope is active. * ``FabricFrameView`` ships ``_FabricWorldSpaceWriter`` and ``_FabricLocalSpaceWriter`` that pause ``IFabricHierarchy.track_local_xform_changes`` / ``track_world_xform_changes`` for the scope's lifetime (saving and restoring the prior state so we never re-enable a listener the caller had paused). Eager dual-write inside the scope means Kit's per-tick ``updateWorldXforms`` does not redundantly recompute matrices we just wrote. The renderer's independent ``omni:fabric:worldMatrix`` listener still observes the writes. The lazy-dirty mechanism is gone: the ``_DirtyFlag`` enum, ``_dirty`` field, ``_warned_interleaved_set`` field, the ``_sync_*_if_dirty`` helpers, and the one-time "interleaved set_world_poses / set_local_poses" warning are all deleted. The three-selection RO/RW layout is kept as a defensive layer and for clarity of authoring intent. * ``UsdFrameView`` / ``NewtonSiteFrameView`` / ``OvPhysxFrameView`` ship pass-through writers (their ``set_poses`` / ``set_scales`` immediately delegate to the backend's ``_apply_*_write`` helpers; ``__exit__`` is a no-op beyond releasing the single-writer lock). Setter deprecation / removal: * ``set_world_poses`` and ``set_local_poses`` are kept as one-time-warning shims on ``BaseFrameView`` that route through the writer internally. Use ``view.xform_space_writer("world" | "local")`` and ``w.set_poses(...)``. * ``set_world_scales`` and ``set_local_scales`` were introduced in this release cycle without external users and are removed outright (no deprecation). Use ``w.set_scales(...)`` inside a writer scope. * The existing ``set_scales`` deprecation shim is kept and now opens the backend-appropriate writer scope internally (Fabric: world, USD/OvPhysx: local). Migration: All 81 call sites across ``source/``, ``scripts/``, and the test suites are migrated to the new API in this commit so the repo's own code base raises no deprecation warnings. External callers on ``set_world_poses`` / ``set_local_poses`` / ``set_scales`` keep working (one warning per class on first call). Verification: * ``pytest source/isaaclab_physx/test/sim/test_views_xform_prim_fabric.py`` -- 81 passed, 3 skipped (cuda:1 multi-GPU). * ``pytest source/isaaclab/test/sensors/test_ray_caster_camera.py::test_output_equal_to_usdcamera`` -- the original Camera + RTX regression that motivated this arc, passes. * ``pytest source/isaaclab/test/sensors/test_ray_caster_camera.py::test_output_equal_to_usd_camera_when_intrinsics_set`` -- 4 parametrizations pass. * ``./isaaclab.sh -f`` -- clean. New test coverage for the writer contract (in ``test_views_xform_prim_fabric.py``): * world / local writer derives opposite space on exit * exactly one derive-kernel launch per scope (monkeypatched counter) regardless of how many set_poses / set_scales calls are made inside the scope * single-active-writer invariant raises ``RuntimeError`` * exit restores prior ``track_local_xform_changes`` / ``track_world_xform_changes`` state (does not re-enable listeners the caller had paused) * invalid space string raises ``ValueError`` * empty scope does not launch the derive kernel * view-level getters raise inside an active scope

…er() / xform_local_space_writer() Replace the single dispatch method that took a 'world'|'local' string with two explicit methods. More discoverable, better typed (each returns its concrete writer class), and removes the invalid-space ValueError path. Update all callsites (camera, USD/Fabric/Newton/OVPhysX backends, mimic), benchmarks, tests, and changelog fragments. Drop the now-meaningless test_writer_invalid_space_raises.

Previously FabricFrameView held three persistent PrimSelection handles with asymmetric RO/RW flags (trans_sel_ro = both-RO for reads, world_sel_rw = worldMatrix-RW, local_sel_rw = localMatrix-RW). That layout was the load-bearing protection against Kit's per-tick updateWorldXforms() before the writer scope started pausing IFabricHierarchy tracking. With tracking-pause in place, the asymmetric flags are no longer load-bearing. Reduce to two persistent selections: _sel_ro : worldMatrix=RO, localMatrix=RO steady state _sel_rw : worldMatrix=RW, localMatrix=RW inside writer scope Each selection owns its own bundle of indexed-fabric arrays. The writer scope flips a single _is_rw flag on enter/exit; both bundles stay alive for the view's lifetime, so no selection is rebuilt on the flip. PrepareForReuse() topology polls run independently per bundle. Net: -10 lines of code, -1 selection, -1 set of fabric_indices. The two whitebox tests that reached into the old triple are updated to poll the new pair. All 79 fabric-suite tests pass and the test_output_equal_to_usdcamera camera+RTX regression that motivated this PR still passes.

When a FabricFrameView writer scope unwinds via an exception (a Python error in user code, a notebook cell interrupt, etc.), the opposite-space derive + wp.synchronize() now runs anyway so worldMatrix and localMatrix remain mutually consistent prim-by-prim on whatever partial-write state Fabric currently holds. The partial write itself is not rolled back -- callers needing transactional semantics should snapshot the matrices themselves before entering the scope. The recovery launch is itself wrapped in try/except: if it fails (e.g. the original exception came from a poisoned CUDA stream), the recovery error is logged and the original exception propagates -- masking it would hide the actual root cause. Hierarchy-tracking restore and the _is_rw flip happen in finally as before. Add a regression test that raises inside a writer scope and verifies: - tracking state restored - _is_rw back to False - world matrices reflect the partial write - local matrices derived from that partial-write world - the view is still usable for further writer scopes

The previous wording suggested IFabricHierarchy.update_world_xforms() could fire interleaved with our writes (a "between our write and the renderer's read" race). That cannot happen: the scope is synchronous Python code and no simulation step or render tick runs while it is open. The risk the tracking pause and the RO steady-state selection actually defend against is the next render/sim tick after the scope exits, not the scope itself. Update FabricFrameView's class docstring to reflect this, and add an explicit contract to FrameViewSpaceWriterBase: callers must not advance the simulation or render from inside a scope, because the matrices may be mid-write until exit and rendering would read torn data.

Previous wording suggested the pause was about keeping the renderer from seeing half-written data. That is wrong: the scope is synchronous Python code, so the renderer cannot run mid-scope and cannot see torn state. The "torn data" concern only motivates the separate rule that callers must not advance the simulation from inside the scope. What the pause actually prevents is Fabric updating the dependent xforms itself, in two ways the scope wants to keep exclusive: 1. Per-write, inside the scope -- the tracker would fire synchronously on each set_* and propagate to the opposite-space matrix (and to descendants, where applicable). 2. On the next render tick -- update_world_xforms() would replay queued tracker work and potentially derive the wrong direction (e.g. recompute world from local even though we just wrote world). We do the dependent-xform update ourselves in one batched kernel at scope exit, so the pause stops Fabric from doing the same work redundantly during the scope and incorrectly after it.

Fabric change tracking is pull-based: a per-attribute listener records writes into a private changelog, and Kit drains and processes that changelog on the next call to IFabricHierarchy::update_world_xforms() (typically from the render path). "Tracking off" just stops the listener from recording new entries -- writes still land in Fabric storage; they are simply invisible to the next update_world_xforms() call. Previous docstring claimed the pause prevented "per-write synchronous propagation through the tracker". There is no such synchronous path: nothing fires until the next update_world_xforms() tick. Rewrite the bullet to describe the actual mechanism, and explain why an empty changelog at scope exit is the property we need (the next tick has no entries to process for our prims and therefore can't pick a canonical direction and derive the other from it). Source: kit/runtime/source/plugins/usdrt.hierarchy/FabricHierarchy.cpp - trackLocalXformChanges_abi -> pauseChangeTracking/resumeChangeTracking - updateWorldXforms_abi reads getChanges()/popChanges() on each listener

Fabric is just the flat attribute data store. The plugin that owns the changelog listeners, the track_*_xform_changes toggles, and the update_world_xforms() step is Fabric Hierarchy (usdrt::hierarchy:: IFabricHierarchy), not Fabric itself. The renderer reads Fabric attributes through the Fabric Scene Delegate (FSD), not via some generic "renderer worldMatrix listener". Rework the writer-scope's listener bullet to use these names precisely: - the tracking pause and the changelog belong to Fabric Hierarchy - update_world_xforms() is a method on Fabric Hierarchy, not on "Kit" - FSD reads omni:fabric:worldMatrix from Fabric storage on the render path The IsaacLab project-level shorthand "Fabric backend" / FabricFrameView / _use_fabric is unchanged -- those are API labels, not technical claims.

Changelog fragments are read against develop, not against an in-branch interim state. Remove every entry that describes a feature added then removed within this PR -- net effect vs develop is zero, and mentioning the round-trip only confuses reviewers. Specifically: - isaaclab/xform-space-writer: drop the "Removed: set_world_scales / set_local_scales" section. Those methods were added in fabric-local-poses and removed here; they never existed on develop. - isaaclab/fabric-local-poses: drop set_world_scales / set_local_scales from the Added list for the same reason; keep only the surviving getters and route scale writes via the writer scope. - isaaclab_physx/fabric-local-poses: drop the "lazy dirty tracking" and "interleave detection" entries; both mechanisms were added and then removed by the writer-scope migration. - isaaclab_physx/xform-space-writer: drop the paragraph that lists the removed _DirtyFlag enum / _sync_*_if_dirty helpers. Also switch the remaining prose to the correct Omniverse terminology -- Fabric Hierarchy owns update_world_xforms() and the change tracking; FSD feeds the renderer. - isaaclab_newton + isaaclab_ovphysx/fabric-local-poses: drop set_world_scales / set_local_scales from the Added/Deprecated lists. Also include the prior wording fix "Will be used by" -> "Used by" in the utils docs (the kernels are now in use). While I'm in this file, drop the deprecation-replacement clauses that told users to use set_world_scales / set_local_scales -- those methods don't exist; the writer scope is the right pointer. Plus: add a comment in scripts/benchmarks/benchmark_xform_prim_view.py explaining why each wp.clone() now goes through .warp (ProxyArray introduced by PR isaac-sim#5304 changed the FrameView getter return type).

The benchmark mixes a one-time view-construction cost (`init`) with per-iteration steady-state ops in the same dict. The Total row and Overall speedup were both summing every value in that dict, so a backend whose first call materializes the view (NewtonSiteFrameView took ~1 s of stage population in the reported run) crushed the per-iter ops in the totals and printed a misleading 0.13x overall. Separate the two: keep Initialization in the per-op table for visibility, but compute Total and Overall over the per-iter operations only. The new label makes the scope explicit -- "Per-iter total" and "SPEEDUP vs USD (per-iter ops; one-time init excluded)". With the user's Blackwell RTX PRO 5000 numbers, this turns the nonsensical 0.13x Newton overall into 207x and the dampened 25x Fabric overall into 164x.

Tests should describe and exercise the CURRENT API, not history. Removed - test_set_world_scales_method_no_longer_exists (whitebox assertion that two never-shipped attributes are absent from BaseFrameView). Renamed + docstrings rewritten (names and docstrings referenced ``set_world_scales`` / ``set_local_scales`` which do not exist; the test bodies already use the writer scope): - frame_view_contract_utils.py: test_set_local_scales_roundtrip -> test_local_scales_roundtrip test_set_world_scales_roundtrip -> test_world_scales_roundtrip - test_views_xform_prim.py: test_set_local_scales_then_get_world_scales -> test_world_scale_composes_with_parent_scale test_set_world_scales_then_get_local_scales -> test_local_scale_inverts_parent_when_writing_world_scale Stale docstring references to the removed dirty-tracking and the deprecated set_*_scales API also corrected in three fabric tests (test_local_scales_roundtrip, test_world_scales_roundtrip, test_fabric_cuda1_scales_roundtrip, test_initial_seed_with_scaled_parent).

FabricFrameView's getters (_get_world_poses_impl, _get_local_poses_impl, and the shared _decompose_scales for both scale getters) only called wp.synchronize() on the cached path (indices is None / slice(None)). The per-indices path returned a fresh ProxyArray without syncing, so a caller that did .numpy() or wp.to_torch().cpu() on it could observe zeros from the wp.zeros initializer (the kernel hadn't completed). Move the sync below the kernel launch and call it unconditionally. This: - removes the asymmetry between the cached and the indexed paths, - matches what the class docstring claims about getters being immediately readable, - is essentially free for the cached path (was already syncing) and cheap for the indexed path (the kernel itself is the dominant cost). Update the class docstring to describe the new behaviour: every getter launches its decompose kernel and synchronizes before returning, so the returned ProxyArray is immediately readable from GPU or host without any caller-side sync. Reported by Greptile on PR isaac-sim#5677.

_compute_fabric_indices and _compute_parent_fabric_indices both walked selection.GetPaths(), built the same path->index dict, then iterated self.prim_paths to look up either the prim itself or its parent. The selection-walk + dict + lookup loop is exactly what _compute_fabric_indices_for(selection, paths) already does for one-off index arrays. Have the two specialised helpers delegate to it. The parent variant keeps its stage-root precondition inline (where it reads naturally, next to the rsplit) and builds the parent path list before calling the shared primitive. The child variant is a one-line passthrough. No behavioural change: same selection walks, same indices, same ordering. Saves ~18 lines and removes a duplicated dict construction. The shared primitive's docstring is updated to reflect its new role as the canonical lookup, no longer just a one-shot.

Two helpers were building lists with .append() inside an imperative for-loop: - _compute_fabric_indices_for: walked paths -> looked up indices, raised on miss, appended to a list. - _compute_parent_fabric_indices: walked self.prim_paths -> derived parent path, validated stage-root, appended to a list. Factor the per-element step into a named local function whose body documents the intent (lookup + validation, parent derivation + validation), then drive it from a list comprehension. Same control flow, denser at the call site, the loop body's purpose is now a one-token name.

The world-body fallback in NewtonSiteFrameView._resolve_source_prim resolved the reference prim directly from source_root, instead of stripping the clone-template suffix as the prior code did. For heterogeneous (multi-asset spawner) scenes this resolved the wrong reference frame, offsetting site world poses and breaking rendering-correctness (dexsuite kuka hetero kitless). Restore the split_clone_template/get_suffix-based ref_path computation (and the dropped imports). Homogeneous scenes were unaffected; hetero rgb/depth now match golden images again.

github-actions Bot added documentation Improvements or additions to documentation isaac-lab Related to Isaac Lab team labels May 18, 2026

isaaclab-review-bot Bot reviewed May 18, 2026

View reviewed changes

pv-nvidia force-pushed the pv/fabric-local-poses branch 4 times, most recently from 80838c0 to 9ff3155 Compare May 20, 2026 14:11

isaaclab-review-bot Bot reviewed May 20, 2026

View reviewed changes

pv-nvidia mentioned this pull request May 21, 2026

feat: Full Fabric acceleration stack — local poses, stage cache, fused compose #5728

Draft

pv-nvidia closed this May 22, 2026

pv-nvidia reopened this May 22, 2026

pv-nvidia force-pushed the pv/fabric-local-poses branch 12 times, most recently from 4685420 to 6b56971 Compare May 25, 2026 17:24

pv-nvidia marked this pull request as ready for review May 25, 2026 18:05

pv-nvidia requested review from Mayankm96, jtigue-bdai, kellyguo11, ooctipus and pascal-roth as code owners May 25, 2026 18:05

pv-nvidia and others added 28 commits June 24, 2026 10:05

test: fix FrameView scale contracts

025eea7

style: format FrameView scale contract test

7e82bd1

bench: include FrameView scale operations

fc9f12c

fix: preserve ProxyArray get_scales contract

1566c63

docs: remove ProxyArray scale return attribution

ead581b

fix: preserve Newton legacy scale semantics

57fd126

style: format Newton scale type annotation

96195d7

fix: flush Fabric world matrices after local writes

c109f6a

style: apply pre-commit formatting

2bc2c64

feat: configure Fabric change-block matrix flushes

2966561

docs: clarify Fabric getter synchronization

264baf5

docs: describe Fabric selection layout as end state in changelog

7ce0f0f

pv-nvidia force-pushed the pv/fabric-local-poses branch from b5bcd15 to 70f1400 Compare June 24, 2026 10:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: GPU-accelerated FrameView local poses + a new xform_space_writer API#5677

feat: GPU-accelerated FrameView local poses + a new xform_space_writer API#5677
pv-nvidia wants to merge 54 commits into
isaac-sim:developfrom
pv-nvidia:pv/fabric-local-poses

pv-nvidia commented May 18, 2026 •

edited

Loading

Uh oh!

isaaclab-review-bot Bot left a comment •

edited

Loading

Uh oh!

isaaclab-review-bot Bot left a comment •

edited

Loading

Uh oh!

pv-nvidia commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

pv-nvidia commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fast GPU pose/scale ops on Fabric + a new writer-scope API

What this does

Why a scope?

Why we have to coordinate with Fabric Hierarchy (renderer correctness)

Backend support

API changes

Benchmark (1024 prims, 50 iterations, Blackwell RTX PRO 5000)

Notes for reviewers

Uh oh!

isaaclab-review-bot Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

🔍 Code Review Update

Summary

✅ New Additions Since Last Review

🔧 Remaining Observations

📋 Architecture Notes

📋 Verdict

Uh oh!

isaaclab-review-bot Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

🔍 Code Review Update

Summary

✅ Strengths

🔧 Minor Observations

📋 Architecture Notes

📋 CI Status

📋 Verdict

Changes in this batch

Previous comments status

New observations

Uh oh!

pv-nvidia commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pv-nvidia commented May 18, 2026 •

edited

Loading

isaaclab-review-bot Bot left a comment •

edited

Loading

isaaclab-review-bot Bot left a comment •

edited

Loading