feat: GPU-accelerated FrameView local poses + a new xform_space_writer API#5677
feat: GPU-accelerated FrameView local poses + a new xform_space_writer API#5677pv-nvidia wants to merge 54 commits into
Conversation
There was a problem hiding this comment.
🔍 Code Review Update
Review of new commits: PR was rebased and expanded since last review.
Previously reviewed:b15d6235| Now reviewing:80838c00
Summary
The PR has been expanded from 5 to 6 commits, adding Fabric-accelerated get/set_local_poses:
- Service locator infrastructure on
SimulationContext - Service locator tests + changelog
- Indexed Fabric transform kernels in
isaaclab.utils.warp.fabric FabricStageCacheas a shared hierarchy handle- (NEW) Merge commit to consolidate branches
- (NEW)
FabricFrameViewrewrite withget/set_local_poses+ dirty tracking
✅ New Additions Since Last Review
- Fabric-accelerated local poses:
set_local_poses/get_local_posesnow usewp.indexedfabricarrayto read/writeomni:fabric:localMatrixdirectly on the GPU - Bidirectional world↔local sync:
set_world_poses→ recomputes localMatrix via_sync_local_from_world()set_local_poses→ marks_world_dirty, world recomputed on nextget_world_poses
- Per-view dirty tracking:
_world_dirtyflag is instance-scoped, so concurrent views on the same stage don't clear each other's flag - Parent matrix handling:
_build_parent_indexed_array()+_compute_parent_fabric_indices()for parent world matrix lookups - Topology-adaptive:
PrepareForReuse()calls +_rebuild_trans_ro_arrays()for automatic recovery - Comprehensive tests: 13 new integration tests covering local/world consistency, rotated/scaled parents, multi-view isolation
🔧 Remaining Observations
[Minor] Index array dtype mismatch still present
The kernels declare indices: ArrayUInt32, but _compute_fabric_indices() returns dtype=wp.int32:
# fabric_frame_view.py
return wp.array(indices, dtype=wp.int32, device=self._device)Warp will silently cast, so this works in practice. Suggestion: switch to dtype=wp.uint32 for consistency with the kernel signatures. Not blocking.
[Minor] Undefined buffer references in get_local_poses
get_local_poses references self._fabric_local_translations_buf and self._fabric_local_orientations_buf:
if use_cached:
translations_wp = self._fabric_local_translations_buf
orientations_wp = self._fabric_local_orientations_bufThese don't appear to be initialized in _initialize_fabric(). Verify these buffers are created alongside the existing world-pose buffers.
📋 Architecture Notes
The world↔local propagation design is clean:
- Write world → update local:
_sync_local_from_world()runsupdate_indexed_local_matrix_from_worldkernel immediately after world writes - Write local → lazy world update:
_world_dirtyflag defersupdate_indexed_world_matrix_from_localuntil the next world read
This asymmetry makes sense: world writes are typically followed by physics steps (which don't need locals), while local writes are often followed by world reads for rendering.
📋 Verdict
LGTM — the new local-pose acceleration is a significant feature addition. The bidirectional sync logic is well-designed, and the test coverage is comprehensive (33 test functions across all new modules). The minor dtype observation from the previous review remains, plus one potential undefined buffer issue to double-check.
Automated review by isaaclab-review-bot • Reviewed at 80838c00
80838c0 to
9ff3155
Compare
There was a problem hiding this comment.
🔍 Code Review Update
Review of new commits: PR was rebased and expanded since last review.
Previously reviewed:eb5582ec| Now reviewing:9ff3155
Summary
This PR delivers a well-architected feature: GPU-accelerated local-pose operations for FabricFrameView. The implementation is comprehensive and addresses the core limitation where local poses previously fell back to USD round-trips.
✅ Strengths
1. Clean Architecture
FabricStageCacheprovides shared hierarchy handles via the service locator pattern, avoiding per-view duplication- Three persistent selections (
trans_sel_ro,world_sel_rw,local_sel_rw) cleanly separate read vs. write access patterns - Factory dispatch in
FrameView._get_backend()correctly routes toUsdFrameViewwhen Fabric is unavailable
2. Robust World↔Local Consistency
- Bidirectional dirty tracking:
set_local_posesmarks_world_dirty, deferred until next world read _sync_local_from_world()/_sync_world_from_local_if_dirty()keep matrices consistent- Per-view dirty flags prevent concurrent views from clearing each other's pending syncs
3. Topology-Adaptive Design
PrepareForReuse()+ lazy array rebuild in_get_*_array()handles Fabric memory layout changes_rebuild_trans_ro_arrays()consolidates index and indexed-array refresh
4. Excellent Test Coverage
- 13+ new integration tests covering rotated parents, scaled parents, multi-view isolation
test_set_local_then_get_world_with_rotated_parentvalidates transpose-convention correctnesstest_multi_view_per_view_dirty_isolationcatches per-stage vs. per-view flag bugs
🔧 Minor Observations
[Minor] Index array dtype mismatch
_compute_fabric_indices() returns dtype=wp.int32, but kernels declare indices: ArrayUInt32:
return wp.array(indices, dtype=wp.int32, device=self._device)Warp silently casts, so this works. Suggestion: use dtype=wp.uint32 for consistency with kernel signatures. Not blocking.
[Nit] Docstring transpose-convention note
The docstrings for update_indexed_local_matrix_from_world and update_indexed_world_matrix_from_local explain the transpose identity well. Consider adding a brief note that this relies on Fabric's row-major storage convention for future maintainers.
[Style] Empty sentinel shape
_fabric_empty_2d_array_sentinel uses shape (0, 0):
self._fabric_empty_2d_array_sentinel = wp.zeros((0, 0), dtype=wp.float32, device=self._device)This is fine since the kernels gate on shape[0] > 0, but (0, 3) or (0, 4) might be slightly more self-documenting for unused position/quaternion slots.
📋 Architecture Notes
The asymmetric sync strategy is well-reasoned:
- Write world → sync local immediately:
_sync_local_from_world()runs right after world writes because downstream code (e.g., rendering) typically reads locals soon after - Write local → lazy world sync:
_world_dirtyflag defers theworld = parent * localkernel until the next world read, avoiding unnecessary computation when multiple local writes occur before a world read
📋 CI Status
CI checks are currently pending. The pre-commit and changelog checks have passed.
📋 Verdict
LGTM — This is a significant feature addition that completes the Fabric acceleration story for FabricFrameView. The bidirectional sync logic is sound, test coverage is thorough, and the codebase is well-documented. The minor dtype observation is non-blocking.
Automated review by isaaclab-review-bot • Reviewed at 9ff31550
Update (d13ed99→7f1a012c): Reviewed large incremental batch (50+ files changed). This batch contains no changes to the core FabricFrameView implementation — the Fabric local-poses feature code is identical to the previously reviewed state.
Key changes in this batch (all outside the PR's core feature scope):
-
isaaclab_ppispmade fully optional: All packages (isaaclab_physx,isaaclab_newton,isaaclab_ov,isaaclab_ovphysx) removed hardisaaclab_ppispfromdependencies. Lazytry/except ModuleNotFoundErrorwith_raise_missing_ppisp_error()helper provides actionable install hints only whenisp_cfgis actually set. Clean pattern, consistently applied. -
Cloner refactoring to
ReplicateContextclasses:physx_replicate,newton_physics_replicate, andovphysx_replicatelegacy functions now delegate to newPhysxReplicateContext,NewtonReplicateContext, andOvPhysxReplicateContextclasses. Asset constructors callqueue_*_replication(cfg)to register onREPLICATION_QUEUE. Well-structured — separates queueing from execution, enables batched replication. -
MJWarp
ls_paralleldeprecated: Config field emitsDeprecationWarningvia__post_init__(),_build_solver()ignores it. Removed from all test configurations. -
Newton manager lifecycle fix:
close()now callssuper().close()beforecls.clear(), ensuring sensor invalidation callbacks fire while Newton state is still alive, preventing stale registrations leaking. -
OvPhysX/OVRTX runtime guards:
import_ovphysx()helper with actionable install messages,ovphysx==0.4.13pin. -
PhysxManager rigid body view: Now detects name collisions between rigid and non-rigid prims (e.g.,
JointWrenchSensorframe prim vs. rigid link with same name) and uses exact paths for ambiguous names. -
Version bumps and changelog consolidation: All changelog fragment files moved into proper versioned CHANGELOG.rst entries.
Previous inline comments status:
- ⏳ P1 (Missing
wp.synchronize()in indexed getter paths): Still present — no changes tofabric_frame_view.pyin this batch. Remains non-blocking per previous assessment. - ✅ P2 (Redundant kernel launch): Fixed in earlier commits (confirmed).
No new issues found in the incremental changes. All additions are well-implemented and follow established patterns. LGTM.
Automated incremental review by isaaclab-review-bot • Reviewed at 7f1a012c
Update (7f1a012→999cf3c6): Major architectural refactor reviewed.
Changes in this batch
The lazy dirty-flag mechanism (_DirtyFlag enum, _dirty field, _sync_*_if_dirty helpers, interleaved-write warning) has been removed entirely and replaced with a context-managed writer scope API (FrameViewSpaceWriterBase). This is a substantial and well-executed improvement:
-
New
xform_space_writer()context manager: Opens a scoped write session ("world"or"local"). On__exit__, derives the opposite-space matrices once via a single Warp kernel and callswp.synchronize()once. Clean, predictable, no more subtle interleaving bugs. -
Single-active-writer invariant: Per-view lock prevents concurrent scopes —
RuntimeErroron double-entry. Well-tested. -
Hierarchy listener pause/restore:
_FabricWriterMixin._enter_impl()pausestrack_local_xform_changes/track_world_xform_changesduring writes, restoring prior state on exit. Correctly handles pre-paused listeners (no accidental re-enable). -
Deprecated shims preserved:
set_world_poses()/set_local_poses()remain as thin wrappers that open one-shot writer scopes internally + emitDeprecationWarning. Clean migration path. -
set_world_scales/set_local_scalesremoved (breaking): Since these were introduced in this release cycle without stable downstream users, removal without deprecation is appropriate. Changelog documents it. -
Test coverage: Comprehensive new tests for the writer contract — single-active invariant, derive-on-exit counting, empty-scope no-op, getter-raises-inside-scope, hierarchy tracking restore.
Previous comments status
- ✅ P2 (Redundant kernel launch): Fixed — the
_sync_fabric_from_usd_initial()path now only composes localMatrix then calls_recompute_world_from_local_all(). The redundant world compose is gone. - ⏳ P1 (Missing
wp.synchronize()in indexed getter paths): Still present in the non-cached decompose paths. Remains non-blocking per previous assessment (callers on the same CUDA stream see correct ordering; only cross-stream or immediate.numpy()usage is affected). - ✅ Minor observations (dtype, docstring, sentinel shape): These were non-blocking nits and remain unchanged. The sentinel shape
(0, 0)is now less relevant since it is only used internally by the writer compose kernel.
New observations
No new issues found. The architectural shift from lazy dirty-flag tracking to eager derive-on-scope-exit is a clear improvement in correctness and readability. The _FabricWriterMixin pattern cleanly separates Fabric lifecycle management from the write logic. LGTM.
Automated incremental review by isaaclab-review-bot • Reviewed at 999cf3c6
|
Superseded by the consolidated PR #5728 (pv/fabric-full-stack). |
4685420 to
6b56971
Compare
…rent Implemented new tests to validate the behavior of world and local scale conversions in a hierarchy with a scaled parent. The tests ensure that setting local scales correctly computes world scales and vice versa, addressing USD-specific scale math not covered by existing tests.
Restores PR isaac-sim#5728's three-selection layout (`_trans_sel_ro`, `_world_sel_rw`, `_local_sel_rw`) with asymmetric Fabric access flags on `worldMatrix` and `localMatrix`. Those flags are what protect the user's write from being clobbered by Kit's per-tick `IFabricHierarchy.update_world_xforms`: * On `set_world_poses` (via `_world_sel_rw`, `localMatrix=RO`), Fabric does not recompute world from local -- the user's worldMatrix write survives until the renderer reads it. * On `set_local_poses` (via `_local_sel_rw`, `worldMatrix=RO`), Fabric recomputes world from the new local on the next tick -- the renderer reads the correct world. A single combined `worldMatrix=RW, localMatrix=RW` selection (the recent design on this branch) removed that protection. Fabric saw both attributes as user-authored and fell back to the hierarchy's canonical direction (local -> world), recomputing world from a stale local and silently overwriting the user's world write. That was the failure mode behind the `test_output_equal_to_usdcamera` regression and any other Camera + RTX path that drives world poses through Fabric. With the RO/RW protection back in place, the eager world<->local flushes introduced by commits "fix: flush Fabric world matrices after local writes" and the follow-up "set_world_poses eager local sync" are no longer needed and are removed. The `change_block` context manager and its companion helpers existed only to batch those eager flushes; with the flushes gone, the API has nothing to defer and is removed from both `BaseFrameView` and `FabricFrameView`. Class docstring now spells out the load-bearing role of the RO/RW layout so a future refactor doesn't reintroduce the single-selection shape. Tests: * Removed `test_set_local_*_updates_renderer_facing_fabric_world_matrix` (asserted an eager-update contract that the lazy design deliberately does not hold; correctness across the next render tick is provided by the RO/RW protection, not by an extra Warp kernel). * Removed the four `test_change_block_*` tests; the API is gone. * Inverted `test_interleaved_set_emits_no_warning` back to `test_interleaved_set_emits_warning`, restored `_dirty == LOCAL` assertions in `test_world_scales_roundtrip` and the symmetric `WORLD` assertion in `test_local_scales_roundtrip`, and updated `test_multi_view_per_view_dirty_isolation` to expect the lazy cross-view behavior. * Adapted `test_prepare_for_reuse_detects_topology_change` and `test_fabric_rebuild_after_topology_change` to poll/rebuild all three selections. Verified: * `pytest test_ray_caster_camera.py::test_output_equal_to_usdcamera` passes. * `pytest test_ray_caster_camera.py::test_output_equal_to_usd_camera_when_intrinsics_set` 4/4 pass. * `pytest test_views_xform_prim_fabric.py` 71 passed, 3 skipped (cuda:1). * `./isaaclab.sh -f` clean. Net diff -198 lines.
Replaces the four FrameView pose/scale setters with a single
context-managed writer scope on every backend. The writer batches
multiple writes (poses + scales) inside one scope, derives the
opposite-space matrix once on ``__exit__``, and synchronizes once.
Only one writer scope may be active per view at a time; view-level
getters raise ``RuntimeError`` while a writer scope is active.
API summary:
with view.xform_space_writer("world") as w:
w.set_poses(positions=p, orientations=o)
w.set_scales(scales=s)
# Derived (local) matrices are recomputed, the scope releases.
Public classes (in ``isaaclab.sim.views``):
* ``FrameViewSpaceWriterBase`` -- abstract base
* ``FrameViewWorldSpaceWriter`` -- world-space tag class
* ``FrameViewLocalSpaceWriter`` -- local-space tag class
Behind the scenes:
* ``BaseFrameView`` gains ``xform_space_writer()``,
``_active_writer``, and the abstract factory hooks
``_make_world_space_writer`` / ``_make_local_space_writer``.
Public getters become guarded wrappers around new
``_get_*_impl`` backend hooks; they raise ``RuntimeError`` when a
writer scope is active.
* ``FabricFrameView`` ships ``_FabricWorldSpaceWriter`` and
``_FabricLocalSpaceWriter`` that pause
``IFabricHierarchy.track_local_xform_changes`` /
``track_world_xform_changes`` for the scope's lifetime (saving
and restoring the prior state so we never re-enable a listener
the caller had paused). Eager dual-write inside the scope means
Kit's per-tick ``updateWorldXforms`` does not redundantly
recompute matrices we just wrote. The renderer's independent
``omni:fabric:worldMatrix`` listener still observes the writes.
The lazy-dirty mechanism is gone: the ``_DirtyFlag`` enum,
``_dirty`` field, ``_warned_interleaved_set`` field, the
``_sync_*_if_dirty`` helpers, and the one-time
"interleaved set_world_poses / set_local_poses" warning are all
deleted. The three-selection RO/RW layout is kept as a
defensive layer and for clarity of authoring intent.
* ``UsdFrameView`` / ``NewtonSiteFrameView`` /
``OvPhysxFrameView`` ship pass-through writers (their
``set_poses`` / ``set_scales`` immediately delegate to the
backend's ``_apply_*_write`` helpers; ``__exit__`` is a no-op
beyond releasing the single-writer lock).
Setter deprecation / removal:
* ``set_world_poses`` and ``set_local_poses`` are kept as
one-time-warning shims on ``BaseFrameView`` that route through
the writer internally. Use ``view.xform_space_writer("world" |
"local")`` and ``w.set_poses(...)``.
* ``set_world_scales`` and ``set_local_scales`` were introduced in
this release cycle without external users and are removed
outright (no deprecation). Use ``w.set_scales(...)`` inside a
writer scope.
* The existing ``set_scales`` deprecation shim is kept and now
opens the backend-appropriate writer scope internally
(Fabric: world, USD/OvPhysx: local).
Migration:
All 81 call sites across ``source/``, ``scripts/``, and the test
suites are migrated to the new API in this commit so the repo's
own code base raises no deprecation warnings. External callers
on ``set_world_poses`` / ``set_local_poses`` / ``set_scales``
keep working (one warning per class on first call).
Verification:
* ``pytest source/isaaclab_physx/test/sim/test_views_xform_prim_fabric.py``
-- 81 passed, 3 skipped (cuda:1 multi-GPU).
* ``pytest source/isaaclab/test/sensors/test_ray_caster_camera.py::test_output_equal_to_usdcamera``
-- the original Camera + RTX regression that motivated this
arc, passes.
* ``pytest source/isaaclab/test/sensors/test_ray_caster_camera.py::test_output_equal_to_usd_camera_when_intrinsics_set``
-- 4 parametrizations pass.
* ``./isaaclab.sh -f`` -- clean.
New test coverage for the writer contract (in
``test_views_xform_prim_fabric.py``):
* world / local writer derives opposite space on exit
* exactly one derive-kernel launch per scope (monkeypatched
counter) regardless of how many set_poses / set_scales calls
are made inside the scope
* single-active-writer invariant raises ``RuntimeError``
* exit restores prior ``track_local_xform_changes`` /
``track_world_xform_changes`` state (does not re-enable
listeners the caller had paused)
* invalid space string raises ``ValueError``
* empty scope does not launch the derive kernel
* view-level getters raise inside an active scope
…er() / xform_local_space_writer() Replace the single dispatch method that took a 'world'|'local' string with two explicit methods. More discoverable, better typed (each returns its concrete writer class), and removes the invalid-space ValueError path. Update all callsites (camera, USD/Fabric/Newton/OVPhysX backends, mimic), benchmarks, tests, and changelog fragments. Drop the now-meaningless test_writer_invalid_space_raises.
Previously FabricFrameView held three persistent PrimSelection handles with asymmetric RO/RW flags (trans_sel_ro = both-RO for reads, world_sel_rw = worldMatrix-RW, local_sel_rw = localMatrix-RW). That layout was the load-bearing protection against Kit's per-tick updateWorldXforms() before the writer scope started pausing IFabricHierarchy tracking. With tracking-pause in place, the asymmetric flags are no longer load-bearing. Reduce to two persistent selections: _sel_ro : worldMatrix=RO, localMatrix=RO steady state _sel_rw : worldMatrix=RW, localMatrix=RW inside writer scope Each selection owns its own bundle of indexed-fabric arrays. The writer scope flips a single _is_rw flag on enter/exit; both bundles stay alive for the view's lifetime, so no selection is rebuilt on the flip. PrepareForReuse() topology polls run independently per bundle. Net: -10 lines of code, -1 selection, -1 set of fabric_indices. The two whitebox tests that reached into the old triple are updated to poll the new pair. All 79 fabric-suite tests pass and the test_output_equal_to_usdcamera camera+RTX regression that motivated this PR still passes.
When a FabricFrameView writer scope unwinds via an exception (a Python error in user code, a notebook cell interrupt, etc.), the opposite-space derive + wp.synchronize() now runs anyway so worldMatrix and localMatrix remain mutually consistent prim-by-prim on whatever partial-write state Fabric currently holds. The partial write itself is not rolled back -- callers needing transactional semantics should snapshot the matrices themselves before entering the scope. The recovery launch is itself wrapped in try/except: if it fails (e.g. the original exception came from a poisoned CUDA stream), the recovery error is logged and the original exception propagates -- masking it would hide the actual root cause. Hierarchy-tracking restore and the _is_rw flip happen in finally as before. Add a regression test that raises inside a writer scope and verifies: - tracking state restored - _is_rw back to False - world matrices reflect the partial write - local matrices derived from that partial-write world - the view is still usable for further writer scopes
The previous wording suggested IFabricHierarchy.update_world_xforms() could fire interleaved with our writes (a "between our write and the renderer's read" race). That cannot happen: the scope is synchronous Python code and no simulation step or render tick runs while it is open. The risk the tracking pause and the RO steady-state selection actually defend against is the next render/sim tick after the scope exits, not the scope itself. Update FabricFrameView's class docstring to reflect this, and add an explicit contract to FrameViewSpaceWriterBase: callers must not advance the simulation or render from inside a scope, because the matrices may be mid-write until exit and rendering would read torn data.
Previous wording suggested the pause was about keeping the renderer
from seeing half-written data. That is wrong: the scope is synchronous
Python code, so the renderer cannot run mid-scope and cannot see torn
state. The "torn data" concern only motivates the separate rule that
callers must not advance the simulation from inside the scope.
What the pause actually prevents is Fabric updating the dependent xforms
itself, in two ways the scope wants to keep exclusive:
1. Per-write, inside the scope -- the tracker would fire synchronously
on each set_* and propagate to the opposite-space matrix (and to
descendants, where applicable).
2. On the next render tick -- update_world_xforms() would replay queued
tracker work and potentially derive the wrong direction (e.g.
recompute world from local even though we just wrote world).
We do the dependent-xform update ourselves in one batched kernel at
scope exit, so the pause stops Fabric from doing the same work
redundantly during the scope and incorrectly after it.
Fabric change tracking is pull-based: a per-attribute listener records writes into a private changelog, and Kit drains and processes that changelog on the next call to IFabricHierarchy::update_world_xforms() (typically from the render path). "Tracking off" just stops the listener from recording new entries -- writes still land in Fabric storage; they are simply invisible to the next update_world_xforms() call. Previous docstring claimed the pause prevented "per-write synchronous propagation through the tracker". There is no such synchronous path: nothing fires until the next update_world_xforms() tick. Rewrite the bullet to describe the actual mechanism, and explain why an empty changelog at scope exit is the property we need (the next tick has no entries to process for our prims and therefore can't pick a canonical direction and derive the other from it). Source: kit/runtime/source/plugins/usdrt.hierarchy/FabricHierarchy.cpp - trackLocalXformChanges_abi -> pauseChangeTracking/resumeChangeTracking - updateWorldXforms_abi reads getChanges()/popChanges() on each listener
Fabric is just the flat attribute data store. The plugin that owns the changelog listeners, the track_*_xform_changes toggles, and the update_world_xforms() step is Fabric Hierarchy (usdrt::hierarchy:: IFabricHierarchy), not Fabric itself. The renderer reads Fabric attributes through the Fabric Scene Delegate (FSD), not via some generic "renderer worldMatrix listener". Rework the writer-scope's listener bullet to use these names precisely: - the tracking pause and the changelog belong to Fabric Hierarchy - update_world_xforms() is a method on Fabric Hierarchy, not on "Kit" - FSD reads omni:fabric:worldMatrix from Fabric storage on the render path The IsaacLab project-level shorthand "Fabric backend" / FabricFrameView / _use_fabric is unchanged -- those are API labels, not technical claims.
Changelog fragments are read against develop, not against an in-branch interim state. Remove every entry that describes a feature added then removed within this PR -- net effect vs develop is zero, and mentioning the round-trip only confuses reviewers. Specifically: - isaaclab/xform-space-writer: drop the "Removed: set_world_scales / set_local_scales" section. Those methods were added in fabric-local-poses and removed here; they never existed on develop. - isaaclab/fabric-local-poses: drop set_world_scales / set_local_scales from the Added list for the same reason; keep only the surviving getters and route scale writes via the writer scope. - isaaclab_physx/fabric-local-poses: drop the "lazy dirty tracking" and "interleave detection" entries; both mechanisms were added and then removed by the writer-scope migration. - isaaclab_physx/xform-space-writer: drop the paragraph that lists the removed _DirtyFlag enum / _sync_*_if_dirty helpers. Also switch the remaining prose to the correct Omniverse terminology -- Fabric Hierarchy owns update_world_xforms() and the change tracking; FSD feeds the renderer. - isaaclab_newton + isaaclab_ovphysx/fabric-local-poses: drop set_world_scales / set_local_scales from the Added/Deprecated lists. Also include the prior wording fix "Will be used by" -> "Used by" in the utils docs (the kernels are now in use). While I'm in this file, drop the deprecation-replacement clauses that told users to use set_world_scales / set_local_scales -- those methods don't exist; the writer scope is the right pointer. Plus: add a comment in scripts/benchmarks/benchmark_xform_prim_view.py explaining why each wp.clone() now goes through .warp (ProxyArray introduced by PR isaac-sim#5304 changed the FrameView getter return type).
The benchmark mixes a one-time view-construction cost (`init`) with per-iteration steady-state ops in the same dict. The Total row and Overall speedup were both summing every value in that dict, so a backend whose first call materializes the view (NewtonSiteFrameView took ~1 s of stage population in the reported run) crushed the per-iter ops in the totals and printed a misleading 0.13x overall. Separate the two: keep Initialization in the per-op table for visibility, but compute Total and Overall over the per-iter operations only. The new label makes the scope explicit -- "Per-iter total" and "SPEEDUP vs USD (per-iter ops; one-time init excluded)". With the user's Blackwell RTX PRO 5000 numbers, this turns the nonsensical 0.13x Newton overall into 207x and the dampened 25x Fabric overall into 164x.
Tests should describe and exercise the CURRENT API, not history.
Removed
- test_set_world_scales_method_no_longer_exists (whitebox assertion
that two never-shipped attributes are absent from BaseFrameView).
Renamed + docstrings rewritten (names and docstrings referenced
``set_world_scales`` / ``set_local_scales`` which do not exist; the test
bodies already use the writer scope):
- frame_view_contract_utils.py:
test_set_local_scales_roundtrip -> test_local_scales_roundtrip
test_set_world_scales_roundtrip -> test_world_scales_roundtrip
- test_views_xform_prim.py:
test_set_local_scales_then_get_world_scales
-> test_world_scale_composes_with_parent_scale
test_set_world_scales_then_get_local_scales
-> test_local_scale_inverts_parent_when_writing_world_scale
Stale docstring references to the removed dirty-tracking and the
deprecated set_*_scales API also corrected in three fabric tests
(test_local_scales_roundtrip, test_world_scales_roundtrip,
test_fabric_cuda1_scales_roundtrip, test_initial_seed_with_scaled_parent).
FabricFrameView's getters (_get_world_poses_impl, _get_local_poses_impl,
and the shared _decompose_scales for both scale getters) only called
wp.synchronize() on the cached path (indices is None / slice(None)).
The per-indices path returned a fresh ProxyArray without syncing, so a
caller that did .numpy() or wp.to_torch().cpu() on it could observe
zeros from the wp.zeros initializer (the kernel hadn't completed).
Move the sync below the kernel launch and call it unconditionally.
This:
- removes the asymmetry between the cached and the indexed paths,
- matches what the class docstring claims about getters being
immediately readable,
- is essentially free for the cached path (was already syncing) and
cheap for the indexed path (the kernel itself is the dominant cost).
Update the class docstring to describe the new behaviour: every getter
launches its decompose kernel and synchronizes before returning, so the
returned ProxyArray is immediately readable from GPU or host without
any caller-side sync.
Reported by Greptile on PR isaac-sim#5677.
_compute_fabric_indices and _compute_parent_fabric_indices both walked selection.GetPaths(), built the same path->index dict, then iterated self.prim_paths to look up either the prim itself or its parent. The selection-walk + dict + lookup loop is exactly what _compute_fabric_indices_for(selection, paths) already does for one-off index arrays. Have the two specialised helpers delegate to it. The parent variant keeps its stage-root precondition inline (where it reads naturally, next to the rsplit) and builds the parent path list before calling the shared primitive. The child variant is a one-line passthrough. No behavioural change: same selection walks, same indices, same ordering. Saves ~18 lines and removes a duplicated dict construction. The shared primitive's docstring is updated to reflect its new role as the canonical lookup, no longer just a one-shot.
Two helpers were building lists with .append() inside an imperative
for-loop:
- _compute_fabric_indices_for: walked paths -> looked up indices,
raised on miss, appended to a list.
- _compute_parent_fabric_indices: walked self.prim_paths -> derived
parent path, validated stage-root, appended to a list.
Factor the per-element step into a named local function whose body
documents the intent (lookup + validation, parent derivation +
validation), then drive it from a list comprehension. Same control
flow, denser at the call site, the loop body's purpose is now a
one-token name.
b5bcd15 to
70f1400
Compare
The world-body fallback in NewtonSiteFrameView._resolve_source_prim resolved the reference prim directly from source_root, instead of stripping the clone-template suffix as the prior code did. For heterogeneous (multi-asset spawner) scenes this resolved the wrong reference frame, offsetting site world poses and breaking rendering-correctness (dexsuite kuka hetero kitless). Restore the split_clone_template/get_suffix-based ref_path computation (and the dropped imports). Homogeneous scenes were unaffected; hetero rgb/depth now match golden images again.
Fast GPU pose/scale ops on Fabric + a new writer-scope API
What this does
Two things:
Local-space pose and scale ops are now GPU-fast on the Fabric backend.
They go through the same Warp-kernel path that world-space ops already used.
The old USD fallback for local-space is ~100× slower (see the table below).
All transform writes go through a small context manager. You open a
scope, do your writes inside it, and the scope handles the cleanup:
Use
view.xform_local_space_writer()for local-space writes.Builds on Piotr's prototype at
bareya/pbarejko/camera-update.Why a scope?
A FrameView keeps two copies of each prim's transform in Fabric storage:
the
omni:fabric:worldMatrixandomni:fabric:localMatrixattributes.When you write one, the other has to be recomputed so they stay
consistent.
Without a scope, that recompute has to run after every single write. With
a scope, it runs once, when the scope closes. You can call
set_posesand
set_scalesas many times as you want inside the scope; on exit, oneWarp kernel derives the other space and one
wp.synchronize()runs.Empty scopes cost nothing.
The scope also pauses Fabric Hierarchy's transform-change tracking
while you're writing, then restores it on exit. (Fabric itself is just
the data store; Fabric Hierarchy, exposed through USDRT's
IFabricHierarchy, is the plugin that watches writes toomni:fabric:localMatrix/omni:fabric:worldMatrixand keeps themmutually consistent across the hierarchy.) Its tracking is pull-based:
a per-attribute listener records writes into a changelog, and the
plugin drains and processes that changelog on the next call to
IFabricHierarchy::update_world_xforms(). Pausing the listener doesnot block the writes themselves — they still land in Fabric storage —
it just keeps them out of the changelog. With the changelog empty, the
next tick has nothing to "catch up on" for these prims, and can't
decide one of the two spaces is canonical and derive the other from
it. See the next section for why that matters.
Rules:
raises
RuntimeError.get_world_poses, etc.)raise. Read through the writer (
writer.get_poses()) inside the scope,or close the scope first.
No
sim.step(), noworld.render(), noSimulationApp.update().The matrices are mid-write until exit, and rendering against that state
would read torn data. Keep scopes short and close them before stepping.
cell interrupt, etc.), the scope still runs the opposite-space recompute
on exit so the world and local matrices stay consistent prim-by-prim.
The partial write itself is not rolled back — if you need
all-or-nothing, snapshot the matrices yourself before entering.
Why we have to coordinate with Fabric Hierarchy (renderer correctness)
A writer scope is synchronous Python code, so the renderer can't see
anything mid-scope — no tick runs until we exit. The risk is on the
next render/sim tick after the scope closes.
On that next tick, Fabric Hierarchy runs its
IFabricHierarchy::update_world_xforms()step, which can recomputeomni:fabric:worldMatrixfromomni:fabric:localMatrix(or viceversa). If it thinks one of the two was the user's most recent edit,
it derives the other from it; the derived value — not what we wrote —
is then what the Fabric Scene Delegate (FSD) hands to RTX on the
render path.
Three things in this PR keep that next-tick recompute from clobbering our
work:
Opposite-space derive at scope exit. By the time the scope closes,
omni:fabric:worldMatrixandomni:fabric:localMatrixare mutuallyconsistent prim-by-prim, so any recompute Fabric Hierarchy does is a
no-op.
Fabric-Hierarchy change tracking is paused for the whole scope.
The writer calls
track_local_xform_changes(False)andtrack_world_xform_changes(False)on enter (saving the prior state)and restores them on exit. While tracking is paused, the listeners
don't record our writes in their changelog, so when the next tick
runs there's nothing for
update_world_xforms()to "catch up on".Two persistent selections with explicit read-only / read-write
flags. Outside the scope, the view's selection is fully read-only —
both
omni:fabric:worldMatrixandomni:fabric:localMatrixareflagged RO. Inside the scope, the writer flips to a fully read-write
selection.
Both selections are built once when the view is initialized and kept
alive for its lifetime. The writer scope flips a single flag (
_is_rw)on enter/exit; nothing is rebuilt. The RO steady state tells Fabric
Hierarchy's next-tick
update_world_xforms()that no attribute isuser-authored, so it leaves both alone.
Backend support
to the backend's existing storage. No batching, no extra recompute (there
is no second matrix to keep in sync). Same API across all four backends.
Newton has two different "scale" ideas, and the API keeps them apart:
NewtonSiteFrameView.set_scales(...)(deprecated) writes the Newtoncollision-shape geometry size.
set_scales(...)writes per-site transform scale,matching what USD FrameView does.
They operate on different state and are not merged.
API changes
New (recommended):
view.xform_world_space_writer()/view.xform_local_space_writer()—a context manager with
set_poses,set_scales,get_poses,get_scales.Deprecated (still works, warns once per class on first use):
set_world_poses/set_local_poses— use the writer scope.get_scales/set_scales— useget_local_scales/get_world_scales, or the writer'sset_scales.Benchmark (1024 prims, 50 iterations, Blackwell RTX PRO 5000)
Per-iteration timings (lower is better):
Speedup vs USD (per-iter ops; one-time view construction excluded):
One-time view construction (reported separately, not part of the per-iter total): USD 4.6 ms, Fabric 4.4 ms, Newton Site 1013 ms — the Newton number is dominated by stage population on the first call. Steady-state per-iteration cost is what the speedup row reflects.
Notes for reviewers
wp.synchronize()before returning, so the returnedProxyArrayis always immediately readable from both GPU and hostcode — no caller-side sync needed. (Both the cached and the
per-indices paths sync; this used to be asymmetric.)