Skip to content

feat: GPU-accelerated FrameView local poses + a new xform_space_writer API#5677

Open
pv-nvidia wants to merge 54 commits into
isaac-sim:developfrom
pv-nvidia:pv/fabric-local-poses
Open

feat: GPU-accelerated FrameView local poses + a new xform_space_writer API#5677
pv-nvidia wants to merge 54 commits into
isaac-sim:developfrom
pv-nvidia:pv/fabric-local-poses

Conversation

@pv-nvidia

@pv-nvidia pv-nvidia commented May 18, 2026

Copy link
Copy Markdown
Contributor

Fast GPU pose/scale ops on Fabric + a new writer-scope API

What this does

Two things:

  1. Local-space pose and scale ops are now GPU-fast on the Fabric backend.
    They go through the same Warp-kernel path that world-space ops already used.
    The old USD fallback for local-space is ~100× slower (see the table below).

  2. All transform writes go through a small context manager. You open a
    scope, do your writes inside it, and the scope handles the cleanup:

    with view.xform_world_space_writer() as writer:
        writer.set_poses(positions=p, orientations=o)
        writer.set_scales(scales=s)

    Use view.xform_local_space_writer() for local-space writes.

Builds on Piotr's prototype at
bareya/pbarejko/camera-update.

Why a scope?

A FrameView keeps two copies of each prim's transform in Fabric storage:
the omni:fabric:worldMatrix and omni:fabric:localMatrix attributes.
When you write one, the other has to be recomputed so they stay
consistent.

Without a scope, that recompute has to run after every single write. With
a scope, it runs once, when the scope closes. You can call set_poses
and set_scales as many times as you want inside the scope; on exit, one
Warp kernel derives the other space and one wp.synchronize() runs.
Empty scopes cost nothing.

The scope also pauses Fabric Hierarchy's transform-change tracking
while you're writing, then restores it on exit. (Fabric itself is just
the data store; Fabric Hierarchy, exposed through USDRT's
IFabricHierarchy, is the plugin that watches writes to
omni:fabric:localMatrix / omni:fabric:worldMatrix and keeps them
mutually consistent across the hierarchy.) Its tracking is pull-based:
a per-attribute listener records writes into a changelog, and the
plugin drains and processes that changelog on the next call to
IFabricHierarchy::update_world_xforms(). Pausing the listener does
not block the writes themselves — they still land in Fabric storage —
it just keeps them out of the changelog. With the changelog empty, the
next tick has nothing to "catch up on" for these prims, and can't
decide one of the two spaces is canonical and derive the other from
it. See the next section for why that matters.

Rules:

  • Only one writer can be open on a view at a time. Opening a second one
    raises RuntimeError.
  • While a writer is open, the view's own getters (get_world_poses, etc.)
    raise. Read through the writer (writer.get_poses()) inside the scope,
    or close the scope first.
  • Do not advance the simulation or render from inside the scope.
    No sim.step(), no world.render(), no SimulationApp.update().
    The matrices are mid-write until exit, and rendering against that state
    would read torn data. Keep scopes short and close them before stepping.
  • If something raises inside the scope (a user-code bug, a notebook
    cell interrupt, etc.), the scope still runs the opposite-space recompute
    on exit so the world and local matrices stay consistent prim-by-prim.
    The partial write itself is not rolled back — if you need
    all-or-nothing, snapshot the matrices yourself before entering.

Why we have to coordinate with Fabric Hierarchy (renderer correctness)

A writer scope is synchronous Python code, so the renderer can't see
anything mid-scope — no tick runs until we exit. The risk is on the
next render/sim tick after the scope closes.

On that next tick, Fabric Hierarchy runs its
IFabricHierarchy::update_world_xforms() step, which can recompute
omni:fabric:worldMatrix from omni:fabric:localMatrix (or vice
versa). If it thinks one of the two was the user's most recent edit,
it derives the other from it; the derived value — not what we wrote —
is then what the Fabric Scene Delegate (FSD) hands to RTX on the
render path.

Three things in this PR keep that next-tick recompute from clobbering our
work:

  1. Opposite-space derive at scope exit. By the time the scope closes,
    omni:fabric:worldMatrix and omni:fabric:localMatrix are mutually
    consistent prim-by-prim, so any recompute Fabric Hierarchy does is a
    no-op.

  2. Fabric-Hierarchy change tracking is paused for the whole scope.
    The writer calls track_local_xform_changes(False) and
    track_world_xform_changes(False) on enter (saving the prior state)
    and restores them on exit. While tracking is paused, the listeners
    don't record our writes in their changelog, so when the next tick
    runs there's nothing for update_world_xforms() to "catch up on".

  3. Two persistent selections with explicit read-only / read-write
    flags.
    Outside the scope, the view's selection is fully read-only —
    both omni:fabric:worldMatrix and omni:fabric:localMatrix are
    flagged RO. Inside the scope, the writer flips to a fully read-write
    selection.

    _sel_ro :  worldMatrix=RO, localMatrix=RO   (steady state, between writes)
    _sel_rw :  worldMatrix=RW, localMatrix=RW   (active during a writer scope)
    

    Both selections are built once when the view is initialized and kept
    alive for its lifetime. The writer scope flips a single flag (_is_rw)
    on enter/exit; nothing is rebuilt. The RO steady state tells Fabric
    Hierarchy's next-tick update_world_xforms() that no attribute is
    user-authored, so it leaves both alone.

Backend support

  • Fabric (PhysX): the fast GPU path described above.
  • USD, Newton, OVPhysX: the writer is a thin wrapper. Writes go straight
    to the backend's existing storage. No batching, no extra recompute (there
    is no second matrix to keep in sync). Same API across all four backends.

Newton has two different "scale" ideas, and the API keeps them apart:

  • NewtonSiteFrameView.set_scales(...) (deprecated) writes the Newton
    collision-shape geometry
    size.
  • The writer's set_scales(...) writes per-site transform scale,
    matching what USD FrameView does.

They operate on different state and are not merged.

API changes

New (recommended):

  • view.xform_world_space_writer() / view.xform_local_space_writer()
    a context manager with set_poses, set_scales, get_poses,
    get_scales.

Deprecated (still works, warns once per class on first use):

  • set_world_poses / set_local_poses — use the writer scope.
  • get_scales / set_scales — use get_local_scales /
    get_world_scales, or the writer's set_scales.

Benchmark (1024 prims, 50 iterations, Blackwell RTX PRO 5000)

Per-iteration timings (lower is better):

Operation USD (ms) Fabric (ms) Newton Site (ms)
Get World Poses 8.6148 0.0375 0.0307
Set World Poses 19.1357 0.1090 0.0368
Get Local Poses 5.9556 0.0374 0.0497
Set Local Poses 7.9794 0.0979 0.0790
Get World Scales 13.1193 0.0371 0.0015
Set World Scales 21.4571 0.1014 0.0612
Get Local Scales 3.2544 0.0375 0.0022
Set Local Scales 3.8350 0.0937 0.0587
Get Both (World+Local) 14.7506 0.0738 0.1302
Interleaved World Set→Get 28.1046 0.1432 0.1588
Per-iter total 126.2 0.77 0.61

Speedup vs USD (per-iter ops; one-time view construction excluded):

Operation Fabric × Newton Site ×
Get World Poses 229.6× 280.2×
Set World Poses 175.6× 519.8×
Get Local Poses 159.0× 119.9×
Set Local Poses 81.5× 101.0×
Get World Scales 353.9× 8911.5×
Set World Scales 211.6× 350.4×
Get Local Scales 86.8× 1489.9×
Set Local Scales 40.9× 65.4×
Get Both (World+Local) 200.0× 113.3×
Interleaved World Set→Get 196.2× 177.0×
Overall (per-iter) 164× 207×

One-time view construction (reported separately, not part of the per-iter total): USD 4.6 ms, Fabric 4.4 ms, Newton Site 1013 ms — the Newton number is dominated by stage population on the first call. Steady-state per-iteration cost is what the speedup row reflects.

Notes for reviewers

  • Getters call wp.synchronize() before returning, so the returned
    ProxyArray is always immediately readable from both GPU and host
    code — no caller-side sync needed. (Both the cached and the
    per-indices paths sync; this used to be asymmetric.)

@github-actions github-actions Bot added documentation Improvements or additions to documentation isaac-lab Related to Isaac Lab team labels May 18, 2026

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔍 Code Review Update

Review of new commits: PR was rebased and expanded since last review.
Previously reviewed: b15d6235 | Now reviewing: 80838c00

Summary

The PR has been expanded from 5 to 6 commits, adding Fabric-accelerated get/set_local_poses:

  1. Service locator infrastructure on SimulationContext
  2. Service locator tests + changelog
  3. Indexed Fabric transform kernels in isaaclab.utils.warp.fabric
  4. FabricStageCache as a shared hierarchy handle
  5. (NEW) Merge commit to consolidate branches
  6. (NEW) FabricFrameView rewrite with get/set_local_poses + dirty tracking

✅ New Additions Since Last Review

  • Fabric-accelerated local poses: set_local_poses / get_local_poses now use wp.indexedfabricarray to read/write omni:fabric:localMatrix directly on the GPU
  • Bidirectional world↔local sync:
    • set_world_poses → recomputes localMatrix via _sync_local_from_world()
    • set_local_poses → marks _world_dirty, world recomputed on next get_world_poses
  • Per-view dirty tracking: _world_dirty flag is instance-scoped, so concurrent views on the same stage don't clear each other's flag
  • Parent matrix handling: _build_parent_indexed_array() + _compute_parent_fabric_indices() for parent world matrix lookups
  • Topology-adaptive: PrepareForReuse() calls + _rebuild_trans_ro_arrays() for automatic recovery
  • Comprehensive tests: 13 new integration tests covering local/world consistency, rotated/scaled parents, multi-view isolation

🔧 Remaining Observations

[Minor] Index array dtype mismatch still present

The kernels declare indices: ArrayUInt32, but _compute_fabric_indices() returns dtype=wp.int32:

# fabric_frame_view.py
return wp.array(indices, dtype=wp.int32, device=self._device)

Warp will silently cast, so this works in practice. Suggestion: switch to dtype=wp.uint32 for consistency with the kernel signatures. Not blocking.

[Minor] Undefined buffer references in get_local_poses

get_local_poses references self._fabric_local_translations_buf and self._fabric_local_orientations_buf:

if use_cached:
    translations_wp = self._fabric_local_translations_buf
    orientations_wp = self._fabric_local_orientations_buf

These don't appear to be initialized in _initialize_fabric(). Verify these buffers are created alongside the existing world-pose buffers.

📋 Architecture Notes

The world↔local propagation design is clean:

  • Write world → update local: _sync_local_from_world() runs update_indexed_local_matrix_from_world kernel immediately after world writes
  • Write local → lazy world update: _world_dirty flag defers update_indexed_world_matrix_from_local until the next world read

This asymmetry makes sense: world writes are typically followed by physics steps (which don't need locals), while local writes are often followed by world reads for rendering.

📋 Verdict

LGTM — the new local-pose acceleration is a significant feature addition. The bidirectional sync logic is well-designed, and the test coverage is comprehensive (33 test functions across all new modules). The minor dtype observation from the previous review remains, plus one potential undefined buffer issue to double-check.


Automated review by isaaclab-review-bot • Reviewed at 80838c00

@pv-nvidia pv-nvidia force-pushed the pv/fabric-local-poses branch 4 times, most recently from 80838c0 to 9ff3155 Compare May 20, 2026 14:11

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔍 Code Review Update

Review of new commits: PR was rebased and expanded since last review.
Previously reviewed: eb5582ec | Now reviewing: 9ff3155

Summary

This PR delivers a well-architected feature: GPU-accelerated local-pose operations for FabricFrameView. The implementation is comprehensive and addresses the core limitation where local poses previously fell back to USD round-trips.

✅ Strengths

1. Clean Architecture

  • FabricStageCache provides shared hierarchy handles via the service locator pattern, avoiding per-view duplication
  • Three persistent selections (trans_sel_ro, world_sel_rw, local_sel_rw) cleanly separate read vs. write access patterns
  • Factory dispatch in FrameView._get_backend() correctly routes to UsdFrameView when Fabric is unavailable

2. Robust World↔Local Consistency

  • Bidirectional dirty tracking: set_local_poses marks _world_dirty, deferred until next world read
  • _sync_local_from_world() / _sync_world_from_local_if_dirty() keep matrices consistent
  • Per-view dirty flags prevent concurrent views from clearing each other's pending syncs

3. Topology-Adaptive Design

  • PrepareForReuse() + lazy array rebuild in _get_*_array() handles Fabric memory layout changes
  • _rebuild_trans_ro_arrays() consolidates index and indexed-array refresh

4. Excellent Test Coverage

  • 13+ new integration tests covering rotated parents, scaled parents, multi-view isolation
  • test_set_local_then_get_world_with_rotated_parent validates transpose-convention correctness
  • test_multi_view_per_view_dirty_isolation catches per-stage vs. per-view flag bugs

🔧 Minor Observations

[Minor] Index array dtype mismatch

_compute_fabric_indices() returns dtype=wp.int32, but kernels declare indices: ArrayUInt32:

return wp.array(indices, dtype=wp.int32, device=self._device)

Warp silently casts, so this works. Suggestion: use dtype=wp.uint32 for consistency with kernel signatures. Not blocking.

[Nit] Docstring transpose-convention note

The docstrings for update_indexed_local_matrix_from_world and update_indexed_world_matrix_from_local explain the transpose identity well. Consider adding a brief note that this relies on Fabric's row-major storage convention for future maintainers.

[Style] Empty sentinel shape

_fabric_empty_2d_array_sentinel uses shape (0, 0):

self._fabric_empty_2d_array_sentinel = wp.zeros((0, 0), dtype=wp.float32, device=self._device)

This is fine since the kernels gate on shape[0] > 0, but (0, 3) or (0, 4) might be slightly more self-documenting for unused position/quaternion slots.

📋 Architecture Notes

The asymmetric sync strategy is well-reasoned:

  • Write world → sync local immediately: _sync_local_from_world() runs right after world writes because downstream code (e.g., rendering) typically reads locals soon after
  • Write local → lazy world sync: _world_dirty flag defers the world = parent * local kernel until the next world read, avoiding unnecessary computation when multiple local writes occur before a world read

📋 CI Status

CI checks are currently pending. The pre-commit and changelog checks have passed.

📋 Verdict

LGTM — This is a significant feature addition that completes the Fabric acceleration story for FabricFrameView. The bidirectional sync logic is sound, test coverage is thorough, and the codebase is well-documented. The minor dtype observation is non-blocking.


Automated review by isaaclab-review-bot • Reviewed at 9ff31550


Update (d13ed99→7f1a012c): Reviewed large incremental batch (50+ files changed). This batch contains no changes to the core FabricFrameView implementation — the Fabric local-poses feature code is identical to the previously reviewed state.

Key changes in this batch (all outside the PR's core feature scope):

  1. isaaclab_ppisp made fully optional: All packages (isaaclab_physx, isaaclab_newton, isaaclab_ov, isaaclab_ovphysx) removed hard isaaclab_ppisp from dependencies. Lazy try/except ModuleNotFoundError with _raise_missing_ppisp_error() helper provides actionable install hints only when isp_cfg is actually set. Clean pattern, consistently applied.

  2. Cloner refactoring to ReplicateContext classes: physx_replicate, newton_physics_replicate, and ovphysx_replicate legacy functions now delegate to new PhysxReplicateContext, NewtonReplicateContext, and OvPhysxReplicateContext classes. Asset constructors call queue_*_replication(cfg) to register on REPLICATION_QUEUE. Well-structured — separates queueing from execution, enables batched replication.

  3. MJWarp ls_parallel deprecated: Config field emits DeprecationWarning via __post_init__(), _build_solver() ignores it. Removed from all test configurations.

  4. Newton manager lifecycle fix: close() now calls super().close() before cls.clear(), ensuring sensor invalidation callbacks fire while Newton state is still alive, preventing stale registrations leaking.

  5. OvPhysX/OVRTX runtime guards: import_ovphysx() helper with actionable install messages, ovphysx==0.4.13 pin.

  6. PhysxManager rigid body view: Now detects name collisions between rigid and non-rigid prims (e.g., JointWrenchSensor frame prim vs. rigid link with same name) and uses exact paths for ambiguous names.

  7. Version bumps and changelog consolidation: All changelog fragment files moved into proper versioned CHANGELOG.rst entries.

Previous inline comments status:

  • P1 (Missing wp.synchronize() in indexed getter paths): Still present — no changes to fabric_frame_view.py in this batch. Remains non-blocking per previous assessment.
  • P2 (Redundant kernel launch): Fixed in earlier commits (confirmed).

No new issues found in the incremental changes. All additions are well-implemented and follow established patterns. LGTM.


Automated incremental review by isaaclab-review-bot • Reviewed at 7f1a012c


Update (7f1a012→999cf3c6): Major architectural refactor reviewed.

Changes in this batch

The lazy dirty-flag mechanism (_DirtyFlag enum, _dirty field, _sync_*_if_dirty helpers, interleaved-write warning) has been removed entirely and replaced with a context-managed writer scope API (FrameViewSpaceWriterBase). This is a substantial and well-executed improvement:

  1. New xform_space_writer() context manager: Opens a scoped write session ("world" or "local"). On __exit__, derives the opposite-space matrices once via a single Warp kernel and calls wp.synchronize() once. Clean, predictable, no more subtle interleaving bugs.

  2. Single-active-writer invariant: Per-view lock prevents concurrent scopes — RuntimeError on double-entry. Well-tested.

  3. Hierarchy listener pause/restore: _FabricWriterMixin._enter_impl() pauses track_local_xform_changes / track_world_xform_changes during writes, restoring prior state on exit. Correctly handles pre-paused listeners (no accidental re-enable).

  4. Deprecated shims preserved: set_world_poses() / set_local_poses() remain as thin wrappers that open one-shot writer scopes internally + emit DeprecationWarning. Clean migration path.

  5. set_world_scales / set_local_scales removed (breaking): Since these were introduced in this release cycle without stable downstream users, removal without deprecation is appropriate. Changelog documents it.

  6. Test coverage: Comprehensive new tests for the writer contract — single-active invariant, derive-on-exit counting, empty-scope no-op, getter-raises-inside-scope, hierarchy tracking restore.

Previous comments status

  • P2 (Redundant kernel launch): Fixed — the _sync_fabric_from_usd_initial() path now only composes localMatrix then calls _recompute_world_from_local_all(). The redundant world compose is gone.
  • P1 (Missing wp.synchronize() in indexed getter paths): Still present in the non-cached decompose paths. Remains non-blocking per previous assessment (callers on the same CUDA stream see correct ordering; only cross-stream or immediate .numpy() usage is affected).
  • Minor observations (dtype, docstring, sentinel shape): These were non-blocking nits and remain unchanged. The sentinel shape (0, 0) is now less relevant since it is only used internally by the writer compose kernel.

New observations

No new issues found. The architectural shift from lazy dirty-flag tracking to eager derive-on-scope-exit is a clear improvement in correctness and readability. The _FabricWriterMixin pattern cleanly separates Fabric lifecycle management from the write logic. LGTM.


Automated incremental review by isaaclab-review-bot • Reviewed at 999cf3c6

@pv-nvidia

Copy link
Copy Markdown
Contributor Author

Superseded by the consolidated PR #5728 (pv/fabric-full-stack).

@pv-nvidia pv-nvidia closed this May 22, 2026
@pv-nvidia pv-nvidia reopened this May 22, 2026
@pv-nvidia pv-nvidia force-pushed the pv/fabric-local-poses branch 12 times, most recently from 4685420 to 6b56971 Compare May 25, 2026 17:24
@pv-nvidia pv-nvidia marked this pull request as ready for review May 25, 2026 18:05
pv-nvidia and others added 28 commits June 24, 2026 10:05
…rent

Implemented new tests to validate the behavior of world and local scale conversions in a hierarchy with a scaled parent. The tests ensure that setting local scales correctly computes world scales and vice versa, addressing USD-specific scale math not covered by existing tests.
Restores PR isaac-sim#5728's three-selection layout (`_trans_sel_ro`,
`_world_sel_rw`, `_local_sel_rw`) with asymmetric Fabric access flags
on `worldMatrix` and `localMatrix`.  Those flags are what protect the
user's write from being clobbered by Kit's per-tick
`IFabricHierarchy.update_world_xforms`:

* On `set_world_poses` (via `_world_sel_rw`, `localMatrix=RO`), Fabric
  does not recompute world from local -- the user's worldMatrix write
  survives until the renderer reads it.
* On `set_local_poses` (via `_local_sel_rw`, `worldMatrix=RO`), Fabric
  recomputes world from the new local on the next tick -- the renderer
  reads the correct world.

A single combined `worldMatrix=RW, localMatrix=RW` selection (the
recent design on this branch) removed that protection.  Fabric saw
both attributes as user-authored and fell back to the hierarchy's
canonical direction (local -> world), recomputing world from a stale
local and silently overwriting the user's world write.  That was the
failure mode behind the
`test_output_equal_to_usdcamera` regression and any other Camera + RTX
path that drives world poses through Fabric.

With the RO/RW protection back in place, the eager world<->local
flushes introduced by commits "fix: flush Fabric world matrices after
local writes" and the follow-up "set_world_poses eager local sync" are
no longer needed and are removed.  The `change_block` context manager
and its companion helpers existed only to batch those eager flushes;
with the flushes gone, the API has nothing to defer and is removed
from both `BaseFrameView` and `FabricFrameView`.

Class docstring now spells out the load-bearing role of the RO/RW
layout so a future refactor doesn't reintroduce the single-selection
shape.

Tests:

* Removed `test_set_local_*_updates_renderer_facing_fabric_world_matrix`
  (asserted an eager-update contract that the lazy design deliberately
  does not hold; correctness across the next render tick is provided
  by the RO/RW protection, not by an extra Warp kernel).
* Removed the four `test_change_block_*` tests; the API is gone.
* Inverted `test_interleaved_set_emits_no_warning` back to
  `test_interleaved_set_emits_warning`, restored `_dirty == LOCAL`
  assertions in `test_world_scales_roundtrip` and the symmetric
  `WORLD` assertion in `test_local_scales_roundtrip`, and updated
  `test_multi_view_per_view_dirty_isolation` to expect the lazy
  cross-view behavior.
* Adapted `test_prepare_for_reuse_detects_topology_change` and
  `test_fabric_rebuild_after_topology_change` to poll/rebuild all
  three selections.

Verified:

* `pytest test_ray_caster_camera.py::test_output_equal_to_usdcamera` passes.
* `pytest test_ray_caster_camera.py::test_output_equal_to_usd_camera_when_intrinsics_set` 4/4 pass.
* `pytest test_views_xform_prim_fabric.py` 71 passed, 3 skipped (cuda:1).
* `./isaaclab.sh -f` clean.

Net diff -198 lines.
Replaces the four FrameView pose/scale setters with a single
context-managed writer scope on every backend.  The writer batches
multiple writes (poses + scales) inside one scope, derives the
opposite-space matrix once on ``__exit__``, and synchronizes once.
Only one writer scope may be active per view at a time; view-level
getters raise ``RuntimeError`` while a writer scope is active.

API summary:

  with view.xform_space_writer("world") as w:
      w.set_poses(positions=p, orientations=o)
      w.set_scales(scales=s)
  # Derived (local) matrices are recomputed, the scope releases.

Public classes (in ``isaaclab.sim.views``):

  * ``FrameViewSpaceWriterBase``  -- abstract base
  * ``FrameViewWorldSpaceWriter`` -- world-space tag class
  * ``FrameViewLocalSpaceWriter`` -- local-space tag class

Behind the scenes:

  * ``BaseFrameView`` gains ``xform_space_writer()``,
    ``_active_writer``, and the abstract factory hooks
    ``_make_world_space_writer`` / ``_make_local_space_writer``.
    Public getters become guarded wrappers around new
    ``_get_*_impl`` backend hooks; they raise ``RuntimeError`` when a
    writer scope is active.
  * ``FabricFrameView`` ships ``_FabricWorldSpaceWriter`` and
    ``_FabricLocalSpaceWriter`` that pause
    ``IFabricHierarchy.track_local_xform_changes`` /
    ``track_world_xform_changes`` for the scope's lifetime (saving
    and restoring the prior state so we never re-enable a listener
    the caller had paused).  Eager dual-write inside the scope means
    Kit's per-tick ``updateWorldXforms`` does not redundantly
    recompute matrices we just wrote.  The renderer's independent
    ``omni:fabric:worldMatrix`` listener still observes the writes.
    The lazy-dirty mechanism is gone: the ``_DirtyFlag`` enum,
    ``_dirty`` field, ``_warned_interleaved_set`` field, the
    ``_sync_*_if_dirty`` helpers, and the one-time
    "interleaved set_world_poses / set_local_poses" warning are all
    deleted.  The three-selection RO/RW layout is kept as a
    defensive layer and for clarity of authoring intent.
  * ``UsdFrameView`` / ``NewtonSiteFrameView`` /
    ``OvPhysxFrameView`` ship pass-through writers (their
    ``set_poses`` / ``set_scales`` immediately delegate to the
    backend's ``_apply_*_write`` helpers; ``__exit__`` is a no-op
    beyond releasing the single-writer lock).

Setter deprecation / removal:

  * ``set_world_poses`` and ``set_local_poses`` are kept as
    one-time-warning shims on ``BaseFrameView`` that route through
    the writer internally.  Use ``view.xform_space_writer("world" |
    "local")`` and ``w.set_poses(...)``.
  * ``set_world_scales`` and ``set_local_scales`` were introduced in
    this release cycle without external users and are removed
    outright (no deprecation).  Use ``w.set_scales(...)`` inside a
    writer scope.
  * The existing ``set_scales`` deprecation shim is kept and now
    opens the backend-appropriate writer scope internally
    (Fabric: world, USD/OvPhysx: local).

Migration:

  All 81 call sites across ``source/``, ``scripts/``, and the test
  suites are migrated to the new API in this commit so the repo's
  own code base raises no deprecation warnings.  External callers
  on ``set_world_poses`` / ``set_local_poses`` / ``set_scales``
  keep working (one warning per class on first call).

Verification:

  * ``pytest source/isaaclab_physx/test/sim/test_views_xform_prim_fabric.py``
    -- 81 passed, 3 skipped (cuda:1 multi-GPU).
  * ``pytest source/isaaclab/test/sensors/test_ray_caster_camera.py::test_output_equal_to_usdcamera``
    -- the original Camera + RTX regression that motivated this
    arc, passes.
  * ``pytest source/isaaclab/test/sensors/test_ray_caster_camera.py::test_output_equal_to_usd_camera_when_intrinsics_set``
    -- 4 parametrizations pass.
  * ``./isaaclab.sh -f`` -- clean.

New test coverage for the writer contract (in
``test_views_xform_prim_fabric.py``):

  * world / local writer derives opposite space on exit
  * exactly one derive-kernel launch per scope (monkeypatched
    counter) regardless of how many set_poses / set_scales calls
    are made inside the scope
  * single-active-writer invariant raises ``RuntimeError``
  * exit restores prior ``track_local_xform_changes`` /
    ``track_world_xform_changes`` state (does not re-enable
    listeners the caller had paused)
  * invalid space string raises ``ValueError``
  * empty scope does not launch the derive kernel
  * view-level getters raise inside an active scope
…er() / xform_local_space_writer()

Replace the single dispatch method that took a 'world'|'local' string with
two explicit methods. More discoverable, better typed (each returns its
concrete writer class), and removes the invalid-space ValueError path.

Update all callsites (camera, USD/Fabric/Newton/OVPhysX backends, mimic),
benchmarks, tests, and changelog fragments. Drop the now-meaningless
test_writer_invalid_space_raises.
Previously FabricFrameView held three persistent PrimSelection handles
with asymmetric RO/RW flags (trans_sel_ro = both-RO for reads, world_sel_rw
= worldMatrix-RW, local_sel_rw = localMatrix-RW).  That layout was the
load-bearing protection against Kit's per-tick updateWorldXforms() before
the writer scope started pausing IFabricHierarchy tracking.

With tracking-pause in place, the asymmetric flags are no longer
load-bearing.  Reduce to two persistent selections:

  _sel_ro :  worldMatrix=RO, localMatrix=RO   steady state
  _sel_rw :  worldMatrix=RW, localMatrix=RW   inside writer scope

Each selection owns its own bundle of indexed-fabric arrays.  The writer
scope flips a single _is_rw flag on enter/exit; both bundles stay alive
for the view's lifetime, so no selection is rebuilt on the flip.
PrepareForReuse() topology polls run independently per bundle.

Net: -10 lines of code, -1 selection, -1 set of fabric_indices.  The
two whitebox tests that reached into the old triple are updated to
poll the new pair.  All 79 fabric-suite tests pass and the
test_output_equal_to_usdcamera camera+RTX regression that motivated
this PR still passes.
When a FabricFrameView writer scope unwinds via an exception (a Python
error in user code, a notebook cell interrupt, etc.), the opposite-space
derive + wp.synchronize() now runs anyway so worldMatrix and localMatrix
remain mutually consistent prim-by-prim on whatever partial-write state
Fabric currently holds.  The partial write itself is not rolled back --
callers needing transactional semantics should snapshot the matrices
themselves before entering the scope.

The recovery launch is itself wrapped in try/except: if it fails (e.g.
the original exception came from a poisoned CUDA stream), the recovery
error is logged and the original exception propagates -- masking it
would hide the actual root cause.  Hierarchy-tracking restore and the
_is_rw flip happen in finally as before.

Add a regression test that raises inside a writer scope and verifies:
  - tracking state restored
  - _is_rw back to False
  - world matrices reflect the partial write
  - local matrices derived from that partial-write world
  - the view is still usable for further writer scopes
The previous wording suggested IFabricHierarchy.update_world_xforms()
could fire interleaved with our writes (a "between our write and the
renderer's read" race).  That cannot happen: the scope is synchronous
Python code and no simulation step or render tick runs while it is
open.  The risk the tracking pause and the RO steady-state selection
actually defend against is the next render/sim tick after the scope
exits, not the scope itself.

Update FabricFrameView's class docstring to reflect this, and add an
explicit contract to FrameViewSpaceWriterBase: callers must not advance
the simulation or render from inside a scope, because the matrices may
be mid-write until exit and rendering would read torn data.
Previous wording suggested the pause was about keeping the renderer
from seeing half-written data.  That is wrong: the scope is synchronous
Python code, so the renderer cannot run mid-scope and cannot see torn
state.  The "torn data" concern only motivates the separate rule that
callers must not advance the simulation from inside the scope.

What the pause actually prevents is Fabric updating the dependent xforms
itself, in two ways the scope wants to keep exclusive:

  1. Per-write, inside the scope -- the tracker would fire synchronously
     on each set_* and propagate to the opposite-space matrix (and to
     descendants, where applicable).
  2. On the next render tick -- update_world_xforms() would replay queued
     tracker work and potentially derive the wrong direction (e.g.
     recompute world from local even though we just wrote world).

We do the dependent-xform update ourselves in one batched kernel at
scope exit, so the pause stops Fabric from doing the same work
redundantly during the scope and incorrectly after it.
Fabric change tracking is pull-based: a per-attribute listener records
writes into a private changelog, and Kit drains and processes that
changelog on the next call to IFabricHierarchy::update_world_xforms()
(typically from the render path).  "Tracking off" just stops the
listener from recording new entries -- writes still land in Fabric
storage; they are simply invisible to the next update_world_xforms()
call.

Previous docstring claimed the pause prevented "per-write synchronous
propagation through the tracker".  There is no such synchronous path:
nothing fires until the next update_world_xforms() tick.  Rewrite the
bullet to describe the actual mechanism, and explain why an empty
changelog at scope exit is the property we need (the next tick has
no entries to process for our prims and therefore can't pick a
canonical direction and derive the other from it).

Source: kit/runtime/source/plugins/usdrt.hierarchy/FabricHierarchy.cpp
  - trackLocalXformChanges_abi -> pauseChangeTracking/resumeChangeTracking
  - updateWorldXforms_abi reads getChanges()/popChanges() on each listener
Fabric is just the flat attribute data store.  The plugin that owns the
changelog listeners, the track_*_xform_changes toggles, and the
update_world_xforms() step is Fabric Hierarchy (usdrt::hierarchy::
IFabricHierarchy), not Fabric itself.  The renderer reads Fabric
attributes through the Fabric Scene Delegate (FSD), not via some
generic "renderer worldMatrix listener".

Rework the writer-scope's listener bullet to use these names precisely:

- the tracking pause and the changelog belong to Fabric Hierarchy
- update_world_xforms() is a method on Fabric Hierarchy, not on "Kit"
- FSD reads omni:fabric:worldMatrix from Fabric storage on the render path

The IsaacLab project-level shorthand "Fabric backend" / FabricFrameView /
_use_fabric is unchanged -- those are API labels, not technical claims.
Changelog fragments are read against develop, not against an in-branch
interim state.  Remove every entry that describes a feature added then
removed within this PR -- net effect vs develop is zero, and mentioning
the round-trip only confuses reviewers.

Specifically:
- isaaclab/xform-space-writer: drop the "Removed: set_world_scales /
  set_local_scales" section.  Those methods were added in
  fabric-local-poses and removed here; they never existed on develop.
- isaaclab/fabric-local-poses: drop set_world_scales / set_local_scales
  from the Added list for the same reason; keep only the surviving
  getters and route scale writes via the writer scope.
- isaaclab_physx/fabric-local-poses: drop the "lazy dirty tracking" and
  "interleave detection" entries; both mechanisms were added and then
  removed by the writer-scope migration.
- isaaclab_physx/xform-space-writer: drop the paragraph that lists the
  removed _DirtyFlag enum / _sync_*_if_dirty helpers.  Also switch the
  remaining prose to the correct Omniverse terminology -- Fabric
  Hierarchy owns update_world_xforms() and the change tracking; FSD
  feeds the renderer.
- isaaclab_newton + isaaclab_ovphysx/fabric-local-poses: drop
  set_world_scales / set_local_scales from the Added/Deprecated lists.

Also include the prior wording fix "Will be used by" -> "Used by" in
the utils docs (the kernels are now in use).

While I'm in this file, drop the deprecation-replacement clauses that
told users to use set_world_scales / set_local_scales -- those methods
don't exist; the writer scope is the right pointer.

Plus: add a comment in scripts/benchmarks/benchmark_xform_prim_view.py
explaining why each wp.clone() now goes through .warp (ProxyArray
introduced by PR isaac-sim#5304 changed the FrameView getter return type).
The benchmark mixes a one-time view-construction cost (`init`) with
per-iteration steady-state ops in the same dict.  The Total row and
Overall speedup were both summing every value in that dict, so a
backend whose first call materializes the view (NewtonSiteFrameView
took ~1 s of stage population in the reported run) crushed the per-iter
ops in the totals and printed a misleading 0.13x overall.

Separate the two: keep Initialization in the per-op table for
visibility, but compute Total and Overall over the per-iter operations
only.  The new label makes the scope explicit -- "Per-iter total" and
"SPEEDUP vs USD (per-iter ops; one-time init excluded)".

With the user's Blackwell RTX PRO 5000 numbers, this turns the
nonsensical 0.13x Newton overall into 207x and the dampened 25x Fabric
overall into 164x.
Tests should describe and exercise the CURRENT API, not history.

Removed
  - test_set_world_scales_method_no_longer_exists (whitebox assertion
    that two never-shipped attributes are absent from BaseFrameView).

Renamed + docstrings rewritten (names and docstrings referenced
``set_world_scales`` / ``set_local_scales`` which do not exist; the test
bodies already use the writer scope):
  - frame_view_contract_utils.py:
      test_set_local_scales_roundtrip -> test_local_scales_roundtrip
      test_set_world_scales_roundtrip -> test_world_scales_roundtrip
  - test_views_xform_prim.py:
      test_set_local_scales_then_get_world_scales
        -> test_world_scale_composes_with_parent_scale
      test_set_world_scales_then_get_local_scales
        -> test_local_scale_inverts_parent_when_writing_world_scale

Stale docstring references to the removed dirty-tracking and the
deprecated set_*_scales API also corrected in three fabric tests
(test_local_scales_roundtrip, test_world_scales_roundtrip,
test_fabric_cuda1_scales_roundtrip, test_initial_seed_with_scaled_parent).
FabricFrameView's getters (_get_world_poses_impl, _get_local_poses_impl,
and the shared _decompose_scales for both scale getters) only called
wp.synchronize() on the cached path (indices is None / slice(None)).
The per-indices path returned a fresh ProxyArray without syncing, so a
caller that did .numpy() or wp.to_torch().cpu() on it could observe
zeros from the wp.zeros initializer (the kernel hadn't completed).

Move the sync below the kernel launch and call it unconditionally.
This:
  - removes the asymmetry between the cached and the indexed paths,
  - matches what the class docstring claims about getters being
    immediately readable,
  - is essentially free for the cached path (was already syncing) and
    cheap for the indexed path (the kernel itself is the dominant cost).

Update the class docstring to describe the new behaviour: every getter
launches its decompose kernel and synchronizes before returning, so the
returned ProxyArray is immediately readable from GPU or host without
any caller-side sync.

Reported by Greptile on PR isaac-sim#5677.
_compute_fabric_indices and _compute_parent_fabric_indices both walked
selection.GetPaths(), built the same path->index dict, then iterated
self.prim_paths to look up either the prim itself or its parent.  The
selection-walk + dict + lookup loop is exactly what
_compute_fabric_indices_for(selection, paths) already does for one-off
index arrays.

Have the two specialised helpers delegate to it.  The parent variant
keeps its stage-root precondition inline (where it reads naturally,
next to the rsplit) and builds the parent path list before calling the
shared primitive.  The child variant is a one-line passthrough.

No behavioural change: same selection walks, same indices, same
ordering.  Saves ~18 lines and removes a duplicated dict construction.
The shared primitive's docstring is updated to reflect its new role as
the canonical lookup, no longer just a one-shot.
Two helpers were building lists with .append() inside an imperative
for-loop:

  - _compute_fabric_indices_for: walked paths -> looked up indices,
    raised on miss, appended to a list.
  - _compute_parent_fabric_indices: walked self.prim_paths -> derived
    parent path, validated stage-root, appended to a list.

Factor the per-element step into a named local function whose body
documents the intent (lookup + validation, parent derivation +
validation), then drive it from a list comprehension.  Same control
flow, denser at the call site, the loop body's purpose is now a
one-token name.
@pv-nvidia pv-nvidia force-pushed the pv/fabric-local-poses branch from b5bcd15 to 70f1400 Compare June 24, 2026 10:07
The world-body fallback in NewtonSiteFrameView._resolve_source_prim resolved
the reference prim directly from source_root, instead of stripping the
clone-template suffix as the prior code did. For heterogeneous (multi-asset
spawner) scenes this resolved the wrong reference frame, offsetting site world
poses and breaking rendering-correctness (dexsuite kuka hetero kitless).

Restore the split_clone_template/get_suffix-based ref_path computation (and the
dropped imports). Homogeneous scenes were unaffected; hetero rgb/depth now
match golden images again.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation isaac-lab Related to Isaac Lab team isaac-mimic Related to Isaac Mimic team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant