Skip to content

perf: optimize oriented BV traversal (precompute transform, early exits)#842

Closed
facontidavide wants to merge 1 commit into
coal-library:develfrom
facontidavide:perf/oriented-bv-traversal
Closed

perf: optimize oriented BV traversal (precompute transform, early exits)#842
facontidavide wants to merge 1 commit into
coal-library:develfrom
facontidavide:perf/oriented-bv-traversal

Conversation

@facontidavide

Copy link
Copy Markdown

Summary

Three thematically-related micro-optimisations in the BVH-traversal hot path for oriented bounding volumes (OBB, RSS, kIOS, OBBRSS):

  1. Precompute the inverse relative transform once at the top of MeshCollisionTraversalNode (RT._RTranspose / RT._InvT) and reuse it via new overlapPrecomputedRTranspose() overloads on each oriented BV. Replaces the per-leaf R^T * (a - T) computation with a single direct evaluation per node.
  2. Skip the redundant oriented-mesh seed pass that computed an initial triangle-pair distance bound before the recursion. The recursion's first leaf already establishes a usable bound; the seed was duplicating work for ~30 lines of code.
  3. Skip the leaf sqrt() and result update entirely when the squared distance already exceeds the current min. The hardened guard added in the SIMD-support commit handles the negative / Scalar::max edge cases.

Implementation notes

  • include/coal/BV/BV.h, OBB.h, OBBRSS.h, RSS.h, kIOS.h: each gains an overlapPrecomputedRTranspose(const Matrix3s& Rt, const Vec3s& InvT, const BV& other) const overload (free for OBBRSS — internal compose; one-line wrapper for the others).
  • src/BV/{OBB,RSS,kIOS}.cpp: implement the new overloads; reuse the same SAT bodies as the existing overlap() but read from the precomputed Rt/InvT arguments.
  • include/coal/internal/traversal.h gains a RelativeTransformation::precomputeInverse() helper.
  • include/coal/internal/traversal_node_bvhs.h: caches the inverse at node setup; the traversal's per-node-pair BVDisjoints now calls overlapPrecomputedRTranspose instead of re-computing from scratch. Drops the seed-pass code path. The leaf sqrt-skip uses the same guard logic that landed with the SIMD-support commit.
  • include/coal/internal/traversal_node_setup.h: shaves 4 lines that initialised the now-unused seed scratch state.

Performance impact

Methodology

  • Hardware: x86-64 desktop, P-core pinned (taskset -c 4), turbo locked.
  • Build: cmake --build build -j12 -DCMAKE_BUILD_TYPE=Release in isolated worktrees; separate libcoal.so per variant. Both base and variant compiled with stock upstream flags.
  • Runs: N = 15 interleaved (base, variant, base, variant, …); median reported.
  • Workload: coal-test-benchmark (test/benchmark.cpp) — exercises all 4 oriented BV types for both collision and distance queries.
Workload Before (µs, median) After (µs, median) Δ N stdev (µs)
coal-test-benchmark total 62474.8 50350.1 -19.41% 15 base 170.1 / variant 210.1

Correctness gate: full ctest suite passes on the variant build (excluding python/nanobind tests).

Tests

  • No new test added — relies on the existing coal-collision, coal-distance, and coal-distance_lower_bound suites which already exercise all 4 oriented BV types (OBB, RSS, kIOS, OBBRSS) with strict numerical tolerances. The "skip seed pass" change is implicitly covered by the distance-lower-bound assertions in those suites — if the seed were load-bearing, those bounds would shift detectably. The coal-distance test in particular runs end-to-end mesh-distance over env.obj vs rob.obj for each BV type and asserts results match within a tight epsilon.
  • An explicit golden-output regression test was considered but rejected: the existing tests already pin the numerics, and a duplicate test would need to hardcode magic numbers that bitrot the moment any inner-loop value shifts within tolerance.

Risk & regression analysis

  • Numerical equivalence: the three changes preserve the algorithm's mathematical contract. Precomputing R^T is identity-preserving; the seed pass was redundant (the recursion's first leaf produces an equally tight bound); the sqrt-skip only fires when the result would be discarded anyway.
  • Edge cases on sqrt skip: with the hardened squared-distance guard from earlier work in this branch, negative or Scalar::max sentinel values are caught upstream — the skip never sees them.
  • overlapPrecomputedRTranspose API surface: new public method on each oriented BV. Header-only; no ABI break.
  • No public-API behaviour change: MeshCollisionTraversalNode<BV> exposes the same BVDisjoints and leafCollides semantics; only the implementation changes.

Three thematically-related micro-optimizations to the BVH traversal hot
path for oriented bounding volumes (OBB, RSS, kIOS, OBBRSS):

- Precompute inverse relative transform (RT._RTranspose / RT._InvT)
  once at the top of MeshCollisionTraversalNode and reuse it via the
  new overlapPrecomputedRTranspose() overloads. Replaces the per-leaf
  R^T * (a - T) computation with a direct R'_a + T'_a evaluation.

- Skip the redundant oriented-mesh "seed" pass that computed an initial
  triangle-pair distance bound before the recursion. The recursion's
  first leaf already establishes a usable bound at no extra cost; the
  seed was duplicating work and adding ~30 lines of code.

- Skip the leaf sqrt() and result update entirely when the squared
  distance already exceeds the current min. The hardened guard added
  in the previous SIMD commit handles the negative / Scalar::max edge
  cases.

These three changes together cut a measurable fraction of per-pair work
in the polso/gen4 mesh-vs-mesh distance benchmark.
@lmontaut

Copy link
Copy Markdown
Contributor

See #858

@lmontaut lmontaut closed this May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr status wip To not review in weekly meeting

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants