Skip to content

BVHAccelerator.compute_multipath broadcasts thread-0 result across all sats when n_sat > 1 #49

Description

@rsasaki0109

Summary

BVHAccelerator.compute_multipath(rx_ecef, sat_ecef) (CUDA kernel multipath_bvh_kernel in src/raytrace/bvh.cu) returns thread-0's result for every satellite when n_sat > 1. Calling it once per satellite (n_sat=1) gives the correct, distinct per-satellite values. The new compute_multipath_batch added in #48 is not affected.

Reproduction

import numpy as np
from gnss_gpu.raytrace import BuildingModel
from gnss_gpu.bvh import BVHAccelerator

building = BuildingModel.create_box(center=[100.0, 0.0, 25.0],
                                     width=20.0, depth=20.0, height=50.0)
bvh = BVHAccelerator.from_building_model(building)

rx = np.array([22.49339292, -7.97963954, 4.7373692])

# Each sat called ALONE (correct):
for s in [[200, -10, 25], [210, -10, 26], [220, -10, 27], [230, -10, 28]]:
    d, _ = bvh.compute_multipath(rx, np.array([s]))
    print(f"alone {s}: delay = {d[0]:.4f}")
# alone [200,-10,25]: 3.9809
# alone [210,-10,26]: 0.0000
# alone [220,-10,27]: 0.0000
# alone [230,-10,28]: 0.0000

# All four sats together (BUG):
sats = np.array([[200,-10,25],[210,-10,26],[220,-10,27],[230,-10,28]], dtype=float)
d, _ = bvh.compute_multipath(rx, sats)
print(f"together: delays = {d}")
# together: delays = [3.9809 3.9809 3.9809 3.9809]   ← all thread-0's answer

Reversing the satellite order gives the opposite broadcast:

sats_rev = sats[::-1].copy()
d, _ = bvh.compute_multipath(rx, sats_rev)
# delays = [0. 0. 0. 0.]  (because sat 0 in this order has no reflection)

So thread 0's best_delay becomes the value seen by every other thread in the launch.

Geometry sanity check (why sat 0 is the only one with a reflection)

Box: x∈[90,110], y∈[-10,10], z∈[0,50]. Sats are at x=200..230, y=-10, near elevation 0. Mirror across the y=+10 wall puts the reflection at (≈111.3, 10, ≈14.3) for sat 0 — just inside the wall (x<110), excess delay ≈3.98 m. For sat 1 the mirror reflection lands at x≈111.4 → outside the wall → no valid reflection. Same logic excludes sats 2 and 3. Calling compute_multipath per satellite confirms this.

Diagnosis pointer

BVHAccelerator.check_los (LOS kernel) does not show this behaviour, so the issue seems specific to multipath_bvh_kernel. Likely candidates worth checking:

  • Register pressure with int stack[64] + the inner mirror/intersect math forcing local-memory spills that alias across threads
  • Some compiler optimization that hoists the read of sat_ecef[sid * 3 + …] out of per-thread context

Suggested fix

Replace the single-rx compute_multipath with a thin wrapper around the batched kernel from #48 (raytrace_multipath_bvh_batch with n_epoch=1). The batched kernel uses one thread per (epoch, sat) and was verified bug-free against per-satellite reference calls in tests added in #48.

Impact

Anyone calling BVHAccelerator.compute_multipath with n_sat > 1 is silently getting wrong per-satellite delays. Users of the linear-scan BuildingModel.compute_multipath are unaffected (different kernel in src/raytrace/raytrace.cu that uses one thread per (sat, triangle) pair).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions