Summary
BVHAccelerator.compute_multipath(rx_ecef, sat_ecef) (CUDA kernel multipath_bvh_kernel in src/raytrace/bvh.cu) returns thread-0's result for every satellite when n_sat > 1. Calling it once per satellite (n_sat=1) gives the correct, distinct per-satellite values. The new compute_multipath_batch added in #48 is not affected.
Reproduction
import numpy as np
from gnss_gpu.raytrace import BuildingModel
from gnss_gpu.bvh import BVHAccelerator
building = BuildingModel.create_box(center=[100.0, 0.0, 25.0],
width=20.0, depth=20.0, height=50.0)
bvh = BVHAccelerator.from_building_model(building)
rx = np.array([22.49339292, -7.97963954, 4.7373692])
# Each sat called ALONE (correct):
for s in [[200, -10, 25], [210, -10, 26], [220, -10, 27], [230, -10, 28]]:
d, _ = bvh.compute_multipath(rx, np.array([s]))
print(f"alone {s}: delay = {d[0]:.4f}")
# alone [200,-10,25]: 3.9809
# alone [210,-10,26]: 0.0000
# alone [220,-10,27]: 0.0000
# alone [230,-10,28]: 0.0000
# All four sats together (BUG):
sats = np.array([[200,-10,25],[210,-10,26],[220,-10,27],[230,-10,28]], dtype=float)
d, _ = bvh.compute_multipath(rx, sats)
print(f"together: delays = {d}")
# together: delays = [3.9809 3.9809 3.9809 3.9809] ← all thread-0's answer
Reversing the satellite order gives the opposite broadcast:
sats_rev = sats[::-1].copy()
d, _ = bvh.compute_multipath(rx, sats_rev)
# delays = [0. 0. 0. 0.] (because sat 0 in this order has no reflection)
So thread 0's best_delay becomes the value seen by every other thread in the launch.
Geometry sanity check (why sat 0 is the only one with a reflection)
Box: x∈[90,110], y∈[-10,10], z∈[0,50]. Sats are at x=200..230, y=-10, near elevation 0. Mirror across the y=+10 wall puts the reflection at (≈111.3, 10, ≈14.3) for sat 0 — just inside the wall (x<110), excess delay ≈3.98 m. For sat 1 the mirror reflection lands at x≈111.4 → outside the wall → no valid reflection. Same logic excludes sats 2 and 3. Calling compute_multipath per satellite confirms this.
Diagnosis pointer
BVHAccelerator.check_los (LOS kernel) does not show this behaviour, so the issue seems specific to multipath_bvh_kernel. Likely candidates worth checking:
- Register pressure with
int stack[64] + the inner mirror/intersect math forcing local-memory spills that alias across threads
- Some compiler optimization that hoists the read of
sat_ecef[sid * 3 + …] out of per-thread context
Suggested fix
Replace the single-rx compute_multipath with a thin wrapper around the batched kernel from #48 (raytrace_multipath_bvh_batch with n_epoch=1). The batched kernel uses one thread per (epoch, sat) and was verified bug-free against per-satellite reference calls in tests added in #48.
Impact
Anyone calling BVHAccelerator.compute_multipath with n_sat > 1 is silently getting wrong per-satellite delays. Users of the linear-scan BuildingModel.compute_multipath are unaffected (different kernel in src/raytrace/raytrace.cu that uses one thread per (sat, triangle) pair).
Summary
BVHAccelerator.compute_multipath(rx_ecef, sat_ecef)(CUDA kernelmultipath_bvh_kernelinsrc/raytrace/bvh.cu) returns thread-0's result for every satellite whenn_sat > 1. Calling it once per satellite (n_sat=1) gives the correct, distinct per-satellite values. The newcompute_multipath_batchadded in #48 is not affected.Reproduction
Reversing the satellite order gives the opposite broadcast:
So thread 0's
best_delaybecomes the value seen by every other thread in the launch.Geometry sanity check (why sat 0 is the only one with a reflection)
Box: x∈[90,110], y∈[-10,10], z∈[0,50]. Sats are at x=200..230, y=-10, near elevation 0. Mirror across the y=+10 wall puts the reflection at (≈111.3, 10, ≈14.3) for sat 0 — just inside the wall (x<110), excess delay ≈3.98 m. For sat 1 the mirror reflection lands at x≈111.4 → outside the wall → no valid reflection. Same logic excludes sats 2 and 3. Calling
compute_multipathper satellite confirms this.Diagnosis pointer
BVHAccelerator.check_los(LOS kernel) does not show this behaviour, so the issue seems specific tomultipath_bvh_kernel. Likely candidates worth checking:int stack[64]+ the inner mirror/intersect math forcing local-memory spills that alias across threadssat_ecef[sid * 3 + …]out of per-thread contextSuggested fix
Replace the single-rx
compute_multipathwith a thin wrapper around the batched kernel from #48 (raytrace_multipath_bvh_batchwithn_epoch=1). The batched kernel uses one thread per(epoch, sat)and was verified bug-free against per-satellite reference calls in tests added in #48.Impact
Anyone calling
BVHAccelerator.compute_multipathwithn_sat > 1is silently getting wrong per-satellite delays. Users of the linear-scanBuildingModel.compute_multipathare unaffected (different kernel insrc/raytrace/raytrace.cuthat uses one thread per(sat, triangle)pair).