fix(resample): NaN handling in adaptive sampling + L1 validity on same-size replacement by jeandet · Pull Request #35 · SciQLop/NeoQCP

jeandet · 2026-06-12T12:22:32Z

Stacked on #34 (← #33 ← #32) — merge those first; this PR's diff then collapses to its own commit (575f666).

optimizedLineData derived an interval's closing point as intervalFirst + intervalCount - 1, but NaN samples are skipped WITHOUT counting: with interior NaNs that index lands on a NaN, injecting a spurious (NaN, NaN) line break — phantom gaps in dense gappy data (the norm in space physics). The interval now tracks the index of the last accepted sample explicitly.
The multi-column variant had two related NaN holes: an interval reset landing on a NaN sample seeded minVal/maxVal with NaN, poisoning std::min/std::max for the rest of the interval (envelope silently dropped); and a single-sample NaN interval appended an explicit break point where the adaptive contract (matching the single-column variant) is to skip NaN values — key gaps are detected separately.

Reproducer: adaptiveSamplingInteriorNaNNoSpuriousBreaks (gap-free keys, every other value NaN → output contained NaN break points before).
buildL1Cache reused an existing L1 when only sourceSize matched; the previously dead cachedKeyRange member now participates in the validity check (keyRange is O(1) on sorted sources), so replacing the data with an equally-sized batch can never reuse a stale binning.

Part of the 2026-06-09 audit (H4/M7). Last of the stack: #32 → #33 → #34 → this.

🤖 Generated with Claude Code

Copilot

Pull request overview

This PR tightens correctness and safety around resampling/pipeline caching, with a focus on NaN-handling in adaptive sampling and preventing stale cache reuse when data is replaced (even at the same size). It also adds/updates regression tests that exercise the previously failing edge cases (NaN holes, async teardown, scheduler shutdown, histogram key restrictions).

Changes:

Fix adaptive line sampling to skip NaNs without introducing phantom line breaks; harden multi-column min/max tracking against NaN-seeded intervals.
Strengthen L1 cache validity checks (key-range participates) and improve sorted-key range performance (keyRangeSorted).
Improve async pipeline / scheduler teardown safety and add regression tests for deletion-with-jobs-queued and destructor queue dropping.

Reviewed changes

Copilot reviewed 25 out of 25 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
tests/auto/test-waterfall/test-waterfall.h	Adds new waterfall regression test declarations for snapshot semantics and concurrency safety.
tests/auto/test-waterfall/test-waterfall.cpp	Adds tests for parameter changes applying immediately and for concurrent parameter/data changes during resampling.
tests/auto/test-scatter-rhi/test-scatter-rhi.cpp	Fixes a dangling-span test setup by moving owned vectors into setData.
tests/auto/test-pipeline/test-pipeline.h	Adds new pipeline regression test declarations (scheduler dtor, colormap2 teardown, histogram2d cache invalidation).
tests/auto/test-pipeline/test-pipeline.cpp	Adds regression tests for scheduler teardown behavior, colormap2 queued jobs after delete, and histogram2d restricted-range behavior + cache invalidation.
tests/auto/test-multi-datasource/test-multi-datasource.h	Adds adaptive-sampling NaN regression test declaration.
tests/auto/test-multi-datasource/test-multi-datasource.cpp	Adds regression test ensuring no spurious NaN breakpoints are emitted for interior NaNs.
tests/auto/test-datasource2d/test-datasource2d.cpp	Drains the async pipeline to avoid stack-use-after-return in a test that uses non-owning views.
tests/auto/test-datasource/test-datasource.h	Adds windowed valueRange equivalence test declaration.
tests/auto/test-datasource/test-datasource.cpp	Adds brute-force equivalence test for key-restricted valueRange vs binary-search windowed implementation.
src/plottables/plottable-waterfall.h	Refactors waterfall adapter into an immutable snapshot; routes parameter/data changes through adapter rebuild.
src/plottables/plottable-waterfall.cpp	Implements immutable adapter + rebuild flow and removes draw-time mutation.
src/plottables/plottable-histogram2d.h	Adds memoization for finite bounds (selectTest/keyRange) and cache invalidation hooks.
src/plottables/plottable-histogram2d.cpp	Implements bounds caching and adds linear restricted-range scan for unsorted histogram keys.
src/plottables/plottable-colormap2.h	Changes gap threshold API to rebake the transform and avoid capturing object memory in background jobs.
src/plottables/plottable-colormap2.cpp	Moves transform installation into a helper and rebakes it on threshold changes; ensures re-resample when needed.
src/datasource/soa-multi-datasource.h	Uses keyRangeSorted (sorted-key contract) for O(1)/O(log n) key range.
src/datasource/row-major-multi-datasource.h	Uses keyRangeSorted (sorted-key contract) for O(1)/O(log n) key range.
src/datasource/pipeline-scheduler.h	Adds shutdown flag to prevent queued work from running during destruction.
src/datasource/pipeline-scheduler.cpp	Drops queued work on destruction and rejects submissions during shutdown before joining the pool.
src/datasource/graph-resampler.h	Strengthens L1 cache reuse checks by incorporating cached key range (not just size).
src/datasource/async-pipeline.h	Replaces atomic destroyed-flag with a mutex-guarded destroy guard shared with jobs to close TOCTOU.
src/datasource/async-pipeline.cpp	Implements guarded destroyed-check + removes redundant destroyed-check in deliverResult.
src/datasource/algorithms.h	Adds binary-search windowing to valueRange with key restriction; fixes adaptive sampling closing-point selection with NaN skipping; hardens multi-column NaN min/max handling.
src/datasource/algorithms-2d.h	Uses keyRangeSorted for 2D xRange to avoid O(N) scans under the sorted contract.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

jeandet · 2026-06-12T19:59:37Z

 {
    mOriginalSource = std::move(source);
-    mAdapter->setSource(mOriginalSource);
    mNormDirty = true;
-    QCPMultiGraph::setDataSource(mAdapter);
+    rebuildAdapter();
 }


Already addressed — same finding as on #33, fixed by cfa000b ("clearing the source uninstalls the adapter too", reproducer clearingSourceEmptiesTheGraph), now merged on main. This PR is rebased on top, so the flagged code is no longer part of its diff.

Three hot paths scanned entire datasets despite the sorted-keys contract (docs/backlog-2026-06-10.md M1/M2/M4 in SciQLopPlots): - keyRange: the multi-column and 2D sources used the linear-scan qcp::algo::keyRange while keyRangeSorted (O(1)/O(log n)) existed and was already used by the single-column source. QCPColorMap2's pipeline calls xRange on every viewport change (each pan step) and selectTest per hit-test. Switched QCPSoAMultiDataSource, QCPRowMajorMultiDataSource and qcp::algo2d::xRange to keyRangeSorted (2D yRange stays linear: variable-Y layouts are not globally sorted). - valueRange with a key restriction iterated all N points testing each key against the window: a zoomed-in autoscale on a 10M x 4 multigraph read 40M elements on the GUI thread per data_changed. The scan is now bounded by findBegin/findEnd. Histogram keys are scattered (sorted contract does not hold), so QCPHistogram2D::getValueRange now does its own linear scan for restricted ranges instead of inheriting the windowed source path. - QCPHistogram2D::selectTest ran finiteKeyValueBounds — a full O(N) scan through virtual accessors — on every mouse press. Bounds are now memoized per (keyPositiveOnly, valuePositiveOnly), invalidated by setDataSource/dataChanged; getKeyRange shares the cache. Measured on the SciQLopPlots side (10M x 3 multigraph, 0.1% visible, together with the companion windowed-percentile change there): min/max rescale ~17 ms -> ~0.04 ms, percentile rescale ~7 ms -> ~0.5 ms per call. Tests: algoValueRangeRestrictionMatchesBruteForce (windowed == brute force across boundary-exact/outside/degenerate windows and sign domains), histogram2dValueRangeRestrictedUnsortedKeys (guards the scattered-keys path against the sorted-window optimization), histogram2dSelectTestReflectsNewData (cache invalidation). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…e-size replacement - optimizedLineData derived an interval's closing point as intervalFirst + intervalCount - 1, but NaN samples are skipped WITHOUT counting: with interior NaNs that index lands on a NaN, injecting a spurious (NaN, NaN) line break — phantom gaps in dense gappy data (the norm in space physics). The interval now tracks the index of the last accepted sample explicitly. - The multi-column variant had two related NaN holes: an interval reset landing on a NaN sample seeded minVal/maxVal with NaN, poisoning std::min/max for the rest of the interval (envelope silently dropped); and a single-sample NaN interval appended an explicit break point where the adaptive contract (matching the single-column variant) is to skip NaN values — key gaps are detected separately. Reproducer: adaptiveSamplingInteriorNaNNoSpuriousBreaks (gap-free keys, every other value NaN -> output contained NaN break points before). - buildL1Cache reused an existing L1 when only sourceSize matched; the previously dead cachedKeyRange member now participates in the validity check (keyRange is O(1) on sorted sources), so replacing the data with an equally-sized batch can never reuse a stale binning. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

jeandet force-pushed the fix/small-sweep branch from 575f666 to 094bcdd Compare June 12, 2026 16:51

jeandet requested a review from Copilot June 12, 2026 18:13

Copilot started reviewing on behalf of jeandet June 12, 2026 18:13 View session

Copilot AI reviewed Jun 12, 2026

View reviewed changes

jeandet force-pushed the fix/small-sweep branch 2 times, most recently from 9c9cc72 to 60e7a1b Compare June 12, 2026 19:55

jeandet and others added 2 commits June 12, 2026 21:58

jeandet force-pushed the fix/small-sweep branch from 60e7a1b to efd5d9a Compare June 12, 2026 19:59

jeandet merged commit b308450 into SciQLop:main Jun 12, 2026
5 checks passed

jeandet deleted the fix/small-sweep branch June 12, 2026 20:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(resample): NaN handling in adaptive sampling + L1 validity on same-size replacement#35

fix(resample): NaN handling in adaptive sampling + L1 validity on same-size replacement#35
jeandet merged 2 commits into
SciQLop:mainfrom
jeandet:fix/small-sweep

jeandet commented Jun 12, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

jeandet Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

jeandet commented Jun 12, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

jeandet Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants