Skip to content

fix(resample): NaN handling in adaptive sampling + L1 validity on same-size replacement#35

Merged
jeandet merged 2 commits into
SciQLop:mainfrom
jeandet:fix/small-sweep
Jun 12, 2026
Merged

fix(resample): NaN handling in adaptive sampling + L1 validity on same-size replacement#35
jeandet merged 2 commits into
SciQLop:mainfrom
jeandet:fix/small-sweep

Conversation

@jeandet

@jeandet jeandet commented Jun 12, 2026

Copy link
Copy Markdown
Member

Stacked on #34 (← #33#32) — merge those first; this PR's diff then collapses to its own commit (575f666).

  • optimizedLineData derived an interval's closing point as intervalFirst + intervalCount - 1, but NaN samples are skipped WITHOUT counting: with interior NaNs that index lands on a NaN, injecting a spurious (NaN, NaN) line break — phantom gaps in dense gappy data (the norm in space physics). The interval now tracks the index of the last accepted sample explicitly.

  • The multi-column variant had two related NaN holes: an interval reset landing on a NaN sample seeded minVal/maxVal with NaN, poisoning std::min/std::max for the rest of the interval (envelope silently dropped); and a single-sample NaN interval appended an explicit break point where the adaptive contract (matching the single-column variant) is to skip NaN values — key gaps are detected separately.

    Reproducer: adaptiveSamplingInteriorNaNNoSpuriousBreaks (gap-free keys, every other value NaN → output contained NaN break points before).

  • buildL1Cache reused an existing L1 when only sourceSize matched; the previously dead cachedKeyRange member now participates in the validity check (keyRange is O(1) on sorted sources), so replacing the data with an equally-sized batch can never reuse a stale binning.

Part of the 2026-06-09 audit (H4/M7). Last of the stack: #32#33#34this.

🤖 Generated with Claude Code

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR tightens correctness and safety around resampling/pipeline caching, with a focus on NaN-handling in adaptive sampling and preventing stale cache reuse when data is replaced (even at the same size). It also adds/updates regression tests that exercise the previously failing edge cases (NaN holes, async teardown, scheduler shutdown, histogram key restrictions).

Changes:

  • Fix adaptive line sampling to skip NaNs without introducing phantom line breaks; harden multi-column min/max tracking against NaN-seeded intervals.
  • Strengthen L1 cache validity checks (key-range participates) and improve sorted-key range performance (keyRangeSorted).
  • Improve async pipeline / scheduler teardown safety and add regression tests for deletion-with-jobs-queued and destructor queue dropping.

Reviewed changes

Copilot reviewed 25 out of 25 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/auto/test-waterfall/test-waterfall.h Adds new waterfall regression test declarations for snapshot semantics and concurrency safety.
tests/auto/test-waterfall/test-waterfall.cpp Adds tests for parameter changes applying immediately and for concurrent parameter/data changes during resampling.
tests/auto/test-scatter-rhi/test-scatter-rhi.cpp Fixes a dangling-span test setup by moving owned vectors into setData.
tests/auto/test-pipeline/test-pipeline.h Adds new pipeline regression test declarations (scheduler dtor, colormap2 teardown, histogram2d cache invalidation).
tests/auto/test-pipeline/test-pipeline.cpp Adds regression tests for scheduler teardown behavior, colormap2 queued jobs after delete, and histogram2d restricted-range behavior + cache invalidation.
tests/auto/test-multi-datasource/test-multi-datasource.h Adds adaptive-sampling NaN regression test declaration.
tests/auto/test-multi-datasource/test-multi-datasource.cpp Adds regression test ensuring no spurious NaN breakpoints are emitted for interior NaNs.
tests/auto/test-datasource2d/test-datasource2d.cpp Drains the async pipeline to avoid stack-use-after-return in a test that uses non-owning views.
tests/auto/test-datasource/test-datasource.h Adds windowed valueRange equivalence test declaration.
tests/auto/test-datasource/test-datasource.cpp Adds brute-force equivalence test for key-restricted valueRange vs binary-search windowed implementation.
src/plottables/plottable-waterfall.h Refactors waterfall adapter into an immutable snapshot; routes parameter/data changes through adapter rebuild.
src/plottables/plottable-waterfall.cpp Implements immutable adapter + rebuild flow and removes draw-time mutation.
src/plottables/plottable-histogram2d.h Adds memoization for finite bounds (selectTest/keyRange) and cache invalidation hooks.
src/plottables/plottable-histogram2d.cpp Implements bounds caching and adds linear restricted-range scan for unsorted histogram keys.
src/plottables/plottable-colormap2.h Changes gap threshold API to rebake the transform and avoid capturing object memory in background jobs.
src/plottables/plottable-colormap2.cpp Moves transform installation into a helper and rebakes it on threshold changes; ensures re-resample when needed.
src/datasource/soa-multi-datasource.h Uses keyRangeSorted (sorted-key contract) for O(1)/O(log n) key range.
src/datasource/row-major-multi-datasource.h Uses keyRangeSorted (sorted-key contract) for O(1)/O(log n) key range.
src/datasource/pipeline-scheduler.h Adds shutdown flag to prevent queued work from running during destruction.
src/datasource/pipeline-scheduler.cpp Drops queued work on destruction and rejects submissions during shutdown before joining the pool.
src/datasource/graph-resampler.h Strengthens L1 cache reuse checks by incorporating cached key range (not just size).
src/datasource/async-pipeline.h Replaces atomic destroyed-flag with a mutex-guarded destroy guard shared with jobs to close TOCTOU.
src/datasource/async-pipeline.cpp Implements guarded destroyed-check + removes redundant destroyed-check in deliverResult.
src/datasource/algorithms.h Adds binary-search windowing to valueRange with key restriction; fixes adaptive sampling closing-point selection with NaN skipping; hardens multi-column NaN min/max handling.
src/datasource/algorithms-2d.h Uses keyRangeSorted for 2D xRange to avoid O(N) scans under the sorted contract.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 92 to 96
{
mOriginalSource = std::move(source);
mAdapter->setSource(mOriginalSource);
mNormDirty = true;
QCPMultiGraph::setDataSource(mAdapter);
rebuildAdapter();
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already addressed — same finding as on #33, fixed by cfa000b ("clearing the source uninstalls the adapter too", reproducer clearingSourceEmptiesTheGraph), now merged on main. This PR is rebased on top, so the flagged code is no longer part of its diff.

@jeandet jeandet force-pushed the fix/small-sweep branch 2 times, most recently from 9c9cc72 to 60e7a1b Compare June 12, 2026 19:55
jeandet and others added 2 commits June 12, 2026 21:58
Three hot paths scanned entire datasets despite the sorted-keys contract
(docs/backlog-2026-06-10.md M1/M2/M4 in SciQLopPlots):

- keyRange: the multi-column and 2D sources used the linear-scan
  qcp::algo::keyRange while keyRangeSorted (O(1)/O(log n)) existed and was
  already used by the single-column source. QCPColorMap2's pipeline calls
  xRange on every viewport change (each pan step) and selectTest per
  hit-test. Switched QCPSoAMultiDataSource, QCPRowMajorMultiDataSource and
  qcp::algo2d::xRange to keyRangeSorted (2D yRange stays linear:
  variable-Y layouts are not globally sorted).

- valueRange with a key restriction iterated all N points testing each key
  against the window: a zoomed-in autoscale on a 10M x 4 multigraph read
  40M elements on the GUI thread per data_changed. The scan is now bounded
  by findBegin/findEnd. Histogram keys are scattered (sorted contract does
  not hold), so QCPHistogram2D::getValueRange now does its own linear scan
  for restricted ranges instead of inheriting the windowed source path.

- QCPHistogram2D::selectTest ran finiteKeyValueBounds — a full O(N) scan
  through virtual accessors — on every mouse press. Bounds are now
  memoized per (keyPositiveOnly, valuePositiveOnly), invalidated by
  setDataSource/dataChanged; getKeyRange shares the cache.

Measured on the SciQLopPlots side (10M x 3 multigraph, 0.1% visible,
together with the companion windowed-percentile change there): min/max
rescale ~17 ms -> ~0.04 ms, percentile rescale ~7 ms -> ~0.5 ms per call.

Tests: algoValueRangeRestrictionMatchesBruteForce (windowed == brute force
across boundary-exact/outside/degenerate windows and sign domains),
histogram2dValueRangeRestrictedUnsortedKeys (guards the scattered-keys
path against the sorted-window optimization),
histogram2dSelectTestReflectsNewData (cache invalidation).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…e-size replacement

- optimizedLineData derived an interval's closing point as
  intervalFirst + intervalCount - 1, but NaN samples are skipped WITHOUT
  counting: with interior NaNs that index lands on a NaN, injecting a
  spurious (NaN, NaN) line break — phantom gaps in dense gappy data (the
  norm in space physics). The interval now tracks the index of the last
  accepted sample explicitly.

- The multi-column variant had two related NaN holes: an interval reset
  landing on a NaN sample seeded minVal/maxVal with NaN, poisoning
  std::min/max for the rest of the interval (envelope silently dropped);
  and a single-sample NaN interval appended an explicit break point where
  the adaptive contract (matching the single-column variant) is to skip
  NaN values — key gaps are detected separately.

  Reproducer: adaptiveSamplingInteriorNaNNoSpuriousBreaks (gap-free keys,
  every other value NaN -> output contained NaN break points before).

- buildL1Cache reused an existing L1 when only sourceSize matched; the
  previously dead cachedKeyRange member now participates in the validity
  check (keyRange is O(1) on sorted sources), so replacing the data with
  an equally-sized batch can never reuse a stale binning.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@jeandet jeandet merged commit b308450 into SciQLop:main Jun 12, 2026
5 checks passed
@jeandet jeandet deleted the fix/small-sweep branch June 12, 2026 20:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants