Skip to content

fix(pipeline): make async jobs safe against plottable destruction#32

Merged
jeandet merged 2 commits into
SciQLop:mainfrom
jeandet:fix/async-pipeline-job-safety
Jun 12, 2026
Merged

fix(pipeline): make async jobs safe against plottable destruction#32
jeandet merged 2 commits into
SciQLop:mainfrom
jeandet:fix/async-pipeline-job-safety

Conversation

@jeandet

@jeandet jeandet commented Jun 12, 2026

Copy link
Copy Markdown
Member

Three holes let resample jobs touch freed memory when a plottable (or the whole QCustomPlot) died with work in flight — the typical "random crash when closing a plot during a spectrogram resample":

  • QCPColorMap2's transform captured the member mGapThreshold by reference; a job dequeued after the plottable was deleted read freed memory. The transform is now installed by installResampleTransform() with the threshold captured by value and re-baked by setGapThreshold() (mirroring QCPHistogram2D::installTransform). Bonus: a threshold change now triggers a re-resample instead of silently waiting for the next pan.

  • TOCTOU in the job epilogue: if (destroyed->load()) return; invokeMethod(self, ...) — the pipeline could be deleted between the load and the invoke. The shared atomic is replaced by a mutex-guarded flag held across the check-and-invoke; ~QCPAsyncPipelineBase flips it under the same mutex. Once posted, Qt purges pending metacalls of a deleted receiver, so the queued delivery is safe.

  • ~QCPPipelineScheduler let waitForDone() start still-queued jobs (a finishing task's epilogue drains the queue) — jobs belonging to plottables already destroyed by ~QCustomPlot. The destructor now drops queued work and refuses new submissions before joining.

Reproducers (fail before the fix):

  • TestPipeline::colormap2QueuedJobAfterDeleteDoesNotTouchFreedMemory — ASan heap-use-after-free in the gapThreshold load; the pre-existing TestDataSource2D::colormap2AxisRescale also tripped it via plain ~QCustomPlot teardown under ASan.
  • TestPipeline::schedulerDtorDropsQueuedJobs — queued job ran during destruction (deterministic, no sanitizer needed).

Also fixes two test-side dangling-view bugs found by the ASan run (colormap2ViewData returning while the job reads stack spans; test-scatter-rhi spans over helper-local vectors). Full suite is ASan-clean after this commit.

Part of the 2026-06-09 audit (C2/C3 in SciQLopPlots docs/backlog-2026-06-10.md). First of a 4-PR stack: this → waterfall snapshot → range scans → small sweep.

🤖 Generated with Claude Code

Three holes let resample jobs touch freed memory when a plottable (or the
whole QCustomPlot) died with work in flight — the typical "random crash
when closing a plot during a spectrogram resample":

- QCPColorMap2's transform captured the member mGapThreshold by
  reference; a job dequeued after the plottable was deleted read freed
  memory. The transform is now installed by installResampleTransform()
  with the threshold captured BY VALUE and re-baked by setGapThreshold()
  (mirroring QCPHistogram2D::installTransform). Bonus: a threshold change
  now triggers a re-resample instead of silently waiting for the next pan.

- TOCTOU in the job epilogue: `if (destroyed->load()) return;
  invokeMethod(self, ...)` — the pipeline could be deleted between the
  load and the invoke. The shared atomic is replaced by a mutex-guarded
  flag held across the check-and-invoke; ~QCPAsyncPipelineBase flips it
  under the same mutex. Once posted, Qt purges pending metacalls of a
  deleted receiver, so the queued delivery is safe.

- ~QCPPipelineScheduler let waitForDone() start still-queued jobs (a
  finishing task's epilogue drains the queue) — jobs belonging to
  plottables already destroyed by ~QCustomPlot. The destructor now drops
  queued work and refuses new submissions before joining.

Reproducers (fail before the fix):
- TestPipeline::colormap2QueuedJobAfterDeleteDoesNotTouchFreedMemory —
  ASan heap-use-after-free in the gapThreshold load; the pre-existing
  TestDataSource2D::colormap2AxisRescale also tripped it via plain
  ~QCustomPlot teardown under ASan.
- TestPipeline::schedulerDtorDropsQueuedJobs — queued job ran during
  destruction (deterministic, no sanitizer needed).

Also fixes two test-side dangling-view bugs found by the ASan run
(colormap2ViewData returning while the job reads stack spans;
test-scatter-rhi spans over helper-local vectors): both now respect the
non-owning-view contract. Full suite is ASan-clean after this commit.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the async resampling pipeline against use-after-free when plottables (or the owning QCustomPlot) are destroyed while background work is still queued/in-flight, and adds/adjusts tests to reproduce and prevent regressions.

Changes:

  • Make QCPColorMap2’s resample transform self-contained by capturing gapThreshold by value and rebaking the transform on threshold changes.
  • Prevent QCPAsyncPipelineBase jobs from racing with pipeline destruction by replacing the TOCTOU atomic<bool> check with a mutex-guarded shared “destroyed” flag held across check-and-invokeMethod.
  • Update QCPPipelineScheduler teardown to drop queued work and refuse submissions during shutdown; add ASan-driven regression tests and fix test-side dangling span/view lifetimes.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/auto/test-scatter-rhi/test-scatter-rhi.cpp Avoid dangling span views in test helper by using owning setData with moved vectors.
tests/auto/test-pipeline/test-pipeline.h Add new pipeline race/UAF reproducer test declarations.
tests/auto/test-pipeline/test-pipeline.cpp Add deterministic reproducers for scheduler shutdown behavior and colormap queued-job-after-delete.
tests/auto/test-datasource2d/test-datasource2d.cpp Drain pipeline to avoid stack-use-after-return when using non-owning viewData in tests.
src/plottables/plottable-colormap2.h Make gapThreshold a plain value and introduce installResampleTransform() API.
src/plottables/plottable-colormap2.cpp Rework transform installation/capture semantics; rebake on threshold change; add destructor.
src/datasource/pipeline-scheduler.h Track shutdown state to refuse submissions and stop scheduling during destruction.
src/datasource/pipeline-scheduler.cpp Clear queued work and gate scheduling/submission during teardown before joining the pool.
src/datasource/async-pipeline.h Replace atomic<bool> destroyed flag with shared mutex+flag guard; hold across check-and-invoke.
src/datasource/async-pipeline.cpp Flip destroyed flag under guard mutex in destructor; remove redundant destroyed-check in deliverResult.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +112 to 119
void QCPColorMap2::setGapThreshold(double threshold)
{
mRenderer.releaseRhiLayer();
if (mGapThreshold == threshold)
return;
mGapThreshold = threshold;
installResampleTransform();
mPipeline.onDataChanged(); // re-resample so the new threshold shows now, not on the next pan
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified and fixed in c40e396onDataChanged() bumps mGeneration before it knows whether makeJob can produce a job, so with no source isBusy() stayed true until some future delivery. setGapThreshold now only triggers the re-resample when mDataSource is set. Reproducer: colormap2GapThresholdBeforeDataDoesNotStickBusy (failed before the guard).

Comment on lines 42 to 47
void QCPPipelineScheduler::scheduleNext()
{
QMutexLocker lock(&mMutex);
if (mShuttingDown)
return;
while (mRunning < mPool.maxThreadCount())

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not taking this one — the atomic-before-mutex variant narrows the window but can't close it: a worker can always pop and start a job a moment before the destructor's store, so "no queued job ever starts during shutdown" is unachievable by any flag protocol. More importantly, the safety invariant doesn't rest on the destructor drop at all. Plottables die before the scheduler does (~QCustomPlot order), so between plottable destruction and scheduler-dtor entry, epilogues can already legally start queued jobs of dead plottables — which is exactly why this PR makes jobs self-contained (source held by shared_ptr, transform captured by value, delivery gated by the DestroyGuard). A job that slips in concurrently with dtor entry is indistinguishable from one that started a microsecond earlier, and both are safe. The dtor drop is cleanup that avoids pointless work and guarantees nothing new starts after its critical section — which the current mutex-only version already provides.

…e without a source

setGapThreshold() triggered a re-resample unconditionally; before any data
source is set, onDataChanged() increments the pipeline generation but cannot
schedule a job, so isBusy() reported true until some future delivery.
Setting the threshold during plot configuration (before data arrives) is a
normal flow downstream.

Reproducer: colormap2GapThresholdBeforeDataDoesNotStickBusy (review
follow-up on PR SciQLop#32).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@jeandet jeandet merged commit 1d294b9 into SciQLop:main Jun 12, 2026
5 checks passed
@jeandet jeandet deleted the fix/async-pipeline-job-safety branch June 12, 2026 20:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants