fix(pipeline): make async jobs safe against plottable destruction#32
Conversation
Three holes let resample jobs touch freed memory when a plottable (or the whole QCustomPlot) died with work in flight — the typical "random crash when closing a plot during a spectrogram resample": - QCPColorMap2's transform captured the member mGapThreshold by reference; a job dequeued after the plottable was deleted read freed memory. The transform is now installed by installResampleTransform() with the threshold captured BY VALUE and re-baked by setGapThreshold() (mirroring QCPHistogram2D::installTransform). Bonus: a threshold change now triggers a re-resample instead of silently waiting for the next pan. - TOCTOU in the job epilogue: `if (destroyed->load()) return; invokeMethod(self, ...)` — the pipeline could be deleted between the load and the invoke. The shared atomic is replaced by a mutex-guarded flag held across the check-and-invoke; ~QCPAsyncPipelineBase flips it under the same mutex. Once posted, Qt purges pending metacalls of a deleted receiver, so the queued delivery is safe. - ~QCPPipelineScheduler let waitForDone() start still-queued jobs (a finishing task's epilogue drains the queue) — jobs belonging to plottables already destroyed by ~QCustomPlot. The destructor now drops queued work and refuses new submissions before joining. Reproducers (fail before the fix): - TestPipeline::colormap2QueuedJobAfterDeleteDoesNotTouchFreedMemory — ASan heap-use-after-free in the gapThreshold load; the pre-existing TestDataSource2D::colormap2AxisRescale also tripped it via plain ~QCustomPlot teardown under ASan. - TestPipeline::schedulerDtorDropsQueuedJobs — queued job ran during destruction (deterministic, no sanitizer needed). Also fixes two test-side dangling-view bugs found by the ASan run (colormap2ViewData returning while the job reads stack spans; test-scatter-rhi spans over helper-local vectors): both now respect the non-owning-view contract. Full suite is ASan-clean after this commit. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR hardens the async resampling pipeline against use-after-free when plottables (or the owning QCustomPlot) are destroyed while background work is still queued/in-flight, and adds/adjusts tests to reproduce and prevent regressions.
Changes:
- Make
QCPColorMap2’s resample transform self-contained by capturinggapThresholdby value and rebaking the transform on threshold changes. - Prevent
QCPAsyncPipelineBasejobs from racing with pipeline destruction by replacing the TOCTOUatomic<bool>check with a mutex-guarded shared “destroyed” flag held across check-and-invokeMethod. - Update
QCPPipelineSchedulerteardown to drop queued work and refuse submissions during shutdown; add ASan-driven regression tests and fix test-side dangling span/view lifetimes.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/auto/test-scatter-rhi/test-scatter-rhi.cpp | Avoid dangling span views in test helper by using owning setData with moved vectors. |
| tests/auto/test-pipeline/test-pipeline.h | Add new pipeline race/UAF reproducer test declarations. |
| tests/auto/test-pipeline/test-pipeline.cpp | Add deterministic reproducers for scheduler shutdown behavior and colormap queued-job-after-delete. |
| tests/auto/test-datasource2d/test-datasource2d.cpp | Drain pipeline to avoid stack-use-after-return when using non-owning viewData in tests. |
| src/plottables/plottable-colormap2.h | Make gapThreshold a plain value and introduce installResampleTransform() API. |
| src/plottables/plottable-colormap2.cpp | Rework transform installation/capture semantics; rebake on threshold change; add destructor. |
| src/datasource/pipeline-scheduler.h | Track shutdown state to refuse submissions and stop scheduling during destruction. |
| src/datasource/pipeline-scheduler.cpp | Clear queued work and gate scheduling/submission during teardown before joining the pool. |
| src/datasource/async-pipeline.h | Replace atomic<bool> destroyed flag with shared mutex+flag guard; hold across check-and-invoke. |
| src/datasource/async-pipeline.cpp | Flip destroyed flag under guard mutex in destructor; remove redundant destroyed-check in deliverResult. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| void QCPColorMap2::setGapThreshold(double threshold) | ||
| { | ||
| mRenderer.releaseRhiLayer(); | ||
| if (mGapThreshold == threshold) | ||
| return; | ||
| mGapThreshold = threshold; | ||
| installResampleTransform(); | ||
| mPipeline.onDataChanged(); // re-resample so the new threshold shows now, not on the next pan | ||
| } |
There was a problem hiding this comment.
Verified and fixed in c40e396 — onDataChanged() bumps mGeneration before it knows whether makeJob can produce a job, so with no source isBusy() stayed true until some future delivery. setGapThreshold now only triggers the re-resample when mDataSource is set. Reproducer: colormap2GapThresholdBeforeDataDoesNotStickBusy (failed before the guard).
| void QCPPipelineScheduler::scheduleNext() | ||
| { | ||
| QMutexLocker lock(&mMutex); | ||
| if (mShuttingDown) | ||
| return; | ||
| while (mRunning < mPool.maxThreadCount()) |
There was a problem hiding this comment.
Not taking this one — the atomic-before-mutex variant narrows the window but can't close it: a worker can always pop and start a job a moment before the destructor's store, so "no queued job ever starts during shutdown" is unachievable by any flag protocol. More importantly, the safety invariant doesn't rest on the destructor drop at all. Plottables die before the scheduler does (~QCustomPlot order), so between plottable destruction and scheduler-dtor entry, epilogues can already legally start queued jobs of dead plottables — which is exactly why this PR makes jobs self-contained (source held by shared_ptr, transform captured by value, delivery gated by the DestroyGuard). A job that slips in concurrently with dtor entry is indistinguishable from one that started a microsecond earlier, and both are safe. The dtor drop is cleanup that avoids pointless work and guarantees nothing new starts after its critical section — which the current mutex-only version already provides.
…e without a source setGapThreshold() triggered a re-resample unconditionally; before any data source is set, onDataChanged() increments the pipeline generation but cannot schedule a job, so isBusy() reported true until some future delivery. Setting the threshold during plot configuration (before data arrives) is a normal flow downstream. Reproducer: colormap2GapThresholdBeforeDataDoesNotStickBusy (review follow-up on PR SciQLop#32). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Three holes let resample jobs touch freed memory when a plottable (or the whole QCustomPlot) died with work in flight — the typical "random crash when closing a plot during a spectrogram resample":
QCPColorMap2's transform captured the member
mGapThresholdby reference; a job dequeued after the plottable was deleted read freed memory. The transform is now installed byinstallResampleTransform()with the threshold captured by value and re-baked bysetGapThreshold()(mirroringQCPHistogram2D::installTransform). Bonus: a threshold change now triggers a re-resample instead of silently waiting for the next pan.TOCTOU in the job epilogue:
if (destroyed->load()) return; invokeMethod(self, ...)— the pipeline could be deleted between the load and the invoke. The shared atomic is replaced by a mutex-guarded flag held across the check-and-invoke;~QCPAsyncPipelineBaseflips it under the same mutex. Once posted, Qt purges pending metacalls of a deleted receiver, so the queued delivery is safe.~QCPPipelineSchedulerletwaitForDone()start still-queued jobs (a finishing task's epilogue drains the queue) — jobs belonging to plottables already destroyed by~QCustomPlot. The destructor now drops queued work and refuses new submissions before joining.Reproducers (fail before the fix):
TestPipeline::colormap2QueuedJobAfterDeleteDoesNotTouchFreedMemory— ASan heap-use-after-free in the gapThreshold load; the pre-existingTestDataSource2D::colormap2AxisRescalealso tripped it via plain~QCustomPlotteardown under ASan.TestPipeline::schedulerDtorDropsQueuedJobs— queued job ran during destruction (deterministic, no sanitizer needed).Also fixes two test-side dangling-view bugs found by the ASan run (
colormap2ViewDatareturning while the job reads stack spans; test-scatter-rhi spans over helper-local vectors). Full suite is ASan-clean after this commit.Part of the 2026-06-09 audit (C2/C3 in SciQLopPlots
docs/backlog-2026-06-10.md). First of a 4-PR stack: this → waterfall snapshot → range scans → small sweep.🤖 Generated with Claude Code