Skip to content

feat(capture): Add --cleanup-after-upload flag for automatic resource cleanup#2367

Merged
mereta merged 4 commits into
mainfrom
carlota-capture-cleanup-remote-storage
May 21, 2026
Merged

feat(capture): Add --cleanup-after-upload flag for automatic resource cleanup#2367
mereta merged 4 commits into
mainfrom
carlota-capture-cleanup-remote-storage

Conversation

@carlotaarvela

@carlotaarvela carlotaarvela commented May 20, 2026

Copy link
Copy Markdown
Contributor

Description

Adds a --cleanup-after-upload flag to kubectl retina capture create that enables automatic cleanup of capture jobs, secrets, and host-path files after successful upload to remote storage (blob, S3, or PVC).

Key behaviors:

  • TTL on jobs (no-wait mode): When --cleanup-after-upload is combined with --no-wait=true and a remote destination, jobs get a TTLSecondsAfterFinished (5 min) so Kubernetes garbage-collects them automatically.
  • Secret ownerReferences (always): Secrets created for blob/S3 uploads always get ownerReferences pointing to the capture job, ensuring they are cleaned up when the job is deleted—whether by TTL, manual delete, or the controller.
  • Controller auto-delete: The operator's CaptureReconciler now deletes the Capture resource once all jobs succeed and CleanUpAfterUpload is true with remote storage configured (blob, S3, or PVC).
  • ActiveDeadlineSeconds: Jobs now get a hard deadline (duration + 30min) to prevent indefinite hangs.
  • Wait timeout fix: waitUntilJobsComplete deadline is now duration + 5min (floored at 5min), fixing premature timeouts for short captures.
  • NotFound tolerance: deleteSecret and capture delete now tolerate NotFound errors caused by ownerRef garbage collection racing with explicit deletion.

Related Issue

N/A — feature request for safer automated capture workflows.

Checklist

  • I have read the contributing documentation.
  • I signed and signed-off the commits (git commit -S -s ...).
  • I have correctly attributed the author(s) of the code.
  • I have tested the changes locally.
  • I have followed the project's style guidelines.
  • I have updated the documentation, if necessary.
  • I have added tests, if applicable.

Screenshots (if applicable) or Testing Completed

Unit Tests

All tests pass in the affected packages. The new logic is 100% covered by the new CLI and Operator controller tests added.

Manual Testing

# Scenario Command Expected Result
1 --cleanup-after-upload without remote ./bin/kubectl-retina capture create --name test1 --node-names aks-nodepool1-... --host-path /tmp/captures --cleanup-after-upload --duration 5s Error: requires remote storage PASS — error returned immediately
2 No-wait + blob + cleanup ./bin/kubectl-retina capture create --name test2 --node-names aks-nodepool1-... --blob-upload "<SAS URL>" --cleanup-after-upload --no-wait=true --duration 10s CLI exits immediately, job has TTL, secret has ownerRef PASS — verified via kubectl get job -o yaml (TTL=300), kubectl get secret -o yaml (ownerReferences present), job auto-deleted after ~5 min
3 Wait + blob + cleanup ./bin/kubectl-retina capture create --name test3 --node-names aks-nodepool1-... --blob-upload "<SAS URL>" --cleanup-after-upload --no-wait=false --duration 10s CLI waits, uploads blob, then deletes jobs/secrets PASS — CLI printed completion, kubectl get jobs showed no jobs remaining, blob appeared in storage account, secret deleted
4 Wait + blob WITHOUT cleanup ./bin/kubectl-retina capture create --name test4 --node-names aks-nodepool1-... --blob-upload "<SAS URL>" --no-wait=false --duration 10s CLI waits, uploads, jobs/secrets remain PASS — jobs and secrets preserved after CLI exit
5 Host-path only + no-wait (no cleanup flag) ./bin/kubectl-retina capture create --name test5 --node-names aks-nodepool1-... --host-path /tmp/captures --no-wait=true --duration 5s No TTL set, jobs remain PASSkubectl get job -o yaml showed no TTLSecondsAfterFinished

Wait timeout fix verified: Test 3 used --duration 10s and the CLI completed within ~75s (10s capture + upload time), confirming the new duration + 5min deadline works correctly for short captures (previously would timeout at 2×duration = 20s before upload finished).

NotFound tolerance verified: In test 2, running kubectl retina capture delete --name test2 after TTL had already garbage-collected the job+secret returned success (no "secret not found" error).

Additional Notes

  • The CleanUpAfterUpload field is added to the CRD spec as optional with +kubebuilder:default=false. Existing Captures are unaffected.
  • JobActiveDeadlineBufferSeconds (30 min) is generous to accommodate large captures uploading over slow links or Windows nodes with slow image pulls.
  • The operator controller cleanup path only triggers when ALL jobs succeed — any failure preserves everything for debugging.
  • The controller's remote storage check includes all three types: BlobUpload, S3Upload, and PersistentVolumeClaim.

@carlotaarvela carlotaarvela requested a review from a team as a code owner May 20, 2026 07:29
@github-actions

github-actions Bot commented May 20, 2026

Copy link
Copy Markdown

Retina Code Coverage Report

Total coverage increased from 35.4% to 36.3%

Increased diff

Impacted Files Coverage
pkg/controllers/daemon/namespace/namespace_controller.go 76.24% ... 78.46% (2.22%) ⬆️
cli/cmd/capture/create.go 35.99% ... 73.86% (37.87%) ⬆️
pkg/controllers/operator/capture/controller.go 0.0% ... 13.38% (13.38%) ⬆️
cli/cmd/capture/delete.go 75.0% ... 82.8% (7.8%) ⬆️

@carlotaarvela carlotaarvela force-pushed the carlota-capture-cleanup-remote-storage branch from f91cdb9 to 62c9897 Compare May 20, 2026 07:45
Comment thread cli/cmd/capture/create.go Outdated
Comment thread cli/cmd/capture/create.go Outdated
Comment thread pkg/controllers/operator/capture/controller.go Outdated
Comment thread cli/cmd/capture/create.go Outdated
Comment thread cli/cmd/capture/create.go Outdated
@carlotaarvela carlotaarvela requested a review from SRodi May 20, 2026 09:36
@carlotaarvela carlotaarvela force-pushed the carlota-capture-cleanup-remote-storage branch from 583136c to 99882d7 Compare May 21, 2026 14:19
@carlotaarvela carlotaarvela force-pushed the carlota-capture-cleanup-remote-storage branch from a6bcfaf to 31d7504 Compare May 21, 2026 14:52
Comment thread crd/api/v1alpha1/capture_types.go
Comment thread cli/cmd/capture/create.go Outdated
Comment thread pkg/controllers/operator/capture/cleanup_after_upload_test.go
@carlotaarvela carlotaarvela requested a review from mereta May 21, 2026 15:43

@mereta mereta left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👌🏼

@SRodi SRodi left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @carlotaarvela!

@carlotaarvela carlotaarvela added this pull request to the merge queue May 21, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 21, 2026
@mereta mereta added this pull request to the merge queue May 21, 2026
Merged via the queue into main with commit 3b50857 May 21, 2026
34 checks passed
@mereta mereta deleted the carlota-capture-cleanup-remote-storage branch May 21, 2026 17:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants