Skip to content

HYPERFLEET-1263 - fix: replace gcloud exec with Go pubsub SDK for subscription cleanup#137

Open
kuudori wants to merge 1 commit into
openshift-hyperfleet:mainfrom
kuudori:HYPERFLEET-1263-pubsub-sdk
Open

HYPERFLEET-1263 - fix: replace gcloud exec with Go pubsub SDK for subscription cleanup#137
kuudori wants to merge 1 commit into
openshift-hyperfleet:mainfrom
kuudori:HYPERFLEET-1263-pubsub-sdk

Conversation

@kuudori

@kuudori kuudori commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Replace gcloud pubsub subscriptions delete shell-out in DeletePubSubSubscription with cloud.google.com/go/pubsub/v2 Go SDK using Application Default Credentials
  • Fix silent cleanup failure: gcloud binary is not in the test container image, making every Pub/Sub cleanup call a no-op (exec: "gcloud": executable file not found in $PATH)
  • PurgeAdapterQueue (googlepubsub path) now actually works — it dispatches to the fixed DeletePubSubSubscription

What changed

File Change
pkg/helper/adapter.go Replace exec.CommandContext("gcloud"...) with SubscriptionAdminClient.DeleteSubscription. Use gRPC codes.NotFound for not-found detection (was string matching). Surface PermissionDenied and all other errors as hard failures. Package-level newPubSubDeleteFunc var for test injection.
pkg/helper/adapter_test.go 6 new table-driven subtests: success, NotFound→no-op, PermissionDenied→hard error, generic error propagation, factory error, default project fallback. Cleanup invocation assertion on all non-factory-error paths.
go.mod / go.sum Add cloud.google.com/go/pubsub/v2, upgrade google.golang.org/grpc 1.59→1.80

Design decisions

  • PermissionDenied = hard error, not no-op. The old silent-failure behavior is the bug this ticket kills. NotFound→no-op because the desired end-state (subscription gone) is achieved. PermissionDenied means it still exists.
  • Per-call client lifecycle — at most 7 calls per test run, connection overhead is negligible
  • Package-level var (not Helper struct field) — avoids touching Helper struct (4 exported fields, no precedent for function-typed fields) or suite.go
  • v2 SDK (not deprecated v1) — staticcheck flags v1 import

Test plan

  • make check passes (generate → fmt-check → vet → lint 0 issues → all tests with -race)
  • make build produces binary
  • 6 unit tests cover all error paths including PermissionDenied
  • Cleanup function invocation verified via cleanupCalled bool
  • E2E: gated in CI, not run locally

Related

  • HYPERFLEET-1262 — sibling bug from same prow run (gRPC reconnect), no code overlap
  • HYPERFLEET-1110 — complementary (expands cleanup coverage), blocked by HYPERFLEET-1108, no PR yet. Benefits from this landing first.

…scription cleanup

DeletePubSubSubscription shelled out to gcloud which isn't in the test
container image, making every cleanup call a silent no-op. Replace with
cloud.google.com/go/pubsub/v2 using Application Default Credentials.

- Replace exec.CommandContext("gcloud"...) with SubscriptionAdminClient.DeleteSubscription
- Use gRPC codes.NotFound for not-found detection instead of string matching
- Surface PermissionDenied and all other errors as hard failures
- Add package-level var newPubSubDeleteFunc for unit test injection
- Add 6 table-driven tests covering success, NotFound, PermissionDenied,
  generic error, factory error, and default project fallback
- Verify cleanup function invocation via defer in all non-factory-error paths
@openshift-ci

openshift-ci Bot commented Jun 26, 2026

Copy link
Copy Markdown

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci

openshift-ci Bot commented Jun 26, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign aredenba-rh for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown
📝 Walkthrough

Summary by CodeRabbit

  • Bug Fixes

    • Improved Pub/Sub subscription deletion reliability by using a more robust backend operation and clearer error handling.
    • Missing Pub/Sub subscriptions are now handled gracefully instead of being treated as failures.
    • When no project ID is set, the default project is now used consistently.
  • Chores

    • Updated several Go module dependencies to newer versions.

Walkthrough

go.mod adds cloud.google.com/go/pubsub/v2, google.golang.org/grpc, and updated indirect module pins. DeletePubSubSubscription now uses newPubSubDeleteFunc to create a Pub/Sub admin client, deletes subscriptions through the Go client, defers client cleanup, and treats gRPC NotFound as a no-op. The new test covers success, NotFound, permission errors, factory errors, default project selection, and cleanup execution.

Sequence Diagram(s)

sequenceDiagram
  participant Helper as DeletePubSubSubscription
  participant Factory as newPubSubDeleteFunc
  participant Client as Pub/Sub admin client

  Helper->>Factory: create deleteFn and cleanup
  Factory->>Client: create client
  Helper->>Client: delete subscription
  Client-->>Helper: nil or gRPC error
  alt NotFound
    Helper-->>Helper: return nil
  else other error
    Helper-->>Helper: wrap error with subscription/project context
  end
  Helper->>Helper: defer cleanup()
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 10 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (10 passed)
Check name Status Explanation
Title check ✅ Passed It clearly states the main change: replacing gcloud exec with the Go Pub/Sub SDK for subscription cleanup.
Description check ✅ Passed It directly describes the Pub/Sub cleanup fix, tests, and dependency updates.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Sec-02: Secrets In Log Output ✅ Passed No CWE-532 issue: changed non-test code logs only subscription/project/error; no token/password/credential/secret fields or interpolations found.
No Hardcoded Secrets ✅ Passed No API keys/tokens/passwords/private keys, embedded creds, or long base64 blobs appear in the changed files; only dependency bumps and Pub/Sub code.
No Weak Cryptography ✅ Passed Touched files and repo scan show no crypto/md5, crypto/des, crypto/rc4, SHA1-for-security, ECB, or secret comparisons. No CWE-327 finding.
No Injection Vectors ✅ Passed No CWE-78/89/79/502 sink was added: Pub/Sub delete now uses a typed gRPC client, not exec; touched files have no untrusted query/template/yaml use.
No Privileged Containers ✅ Passed No CWE-250/CWE-266 privileged-container change: PR touches only go.mod and pkg/helper Go files, not manifests, Helm templates, or Dockerfiles.
No Pii Or Sensitive Data In Logs ✅ Passed No CWE-532 issue: changed logs only emit project/subscription IDs and errors; no PII, secrets, raw bodies, or credential-bearing hosts.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Comment @coderabbitai help to get the list of available commands.

@kuudori kuudori marked this pull request as ready for review June 26, 2026 19:47
@openshift-ci openshift-ci Bot requested review from aredenba-rh and sherine-k June 26, 2026 19:47

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/helper/adapter_test.go`:
- Around line 260-290: The DeletePubSubSubscription test only verifies the
default project path and ignores the subscription argument, so regressions in
forwarding the configured project or subscription can slip through. Update the
test around Helper.DeletePubSubSubscription and newPubSubDeleteFunc to capture
both projectID and subscription, then assert the configured GCPProjectID is
passed when set, the defaultGCPProjectID is used only when expected, and the
subscription ID matches the value supplied to DeletePubSubSubscription.

In `@pkg/helper/adapter.go`:
- Around line 561-564: The cleanup callback in the Pub/Sub admin client close
path is logging a real close failure at Info level, which downplays an error;
update the defer/cleanup function around client.Close() to use an error-level
log (via logger.Error) or return the close error to the caller if it should be
propagated. Keep the fix localized to the deleteFn cleanup logic so the log
severity matches the failure severity.
- Around line 580-595: The Pub/Sub delete path in the adapter currently returns
errors without project context, which breaks the error model for cleanup
failures. Update the newPubSubDeleteFunc setup in adapter.go so the factory
error is wrapped with subscriptionID and projectID context instead of returned
bare, and also include projectID in the wrapped error returned from the
deleteFn(ctx) failure path. Keep the existing logger calls in sync with the
returned error context so callers of the delete helper can diagnose ADC/client
and permission issues from the error alone.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: af704f4e-5394-494e-a528-2b26c14a5c3a

📥 Commits

Reviewing files that changed from the base of the PR and between 3ff44b4 and e514823.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum, !**/go.sum
📒 Files selected for processing (3)
  • go.mod
  • pkg/helper/adapter.go
  • pkg/helper/adapter_test.go
🔗 Linked repositories identified

CodeRabbit considers these linked repositories for cross-repo context during reviews:

  • openshift-hyperfleet/architecture (manual)
  • openshift-hyperfleet/hyperfleet-api (manual)
  • openshift-hyperfleet/hyperfleet-sentinel (manual)
  • openshift-hyperfleet/hyperfleet-adapter (manual)
  • openshift-hyperfleet/hyperfleet-broker (manual)

Comment on lines +260 to +290
var capturedProjectID string
var cleanupCalled bool

original := newPubSubDeleteFunc
t.Cleanup(func() { newPubSubDeleteFunc = original })

newPubSubDeleteFunc = func(_ context.Context, projectID, _ string) (func(context.Context) error, func(), error) {
capturedProjectID = projectID
if tt.factoryErr != nil {
return nil, nil, tt.factoryErr
}
return func(context.Context) error { return tt.deleteErr }, func() { cleanupCalled = true }, nil
}

h := &Helper{Cfg: &config.Config{GCPProjectID: tt.projectID}}
err := h.DeletePubSubSubscription(context.Background(), "test-sub")

if tt.wantErr {
if err == nil {
t.Fatal("expected error, got nil")
}
if tt.wantErrMsg != "" && !strings.Contains(err.Error(), tt.wantErrMsg) {
t.Errorf("error %q does not contain %q", err.Error(), tt.wantErrMsg)
}
} else if err != nil {
t.Fatalf("unexpected error: %v", err)
}

if tt.wantDefaultProject && capturedProjectID != defaultGCPProjectID {
t.Errorf("expected default project %q, got %q", defaultGCPProjectID, capturedProjectID)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Assert the configured project and subscription are forwarded.

The stub ignores the subscription ID, and capturedProjectID is only checked for the default case. A regression that always uses the default project or passes the wrong subscription can still pass.

Proposed change
 			var capturedProjectID string
+			var capturedSubscriptionID string
 			var cleanupCalled bool
@@
-			newPubSubDeleteFunc = func(_ context.Context, projectID, _ string) (func(context.Context) error, func(), error) {
+			newPubSubDeleteFunc = func(_ context.Context, projectID, subscriptionID string) (func(context.Context) error, func(), error) {
 				capturedProjectID = projectID
+				capturedSubscriptionID = subscriptionID
 				if tt.factoryErr != nil {
 					return nil, nil, tt.factoryErr
 				}
@@
-			if tt.wantDefaultProject && capturedProjectID != defaultGCPProjectID {
-				t.Errorf("expected default project %q, got %q", defaultGCPProjectID, capturedProjectID)
+			wantProjectID := tt.projectID
+			if tt.wantDefaultProject {
+				wantProjectID = defaultGCPProjectID
+			}
+			if capturedProjectID != wantProjectID {
+				t.Errorf("expected project %q, got %q", wantProjectID, capturedProjectID)
+			}
+			if capturedSubscriptionID != "test-sub" {
+				t.Errorf("expected subscription %q, got %q", "test-sub", capturedSubscriptionID)
 			}

As per path instructions, “New exported functions and critical logic paths SHOULD have tests.”

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
var capturedProjectID string
var cleanupCalled bool
original := newPubSubDeleteFunc
t.Cleanup(func() { newPubSubDeleteFunc = original })
newPubSubDeleteFunc = func(_ context.Context, projectID, _ string) (func(context.Context) error, func(), error) {
capturedProjectID = projectID
if tt.factoryErr != nil {
return nil, nil, tt.factoryErr
}
return func(context.Context) error { return tt.deleteErr }, func() { cleanupCalled = true }, nil
}
h := &Helper{Cfg: &config.Config{GCPProjectID: tt.projectID}}
err := h.DeletePubSubSubscription(context.Background(), "test-sub")
if tt.wantErr {
if err == nil {
t.Fatal("expected error, got nil")
}
if tt.wantErrMsg != "" && !strings.Contains(err.Error(), tt.wantErrMsg) {
t.Errorf("error %q does not contain %q", err.Error(), tt.wantErrMsg)
}
} else if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if tt.wantDefaultProject && capturedProjectID != defaultGCPProjectID {
t.Errorf("expected default project %q, got %q", defaultGCPProjectID, capturedProjectID)
}
var capturedProjectID string
var capturedSubscriptionID string
var cleanupCalled bool
original := newPubSubDeleteFunc
t.Cleanup(func() { newPubSubDeleteFunc = original })
newPubSubDeleteFunc = func(_ context.Context, projectID, subscriptionID string) (func(context.Context) error, func(), error) {
capturedProjectID = projectID
capturedSubscriptionID = subscriptionID
if tt.factoryErr != nil {
return nil, nil, tt.factoryErr
}
return func(context.Context) error { return tt.deleteErr }, func() { cleanupCalled = true }, nil
}
h := &Helper{Cfg: &config.Config{GCPProjectID: tt.projectID}}
err := h.DeletePubSubSubscription(context.Background(), "test-sub")
if tt.wantErr {
if err == nil {
t.Fatal("expected error, got nil")
}
if tt.wantErrMsg != "" && !strings.Contains(err.Error(), tt.wantErrMsg) {
t.Errorf("error %q does not contain %q", err.Error(), tt.wantErrMsg)
}
} else if err != nil {
t.Fatalf("unexpected error: %v", err)
}
wantProjectID := tt.projectID
if tt.wantDefaultProject {
wantProjectID = defaultGCPProjectID
}
if capturedProjectID != wantProjectID {
t.Errorf("expected project %q, got %q", wantProjectID, capturedProjectID)
}
if capturedSubscriptionID != "test-sub" {
t.Errorf("expected subscription %q, got %q", "test-sub", capturedSubscriptionID)
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/helper/adapter_test.go` around lines 260 - 290, The
DeletePubSubSubscription test only verifies the default project path and ignores
the subscription argument, so regressions in forwarding the configured project
or subscription can slip through. Update the test around
Helper.DeletePubSubSubscription and newPubSubDeleteFunc to capture both
projectID and subscription, then assert the configured GCPProjectID is passed
when set, the defaultGCPProjectID is used only when expected, and the
subscription ID matches the value supplied to DeletePubSubSubscription.

Source: Path instructions

Comment thread pkg/helper/adapter.go
Comment on lines +561 to +564
return deleteFn, func() {
if err := client.Close(); err != nil {
logger.Info("failed to close Pub/Sub admin client", "error", err)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Do not log close failures at Info.

Line 563 records an actual cleanup error at Info, which hides resource-lifecycle failures. Use an error-level log, or make the cleanup callback return an error if callers should fail on close.

Proposed change
 	return deleteFn, func() {
 		if err := client.Close(); err != nil {
-			logger.Info("failed to close Pub/Sub admin client", "error", err)
+			logger.Error("failed to close Pub/Sub admin client", "error", err)
 		}
 	}, nil

As per path instructions, “Log levels must match severity (no slog.Error for non-errors, no slog.Info for errors).”

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
return deleteFn, func() {
if err := client.Close(); err != nil {
logger.Info("failed to close Pub/Sub admin client", "error", err)
}
return deleteFn, func() {
if err := client.Close(); err != nil {
logger.Error("failed to close Pub/Sub admin client", "error", err)
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/helper/adapter.go` around lines 561 - 564, The cleanup callback in the
Pub/Sub admin client close path is logging a real close failure at Info level,
which downplays an error; update the defer/cleanup function around
client.Close() to use an error-level log (via logger.Error) or return the close
error to the caller if it should be propagated. Keep the fix localized to the
deleteFn cleanup logic so the log severity matches the failure severity.

Source: Path instructions

Comment thread pkg/helper/adapter.go
Comment on lines +580 to +595
deleteFn, cleanup, err := newPubSubDeleteFunc(ctx, projectID, subscriptionID)
if err != nil {
outputStr := string(output)
if strings.Contains(outputStr, "NOT_FOUND") || strings.Contains(outputStr, "not found") {
return err
}
defer cleanup()

if err := deleteFn(ctx); err != nil {
if status.Code(err) == codes.NotFound {
logger.Info("Pub/Sub subscription not found, skipping deletion", "subscription", subscriptionID)
return nil
}
return fmt.Errorf("failed to delete Pub/Sub subscription %s: %w (output: %s)", subscriptionID, err, outputStr)
logger.Error("failed to delete Pub/Sub subscription",
"subscription", subscriptionID,
"project", projectID,
"error", err)
return fmt.Errorf("failed to delete Pub/Sub subscription %s: %w", subscriptionID, err)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Wrap all Pub/Sub delete failures with project context.

Line 582 returns the factory error bare, and Line 595 omits projectID from the returned delete error. Callers only log the returned error, so ADC/client and permission failures lose the project needed to diagnose cleanup failures.

Proposed change
 	deleteFn, cleanup, err := newPubSubDeleteFunc(ctx, projectID, subscriptionID)
 	if err != nil {
-		return err
+		return fmt.Errorf("failed to initialize Pub/Sub deletion for subscription %s in project %s: %w", subscriptionID, projectID, err)
 	}
 	defer cleanup()
@@
-		return fmt.Errorf("failed to delete Pub/Sub subscription %s: %w", subscriptionID, err)
+		return fmt.Errorf("failed to delete Pub/Sub subscription %s in project %s: %w", subscriptionID, projectID, err)
 	}

As per path instructions, “Wrap errors per Error Model Standard — no bare return err.”

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
deleteFn, cleanup, err := newPubSubDeleteFunc(ctx, projectID, subscriptionID)
if err != nil {
outputStr := string(output)
if strings.Contains(outputStr, "NOT_FOUND") || strings.Contains(outputStr, "not found") {
return err
}
defer cleanup()
if err := deleteFn(ctx); err != nil {
if status.Code(err) == codes.NotFound {
logger.Info("Pub/Sub subscription not found, skipping deletion", "subscription", subscriptionID)
return nil
}
return fmt.Errorf("failed to delete Pub/Sub subscription %s: %w (output: %s)", subscriptionID, err, outputStr)
logger.Error("failed to delete Pub/Sub subscription",
"subscription", subscriptionID,
"project", projectID,
"error", err)
return fmt.Errorf("failed to delete Pub/Sub subscription %s: %w", subscriptionID, err)
deleteFn, cleanup, err := newPubSubDeleteFunc(ctx, projectID, subscriptionID)
if err != nil {
return fmt.Errorf("failed to initialize Pub/Sub deletion for subscription %s in project %s: %w", subscriptionID, projectID, err)
}
defer cleanup()
if err := deleteFn(ctx); err != nil {
if status.Code(err) == codes.NotFound {
logger.Info("Pub/Sub subscription not found, skipping deletion", "subscription", subscriptionID)
return nil
}
logger.Error("failed to delete Pub/Sub subscription",
"subscription", subscriptionID,
"project", projectID,
"error", err)
return fmt.Errorf("failed to delete Pub/Sub subscription %s in project %s: %w", subscriptionID, projectID, err)
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/helper/adapter.go` around lines 580 - 595, The Pub/Sub delete path in the
adapter currently returns errors without project context, which breaks the error
model for cleanup failures. Update the newPubSubDeleteFunc setup in adapter.go
so the factory error is wrapped with subscriptionID and projectID context
instead of returned bare, and also include projectID in the wrapped error
returned from the deleteFn(ctx) failure path. Keep the existing logger calls in
sync with the returned error context so callers of the delete helper can
diagnose ADC/client and permission issues from the error alone.

Source: Path instructions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant