HYPERFLEET-1229 - feat: automate GCP developer cluster lifecycle enforcement#58
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds lifecycle enforcement documentation, Terraform inputs and module wiring, a Go Cloud Function, decision logic and tests, and a TTL labeling script. The new flow evaluates Sequence Diagram(s)sequenceDiagram
participant CloudSchedulerJob
participant EnforceLifecycle
participant ContainerAPI
participant ComputeAPI
CloudSchedulerJob->>EnforceLifecycle: HTTP POST with OIDC
EnforceLifecycle->>ContainerAPI: list clusters
EnforceLifecycle->>ComputeAPI: list managed instances and creation timestamps
EnforceLifecycle->>EnforceLifecycle: EvaluateCluster(clusterInfo, now)
EnforceLifecycle->>ContainerAPI: update labels, scale node pools, or delete cluster
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 10 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (10 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Comment |
3867ff6 to
f7c3d46
Compare
There was a problem hiding this comment.
Actionable comments posted: 10
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@functions/lifecycle-enforcer/decision_test.go`:
- Around line 311-324: The test case names in decision_test.go are misleading
because they describe the wrong scenario or implementation details rather than
the asserted behavior. Rename the affected cases in TestEvaluateCluster and any
related shutdown test blocks so the names match the actual expected
action/reason (for example, describe empty labels leading to missing owner and
shutdown), and ensure the "shutdown sets shutdown-date label" case reflects what
it actually verifies instead of implying SetLabels is asserted when it is not.
In `@functions/lifecycle-enforcer/decision.go`:
- Around line 88-95: The TTL-based delete path in decision.go is still allowing
deletion when owner metadata is missing, which bypasses the intended 7-day owner
grace period. Update the Decision logic in the shutdown-date block so the TTL
delete branch is only considered when hasOwner is true, keeping the
missing-owner delete path exclusive to its own grace period. Use the existing
Decision, hasOwner, hasTTL, and ttlExpired checks to make the TTL branch
dependent on owner presence.
In `@functions/lifecycle-enforcer/function.go`:
- Around line 178-225: `EvaluateCluster()` is deriving `NodeCount` from stale
`NodePool.InitialNodeCount` and overwriting it per MIG instead of using live
membership across all `InstanceGroupUrls`. Update the `lifecycle-enforcer` loop
to aggregate `managed.ManagedInstances` from every MIG into `npInfo.NodeCount`,
and if any `computeSvc.InstanceGroupManagers.ListManagedInstances` or
`computeSvc.Instances.Get` call fails, return an error rather than continuing
with partial/stale data. Keep the change localized around the `NodePoolInfo`
population logic so the pool is classified from current MIG state.
- Around line 236-279: Wait for the GKE long-running operations returned by
Cluster.SetResourceLabels, Clusters.NodePools.SetSize, and Clusters.Delete
before treating the work as complete. In lifecycle-enforcer’s shutdown/delete
flow, poll each returned Operation to DONE before logging success, and make sure
shutdown-date is only written after every node pool has reached zero. Update the
ActionShutdown and ActionDelete handling to use the operation result instead of
assuming the request finished immediately.
In `@functions/lifecycle-enforcer/go.mod`:
- Around line 32-39: Bump the vulnerable indirect module entries in go.mod for
the lifecycle-enforcer function: update golang.org/x/crypto,
golang.org/x/oauth2, and google.golang.org/grpc to versions at or above the
fixed releases called out in the review. Locate the existing dependency lines in
the go.mod block and raise them to the minimum safe versions, then make sure the
module graph still resolves cleanly after the version changes.
In `@Makefile`:
- Line 251: The recipe shells in the Makefile are vulnerable because
$(LIFECYCLE_DIR) is interpolated unquoted in the commands. Update the affected
targets that use $(LIFECYCLE_DIR) (including the go test, and the other commands
at the same pattern) to wrap the variable in quotes so the shell treats it as a
single path value. Use the existing recipe commands as the place to apply the
fix, keeping the behavior unchanged aside from preventing shell metacharacter
expansion.
In `@scripts/add-ttl-labels.sh`:
- Around line 5-8: `add-ttl-labels.sh` currently treats `DRY_RUN` as false for
any non-literal true value and uses `TTL_DAYS` without validation, so tighten
the CLI boundary before any label-write path. In the script’s startup logic
around `TTL_DAYS`, `DRY_RUN`, and `TTL_DATE`, validate that `TTL_DAYS` is a
positive integer within an expected range, and normalize `DRY_RUN` to an
explicit boolean with only accepted values. Ensure the downstream label
application flow only runs when the sanitized values are valid, and keep the
same checks consistent in the write path referenced by the script’s main action
block.
In `@terraform/main.tf`:
- Around line 5-12: The ttl label is being recomputed from plantimestamp()
inside the local common_labels setup, which causes every plan/apply to push the
cutoff forward and create drift. Update the ttl handling in the locals that
build common_labels so it comes from a creation-time value persisted in state or
is only stamped once, and avoid continuously managing ttl through this shared
label map. Keep the change localized around the ttl_date/common_labels symbols
so the lifecycle-enforcer logic continues to read a stable shutdown/delete
cutoff.
In `@terraform/modules/lifecycle/main.tf`:
- Around line 1-4: The lifecycle module is creating project-global GCP resources
with fixed names, which will collide across multiple Terraform states and make
one state able to destroy shared infrastructure. Update the naming in lifecycle
module symbols like local.function_name, local.bucket_name, the service account
definitions, and the scheduler job resource to include a unique namespace such
as the project or developer identifier, or otherwise restrict this module to a
single shared state. Ensure all references to these names stay consistent
throughout the module so each deployment gets isolated resource names.
- Around line 9-17: The google_storage_bucket resource for function_source is
missing public access prevention, so add the bucket-level setting to enforce
blocking public IAM exposure. Update the function_source bucket definition to
set public_access_prevention to "enforced" alongside the existing
uniform_bucket_level_access and other attributes, using the
google_storage_bucket "function_source" block as the place to make the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 998c6f8d-f89f-4e76-a917-cb86565f11be
⛔ Files ignored due to path filters (1)
functions/lifecycle-enforcer/go.sumis excluded by!**/*.sum,!**/go.sum
📒 Files selected for processing (18)
AGENTS.mdMakefileREADME.mdfunctions/lifecycle-enforcer/README.mdfunctions/lifecycle-enforcer/decision.gofunctions/lifecycle-enforcer/decision_test.gofunctions/lifecycle-enforcer/function.gofunctions/lifecycle-enforcer/go.modscripts/add-ttl-labels.shterraform/README.mdterraform/envs/gke/dev-prow.tfvarsterraform/envs/gke/dev.tfvars.exampleterraform/main.tfterraform/modules/lifecycle/main.tfterraform/modules/lifecycle/outputs.tfterraform/modules/lifecycle/variables.tfterraform/variables.tfterraform/versions.tf
🔗 Linked repositories identified
CodeRabbit considers these linked repositories for cross-repo context during reviews:
openshift-hyperfleet/architecture(manual)openshift-hyperfleet/hyperfleet-api(manual)openshift-hyperfleet/hyperfleet-sentinel(manual)openshift-hyperfleet/hyperfleet-adapter(manual)openshift-hyperfleet/hyperfleet-broker(manual)
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@Makefile`:
- Around line 251-276: The lifecycle Make targets currently treat a missing go
binary as a non-fatal warning under CI, which lets ci-validate pass without
running the checks. Update the go-missing branches in the lifecycle targets (the
test/build/lint recipes in Makefile, plus the ci-validate dependency chain) so
CI fails fast instead of skipping, while keeping the existing behavior only for
local non-CI usage if needed. Use the lifecycle target names and ci-validate
wiring to ensure the canonical pre-merge path always runs or errors when go is
unavailable.
In `@terraform/modules/lifecycle/main.tf`:
- Around line 80-82: Update the lifecycle function to stop using the deprecated
Go 1.23 runtime: in build_config for EnforceLifecycle, replace the go123 runtime
with a supported version such as go124 or newer, and make the matching Go
version change in functions/lifecycle-enforcer/go.mod so both places stay
aligned. Verify any lifecycle-enforcer build settings or related module
declarations reference the new Go version consistently.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 69ebe96a-0331-4efe-8007-b45ed875ca76
⛔ Files ignored due to path filters (1)
functions/lifecycle-enforcer/go.sumis excluded by!**/*.sum,!**/go.sum
📒 Files selected for processing (18)
AGENTS.mdMakefileREADME.mdfunctions/lifecycle-enforcer/README.mdfunctions/lifecycle-enforcer/decision.gofunctions/lifecycle-enforcer/decision_test.gofunctions/lifecycle-enforcer/function.gofunctions/lifecycle-enforcer/go.modscripts/add-ttl-labels.shterraform/README.mdterraform/envs/gke/dev-prow.tfvarsterraform/envs/gke/dev.tfvars.exampleterraform/main.tfterraform/modules/lifecycle/main.tfterraform/modules/lifecycle/outputs.tfterraform/modules/lifecycle/variables.tfterraform/variables.tfterraform/versions.tf
🔗 Linked repositories identified
CodeRabbit considers these linked repositories for cross-repo context during reviews:
openshift-hyperfleet/architecture(manual)openshift-hyperfleet/hyperfleet-api(manual)openshift-hyperfleet/hyperfleet-sentinel(manual)openshift-hyperfleet/hyperfleet-adapter(manual)openshift-hyperfleet/hyperfleet-broker(manual)
✅ Files skipped from review due to trivial changes (4)
- functions/lifecycle-enforcer/README.md
- terraform/modules/lifecycle/variables.tf
- terraform/README.md
- README.md
🚧 Files skipped from review as they are similar to previous changes (11)
- terraform/versions.tf
- terraform/envs/gke/dev.tfvars.example
- terraform/modules/lifecycle/outputs.tf
- terraform/variables.tf
- terraform/envs/gke/dev-prow.tfvars
- AGENTS.md
- functions/lifecycle-enforcer/decision_test.go
- scripts/add-ttl-labels.sh
- functions/lifecycle-enforcer/function.go
- terraform/main.tf
- functions/lifecycle-enforcer/decision.go
f7c3d46 to
51bfcaf
Compare
| } | ||
|
|
||
| env := cluster.Labels[LabelEnvironment] | ||
| if env == EnvCICD { |
There was a problem hiding this comment.
For cicd clusters like these https://console.cloud.google.com/kubernetes/list/overview?project=hcm-hyperfleet&pageState=(%22cluster_list_graphql_table%22:(%22f%22:%22%255B%257B_22k_22_3A_22Labels_22_2C_22t_22_3A13_2C_22v_22_3A_22_5C_22environment%2520_3A%2520cicd_~*environment%2520_3A%2520cicd_5C_22_22_2C_22s_22_3Atrue_2C_22i_22_3A_22labels_22%257D%255D%22))
Don't we still want to get rid of the ci-infra clusters?
The clusters with the owner : ci-infra-* label
There was a problem hiding this comment.
The management of ci-infra-* clusters is the responsibility of the Test Platform team. I don't think we should interfere with that in the scope of this ticket — the lifecycle enforcer intentionally skips them to avoid conflicts with Prow's own cleanup mechanism.
| logger := slog.Default() | ||
| now := time.Now().UTC() | ||
|
|
||
| projectID := os.Getenv("GCP_PROJECT_ID") |
There was a problem hiding this comment.
Should we standardize this to be GCP_PROJECT_ID or PROJECT_ID? Because I think it's set to PROJECT_ID
https://github.com/openshift-hyperfleet/hyperfleet-infra/blob/main/env.gcp#L2
https://github.com/openshift-hyperfleet/hyperfleet-infra/blob/main/terraform/bootstrap/setup-backend.sh#L22
here but in the e2e tests its set as GCP_PROJECT_ID
There was a problem hiding this comment.
Fixed — standardized to PROJECT_ID across the Cloud Function code, Terraform module env vars, shell script, and README to match the existing convention in env.gcp and setup-backend.sh.
| variable "schedule" { | ||
| description = "Cron schedule for the enforcement job (Cloud Scheduler format)" | ||
| type = string | ||
| default = "0 * * * *" |
There was a problem hiding this comment.
Jira says to run at 19:00 UTC but this will be running hourly.
HF-1229
runs daily at 19:00 UTC via Cloud Scheduler, iterates all GKE clusters in hcm-hyperfleet, and enforces lifecycle rules:
Maybe update jira to match?
There was a problem hiding this comment.
Good catch — updated the JIRA ticket with a comment explaining the change from nightly to hourly. Hourly is better because it catches idle clusters throughout the day and enforces TTL/owner checks more frequently. The schedule is configurable via lifecycle_enforcer_schedule (default: 0 * * * *).
| instanceName := lastSegment(mi.Instance) | ||
| inst, err := computeSvc.Instances.Get(projectID, igZone, instanceName).Context(ctx).Do() | ||
| if err != nil { | ||
| slog.Warn("failed to get instance details", |
There was a problem hiding this comment.
If we just warn and continue will get cause silent failures? Should we instead be collecting the failures and erroring?
There was a problem hiding this comment.
Fixed — added a retry mechanism (3 attempts with linear backoff: 500ms, 1s, 1.5s) for the Instances.Get call. If all retries fail, it still logs a warning and continues (the instance count from ListManagedInstances remains accurate, only the creation timestamp is lost for that specific instance).
| functions.HTTP("EnforceLifecycle", handleEnforceLifecycle) | ||
| } | ||
|
|
||
| func handleEnforceLifecycle(w http.ResponseWriter, r *http.Request) { |
There was a problem hiding this comment.
Validate that the request is an http.MethodPost?
Since that's what is defined here https://github.com/openshift-hyperfleet/hyperfleet-infra/pull/58/changes#diff-e16d384f48e95c49a8aa88db12e2e8582bf119542fe3101748eb9269950d3c80R119
| func handleEnforceLifecycle(w http.ResponseWriter, r *http.Request) { | |
| func handleEnforceLifecycle(w http.ResponseWriter, r *http.Request) { | |
| if r.Method != http.MethodPost { | |
| http.Error(w, "method not allowed", http.StatusMethodNotAllowed) | |
| return | |
| } |
There was a problem hiding this comment.
Fixed — added POST method validation at the top of the handler.
51bfcaf to
3564e2b
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@functions/lifecycle-enforcer/function.go`:
- Around line 246-277: The lifecycle-enforcer logic in the shutdown path applies
the `shutdown-date` label before the node pool scale-to-zero work completes, so
move the label update out of the early `SetResourceLabels` block and only apply
it after the `ActionShutdown` loop in `enforceLifecycle` succeeds for every node
pool. Keep the existing cluster label merge behavior, but ensure
`decision.SetLabels` is written only after all
`svc.Projects.Locations.Clusters.NodePools.SetSize` calls have completed without
error so a failed shutdown does not advance the delete window.
- Around line 78-158: The handler currently logs and continues on cluster-level
failures in the main loop, which leaves the overall request returning success
even when enforcement partially fails. In the lifecycle-enforcer function, add
an explicit failure-tracking flag around the cluster processing path that is set
when buildClusterInfo or executeDecision fails, then after encoding the JSON
response use that flag to return a non-2xx status. Keep the per-cluster results
and logging in place, but make the degraded behavior intentional with a clear
comment near the continue paths and the final response handling.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 88b71d29-efb9-484a-a088-c1b1b182662d
⛔ Files ignored due to path filters (1)
functions/lifecycle-enforcer/go.sumis excluded by!**/*.sum,!**/go.sum
📒 Files selected for processing (19)
AGENTS.mdMakefileREADME.mdfunctions/lifecycle-enforcer/README.mdfunctions/lifecycle-enforcer/decision.gofunctions/lifecycle-enforcer/decision_test.gofunctions/lifecycle-enforcer/function.gofunctions/lifecycle-enforcer/go.modscripts/add-ttl-labels.shterraform/README.mdterraform/envs/gke/dev-prow.tfvarsterraform/envs/gke/dev.tfvars.exampleterraform/main.tfterraform/modules/lifecycle/.gitignoreterraform/modules/lifecycle/main.tfterraform/modules/lifecycle/outputs.tfterraform/modules/lifecycle/variables.tfterraform/variables.tfterraform/versions.tf
🔗 Linked repositories identified
CodeRabbit considers these linked repositories for cross-repo context during reviews:
openshift-hyperfleet/architecture(manual)openshift-hyperfleet/hyperfleet-api(manual)openshift-hyperfleet/hyperfleet-sentinel(manual)openshift-hyperfleet/hyperfleet-adapter(manual)openshift-hyperfleet/hyperfleet-broker(manual)
✅ Files skipped from review due to trivial changes (6)
- terraform/modules/lifecycle/.gitignore
- terraform/modules/lifecycle/variables.tf
- terraform/README.md
- terraform/envs/gke/dev.tfvars.example
- AGENTS.md
- README.md
🚧 Files skipped from review as they are similar to previous changes (11)
- terraform/envs/gke/dev-prow.tfvars
- terraform/versions.tf
- functions/lifecycle-enforcer/go.mod
- functions/lifecycle-enforcer/decision_test.go
- terraform/variables.tf
- terraform/modules/lifecycle/outputs.tf
- functions/lifecycle-enforcer/decision.go
- terraform/modules/lifecycle/main.tf
- scripts/add-ttl-labels.sh
- terraform/main.tf
- Makefile
3564e2b to
86e8910
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
functions/lifecycle-enforcer/function.go (1)
223-235: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick winDo not drop nodes with unknown creation time.
Lines 223-235 turn an instance inventory failure into partial age data, but the idle rule is only safe when you know the age of every running node. If one fresh node is skipped here,
EvaluateCluster()can misclassify the pool as “all nodes >12h” and scale down an active cluster. CWE-703.Suggested fix
if lastErr != nil { - slog.Warn("failed to get instance details after retries", - "instance", instanceName, - "attempts", 3, - "error", lastErr, - ) - continue + return ClusterInfo{}, fmt.Errorf("getting instance details for %s/%s after retries: %w", igZone, instanceName, lastErr) } creationTime, err := time.Parse(time.RFC3339, inst.CreationTimestamp) if err != nil { - continue + return ClusterInfo{}, fmt.Errorf("parsing creation timestamp for %s: %w", instanceName, err) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@functions/lifecycle-enforcer/function.go` around lines 223 - 235, The instance age check in EvaluateCluster must not silently skip nodes when instance details or CreationTimestamp parsing fails. In the loop around lastErr and time.Parse, treat unknown creation time as an evaluation failure instead of continuing, so the cluster is not misclassified as fully idle; update the logic in the instance inventory handling path to return or mark the cluster unsafe-to-scale when any node’s age cannot be determined.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@functions/lifecycle-enforcer/README.md`:
- Around line 90-101: The README wording for the idle shutdown flow is
misleading because the enforcer is based on node age, not workload or traffic.
Update the description in the lifecycle enforcer README to refer to node
age/time-to-live eligibility rather than “inactivity,” and keep the mitigation
guidance aligned with that behavior so operators understand that actively used
but old nodes can still be shut down.
---
Duplicate comments:
In `@functions/lifecycle-enforcer/function.go`:
- Around line 223-235: The instance age check in EvaluateCluster must not
silently skip nodes when instance details or CreationTimestamp parsing fails. In
the loop around lastErr and time.Parse, treat unknown creation time as an
evaluation failure instead of continuing, so the cluster is not misclassified as
fully idle; update the logic in the instance inventory handling path to return
or mark the cluster unsafe-to-scale when any node’s age cannot be determined.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 3777ff06-b90e-409a-afc2-ef0f9b692fd0
⛔ Files ignored due to path filters (1)
functions/lifecycle-enforcer/go.sumis excluded by!**/*.sum,!**/go.sum
📒 Files selected for processing (19)
AGENTS.mdMakefileREADME.mdfunctions/lifecycle-enforcer/README.mdfunctions/lifecycle-enforcer/decision.gofunctions/lifecycle-enforcer/decision_test.gofunctions/lifecycle-enforcer/function.gofunctions/lifecycle-enforcer/go.modscripts/add-ttl-labels.shterraform/README.mdterraform/envs/gke/dev-prow.tfvarsterraform/envs/gke/dev.tfvars.exampleterraform/main.tfterraform/modules/lifecycle/.gitignoreterraform/modules/lifecycle/main.tfterraform/modules/lifecycle/outputs.tfterraform/modules/lifecycle/variables.tfterraform/variables.tfterraform/versions.tf
🔗 Linked repositories identified
CodeRabbit considers these linked repositories for cross-repo context during reviews:
openshift-hyperfleet/architecture(manual)openshift-hyperfleet/hyperfleet-api(manual)openshift-hyperfleet/hyperfleet-sentinel(manual)openshift-hyperfleet/hyperfleet-adapter(manual)openshift-hyperfleet/hyperfleet-broker(manual)
✅ Files skipped from review due to trivial changes (4)
- terraform/modules/lifecycle/.gitignore
- terraform/README.md
- AGENTS.md
- README.md
🚧 Files skipped from review as they are similar to previous changes (13)
- terraform/modules/lifecycle/variables.tf
- terraform/envs/gke/dev-prow.tfvars
- functions/lifecycle-enforcer/go.mod
- terraform/envs/gke/dev.tfvars.example
- terraform/modules/lifecycle/outputs.tf
- terraform/versions.tf
- terraform/main.tf
- terraform/modules/lifecycle/main.tf
- scripts/add-ttl-labels.sh
- Makefile
- terraform/variables.tf
- functions/lifecycle-enforcer/decision.go
- functions/lifecycle-enforcer/decision_test.go
…rcement Cloud Function (Go) that runs hourly via Cloud Scheduler to enforce idle shutdown (>12h), TTL expiration (48h grace), and missing owner (7 day grace) rules on developer GKE clusters. Includes Terraform module, Makefile targets, migration script, and dry-run mode.
86e8910 to
ea5643e
Compare
Summary
hcm-hyperfleetterraform/modules/lifecycle/) for Cloud Function Gen2, Cloud Scheduler, IAM, and GCS bucketttllabel (current date + 5 days viaplantimestamp()) andenvironmentvariable to Terraformcommon_labelsdev-prow.tfvarswithenvironment = "cicd"(exempt from enforcement)test-lifecycle-function,build-lifecycle-function,lint-lifecycle-function,add-ttl-labelsscripts/add-ttl-labels.sh) for adding TTL labels to existing clustersEnforcement rules
shutdown-dateshutdown-dateenvironment: cicdorhyperfleet-dev-ci-infra-*Rollout
lifecycle_enforcer_dry_run = true(default)DRY_RUN=false make add-ttl-labelsto label existing clusterslifecycle_enforcer_dry_run = falseand re-applyTest plan
make validate-terraformpassesmake test-lifecycle-functionpassesmake lint-lifecycle-functionpassesenable_lifecycle_enforcer = true+lifecycle_enforcer_dry_run = trueand validate logsadd-ttl-labels.shin dry-run mode againsthcm-hyperfleetRef: https://redhat.atlassian.net/browse/HYPERFLEET-1229