Use AWS Secrets Manager for privileged CI config by Kravalg · Pull Request #57 · VilnaCRM-Org/bootstrap-infrastructure

Kravalg · 2026-05-24T16:04:47Z

Pull Request

Description

Replaces the Pulumi Cloud/ESC CI configuration path with an AWS-only path:

GitHub Actions -> GitHub OIDC -> AWS Secrets Manager -> Pulumi CLI with S3 backend + AWS KMS secrets provider

The old .github/actions/load-esc-ci-env action and .github/ci/pulumi-esc.json are removed. Privileged workflows now call .github/actions/load-aws-ci-env, which assumes a GitHub OIDC config-read role, reads one fixed AWS Secrets Manager JSON secret, validates required keys without printing values, and exports the role/backend/stack values needed by later AWS OIDC steps.

Pulumi now creates the AWS-side config foundation directly: Secrets Manager secret containers and one GitHubCiConfigRead-* role per fixed CI suffix (test-pr, test, prod-preview, prod). Production preview and production apply use separate config-read role variables so the protected prod apply secret is not readable through the prod-preview path.

Deployment apply paths are saved-plan-only. The previous test-account direct make pulumi-up fallback after a saved-plan KMS decrypt failure has been removed, scripts/run_pulumi_command.py rejects direct pulumi up whenever GITHUB_ACTIONS=true, and same-repo PR jobs now fail fast if AWS_TEST_PR_CI_CONFIG_ROLE_ARN is missing instead of falling back to the main AWS_TEST_CI_CONFIG_ROLE_ARN.

Related Issues

Fixes #20.
Refs #49, #50, #52, #53, #54, #55, #56, #58.

The operations-alert issues are legacy duplicate candidates. They should stay open until the AWS-backed triage creates or updates a canonical fingerprinted issue and SRE supplies the required confirmation reference for the protected reconcile workflow.

Manual Secure Follow-Up

Follow docs/aws-secrets-manager-ci-cutover.md for the AWS-only setup.

Required non-secret GitHub repository variables:

AWS_TEST_REGION
AWS_TEST_PR_CI_CONFIG_ROLE_ARN
AWS_TEST_CI_CONFIG_ROLE_ARN
AWS_PROD_REGION
AWS_PROD_PREVIEW_CI_CONFIG_ROLE_ARN
AWS_PROD_CI_CONFIG_ROLE_ARN

Required AWS Secrets Manager JSON secret IDs:

/bootstrap-infrastructure/ci/test-pr
/bootstrap-infrastructure/ci/test
/bootstrap-infrastructure/ci/prod-preview
/bootstrap-infrastructure/ci/prod

Do not paste secret JSON values into GitHub, chat, logs, docs, or Pulumi config. Use put-secret-value from a private local file and verify with describe-secret, not get-secret-value.

Current External Setup State

The six non-secret GitHub repository variables for the AWS config-read roles are populated.
Privileged PR jobs reach GitHub OIDC and fail at AWS authorization for the test role: Could not assume role with OIDC: Not authorized to perform sts:AssumeRoleWithWebIdentity for arn:aws:iam::891377212104:role/GitHubCiConfigRead-bootstrap-infrastructure-test-pr.
The local AWS CLI currently has only the default profile, and after unsetting inherited AWS env vars it resolves to production account 933245420672 as arn:aws:iam::933245420672:user/codex; a separate test-account profile for account 891377212104 is still needed before fixing test-account resources from this workstation.
The current shell process still has inherited AWS credential env vars; values were not recorded. Use unset AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY AWS_SESSION_TOKEN AWS_PROFILE before profile-based verification.
AWS MCP production identity succeeds for account 933245420672 as arn:aws:iam::933245420672:user/codex.
AWS MCP production metadata currently shows the new prod AWS-only resources are not present before stack apply: /bootstrap-infrastructure/ci/prod-preview and /bootstrap-infrastructure/ci/prod return ResourceNotFoundException; the corresponding GitHubCiConfigRead-* roles are not present yet.

How Has This Been Tested?

Latest local validation on this branch:

make test-unit (389 passed, 100% coverage)
uv run pytest tests/pulumi/test_project_structure.py tests/pulumi/test_delivery_contracts.py -q (58 passed)
uv run pytest tests/pulumi/test_delivery_contracts.py::test_pr_comment_workflows_gate_prod_after_successful_test_apply tests/pulumi/test_delivery_contracts.py::test_prod_workflow_requires_successful_test_deploy_for_same_sha -q (2 passed)
uv run pytest tests/unit/test_components.py::test_ci_configuration_requires_github_oidc_provider tests/unit/test_mutation_targets.py::test_mutation_target_ci_config_validation_and_lookup_helpers -q (2 passed)
uv run pytest tests/pulumi/test_project_structure.py::test_docs_cover_current_testing_and_guardrail_guidance tests/pulumi/test_project_structure.py::test_issue20_cutover_manual_is_secret_safe_and_actionable tests/pulumi/test_project_structure.py::test_issue20_closeout_evidence_tracks_external_manual_steps -q (3 passed)
uv run pytest tests/pulumi/test_delivery_contracts.py::test_aws_ci_loader_reads_secrets_manager_without_pulumi_cloud tests/pulumi/test_delivery_contracts.py::test_multi_account_workflows_use_fixed_aws_ci_config_contracts tests/pulumi/test_delivery_contracts.py::test_multi_account_environment_docs_are_explicit tests/unit/test_script_entrypoints.py::test_run_up_plan_stack_rejects_plan_decrypt_without_direct_apply -q (4 passed)
uv run pytest tests/unit/test_script_entrypoints.py::test_run_up_stack_does_not_retry_after_lock tests/unit/test_script_entrypoints.py::test_run_up_stack_rejects_direct_apply_in_github_actions tests/unit/test_script_entrypoints.py::test_run_pulumi_command_unhandled_apply_failures_return_status tests/unit/test_script_entrypoints.py::test_run_pulumi_command_dispatch_propagates_apply_failures -q (4 passed)
uv run pytest tests/pulumi/test_delivery_contracts.py::test_makefile_keeps_pulumi_guardrails_secret_safe tests/pulumi/test_project_structure.py::test_issue20_cutover_manual_is_secret_safe_and_actionable -q (2 passed)
uv run pytest tests/pulumi/test_ci_guardrails.py::test_preview_guardrail_workflow_requires_preview_diff_and_iam_jobs tests/pulumi/test_ci_guardrails.py::test_well_architected_evidence_workflow_uploads_enforced_reports tests/pulumi/test_delivery_contracts.py::test_multi_account_workflows_use_fixed_aws_ci_config_contracts -q (3 passed)
uv run pytest tests/pulumi/test_project_structure.py::test_issue20_closeout_evidence_tracks_external_manual_steps -q (1 passed)
uv run ruff check scripts/run_pulumi_command.py tests/unit/test_script_entrypoints.py tests/pulumi/test_delivery_contracts.py tests/pulumi/test_project_structure.py tests/pulumi/test_ci_guardrails.py
make test-actionlint
make test-yaml
make test-secrets (latest commit scanned; no leaks found)
qlty check
git diff --check

bmalph workflow status:

bmalph doctor ran and reported the project is not initialized for bmalph/BMAD artifacts.
bmalph init --dry-run --platform codex ran to inspect the generated artifact set without adding unrelated workflow scaffolding to this PR.

Checklist

My code follows the code style of this project.
I have performed a self-review of my code.
My change requires a change to the documentation.
I have updated the documentation accordingly.
I have added tests to cover my changes.
Local tests passed; external AWS/GitHub setup blockers are called out above.
You have only one commit. Repository rules allow squash merge into main; this branch is intentionally not force-squashed while review/check state is active.

qodo-code-review · 2026-05-24T16:04:51Z

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

coderabbitai · 2026-05-24T16:04:54Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a composite GitHub Action that resolves/authenticates to Pulumi Cloud, loads and validates Pulumi ESC environmentVariables, exports AWS/Pulumi outputs for OIDC assume-role steps, implements deterministic operations-alert fingerprinting and reconcile tooling, rewires workflows to consume ESC outputs, updates IAM trust-policy generation, tests, and documentation.

Changes

Pulumi ESC-Driven CI Configuration

Layer / File(s)	Summary
ESC loader action and outputs `.github/actions/load-esc-ci-env/action.yml`	Composite action resolves Pulumi ESC target (from `.github/ci/pulumi-esc.json` or override), authenticates to Pulumi Cloud, loads ESC environmentVariables, validates required keys/patterns, and emits AWS/Pulumi outputs (`steps.esc.outputs.*`).
Committed ESC target config `.github/ci/pulumi-esc.json`	Adds committed Pulumi CI target configuration (`organization: vilnacrm-org`, `project: bootstrap-infrastructure`) used by the ESC loader action.
CI environment validator & unit tests `scripts/validate_ci_environment.py`, `tests/unit/test_validate_ci_environment.py`	CLI validator parses `--required-keys`, enforces per-key regex/shape validators (account ID, region, role ARN, backend/secrets provider, stack lists, SNS topic, metadata-only names), writes AWS_DEFAULT_REGION to GITHUB_ENV when applicable, rejects newline injection, and is thoroughly tested.
Operations alert fingerprinting & triage script `scripts/operations_alert_triage.py`, `tests/unit/test_operations_alert_triage.py`	Sanitizes nested SNS/SQS/EventBridge payloads, computes deterministic SHA-256 batch fingerprints from stable metadata, renders sanitized issue bodies, and writes body+fingerprint files for workflow consumption and dedupe.
Operations alert workflow integration `.github/workflows/operations-alert-triage.yml`	Workflow loads triage ESC config via ESC loader, assumes triage role via OIDC using ESC outputs, delegates rendering/fingerprinting to the script, searches issues by fingerprint, comments or creates accordingly, and deletes SQS messages only after success.
Legacy reconcile & cleanup workflows `.github/workflows/operations-alert-reconcile.yml`, `.github/workflows/github-environment-legacy-cleanup.yml`	Manual reconcile workflow to close legacy pre-fingerprint issues as duplicates of a canonical fingerprinted issue; manual cleanup workflow to list/delete allowlisted legacy GitHub Environment variables with dry-run and confirmation gating.
Preview / test / drift workflow rewires `.github/workflows/pulumi-pr-guardrails.yml`, `.github/workflows/pulumi-test-deploy.yml`, `.github/workflows/nightly-guardrails.yml`, `.github/workflows/well-architected-evidence.yml`, `tests/pulumi/test_ci_guardrails.py`	PR guardrails, test deploy, nightly drift, and evidence jobs now load fixed ESC environments per-purpose via the ESC loader, grant id-token permissions as needed, consume `steps.esc.outputs.*` for OIDC credential configuration, remove privileged job-level env wiring, and emit ESC-derived evidence.
Production workflows & command-runner `.github/workflows/pulumi-prod.yml`, `.github/workflows/pulumi-pr-command-runner.yml`	Prod preview/IAM validation/apply/post-apply drift jobs load `prod-preview`/`prod` ESC environments with purpose-specific required-keys, use ESC outputs for OIDC role assumption/account/region allowlists, and keep production apply gated by protected `prod` GitHub Environment approval.
Pulumi automation IAM trust policy `pulumi/infra/automation.py`, `tests/unit/test_components.py`	Trust-policy builders accept branch_name and produce branch/workflow-scoped token.subjects and explicit job_workflow_ref allowlists; operations-alert triage role is bound to a specific workflow ref; production apply retains `repo:...:environment:prod` subject for approval gating.
Structural and unit tests updated `tests/pulumi/`, `tests/unit/`	Structural tests updated to discover ESC loader steps and required-keys assertions, ensure AWS credential inputs come from `steps.esc.outputs.*`, skip SHA pin checks for local `./` actions, and add unit tests covering validator, alert triage, and IAM subject behaviors.
Documentation and specs `README.md`, `docs/`, `specs/issue-20-pulumi-esc-ci-config/`, `.github/github-actions-secrets.md`	Extensive docs/specs updated to require fixed Pulumi ESC environments (`test-pr`, `test`, `prod-preview`, `prod`), list required ESC environmentVariables, describe OIDC-first flows, migration checklist, operations-alert fingerprinting/dedupe semantics, and readiness artifacts.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

Configure prod-preview and prod GitHub Environment variables for PR-comment Pulumi QA #48 — ESC loader and workflow rewires supersede prior repo-environment reliance and map to that issue’s objectives.
Use AWS Secrets Manager and minimize GitHub Environment configuration #20 — This PR implements the Issue 20 Pulumi ESC CI configuration and associated tooling described there.

Possibly related PRs

VilnaCRM-Org/bootstrap-infrastructure#43 — Related operations-alert triage workflow changes; this PR centralizes ESC loading and adds fingerprint-based triage logic.
VilnaCRM-Org/bootstrap-infrastructure#19 — Overlapping CI workflow rewires that also refactor credential/OIDC handling.
VilnaCRM-Org/bootstrap-infrastructure#2 — Related prior Pulumi OIDC/trust scaffolding that these automation changes build upon.

Suggested labels

ci, pulumi, aws, documentation, needs-admin, external-control

Suggested reviewers

dmytrocraft
cubic-dev-ai

"🐰 I hopped through ESC fields with glee,
I moved secrets where they ought to be.
Fingerprints hush duplicate alarm,
Branch-scoped trust keeps production calm.
Cheers — a carrot for the CI tree!"

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch codex/issue20-pulumi-esc

qltysh · 2026-05-24T16:06:57Z

All good ✅

cubic-dev-ai

5 issues found across 31 files

Confidence score: 2/5

High merge risk due to a concrete security concern in pulumi/infra/automation.py: broadened prod role trust to branch/PR subjects could bypass environment-based production approval controls.
scripts/validate_ci_environment.py has two user-impacting validation gaps: newline handling when writing $GITHUB_ENV (environment-file injection risk) and allowing an empty --required-keys set to pass without actually validating CI config.
Given the high severities (9/10 and 7/10) and strong confidence signals, this does not look safe to merge until these are addressed; the docs issues are lower risk but still worth fixing for operator correctness.
Pay close attention to pulumi/infra/automation.py, scripts/validate_ci_environment.py, docs/ci-guardrails.md, docs/github-actions-secrets.md - security trust boundaries, CI validation hardening, and preventing misleading/broken guidance.

_{Tip: cubic can generate docs of your entire codebase and keep them up to date. Try it here.

Re-trigger cubic}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

pulumi/infra/automation.py (1)

592-645: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Require environment:prod as the only prod subject.

Line 601 still whitelists repo:...:ref:refs/heads/{branch_name} for the production role. Combined with the workflow refs on Lines 623-645, any allowed main-branch workflow can satisfy the trust policy without ever entering the protected GitHub prod environment, which turns the manual approval gate into an optional path instead of an enforcement point.

🔐 Suggested tightening

 def _automation_assume_role_policy(
     oidc_provider_arn: str,
     org: str,
     repo_name: str,
     production_environment: str,
     branch_name: str,
 ) -> str:
     """Build the GitHub OIDC trust policy for fixed workflow automation."""
     workflow_prefix = f"{org}/{repo_name}/.github/workflows"
-    subjects = [f"repo:{org}/{repo_name}:ref:refs/heads/{branch_name}"]
     if production_environment == "prod":
-        subjects.append(f"repo:{org}/{repo_name}:environment:{production_environment}")
+        subjects = [
+            f"repo:{org}/{repo_name}:environment:{production_environment}"
+        ]
     else:
-        subjects.append(f"repo:{org}/{repo_name}:pull_request")
+        subjects = [
+            f"repo:{org}/{repo_name}:ref:refs/heads/{branch_name}",
+            f"repo:{org}/{repo_name}:pull_request",
+        ]

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pulumi/infra/automation.py` around lines 592 - 645, The trust policy in
_automation_assume_role_policy currently always includes the branch ref subject
(repo:...:ref:refs/heads/{branch_name}), which allows workflows on the main
branch to assume the prod role without entering the protected GitHub
environment; change the subject construction so that when production_environment
== "prod" the subjects list contains only the environment subject
(repo:{org}/{repo_name}:environment:prod) and does not include the branch ref,
otherwise keep the existing non-prod subjects (ref and pull_request); update the
subjects variable used in the "token.actions.githubusercontent.com:sub"
condition accordingly.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/operations-alert-reconcile.yml:
- Around line 33-89: The workflow calls gh commands (functions: gh issue view
and gh issue close) without an explicit repository context so they can fail;
update every gh invocation in the script (the gh issue view calls that set
canonical_body, state, title, body and the gh issue close call) to include
--repo "${GH_REPO}" (or alternately add an actions/checkout step and export
GH_REPO before use) so the GitHub CLI has an explicit repository context at
runtime.

In `@scripts/operations_alert_triage.py`:
- Around line 138-142: main() assumes the file at args.alerts_json parses to a
dict and currently lets json errors or non-object top-level JSON raise; wrap the
JSON load in a try/except to catch json.JSONDecodeError and TypeError, validate
that the parsed alerts is a dict (isinstance(alerts, dict)), and if either the
parse fails or the result is not a dict, log a clear error mentioning
args.alerts_json and return 1 instead of raising; keep using
alerts_fingerprint(alerts) only after validation.

---

Outside diff comments:
In `@pulumi/infra/automation.py`:
- Around line 592-645: The trust policy in _automation_assume_role_policy
currently always includes the branch ref subject
(repo:...:ref:refs/heads/{branch_name}), which allows workflows on the main
branch to assume the prod role without entering the protected GitHub
environment; change the subject construction so that when production_environment
== "prod" the subjects list contains only the environment subject
(repo:{org}/{repo_name}:environment:prod) and does not include the branch ref,
otherwise keep the existing non-prod subjects (ref and pull_request); update the
subjects variable used in the "token.actions.githubusercontent.com:sub"
condition accordingly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 73e5b8ff-0988-4aad-b921-7cfb82aabba5

📥 Commits

Reviewing files that changed from the base of the PR and between 84bf011 and 760a838.

📒 Files selected for processing (34)

.github/actions/load-esc-ci-env/action.yml
.github/ci/pulumi-esc.json
.github/github-actions-secrets.md
.github/workflows/nightly-guardrails.yml
.github/workflows/operations-alert-reconcile.yml
.github/workflows/operations-alert-triage.yml
.github/workflows/pulumi-pr-command-runner.yml
.github/workflows/pulumi-pr-guardrails.yml
.github/workflows/pulumi-prod.yml
.github/workflows/pulumi-test-deploy.yml
.github/workflows/well-architected-evidence.yml
README.md
docs/README.md
docs/alert-routing-evidence.md
docs/ci-architecture.md
docs/ci-guardrails.md
docs/github-actions-secrets.md
docs/security-operating-evidence.md
docs/sre-operations.md
pulumi/infra/automation.py
scripts/operations_alert_triage.py
scripts/validate_ci_environment.py
specs/issue-18-multi-account-test-prod-pulumi-environments/epics.md
specs/issue-18-multi-account-test-prod-pulumi-environments/prd.md
specs/issue-20-pulumi-esc-ci-config/architecture.md
specs/issue-20-pulumi-esc-ci-config/epics.md
specs/issue-20-pulumi-esc-ci-config/implementation-readiness-report.md
specs/issue-20-pulumi-esc-ci-config/prd.md
tests/pulumi/test_ci_guardrails.py
tests/pulumi/test_delivery_contracts.py
tests/pulumi/test_project_structure.py
tests/unit/test_components.py
tests/unit/test_operations_alert_triage.py
tests/unit/test_validate_ci_environment.py

✅ Files skipped from review due to trivial changes (4)

README.md
specs/issue-20-pulumi-esc-ci-config/implementation-readiness-report.md
docs/sre-operations.md
specs/issue-20-pulumi-esc-ci-config/epics.md

cubic-dev-ai

0 issues found across 10 files (changes from recent commits).

_{Requires human review: Auto-approval blocked by 2 unresolved issues from previous reviews.

Re-trigger cubic}

cubic-dev-ai

0 issues found across 5 files (changes from recent commits).

_{Re-trigger cubic}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

docs/ci-guardrails.md (1)

516-527: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Complete the post-cutover checklist.

The rollout steps stop short of the actions that actually finish this migration: applying the Pulumi stacks to stamp the tightened IAM trust, running the legacy environment-variable cleanup, and deleting the temporary GH_ENVIRONMENT_ADMIN_TOKEN afterward. As written, an operator can follow this section and still leave stale AWS trust or stale privileged GitHub config behind.

📝 Suggested checklist update

 1. create the GitHub OIDC IAM roles in AWS
 2. create and populate the four Pulumi ESC environments listed above
 3. configure Pulumi Cloud OIDC for this repository and organization
-4. create only the protected `prod` GitHub Environment for production approval
-5. enable required reviewers and branch restrictions on `prod`
-6. mark the required PR checks in GitHub branch protection
-7. decide whether production repositories want stricter stack lists or narrower
+4. apply the Pulumi test/prod stacks so the updated IAM trust policies converge in AWS
+5. run the legacy GitHub Environment variable cleanup workflow (dry-run, then confirmed apply)
+6. delete the temporary `GH_ENVIRONMENT_ADMIN_TOKEN` secret after cleanup
+7. create only the protected `prod` GitHub Environment for production approval
+8. enable required reviewers and branch restrictions on `prod`
+9. mark the required PR checks in GitHub branch protection
+10. decide whether production repositories want stricter stack lists or narrower
    IAM role scopes than the template defaults

As per coding guidelines, docs/**: Update docs/ whenever the developer workflow, CI surface, or credential contract changes.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/ci-guardrails.md` around lines 516 - 527, Update the "Manual maintainer
follow-up" section to include the missing post-cutover steps: explicitly
instruct maintainers to apply the Pulumi stacks to enforce the new IAM trust
(reference "Pulumi ESC environments" and "apply the Pulumi stacks"), run the
legacy environment-variable cleanup, and delete the temporary
GH_ENVIRONMENT_ADMIN_TOKEN; also add a short confirmation step to verify no
stale AWS trust or privileged GitHub config remains (referencing the protected
`prod` GitHub Environment and branch protection items) so operators have a
complete checklist to finish the migration.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/operations-alert-reconcile.yml:
- Around line 59-91: The loop over LEGACY_ISSUES should first parse, trim and
deduplicate tokens and error if no valid issue IDs were produced; change the
parsing around normalized/raw_issue/issue to skip empty/whitespace-only tokens,
validate and add each numeric issue to a deduplicated collection (e.g. an
associative array or temporary list keyed by issue) and after parsing fail with
a non-zero exit if the deduped collection is empty, then iterate over the
deduped issue list (instead of the raw tokens) for the existing validation and
gh issue close logic (keeping checks that issue != canonical, state is OPEN,
title matches, and no fingerprint marker).

---

Outside diff comments:
In `@docs/ci-guardrails.md`:
- Around line 516-527: Update the "Manual maintainer follow-up" section to
include the missing post-cutover steps: explicitly instruct maintainers to apply
the Pulumi stacks to enforce the new IAM trust (reference "Pulumi ESC
environments" and "apply the Pulumi stacks"), run the legacy
environment-variable cleanup, and delete the temporary
GH_ENVIRONMENT_ADMIN_TOKEN; also add a short confirmation step to verify no
stale AWS trust or privileged GitHub config remains (referencing the protected
`prod` GitHub Environment and branch protection items) so operators have a
complete checklist to finish the migration.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e368df4c-8673-4013-b117-26c62db7549a

📥 Commits

Reviewing files that changed from the base of the PR and between 760a838 and 378c21e.

📒 Files selected for processing (35)

.github/actions/load-esc-ci-env/action.yml
.github/ci/pulumi-esc.json
.github/github-actions-secrets.md
.github/workflows/github-environment-legacy-cleanup.yml
.github/workflows/nightly-guardrails.yml
.github/workflows/operations-alert-reconcile.yml
.github/workflows/operations-alert-triage.yml
.github/workflows/pulumi-pr-command-runner.yml
.github/workflows/pulumi-pr-guardrails.yml
.github/workflows/pulumi-prod.yml
.github/workflows/pulumi-test-deploy.yml
.github/workflows/well-architected-evidence.yml
README.md
docs/README.md
docs/alert-routing-evidence.md
docs/ci-architecture.md
docs/ci-guardrails.md
docs/github-actions-secrets.md
docs/security-operating-evidence.md
docs/sre-operations.md
pulumi/infra/automation.py
scripts/operations_alert_triage.py
scripts/validate_ci_environment.py
specs/issue-18-multi-account-test-prod-pulumi-environments/epics.md
specs/issue-18-multi-account-test-prod-pulumi-environments/prd.md
specs/issue-20-pulumi-esc-ci-config/architecture.md
specs/issue-20-pulumi-esc-ci-config/epics.md
specs/issue-20-pulumi-esc-ci-config/implementation-readiness-report.md
specs/issue-20-pulumi-esc-ci-config/prd.md
tests/pulumi/test_ci_guardrails.py
tests/pulumi/test_delivery_contracts.py
tests/pulumi/test_project_structure.py
tests/unit/test_components.py
tests/unit/test_operations_alert_triage.py
tests/unit/test_validate_ci_environment.py

✅ Files skipped from review due to trivial changes (6)

.github/ci/pulumi-esc.json
specs/issue-20-pulumi-esc-ci-config/epics.md
specs/issue-20-pulumi-esc-ci-config/implementation-readiness-report.md
README.md
docs/sre-operations.md
docs/alert-routing-evidence.md

Kravalg · 2026-05-24T18:52:24Z

All currently visible review threads have been addressed in 647c160768b118e710207577d5603e488c15875f.

Specifically:

operations-alert-reconcile.yml now parses, deduplicates, and rejects empty legacy issue input before closing anything.
docs/ci-guardrails.md now includes the post-cutover Pulumi stack apply, legacy GitHub Environment cleanup, temporary token deletion, and stale-trust/config verification steps.
Focused validation plus project-structure, Ruff, YAML, actionlint, and diff checks passed locally.

@coderabbitai please re-review the latest head.

coderabbitai · 2026-05-24T18:52:38Z

@Kravalg Thanks for the detailed summary of the addressed threads. I'll kick off a full re-review against the latest head now.

✅ Actions performed

Full review triggered.

coderabbitai

🧹 Nitpick comments (1)

.github/workflows/operations-alert-triage.yml (1)

123-129: ⚡ Quick win

Use uv run for the triage Python CLI invocation.

Switching this call to uv run keeps interpreter/dependency resolution consistent with the repo’s Python command contract.

♻️ Proposed change

-          python3 scripts/operations_alert_triage.py \
+          uv run python scripts/operations_alert_triage.py \
             --alerts-json "${alerts_json}" \
             --queue-name "${OPERATIONS_ALERT_QUEUE_NAME}" \
             --account-id "${AWS_ACCOUNT_ID}" \
             --region "${AWS_REGION}" \
             --body-file "${body_file}" \
             --fingerprint-file "${fingerprint_file}"

As per coding guidelines, "Use uv run ... for Python CLI commands instead of invoking tools directly from the global environment."

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/operations-alert-triage.yml around lines 123 - 129,
Replace the direct python3 invocation of the triage CLI with the repository
Python runner; specifically change the call that runs operations_alert_triage.py
(the block invoking "python3 scripts/operations_alert_triage.py ...") to use "uv
run" so the command becomes "uv run scripts/operations_alert_triage.py ..."
ensuring interpreter and dependency resolution follow the repo's Python command
contract.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In @.github/workflows/operations-alert-triage.yml:
- Around line 123-129: Replace the direct python3 invocation of the triage CLI
with the repository Python runner; specifically change the call that runs
operations_alert_triage.py (the block invoking "python3
scripts/operations_alert_triage.py ...") to use "uv run" so the command becomes
"uv run scripts/operations_alert_triage.py ..." ensuring interpreter and
dependency resolution follow the repo's Python command contract.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: fb1ae087-5e62-43d9-846c-f922c467a76e

📥 Commits

Reviewing files that changed from the base of the PR and between d301569 and 647c160.

📒 Files selected for processing (35)

.github/actions/load-esc-ci-env/action.yml
.github/ci/pulumi-esc.json
.github/github-actions-secrets.md
.github/workflows/github-environment-legacy-cleanup.yml
.github/workflows/nightly-guardrails.yml
.github/workflows/operations-alert-reconcile.yml
.github/workflows/operations-alert-triage.yml
.github/workflows/pulumi-pr-command-runner.yml
.github/workflows/pulumi-pr-guardrails.yml
.github/workflows/pulumi-prod.yml
.github/workflows/pulumi-test-deploy.yml
.github/workflows/well-architected-evidence.yml
README.md
docs/README.md
docs/alert-routing-evidence.md
docs/ci-architecture.md
docs/ci-guardrails.md
docs/github-actions-secrets.md
docs/security-operating-evidence.md
docs/sre-operations.md
pulumi/infra/automation.py
scripts/operations_alert_triage.py
scripts/validate_ci_environment.py
specs/issue-18-multi-account-test-prod-pulumi-environments/epics.md
specs/issue-18-multi-account-test-prod-pulumi-environments/prd.md
specs/issue-20-pulumi-esc-ci-config/architecture.md
specs/issue-20-pulumi-esc-ci-config/epics.md
specs/issue-20-pulumi-esc-ci-config/implementation-readiness-report.md
specs/issue-20-pulumi-esc-ci-config/prd.md
tests/pulumi/test_ci_guardrails.py
tests/pulumi/test_delivery_contracts.py
tests/pulumi/test_project_structure.py
tests/unit/test_components.py
tests/unit/test_operations_alert_triage.py
tests/unit/test_validate_ci_environment.py

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

.github/actions/load-esc-ci-env/action.yml (1)

95-97: 💤 Low value

Consider using uv run for the validation script.

The coding guidelines specify using uv run ... for Python CLI commands. The validation script is invoked directly with python3.

♻️ Suggested change

-        python3 scripts/validate_ci_environment.py \
+        uv run python scripts/validate_ci_environment.py \
           --purpose "${{ inputs.purpose }}" \
           --required-keys "${{ inputs.required-keys }}"

As per coding guidelines: **/*.{py,sh}: Use uv run ... for Python CLI commands instead of invoking tools directly from the global environment.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/actions/load-esc-ci-env/action.yml around lines 95 - 97, Replace the
direct python3 invocation of the validation script with the project's task
runner by calling "uv run" for the CLI command: change the step that runs
"python3 scripts/validate_ci_environment.py --purpose \"${{ inputs.purpose }}\"
--required-keys \"${{ inputs.required-keys }}\"" to use "uv run" so the script
is executed via the project's uv task runner; update the command that references
scripts/validate_ci_environment.py and the inputs purpose/required-keys
accordingly.

tests/pulumi/test_project_structure.py (1)

316-366: ⚡ Quick win

Normalize trigger access instead of indexing workflow[True] directly.

Line 316 and Line 358 assume PyYAML always converts on to True. If a workflow uses quoted "on", these tests fail for the wrong reason. Use the same normalized fallback used elsewhere in this suite.

Proposed fix

-    assert "workflow_dispatch" in reconcile_workflow[True]  # nosec B101
+    reconcile_triggers = reconcile_workflow.get("on", reconcile_workflow.get(True, {}))
+    assert "workflow_dispatch" in reconcile_triggers  # nosec B101
@@
-    assert "workflow_dispatch" in cleanup_workflow[True]  # nosec B101
+    cleanup_triggers = cleanup_workflow.get("on", cleanup_workflow.get(True, {}))
+    assert "workflow_dispatch" in cleanup_triggers  # nosec B101
@@
-    assert cleanup_workflow[True]["workflow_dispatch"]["inputs"]["dry_run"][  # nosec B101
+    assert cleanup_triggers["workflow_dispatch"]["inputs"]["dry_run"][  # nosec B101
         "default"
     ]

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/pulumi/test_project_structure.py` around lines 316 - 366, The tests
index the parsed YAML with True (reconcile_workflow[True],
cleanup_workflow[True]) which breaks when PyYAML preserves the "on" key as the
string "on"; update those assertions to use the same normalized fallback used
elsewhere: read the trigger via reconcile_workflow.get("on",
reconcile_workflow.get(True)) and cleanup_workflow.get("on",
cleanup_workflow.get(True)) (use that value for checks like workflow_dispatch
and inputs), replacing direct [True] indexing on reconcile_workflow and
cleanup_workflow.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/operations_alert_triage.py`:
- Around line 58-66: The code assumes incoming "Messages" is a list of dicts and
directly accesses message.get(...) which will crash for non-list or non-dict
payloads; update the message-shape handling so you first validate that Messages
is a list and iterate only over items that are dicts (skip or coerce others),
and ensure local variables like message, attributes, sns, and event are always
dicts (default to {} when missing) before calling .get; apply this defensive
pattern to the blocks building the return dict (references: variables message,
attributes, sns, event and the code paths around lines 58-66 and the regions
noted 90-97 and 105-129) so malformed payloads are skipped or safely handled
instead of raising.

---

Nitpick comments:
In @.github/actions/load-esc-ci-env/action.yml:
- Around line 95-97: Replace the direct python3 invocation of the validation
script with the project's task runner by calling "uv run" for the CLI command:
change the step that runs "python3 scripts/validate_ci_environment.py --purpose
\"${{ inputs.purpose }}\" --required-keys \"${{ inputs.required-keys }}\"" to
use "uv run" so the script is executed via the project's uv task runner; update
the command that references scripts/validate_ci_environment.py and the inputs
purpose/required-keys accordingly.

In `@tests/pulumi/test_project_structure.py`:
- Around line 316-366: The tests index the parsed YAML with True
(reconcile_workflow[True], cleanup_workflow[True]) which breaks when PyYAML
preserves the "on" key as the string "on"; update those assertions to use the
same normalized fallback used elsewhere: read the trigger via
reconcile_workflow.get("on", reconcile_workflow.get(True)) and
cleanup_workflow.get("on", cleanup_workflow.get(True)) (use that value for
checks like workflow_dispatch and inputs), replacing direct [True] indexing on
reconcile_workflow and cleanup_workflow.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9cfc1992-ca5f-4e88-9b7d-9838e3472f10

📥 Commits

Reviewing files that changed from the base of the PR and between 647c160 and a95ab85.

📒 Files selected for processing (35)

.github/actions/load-esc-ci-env/action.yml
.github/ci/pulumi-esc.json
.github/github-actions-secrets.md
.github/workflows/github-environment-legacy-cleanup.yml
.github/workflows/nightly-guardrails.yml
.github/workflows/operations-alert-reconcile.yml
.github/workflows/operations-alert-triage.yml
.github/workflows/pulumi-pr-command-runner.yml
.github/workflows/pulumi-pr-guardrails.yml
.github/workflows/pulumi-prod.yml
.github/workflows/pulumi-test-deploy.yml
.github/workflows/well-architected-evidence.yml
README.md
docs/README.md
docs/alert-routing-evidence.md
docs/ci-architecture.md
docs/ci-guardrails.md
docs/github-actions-secrets.md
docs/security-operating-evidence.md
docs/sre-operations.md
pulumi/infra/automation.py
scripts/operations_alert_triage.py
scripts/validate_ci_environment.py
specs/issue-18-multi-account-test-prod-pulumi-environments/epics.md
specs/issue-18-multi-account-test-prod-pulumi-environments/prd.md
specs/issue-20-pulumi-esc-ci-config/architecture.md
specs/issue-20-pulumi-esc-ci-config/epics.md
specs/issue-20-pulumi-esc-ci-config/implementation-readiness-report.md
specs/issue-20-pulumi-esc-ci-config/prd.md
tests/pulumi/test_ci_guardrails.py
tests/pulumi/test_delivery_contracts.py
tests/pulumi/test_project_structure.py
tests/unit/test_components.py
tests/unit/test_operations_alert_triage.py
tests/unit/test_validate_ci_environment.py

✅ Files skipped from review due to trivial changes (7)

specs/issue-20-pulumi-esc-ci-config/epics.md
README.md
specs/issue-18-multi-account-test-prod-pulumi-environments/prd.md
docs/sre-operations.md
specs/issue-20-pulumi-esc-ci-config/implementation-readiness-report.md
specs/issue-20-pulumi-esc-ci-config/architecture.md
docs/alert-routing-evidence.md

cubic-dev-ai · 2026-05-24T21:57:06Z

You're iterating quickly on this pull request. To help protect your rate limits, cubic has paused automatic reviews on new pushes for now—when you're ready for another review, comment @cubic-dev-ai review.

qltysh Bot reviewed May 24, 2026

View reviewed changes

Comment thread scripts/operations_alert_triage.py Outdated

Comment thread scripts/validate_ci_environment.py Outdated

Comment thread scripts/validate_ci_environment.py Outdated

Comment thread scripts/validate_ci_environment.py Outdated

Comment thread scripts/validate_ci_environment.py Outdated

cubic-dev-ai Bot reviewed May 24, 2026

View reviewed changes

Comment thread pulumi/infra/automation.py Outdated

Comment thread scripts/validate_ci_environment.py

Comment thread scripts/validate_ci_environment.py

Comment thread docs/ci-guardrails.md

Comment thread docs/github-actions-secrets.md

dmytrocraft force-pushed the codex/issue20-pulumi-esc branch from 5389d3e to 84bf011 Compare May 24, 2026 16:12

coderabbitai Bot approved these changes May 24, 2026

View reviewed changes

dmytrocraft force-pushed the codex/issue20-pulumi-esc branch from 84bf011 to 90d2f56 Compare May 24, 2026 16:48

Kravalg mentioned this pull request May 24, 2026

Configure prod-preview and prod GitHub Environment variables for PR-comment Pulumi QA #48

Closed

11 tasks

dmytrocraft force-pushed the codex/issue20-pulumi-esc branch from 90d2f56 to 760a838 Compare May 24, 2026 17:19

coderabbitai Bot requested changes May 24, 2026

View reviewed changes

Comment thread .github/workflows/operations-alert-reconcile.yml

Comment thread scripts/operations_alert_triage.py

cubic-dev-ai Bot reviewed May 24, 2026

View reviewed changes

coderabbitai Bot approved these changes May 24, 2026

View reviewed changes

cubic-dev-ai Bot approved these changes May 24, 2026

View reviewed changes

dmytrocraft force-pushed the codex/issue20-pulumi-esc branch 3 times, most recently from 33b2fea to 378c21e Compare May 24, 2026 18:33

coderabbitai Bot requested changes May 24, 2026

View reviewed changes

Comment thread .github/workflows/operations-alert-reconcile.yml

dmytrocraft force-pushed the codex/issue20-pulumi-esc branch from 378c21e to 647c160 Compare May 24, 2026 18:48

coderabbitai Bot reviewed May 24, 2026

View reviewed changes

coderabbitai Bot approved these changes May 24, 2026

View reviewed changes

dmytrocraft force-pushed the codex/issue20-pulumi-esc branch 2 times, most recently from b638d1e to a95ab85 Compare May 24, 2026 19:57

coderabbitai Bot requested changes May 24, 2026

View reviewed changes

Comment thread scripts/operations_alert_triage.py Outdated

dmytrocraft force-pushed the codex/issue20-pulumi-esc branch 3 times, most recently from defa592 to a810515 Compare May 24, 2026 20:51

qltysh Bot reviewed May 24, 2026

View reviewed changes

Comment thread pulumi/infra/bootstrap_infrastructure.py

Manage alert reconcile environment controls

5eb202e

dmytrocraft added 8 commits May 25, 2026 01:08

Refresh ESC source boundary evidence

0321fa7

Clarify audited closeout head

6dcf615

Clarify ESC uses AWS Secrets Manager source

fd313a4

Refresh closeout evidence and alert reconcile guard

fa627d8

Avoid self-referential closeout check status

5302cd3

Require SRE evidence for alert reconcile

1c2df7c

Document ESC Secrets Manager cutover

47fc724

Replace Pulumi ESC CI config with AWS Secrets Manager

38dd8a7

github-advanced-security AI found potential problems May 25, 2026

View reviewed changes

Comment thread scripts/validate_ci_environment.py Fixed

dmytrocraft added 4 commits May 25, 2026 17:00

Cover AWS CI config validation branches

e7f12cf

Remove direct apply fallback from test deploys

7b6fd47

Document explicit AWS CI role requirements

c20e7ef

Remove stale ESC and apply fallback docs

72bd9eb

Kravalg changed the title ~~Move privileged CI config to Pulumi ESC~~ Use AWS Secrets Manager for privileged CI config May 25, 2026

dmytrocraft added 4 commits May 25, 2026 17:20

Add operations alert canonical backfill

42228c7

Fix evidence workflow CI config trust

93c1d14

Stop logging CI secret identifiers

e0242f0

Refresh AWS CI closeout evidence

ce74d6e

qltysh Bot reviewed May 25, 2026

View reviewed changes

Comment thread .github/workflows/operations-alert-backfill.yml

dmytrocraft added 8 commits May 25, 2026 17:41

Limit operations alert backfill inputs

d406514

Remove Pulumi Cloud and direct apply fallbacks

622e9aa

Refresh AWS cutover evidence and docs

6f51606

Forbid direct Pulumi up in GitHub Actions

95bcb9e

Refresh AWS cutover closeout evidence

9ca40ab

Fail fast on missing PR AWS config role

1470d28

Refresh evidence for PR config role guard

b5d005b

Format AWS config contract test

7f89bb4

Kravalg mentioned this pull request May 25, 2026

Add GitHub CI AWS bootstrap stack #60

Open

12 tasks

Conversation

Kravalg commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request

Description

Related Issues

Manual Secure Follow-Up

Current External Setup State

How Has This Been Tested?

Checklist

Uh oh!

qodo-code-review Bot commented May 24, 2026

Qodo reviews are paused for this user.

Uh oh!

coderabbitai Bot commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Uh oh!

qltysh Bot commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cubic-dev-ai Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Kravalg commented May 24, 2026

Uh oh!

coderabbitai Bot commented May 24, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

cubic-dev-ai Bot commented May 24, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Kravalg commented May 24, 2026 •

edited

Loading

coderabbitai Bot commented May 24, 2026 •

edited

Loading

qltysh Bot commented May 24, 2026 •

edited

Loading

cubic-dev-ai Bot left a comment •

edited

Loading