Use AWS Secrets Manager for privileged CI config#57
Conversation
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a composite GitHub Action that resolves/authenticates to Pulumi Cloud, loads and validates Pulumi ESC environmentVariables, exports AWS/Pulumi outputs for OIDC assume-role steps, implements deterministic operations-alert fingerprinting and reconcile tooling, rewires workflows to consume ESC outputs, updates IAM trust-policy generation, tests, and documentation. ChangesPulumi ESC-Driven CI Configuration
Estimated code review effort 🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Possibly related PRs
Suggested labels
Suggested reviewers
✨ Finishing Touches🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
5 issues found across 31 files
Confidence score: 2/5
- High merge risk due to a concrete security concern in
pulumi/infra/automation.py: broadened prod role trust to branch/PR subjects could bypass environment-based production approval controls. scripts/validate_ci_environment.pyhas two user-impacting validation gaps: newline handling when writing$GITHUB_ENV(environment-file injection risk) and allowing an empty--required-keysset to pass without actually validating CI config.- Given the high severities (9/10 and 7/10) and strong confidence signals, this does not look safe to merge until these are addressed; the docs issues are lower risk but still worth fixing for operator correctness.
- Pay close attention to
pulumi/infra/automation.py,scripts/validate_ci_environment.py,docs/ci-guardrails.md,docs/github-actions-secrets.md- security trust boundaries, CI validation hardening, and preventing misleading/broken guidance.
Tip: cubic can generate docs of your entire codebase and keep them up to date. Try it here.
Re-trigger cubic
5389d3e to
84bf011
Compare
84bf011 to
90d2f56
Compare
90d2f56 to
760a838
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pulumi/infra/automation.py (1)
592-645:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winRequire
environment:prodas the only prod subject.Line 601 still whitelists
repo:...:ref:refs/heads/{branch_name}for the production role. Combined with the workflow refs on Lines 623-645, any allowed main-branch workflow can satisfy the trust policy without ever entering the protected GitHubprodenvironment, which turns the manual approval gate into an optional path instead of an enforcement point.🔐 Suggested tightening
def _automation_assume_role_policy( oidc_provider_arn: str, org: str, repo_name: str, production_environment: str, branch_name: str, ) -> str: """Build the GitHub OIDC trust policy for fixed workflow automation.""" workflow_prefix = f"{org}/{repo_name}/.github/workflows" - subjects = [f"repo:{org}/{repo_name}:ref:refs/heads/{branch_name}"] if production_environment == "prod": - subjects.append(f"repo:{org}/{repo_name}:environment:{production_environment}") + subjects = [ + f"repo:{org}/{repo_name}:environment:{production_environment}" + ] else: - subjects.append(f"repo:{org}/{repo_name}:pull_request") + subjects = [ + f"repo:{org}/{repo_name}:ref:refs/heads/{branch_name}", + f"repo:{org}/{repo_name}:pull_request", + ]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pulumi/infra/automation.py` around lines 592 - 645, The trust policy in _automation_assume_role_policy currently always includes the branch ref subject (repo:...:ref:refs/heads/{branch_name}), which allows workflows on the main branch to assume the prod role without entering the protected GitHub environment; change the subject construction so that when production_environment == "prod" the subjects list contains only the environment subject (repo:{org}/{repo_name}:environment:prod) and does not include the branch ref, otherwise keep the existing non-prod subjects (ref and pull_request); update the subjects variable used in the "token.actions.githubusercontent.com:sub" condition accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/operations-alert-reconcile.yml:
- Around line 33-89: The workflow calls gh commands (functions: gh issue view
and gh issue close) without an explicit repository context so they can fail;
update every gh invocation in the script (the gh issue view calls that set
canonical_body, state, title, body and the gh issue close call) to include
--repo "${GH_REPO}" (or alternately add an actions/checkout step and export
GH_REPO before use) so the GitHub CLI has an explicit repository context at
runtime.
In `@scripts/operations_alert_triage.py`:
- Around line 138-142: main() assumes the file at args.alerts_json parses to a
dict and currently lets json errors or non-object top-level JSON raise; wrap the
JSON load in a try/except to catch json.JSONDecodeError and TypeError, validate
that the parsed alerts is a dict (isinstance(alerts, dict)), and if either the
parse fails or the result is not a dict, log a clear error mentioning
args.alerts_json and return 1 instead of raising; keep using
alerts_fingerprint(alerts) only after validation.
---
Outside diff comments:
In `@pulumi/infra/automation.py`:
- Around line 592-645: The trust policy in _automation_assume_role_policy
currently always includes the branch ref subject
(repo:...:ref:refs/heads/{branch_name}), which allows workflows on the main
branch to assume the prod role without entering the protected GitHub
environment; change the subject construction so that when production_environment
== "prod" the subjects list contains only the environment subject
(repo:{org}/{repo_name}:environment:prod) and does not include the branch ref,
otherwise keep the existing non-prod subjects (ref and pull_request); update the
subjects variable used in the "token.actions.githubusercontent.com:sub"
condition accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 73e5b8ff-0988-4aad-b921-7cfb82aabba5
📒 Files selected for processing (34)
.github/actions/load-esc-ci-env/action.yml.github/ci/pulumi-esc.json.github/github-actions-secrets.md.github/workflows/nightly-guardrails.yml.github/workflows/operations-alert-reconcile.yml.github/workflows/operations-alert-triage.yml.github/workflows/pulumi-pr-command-runner.yml.github/workflows/pulumi-pr-guardrails.yml.github/workflows/pulumi-prod.yml.github/workflows/pulumi-test-deploy.yml.github/workflows/well-architected-evidence.ymlREADME.mddocs/README.mddocs/alert-routing-evidence.mddocs/ci-architecture.mddocs/ci-guardrails.mddocs/github-actions-secrets.mddocs/security-operating-evidence.mddocs/sre-operations.mdpulumi/infra/automation.pyscripts/operations_alert_triage.pyscripts/validate_ci_environment.pyspecs/issue-18-multi-account-test-prod-pulumi-environments/epics.mdspecs/issue-18-multi-account-test-prod-pulumi-environments/prd.mdspecs/issue-20-pulumi-esc-ci-config/architecture.mdspecs/issue-20-pulumi-esc-ci-config/epics.mdspecs/issue-20-pulumi-esc-ci-config/implementation-readiness-report.mdspecs/issue-20-pulumi-esc-ci-config/prd.mdtests/pulumi/test_ci_guardrails.pytests/pulumi/test_delivery_contracts.pytests/pulumi/test_project_structure.pytests/unit/test_components.pytests/unit/test_operations_alert_triage.pytests/unit/test_validate_ci_environment.py
✅ Files skipped from review due to trivial changes (4)
- README.md
- specs/issue-20-pulumi-esc-ci-config/implementation-readiness-report.md
- docs/sre-operations.md
- specs/issue-20-pulumi-esc-ci-config/epics.md
There was a problem hiding this comment.
0 issues found across 10 files (changes from recent commits).
Requires human review: Auto-approval blocked by 2 unresolved issues from previous reviews.
Re-trigger cubic
33b2fea to
378c21e
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/ci-guardrails.md (1)
516-527:⚠️ Potential issue | 🟠 Major | ⚡ Quick winComplete the post-cutover checklist.
The rollout steps stop short of the actions that actually finish this migration: applying the Pulumi stacks to stamp the tightened IAM trust, running the legacy environment-variable cleanup, and deleting the temporary
GH_ENVIRONMENT_ADMIN_TOKENafterward. As written, an operator can follow this section and still leave stale AWS trust or stale privileged GitHub config behind.📝 Suggested checklist update
1. create the GitHub OIDC IAM roles in AWS 2. create and populate the four Pulumi ESC environments listed above 3. configure Pulumi Cloud OIDC for this repository and organization -4. create only the protected `prod` GitHub Environment for production approval -5. enable required reviewers and branch restrictions on `prod` -6. mark the required PR checks in GitHub branch protection -7. decide whether production repositories want stricter stack lists or narrower +4. apply the Pulumi test/prod stacks so the updated IAM trust policies converge in AWS +5. run the legacy GitHub Environment variable cleanup workflow (dry-run, then confirmed apply) +6. delete the temporary `GH_ENVIRONMENT_ADMIN_TOKEN` secret after cleanup +7. create only the protected `prod` GitHub Environment for production approval +8. enable required reviewers and branch restrictions on `prod` +9. mark the required PR checks in GitHub branch protection +10. decide whether production repositories want stricter stack lists or narrower IAM role scopes than the template defaultsAs per coding guidelines,
docs/**: Updatedocs/whenever the developer workflow, CI surface, or credential contract changes.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/ci-guardrails.md` around lines 516 - 527, Update the "Manual maintainer follow-up" section to include the missing post-cutover steps: explicitly instruct maintainers to apply the Pulumi stacks to enforce the new IAM trust (reference "Pulumi ESC environments" and "apply the Pulumi stacks"), run the legacy environment-variable cleanup, and delete the temporary GH_ENVIRONMENT_ADMIN_TOKEN; also add a short confirmation step to verify no stale AWS trust or privileged GitHub config remains (referencing the protected `prod` GitHub Environment and branch protection items) so operators have a complete checklist to finish the migration.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/operations-alert-reconcile.yml:
- Around line 59-91: The loop over LEGACY_ISSUES should first parse, trim and
deduplicate tokens and error if no valid issue IDs were produced; change the
parsing around normalized/raw_issue/issue to skip empty/whitespace-only tokens,
validate and add each numeric issue to a deduplicated collection (e.g. an
associative array or temporary list keyed by issue) and after parsing fail with
a non-zero exit if the deduped collection is empty, then iterate over the
deduped issue list (instead of the raw tokens) for the existing validation and
gh issue close logic (keeping checks that issue != canonical, state is OPEN,
title matches, and no fingerprint marker).
---
Outside diff comments:
In `@docs/ci-guardrails.md`:
- Around line 516-527: Update the "Manual maintainer follow-up" section to
include the missing post-cutover steps: explicitly instruct maintainers to apply
the Pulumi stacks to enforce the new IAM trust (reference "Pulumi ESC
environments" and "apply the Pulumi stacks"), run the legacy
environment-variable cleanup, and delete the temporary
GH_ENVIRONMENT_ADMIN_TOKEN; also add a short confirmation step to verify no
stale AWS trust or privileged GitHub config remains (referencing the protected
`prod` GitHub Environment and branch protection items) so operators have a
complete checklist to finish the migration.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e368df4c-8673-4013-b117-26c62db7549a
📒 Files selected for processing (35)
.github/actions/load-esc-ci-env/action.yml.github/ci/pulumi-esc.json.github/github-actions-secrets.md.github/workflows/github-environment-legacy-cleanup.yml.github/workflows/nightly-guardrails.yml.github/workflows/operations-alert-reconcile.yml.github/workflows/operations-alert-triage.yml.github/workflows/pulumi-pr-command-runner.yml.github/workflows/pulumi-pr-guardrails.yml.github/workflows/pulumi-prod.yml.github/workflows/pulumi-test-deploy.yml.github/workflows/well-architected-evidence.ymlREADME.mddocs/README.mddocs/alert-routing-evidence.mddocs/ci-architecture.mddocs/ci-guardrails.mddocs/github-actions-secrets.mddocs/security-operating-evidence.mddocs/sre-operations.mdpulumi/infra/automation.pyscripts/operations_alert_triage.pyscripts/validate_ci_environment.pyspecs/issue-18-multi-account-test-prod-pulumi-environments/epics.mdspecs/issue-18-multi-account-test-prod-pulumi-environments/prd.mdspecs/issue-20-pulumi-esc-ci-config/architecture.mdspecs/issue-20-pulumi-esc-ci-config/epics.mdspecs/issue-20-pulumi-esc-ci-config/implementation-readiness-report.mdspecs/issue-20-pulumi-esc-ci-config/prd.mdtests/pulumi/test_ci_guardrails.pytests/pulumi/test_delivery_contracts.pytests/pulumi/test_project_structure.pytests/unit/test_components.pytests/unit/test_operations_alert_triage.pytests/unit/test_validate_ci_environment.py
✅ Files skipped from review due to trivial changes (6)
- .github/ci/pulumi-esc.json
- specs/issue-20-pulumi-esc-ci-config/epics.md
- specs/issue-20-pulumi-esc-ci-config/implementation-readiness-report.md
- README.md
- docs/sre-operations.md
- docs/alert-routing-evidence.md
378c21e to
647c160
Compare
|
All currently visible review threads have been addressed in Specifically:
@coderabbitai please re-review the latest head. |
|
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
.github/workflows/operations-alert-triage.yml (1)
123-129: ⚡ Quick winUse
uv runfor the triage Python CLI invocation.Switching this call to
uv runkeeps interpreter/dependency resolution consistent with the repo’s Python command contract.♻️ Proposed change
- python3 scripts/operations_alert_triage.py \ + uv run python scripts/operations_alert_triage.py \ --alerts-json "${alerts_json}" \ --queue-name "${OPERATIONS_ALERT_QUEUE_NAME}" \ --account-id "${AWS_ACCOUNT_ID}" \ --region "${AWS_REGION}" \ --body-file "${body_file}" \ --fingerprint-file "${fingerprint_file}"As per coding guidelines, "Use
uv run ...for Python CLI commands instead of invoking tools directly from the global environment."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/operations-alert-triage.yml around lines 123 - 129, Replace the direct python3 invocation of the triage CLI with the repository Python runner; specifically change the call that runs operations_alert_triage.py (the block invoking "python3 scripts/operations_alert_triage.py ...") to use "uv run" so the command becomes "uv run scripts/operations_alert_triage.py ..." ensuring interpreter and dependency resolution follow the repo's Python command contract.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In @.github/workflows/operations-alert-triage.yml:
- Around line 123-129: Replace the direct python3 invocation of the triage CLI
with the repository Python runner; specifically change the call that runs
operations_alert_triage.py (the block invoking "python3
scripts/operations_alert_triage.py ...") to use "uv run" so the command becomes
"uv run scripts/operations_alert_triage.py ..." ensuring interpreter and
dependency resolution follow the repo's Python command contract.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: fb1ae087-5e62-43d9-846c-f922c467a76e
📒 Files selected for processing (35)
.github/actions/load-esc-ci-env/action.yml.github/ci/pulumi-esc.json.github/github-actions-secrets.md.github/workflows/github-environment-legacy-cleanup.yml.github/workflows/nightly-guardrails.yml.github/workflows/operations-alert-reconcile.yml.github/workflows/operations-alert-triage.yml.github/workflows/pulumi-pr-command-runner.yml.github/workflows/pulumi-pr-guardrails.yml.github/workflows/pulumi-prod.yml.github/workflows/pulumi-test-deploy.yml.github/workflows/well-architected-evidence.ymlREADME.mddocs/README.mddocs/alert-routing-evidence.mddocs/ci-architecture.mddocs/ci-guardrails.mddocs/github-actions-secrets.mddocs/security-operating-evidence.mddocs/sre-operations.mdpulumi/infra/automation.pyscripts/operations_alert_triage.pyscripts/validate_ci_environment.pyspecs/issue-18-multi-account-test-prod-pulumi-environments/epics.mdspecs/issue-18-multi-account-test-prod-pulumi-environments/prd.mdspecs/issue-20-pulumi-esc-ci-config/architecture.mdspecs/issue-20-pulumi-esc-ci-config/epics.mdspecs/issue-20-pulumi-esc-ci-config/implementation-readiness-report.mdspecs/issue-20-pulumi-esc-ci-config/prd.mdtests/pulumi/test_ci_guardrails.pytests/pulumi/test_delivery_contracts.pytests/pulumi/test_project_structure.pytests/unit/test_components.pytests/unit/test_operations_alert_triage.pytests/unit/test_validate_ci_environment.py
b638d1e to
a95ab85
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
.github/actions/load-esc-ci-env/action.yml (1)
95-97: 💤 Low valueConsider using
uv runfor the validation script.The coding guidelines specify using
uv run ...for Python CLI commands. The validation script is invoked directly withpython3.♻️ Suggested change
- python3 scripts/validate_ci_environment.py \ + uv run python scripts/validate_ci_environment.py \ --purpose "${{ inputs.purpose }}" \ --required-keys "${{ inputs.required-keys }}"As per coding guidelines:
**/*.{py,sh}: Useuv run ...for Python CLI commands instead of invoking tools directly from the global environment.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/actions/load-esc-ci-env/action.yml around lines 95 - 97, Replace the direct python3 invocation of the validation script with the project's task runner by calling "uv run" for the CLI command: change the step that runs "python3 scripts/validate_ci_environment.py --purpose \"${{ inputs.purpose }}\" --required-keys \"${{ inputs.required-keys }}\"" to use "uv run" so the script is executed via the project's uv task runner; update the command that references scripts/validate_ci_environment.py and the inputs purpose/required-keys accordingly.tests/pulumi/test_project_structure.py (1)
316-366: ⚡ Quick winNormalize trigger access instead of indexing
workflow[True]directly.Line 316 and Line 358 assume PyYAML always converts
ontoTrue. If a workflow uses quoted"on", these tests fail for the wrong reason. Use the same normalized fallback used elsewhere in this suite.Proposed fix
- assert "workflow_dispatch" in reconcile_workflow[True] # nosec B101 + reconcile_triggers = reconcile_workflow.get("on", reconcile_workflow.get(True, {})) + assert "workflow_dispatch" in reconcile_triggers # nosec B101 @@ - assert "workflow_dispatch" in cleanup_workflow[True] # nosec B101 + cleanup_triggers = cleanup_workflow.get("on", cleanup_workflow.get(True, {})) + assert "workflow_dispatch" in cleanup_triggers # nosec B101 @@ - assert cleanup_workflow[True]["workflow_dispatch"]["inputs"]["dry_run"][ # nosec B101 + assert cleanup_triggers["workflow_dispatch"]["inputs"]["dry_run"][ # nosec B101 "default" ]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/pulumi/test_project_structure.py` around lines 316 - 366, The tests index the parsed YAML with True (reconcile_workflow[True], cleanup_workflow[True]) which breaks when PyYAML preserves the "on" key as the string "on"; update those assertions to use the same normalized fallback used elsewhere: read the trigger via reconcile_workflow.get("on", reconcile_workflow.get(True)) and cleanup_workflow.get("on", cleanup_workflow.get(True)) (use that value for checks like workflow_dispatch and inputs), replacing direct [True] indexing on reconcile_workflow and cleanup_workflow.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@scripts/operations_alert_triage.py`:
- Around line 58-66: The code assumes incoming "Messages" is a list of dicts and
directly accesses message.get(...) which will crash for non-list or non-dict
payloads; update the message-shape handling so you first validate that Messages
is a list and iterate only over items that are dicts (skip or coerce others),
and ensure local variables like message, attributes, sns, and event are always
dicts (default to {} when missing) before calling .get; apply this defensive
pattern to the blocks building the return dict (references: variables message,
attributes, sns, event and the code paths around lines 58-66 and the regions
noted 90-97 and 105-129) so malformed payloads are skipped or safely handled
instead of raising.
---
Nitpick comments:
In @.github/actions/load-esc-ci-env/action.yml:
- Around line 95-97: Replace the direct python3 invocation of the validation
script with the project's task runner by calling "uv run" for the CLI command:
change the step that runs "python3 scripts/validate_ci_environment.py --purpose
\"${{ inputs.purpose }}\" --required-keys \"${{ inputs.required-keys }}\"" to
use "uv run" so the script is executed via the project's uv task runner; update
the command that references scripts/validate_ci_environment.py and the inputs
purpose/required-keys accordingly.
In `@tests/pulumi/test_project_structure.py`:
- Around line 316-366: The tests index the parsed YAML with True
(reconcile_workflow[True], cleanup_workflow[True]) which breaks when PyYAML
preserves the "on" key as the string "on"; update those assertions to use the
same normalized fallback used elsewhere: read the trigger via
reconcile_workflow.get("on", reconcile_workflow.get(True)) and
cleanup_workflow.get("on", cleanup_workflow.get(True)) (use that value for
checks like workflow_dispatch and inputs), replacing direct [True] indexing on
reconcile_workflow and cleanup_workflow.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 9cfc1992-ca5f-4e88-9b7d-9838e3472f10
📒 Files selected for processing (35)
.github/actions/load-esc-ci-env/action.yml.github/ci/pulumi-esc.json.github/github-actions-secrets.md.github/workflows/github-environment-legacy-cleanup.yml.github/workflows/nightly-guardrails.yml.github/workflows/operations-alert-reconcile.yml.github/workflows/operations-alert-triage.yml.github/workflows/pulumi-pr-command-runner.yml.github/workflows/pulumi-pr-guardrails.yml.github/workflows/pulumi-prod.yml.github/workflows/pulumi-test-deploy.yml.github/workflows/well-architected-evidence.ymlREADME.mddocs/README.mddocs/alert-routing-evidence.mddocs/ci-architecture.mddocs/ci-guardrails.mddocs/github-actions-secrets.mddocs/security-operating-evidence.mddocs/sre-operations.mdpulumi/infra/automation.pyscripts/operations_alert_triage.pyscripts/validate_ci_environment.pyspecs/issue-18-multi-account-test-prod-pulumi-environments/epics.mdspecs/issue-18-multi-account-test-prod-pulumi-environments/prd.mdspecs/issue-20-pulumi-esc-ci-config/architecture.mdspecs/issue-20-pulumi-esc-ci-config/epics.mdspecs/issue-20-pulumi-esc-ci-config/implementation-readiness-report.mdspecs/issue-20-pulumi-esc-ci-config/prd.mdtests/pulumi/test_ci_guardrails.pytests/pulumi/test_delivery_contracts.pytests/pulumi/test_project_structure.pytests/unit/test_components.pytests/unit/test_operations_alert_triage.pytests/unit/test_validate_ci_environment.py
✅ Files skipped from review due to trivial changes (7)
- specs/issue-20-pulumi-esc-ci-config/epics.md
- README.md
- specs/issue-18-multi-account-test-prod-pulumi-environments/prd.md
- docs/sre-operations.md
- specs/issue-20-pulumi-esc-ci-config/implementation-readiness-report.md
- specs/issue-20-pulumi-esc-ci-config/architecture.md
- docs/alert-routing-evidence.md
defa592 to
a810515
Compare
|
You're iterating quickly on this pull request. To help protect your rate limits, cubic has paused automatic reviews on new pushes for now—when you're ready for another review, comment |
Pull Request
Description
Replaces the Pulumi Cloud/ESC CI configuration path with an AWS-only path:
GitHub Actions -> GitHub OIDC -> AWS Secrets Manager -> Pulumi CLI with S3 backend + AWS KMS secrets providerThe old
.github/actions/load-esc-ci-envaction and.github/ci/pulumi-esc.jsonare removed. Privileged workflows now call.github/actions/load-aws-ci-env, which assumes a GitHub OIDC config-read role, reads one fixed AWS Secrets Manager JSON secret, validates required keys without printing values, and exports the role/backend/stack values needed by later AWS OIDC steps.Pulumi now creates the AWS-side config foundation directly: Secrets Manager secret containers and one
GitHubCiConfigRead-*role per fixed CI suffix (test-pr,test,prod-preview,prod). Production preview and production apply use separate config-read role variables so the protectedprodapply secret is not readable through theprod-previewpath.Deployment apply paths are saved-plan-only. The previous test-account direct
make pulumi-upfallback after a saved-plan KMS decrypt failure has been removed,scripts/run_pulumi_command.pyrejects directpulumi upwheneverGITHUB_ACTIONS=true, and same-repo PR jobs now fail fast ifAWS_TEST_PR_CI_CONFIG_ROLE_ARNis missing instead of falling back to the mainAWS_TEST_CI_CONFIG_ROLE_ARN.Related Issues
Fixes #20.
Refs #49, #50, #52, #53, #54, #55, #56, #58.
The operations-alert issues are legacy duplicate candidates. They should stay open until the AWS-backed triage creates or updates a canonical fingerprinted issue and SRE supplies the required confirmation reference for the protected reconcile workflow.
Manual Secure Follow-Up
Follow
docs/aws-secrets-manager-ci-cutover.mdfor the AWS-only setup.Required non-secret GitHub repository variables:
AWS_TEST_REGIONAWS_TEST_PR_CI_CONFIG_ROLE_ARNAWS_TEST_CI_CONFIG_ROLE_ARNAWS_PROD_REGIONAWS_PROD_PREVIEW_CI_CONFIG_ROLE_ARNAWS_PROD_CI_CONFIG_ROLE_ARNRequired AWS Secrets Manager JSON secret IDs:
/bootstrap-infrastructure/ci/test-pr/bootstrap-infrastructure/ci/test/bootstrap-infrastructure/ci/prod-preview/bootstrap-infrastructure/ci/prodDo not paste secret JSON values into GitHub, chat, logs, docs, or Pulumi config. Use
put-secret-valuefrom a private local file and verify withdescribe-secret, notget-secret-value.Current External Setup State
Could not assume role with OIDC: Not authorized to perform sts:AssumeRoleWithWebIdentityforarn:aws:iam::891377212104:role/GitHubCiConfigRead-bootstrap-infrastructure-test-pr.defaultprofile, and after unsetting inherited AWS env vars it resolves to production account933245420672asarn:aws:iam::933245420672:user/codex; a separate test-account profile for account891377212104is still needed before fixing test-account resources from this workstation.unset AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY AWS_SESSION_TOKEN AWS_PROFILEbefore profile-based verification.933245420672asarn:aws:iam::933245420672:user/codex./bootstrap-infrastructure/ci/prod-previewand/bootstrap-infrastructure/ci/prodreturnResourceNotFoundException; the correspondingGitHubCiConfigRead-*roles are not present yet.How Has This Been Tested?
Latest local validation on this branch:
make test-unit(389 passed, 100% coverage)uv run pytest tests/pulumi/test_project_structure.py tests/pulumi/test_delivery_contracts.py -q(58 passed)uv run pytest tests/pulumi/test_delivery_contracts.py::test_pr_comment_workflows_gate_prod_after_successful_test_apply tests/pulumi/test_delivery_contracts.py::test_prod_workflow_requires_successful_test_deploy_for_same_sha -q(2 passed)uv run pytest tests/unit/test_components.py::test_ci_configuration_requires_github_oidc_provider tests/unit/test_mutation_targets.py::test_mutation_target_ci_config_validation_and_lookup_helpers -q(2 passed)uv run pytest tests/pulumi/test_project_structure.py::test_docs_cover_current_testing_and_guardrail_guidance tests/pulumi/test_project_structure.py::test_issue20_cutover_manual_is_secret_safe_and_actionable tests/pulumi/test_project_structure.py::test_issue20_closeout_evidence_tracks_external_manual_steps -q(3 passed)uv run pytest tests/pulumi/test_delivery_contracts.py::test_aws_ci_loader_reads_secrets_manager_without_pulumi_cloud tests/pulumi/test_delivery_contracts.py::test_multi_account_workflows_use_fixed_aws_ci_config_contracts tests/pulumi/test_delivery_contracts.py::test_multi_account_environment_docs_are_explicit tests/unit/test_script_entrypoints.py::test_run_up_plan_stack_rejects_plan_decrypt_without_direct_apply -q(4 passed)uv run pytest tests/unit/test_script_entrypoints.py::test_run_up_stack_does_not_retry_after_lock tests/unit/test_script_entrypoints.py::test_run_up_stack_rejects_direct_apply_in_github_actions tests/unit/test_script_entrypoints.py::test_run_pulumi_command_unhandled_apply_failures_return_status tests/unit/test_script_entrypoints.py::test_run_pulumi_command_dispatch_propagates_apply_failures -q(4 passed)uv run pytest tests/pulumi/test_delivery_contracts.py::test_makefile_keeps_pulumi_guardrails_secret_safe tests/pulumi/test_project_structure.py::test_issue20_cutover_manual_is_secret_safe_and_actionable -q(2 passed)uv run pytest tests/pulumi/test_ci_guardrails.py::test_preview_guardrail_workflow_requires_preview_diff_and_iam_jobs tests/pulumi/test_ci_guardrails.py::test_well_architected_evidence_workflow_uploads_enforced_reports tests/pulumi/test_delivery_contracts.py::test_multi_account_workflows_use_fixed_aws_ci_config_contracts -q(3 passed)uv run pytest tests/pulumi/test_project_structure.py::test_issue20_closeout_evidence_tracks_external_manual_steps -q(1 passed)uv run ruff check scripts/run_pulumi_command.py tests/unit/test_script_entrypoints.py tests/pulumi/test_delivery_contracts.py tests/pulumi/test_project_structure.py tests/pulumi/test_ci_guardrails.pymake test-actionlintmake test-yamlmake test-secrets(latest commit scanned; no leaks found)qlty checkgit diff --checkbmalph workflow status:
bmalph doctorran and reported the project is not initialized for bmalph/BMAD artifacts.bmalph init --dry-run --platform codexran to inspect the generated artifact set without adding unrelated workflow scaffolding to this PR.Checklist
main; this branch is intentionally not force-squashed while review/check state is active.