Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
3637789
Merge pull request #9 from CGC-2026/development
NickSavino Jan 28, 2026
e00c235
Merge pull request #11 from CGC-2026/development
NickSavino Jan 28, 2026
9099b8a
Merge pull request #13 from CGC-2026/development
NickSavino Jan 28, 2026
d80637b
Merge pull request #15 from CGC-2026/development
NickSavino Jan 28, 2026
5611c18
Merge pull request #17 from CGC-2026/development
NickSavino Jan 28, 2026
e75a13a
Merge pull request #19 from CGC-2026/development
NickSavino Jan 28, 2026
5cbce9d
Merge pull request #21 from CGC-2026/development
NickSavino Jan 30, 2026
32c7a65
Merge pull request #23 from CGC-2026/development
NickSavino Jan 30, 2026
bec36e7
Merge pull request #25 from CGC-2026/development
NickSavino Jan 30, 2026
3815722
Merge pull request #27 from CGC-2026/development
NickSavino Jan 30, 2026
b9ab596
Merge pull request #29 from CGC-2026/development
NickSavino Jan 30, 2026
95487f5
Merge pull request #31 from CGC-2026/development
NickSavino Jan 30, 2026
08c584b
yaml changes
NickSavinoArcurve Jan 30, 2026
8a2231f
Merge pull request #32 from CGC-2026/hotfix/cloud-infra
NickSavino Jan 30, 2026
beb3693
yaml changes
NickSavinoArcurve Jan 30, 2026
04b6634
added debugging statement to deploy.yaml
NickSavinoArcurve Jan 30, 2026
bfb32b8
Merge pull request #33 from CGC-2026/hotfix/cloud-infra
NickSavino Jan 30, 2026
eb08c37
yaml changes
NickSavinoArcurve Jan 30, 2026
e64b34c
Merge pull request #34 from CGC-2026/hotfix/cloud-infra
NickSavino Jan 30, 2026
a1dcb8a
yml changes
NickSavinoArcurve Jan 30, 2026
a808759
Merge pull request #35 from CGC-2026/hotfix/cloud-infra
NickSavino Jan 30, 2026
449d1b5
deploy.yml changes
NickSavinoArcurve Jan 30, 2026
2cdf98e
Merge pull request #36 from CGC-2026/hotfix/cloud-infra
NickSavino Jan 30, 2026
6c88809
yml changes
NickSavinoArcurve Jan 30, 2026
fd4a413
Merge pull request #37 from CGC-2026/hotfix/cloud-infra
NickSavino Jan 30, 2026
29220a1
yml changes
NickSavinoArcurve Jan 30, 2026
7394607
Merge pull request #38 from CGC-2026/hotfix/cloud-infra
NickSavino Jan 30, 2026
2c55645
yml changes
NickSavinoArcurve Jan 30, 2026
37f6fba
Merge pull request #39 from CGC-2026/hotfix/cloud-infra
NickSavino Jan 30, 2026
dd3edae
yml changes
NickSavinoArcurve Jan 30, 2026
4ade494
Merge pull request #40 from CGC-2026/hotfix/cloud-infra
NickSavino Jan 30, 2026
dc98e6f
yml changes
NickSavinoArcurve Jan 30, 2026
595e0fc
Merge pull request #41 from CGC-2026/hotfix/cloud-infra
NickSavino Jan 30, 2026
ec24cbf
yml changes
NickSavinoArcurve Jan 30, 2026
fd9f3e9
Merge pull request #42 from CGC-2026/hotfix/cloud-infra
NickSavino Jan 30, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
93 changes: 49 additions & 44 deletions .github/workflows/deploy.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,6 @@ jobs:
id-token: write
contents: read

outputs:
image_uri: ${{ steps.build.outputs.image_uri }}

steps:
- name: Checkout
uses: actions/checkout@v4
Expand All @@ -35,7 +32,6 @@ jobs:
uses: aws-actions/amazon-ecr-login@v2

- name: Build & push image
id: build
env:
IMAGE_TAG: ${{ github.sha }}
run: |
Expand All @@ -45,8 +41,6 @@ jobs:
docker build -f Dockerfile.prod -t "${IMAGE_URI}" .
docker push "${IMAGE_URI}"

echo "IMAGE_URI=${IMAGE_URI}" >> $GITHUB_OUTPUT

deploy:
runs-on: ubuntu-latest
needs: build
Expand All @@ -63,55 +57,66 @@ jobs:
aws-region: ${{ env.AWS_REGION }}

- name: Deploy to EC2 via SSM
env:
IMAGE_URI: ${{ needs.build.outputs.image_uri }}
run: |
set -euo pipefail

ECR_REGISTRY="$(aws sts get-caller-identity --query Account --output text).dkr.ecr.${AWS_REGION}.amazonaws.com"
IMAGE_URI="${ECR_REGISTRY}/${ECR_REPO}:${GITHUB_SHA}"
echo "Deploying image ${IMAGE_URI} to instance"
test -n "${IMAGE_URI}"

COMMANDS=$(cat <<JSON
[
"set -euo pipefail",
"AWS_REGION=${AWS_REGION}",
"IMAGE_URI=${IMAGE_URI}",
"CONTAINER_NAME=${CONTAINER_NAME}",
"echo IMAGE_URI=$IMAGE_URI",
"if ! command -v docker >/dev/null 2>&1; then sudo dnf -y install docker; sudo systemctl enable --now docker; fi",
"sudo usermod -aG docker ec2-user || true",
"ECR_REGISTRY=\${IMAGE_URI%%/*}",
"aws ecr get-login-password --region $AWS_REGION | sudo docker login --username AWS --password-stdin \$ECR_REGISTRY",
"DBURL=\$(aws ssm get-parameter --region $AWS_REGION --with-decryption --name /cgc-2026-prod/api/database_url --query Parameter.Value --output text)",
"CLERK_SECRET=\$(aws ssm get-parameter --region $AWS_REGION --with-decryption --name /cgc-2026-prod/api/clerk_secret_key --query Parameter.Value --output text)",
"sudo docker pull $IMAGE_URI",
"sudo docker rm -f $CONTAINER_NAME || true",
"sudo docker run -d --restart unless-stopped --name $CONTAINER_NAME -p 8080:8080 -e PORT=8080 -e DATABASE_URL=\"\$DBURL\" -e CLERK_SECRET_KEY=\"\$CLERK_SECRET\" $IMAGE_URI",
"sleep 2",
"curl -fsS http://localhost:8080/health"
]
JSON
)

COMMAND_ID=$(aws ssm send-command \
--region "${AWS_REGION}" \
--instance-ids "${EC2_INSTANCE_ID}" \
--document-name "AWS-RunShellScript" \
--parameters commands="[
"set -e",
"AWS_REGION='${AWS_REGION}'",
"IMAGE_URI='${IMAGE_URI}'",
"CONTAINER_NAME='${CONTAINER_NAME}'",
"",
"if ! command -v docker >/dev/null 2>&1; then sudo dnf -y install docker; sudo systemctl enable --now docker; fi",
"",
"sudo usermod -aG docker ec2-user || true",
"",
"# Login to ECR",
"aws ecr get-login-password --region ${AWS_REGION} | sudo docker login --username AWS --password-stdin $(echo ${IMAGE_URI} | cut -d/ -f1)",
"",
"# Fetch runtime secrets from SSM",
"DBURL=$(aws ssm get-parameter --region ${AWS_REGION} --with-decryption --name /cgc-2026-prod/api/database_url --query Parameter.Value --output text)",
"CLERK_SECRET=$(aws ssm get-parameter --region ${AWS_REGION} --with-decryption --name /cgc-2026-prod/api/clerk_secret_key --query Parameter.Value --output text)",
"",
"# Pull new image",
"",
"sudo docker pull ${IMAGE_URI}",
"# Stop existing container (if any)",
"sudo docker rm -f ${CONTAINER_NAME} || true",
"",
"# Run container",
"sudo docker run -d --restart unless-stopped --name \"${CONTAINER_NAME}\" -p 8080:8080 -e PORT=8080 -e DATABASE_URL=\"${DBURL}\" -e CLERK_SECRET_KEY=\"${CLERK_SECRET}\" \"${IMAGE_URI}\"",
"",
"# Smoke check",
"sleep 2",
"curl -fsS http://localhost:8080/health"
]"" \
--parameters "commands=${COMMANDS}" \
--query 'Command.CommandId' \
--output text)
Comment on lines 90 to 96

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

SSM command is sent but never awaited — deployment success/failure is unknown.

The step captures COMMAND_ID but never uses it. The workflow proceeds immediately without waiting for the SSM command to complete or checking its exit status. The deployment could fail silently while the health check runs.

Add a wait loop to poll for command completion:

🛠️ Proposed fix to wait for SSM command completion
          COMMAND_ID=$(aws ssm send-command \
            --region "${AWS_REGION}" \
            --instance-ids "${EC2_INSTANCE_ID}" \
            --document-name "AWS-RunShellScript" \
            --parameters "commands=${COMMANDS}" \
            --query 'Command.CommandId' \
            --output text)
+
+          echo "Waiting for SSM command ${COMMAND_ID} to complete..."
+          for i in {1..60}; do
+            STATUS=$(aws ssm get-command-invocation \
+              --region "${AWS_REGION}" \
+              --command-id "${COMMAND_ID}" \
+              --instance-id "${EC2_INSTANCE_ID}" \
+              --query 'Status' \
+              --output text 2>/dev/null || echo "Pending")
+            
+            case "${STATUS}" in
+              Success)
+                echo "SSM command succeeded"
+                break
+                ;;
+              Failed|Cancelled|TimedOut)
+                echo "SSM command failed with status: ${STATUS}"
+                aws ssm get-command-invocation \
+                  --region "${AWS_REGION}" \
+                  --command-id "${COMMAND_ID}" \
+                  --instance-id "${EC2_INSTANCE_ID}" \
+                  --query 'StandardErrorContent' \
+                  --output text
+                exit 1
+                ;;
+              *)
+                echo "Status: ${STATUS} (attempt $i/60)"
+                sleep 5
+                ;;
+            esac
+          done
🤖 Prompt for AI Agents
In @.github/workflows/deploy.yml around lines 90 - 96, The workflow captures
COMMAND_ID from aws ssm send-command but never waits for it to finish; update
the deploy step to poll SSM for completion using the COMMAND_ID (e.g., call aws
ssm list-command-invocations or aws ssm get-command-invocation with --command-id
"${COMMAND_ID}" and --instance-id "${EC2_INSTANCE_ID}") in a loop, checking the
invocation Status/StatusDetails until it reaches a terminal state (Success,
Failed, TimedOut, Cancelled), break when terminal, and fail the job (exit
non-zero) if the final status is not Success so the subsequent health check
doesn't run on a failed deploy. Ensure the polling includes a sleep/backoff and
a reasonable timeout to avoid infinite loops.


sleep 2
aws ssm get-command-invocation \
- name: Verify health
run: |
set -euo pipefail

PUBLIC_IP=$(aws ec2 describe-instances \
--region "${AWS_REGION}" \
--command-id "${COMMAND_ID}" \
--instance-id "${EC2_INSTANCE_ID}" \
--query 'StandardErrorContent' \
--output text
--instance-ids "${EC2_INSTANCE_ID}" \
--query 'Reservations[0].Instances[0].PublicIpAddress' \
--output text)

test "${PUBLIC_IP}" != "None"

URL="http://${PUBLIC_IP}:8080/health"
echo "Checking ${URL}"

for i in {1..20}; do
if curl -fsS "${URL}" >/dev/null; then
echo "Health check passed"
exit 0
fi
echo "Health not ready yet (attempt $i/20) — retrying..."
sleep 3
done

echo "Health check failed"
Comment on lines +118 to +122

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Missing exit 1 — workflow succeeds even when health check fails.

When the loop exhausts all 20 retries, the step prints "Health check failed" but exits with status 0 (the exit code of echo). The workflow will incorrectly report success.

🐛 Proposed fix to exit non-zero on failure
          echo "Health check failed"
+          exit 1
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
echo "Health not ready yet (attempt $i/20) — retrying..."
sleep 3
done
echo "Health check failed"
echo "Health not ready yet (attempt $i/20) — retrying..."
sleep 3
done
echo "Health check failed"
exit 1
🤖 Prompt for AI Agents
In @.github/workflows/deploy.yml around lines 118 - 122, The health-check loop
currently prints "Health check failed" but does not return a non-zero status, so
the job can still succeed; update the failing branch after the loop (the block
that contains the echo "Health check failed") to exit with a non-zero code
(e.g., add an exit 1 immediately after the echo) so the workflow step fails when
the health check exhausts all retries.

55 changes: 44 additions & 11 deletions infra/env/prod/.terraform.lock.hcl

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.