Implement MRAP cleanup in account-nuke.sh#75
Conversation
Add multi-region access point cleanup to account-nuke script.
There was a problem hiding this comment.
Pull request overview
This PR adds Multi-Region Access Point (MRAP) cleanup functionality to the account-nuke script to handle the deletion of MRAPs and their associated S3 buckets before running aws-nuke.
Changes:
- Adds MRAP discovery and cleanup logic that runs before aws-nuke
- Implements bucket object/version deletion for MRAP-associated buckets
- Adds safety check for jq dependency
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| echo "$REGIONAL_DATA" | jq -c '.[]' | while read -r row; do | ||
| bucket_name=$(echo "$row" | jq -r '.Bucket') | ||
| region_id=$(echo "$row" | jq -r '.Region') | ||
|
|
||
| echo "Targeting bucket: $bucket_name in $region_id" | ||
|
|
||
| # 4. Loop to delete ALL Versions (Handles >1000 items pagination) | ||
| while true; do | ||
| # Fetch batch of versions (suppress errors if bucket is already gone) | ||
| versions=$(aws s3api list-object-versions --bucket "$bucket_name" --region "$region_id" --max-items 1000 --query '{Objects: Versions[].{Key:Key,VersionId:VersionId}}' --output json 2>/dev/null || echo "") | ||
|
|
||
| # Check if empty (jq returns null or empty list) | ||
| count=$(echo "$versions" | jq '.Objects | length' 2>/dev/null || echo "0") | ||
|
|
||
| if [ "$count" == "0" ] || [ "$versions" == "" ] || [ "$count" == "null" ]; then | ||
| break | ||
| fi | ||
|
|
||
| echo " - Deleting batch of $count versions..." | ||
| aws s3api delete-objects --bucket "$bucket_name" --region "$region_id" --delete "$versions" >/dev/null 2>&1 || true | ||
| done | ||
|
|
||
| # 5. Loop to delete ALL Delete Markers (Handles >1000 items pagination) | ||
| while true; do | ||
| markers=$(aws s3api list-object-versions --bucket "$bucket_name" --region "$region_id" --max-items 1000 --query '{Objects: DeleteMarkers[].{Key:Key,VersionId:VersionId}}' --output json 2>/dev/null || echo "") | ||
|
|
||
| count=$(echo "$markers" | jq '.Objects | length' 2>/dev/null || echo "0") | ||
|
|
||
| if [ "$count" == "0" ] || [ "$markers" == "" ] || [ "$count" == "null" ]; then | ||
| break | ||
| fi | ||
|
|
||
| echo " - Deleting batch of $count markers..." | ||
| aws s3api delete-objects --bucket "$bucket_name" --region "$region_id" --delete "$markers" >/dev/null 2>&1 || true | ||
| done | ||
|
|
||
| # 6. Delete the now-empty Bucket | ||
| # We use || true because aws-nuke might have already deleted it | ||
| aws s3 rb "s3://$bucket_name" --force --region "$region_id" 2>/dev/null || echo " Bucket $bucket_name already deleted or not found." | ||
| done |
There was a problem hiding this comment.
The while loop that processes regional data reads from a pipe, which creates a subshell. This means any variables set or modified within the loop (like failures or counters) won't be visible outside the loop. More critically, if the bucket deletion fails, the MRAP deletion at line 159 will still proceed, potentially leaving the MRAP in an inconsistent state.
Consider restructuring this code to avoid piping into while read, or use process substitution with a file descriptor redirect instead. For example, you could use 'while read -r row; do ... done < <(echo "$REGIONAL_DATA" | jq -c '.[]')' to avoid the subshell issue.
|
|
||
| # 7. Delete the MRAP Global Routing | ||
| echo "Deleting MRAP Global Endpoint: $mrap_name" | ||
| aws s3control delete-multi-region-access-point --account-id "$SUB_ACCOUNT_ID" --region us-west-2 --details "Name=$mrap_name" || echo "Warning: Failed to delete MRAP $mrap_name (may already be gone)" |
There was a problem hiding this comment.
The delete-multi-region-access-point command syntax appears incorrect. According to AWS CLI documentation, the --details parameter expects a JSON structure, not a simple key=value pair. The correct format should be '--details "{"Name":"$mrap_name"}"' or use a proper JSON string.
Additionally, MRAP deletion is an asynchronous operation that returns a request token. The deletion doesn't happen immediately - you need to poll the operation status using describe-multi-region-access-point-operation with the returned request token to confirm completion. Without this, the script may exit before the MRAP is actually deleted.
| aws s3control delete-multi-region-access-point --account-id "$SUB_ACCOUNT_ID" --region us-west-2 --details "Name=$mrap_name" || echo "Warning: Failed to delete MRAP $mrap_name (may already be gone)" | |
| # The delete operation is asynchronous; capture the request token and poll for completion | |
| if ! REQUEST_TOKEN_ARN=$(aws s3control delete-multi-region-access-point \ | |
| --account-id "$SUB_ACCOUNT_ID" \ | |
| --region us-west-2 \ | |
| --details "{\"Name\":\"$mrap_name\"}" \ | |
| --query 'RequestTokenARN' \ | |
| --output text 2>/dev/null); then | |
| echo "Warning: Failed to initiate deletion for MRAP $mrap_name (may already be gone)" | |
| else | |
| echo "Deletion initiated for MRAP $mrap_name, tracking operation..." | |
| MAX_RETRIES=30 | |
| RETRY_DELAY=10 | |
| RETRIES=0 | |
| while [ "$RETRIES" -lt "$MAX_RETRIES" ]; do | |
| STATUS=$(aws s3control describe-multi-region-access-point-operation \ | |
| --account-id "$SUB_ACCOUNT_ID" \ | |
| --region us-west-2 \ | |
| --request-token-arn "$REQUEST_TOKEN_ARN" \ | |
| --query 'AsyncOperation.Status' \ | |
| --output text 2>/dev/null || echo "UNKNOWN") | |
| if [ "$STATUS" = "COMPLETED" ]; then | |
| echo "MRAP $mrap_name deletion completed." | |
| break | |
| elif [ "$STATUS" = "FAILED" ]; then | |
| echo "Warning: MRAP $mrap_name deletion reported FAILED." | |
| break | |
| elif [ "$STATUS" = "UNKNOWN" ]; then | |
| echo "Warning: Unable to determine status for MRAP $mrap_name deletion operation." | |
| break | |
| else | |
| echo "MRAP $mrap_name deletion status: $STATUS (retry $((RETRIES + 1))/$MAX_RETRIES)..." | |
| RETRIES=$((RETRIES + 1)) | |
| sleep "$RETRY_DELAY" | |
| fi | |
| done | |
| if [ "$RETRIES" -ge "$MAX_RETRIES" ]; then | |
| echo "Warning: Timed out waiting for MRAP $mrap_name deletion to complete." | |
| fi | |
| fi |
| while true; do | ||
| markers=$(aws s3api list-object-versions --bucket "$bucket_name" --region "$region_id" --max-items 1000 --query '{Objects: DeleteMarkers[].{Key:Key,VersionId:VersionId}}' --output json 2>/dev/null || echo "") | ||
|
|
||
| count=$(echo "$markers" | jq '.Objects | length' 2>/dev/null || echo "0") | ||
|
|
||
| if [ "$count" == "0" ] || [ "$markers" == "" ] || [ "$count" == "null" ]; then | ||
| break | ||
| fi | ||
|
|
||
| echo " - Deleting batch of $count markers..." | ||
| aws s3api delete-objects --bucket "$bucket_name" --region "$region_id" --delete "$markers" >/dev/null 2>&1 || true | ||
| done |
There was a problem hiding this comment.
The pagination logic using --max-items doesn't properly handle AWS CLI pagination. The --max-items parameter is meant to limit the total number of items returned across all pages, not to specify a batch size. When there are more items, AWS CLI returns a NextToken, but this code doesn't capture or use it.
For proper pagination, you should either:
- Remove --max-items and use --page-size with --starting-token to iterate through pages, OR
- Use --max-items with --starting-token to continue from where the previous call left off
Without proper pagination handling, if there are more than 1000 delete markers, the subsequent iterations of the while loop will keep returning the same first 1000 items, creating an infinite loop.
|
|
||
| count=$(echo "$markers" | jq '.Objects | length' 2>/dev/null || echo "0") | ||
|
|
||
| if [ "$count" == "0" ] || [ "$markers" == "" ] || [ "$count" == "null" ]; then |
There was a problem hiding this comment.
The string comparison uses '==' which works in bash but is not POSIX-compliant. For better portability and consistency with shell scripting best practices, use '=' for string comparisons. Additionally, the check for 'null' as a string is correct since jq would output the literal string "null" for null values.
| if [ "$count" == "0" ] || [ "$markers" == "" ] || [ "$count" == "null" ]; then | |
| if [ "$count" = "0" ] || [ "$markers" = "" ] || [ "$count" = "null" ]; then |
|
|
||
| # 6. Delete the now-empty Bucket | ||
| # We use || true because aws-nuke might have already deleted it | ||
| aws s3 rb "s3://$bucket_name" --force --region "$region_id" 2>/dev/null || echo " Bucket $bucket_name already deleted or not found." |
There was a problem hiding this comment.
The aws s3 rb command uses the --force flag which will attempt to delete all objects in the bucket before removing it. However, this step is redundant since the previous loops already deleted all versions and delete markers. More importantly, if the bucket is versioned and the previous loops failed to delete all objects (due to the pagination bug), this command will hang or fail.
Consider removing the --force flag since the bucket should already be empty at this point, or add validation to ensure the bucket is actually empty before attempting deletion.
| aws s3 rb "s3://$bucket_name" --force --region "$region_id" 2>/dev/null || echo " Bucket $bucket_name already deleted or not found." | |
| aws s3 rb "s3://$bucket_name" --region "$region_id" 2>/dev/null || echo " Bucket $bucket_name already deleted or not found." |
|
|
||
| # 0. Safety Check: Ensure jq is installed | ||
| if ! command -v jq &> /dev/null; then | ||
| echo -e "\e[31mError: 'jq' is not installed. MRAP cleanup requires jq. Skipping.\e[0m" |
There was a problem hiding this comment.
The error message could be more informative. Instead of just "Skipping", it would be helpful to indicate what will be skipped and what the consequences are. Consider revising to something like "Error: 'jq' is not installed. MRAP cleanup requires jq for JSON parsing. Skipping MRAP cleanup - MRAPs may need manual deletion." This provides clearer context about the impact of the skip.
| echo -e "\e[31mError: 'jq' is not installed. MRAP cleanup requires jq. Skipping.\e[0m" | |
| echo -e "\e[31mError: 'jq' is not installed. MRAP cleanup requires jq for JSON parsing. Skipping Multi-Region Access Point cleanup - MRAPs may need manual deletion.\e[0m" |
| echo -e "\e[32m[DONE] Multi-Region cleanup phase complete.\e[0m" | ||
|
|
||
| # AWS Nuke | ||
|
|
||
| ./aws-nuke nuke -c nuke.yaml --max-wait-retries 200 --no-dry-run --force --log-level debug --log-full-timestamp true --log-caller true || true |
There was a problem hiding this comment.
The MRAP cleanup phase runs before aws-nuke, which is the correct approach since MRAPs can block the deletion of their associated S3 buckets. However, there's a race condition concern: if aws-nuke runs immediately after the MRAP deletion is initiated (line 159), it might try to delete buckets while the MRAP deletion is still in progress. Since MRAP deletion is asynchronous and can take several minutes, this could lead to failures or inconsistent state.
Consider adding a wait loop after line 159 that polls the MRAP deletion status before proceeding to aws-nuke, or at minimum add a sleep delay to allow the operation to complete.
| while true; do | ||
| # Fetch batch of versions (suppress errors if bucket is already gone) | ||
| versions=$(aws s3api list-object-versions --bucket "$bucket_name" --region "$region_id" --max-items 1000 --query '{Objects: Versions[].{Key:Key,VersionId:VersionId}}' --output json 2>/dev/null || echo "") | ||
|
|
||
| # Check if empty (jq returns null or empty list) | ||
| count=$(echo "$versions" | jq '.Objects | length' 2>/dev/null || echo "0") | ||
|
|
||
| if [ "$count" == "0" ] || [ "$versions" == "" ] || [ "$count" == "null" ]; then | ||
| break | ||
| fi | ||
|
|
||
| echo " - Deleting batch of $count versions..." | ||
| aws s3api delete-objects --bucket "$bucket_name" --region "$region_id" --delete "$versions" >/dev/null 2>&1 || true | ||
| done | ||
|
|
||
| # 5. Loop to delete ALL Delete Markers (Handles >1000 items pagination) | ||
| while true; do | ||
| markers=$(aws s3api list-object-versions --bucket "$bucket_name" --region "$region_id" --max-items 1000 --query '{Objects: DeleteMarkers[].{Key:Key,VersionId:VersionId}}' --output json 2>/dev/null || echo "") | ||
|
|
||
| count=$(echo "$markers" | jq '.Objects | length' 2>/dev/null || echo "0") | ||
|
|
||
| if [ "$count" == "0" ] || [ "$markers" == "" ] || [ "$count" == "null" ]; then | ||
| break | ||
| fi | ||
|
|
||
| echo " - Deleting batch of $count markers..." | ||
| aws s3api delete-objects --bucket "$bucket_name" --region "$region_id" --delete "$markers" >/dev/null 2>&1 || true |
There was a problem hiding this comment.
The pagination logic using --max-items doesn't properly handle AWS CLI pagination. The --max-items parameter is meant to limit the total number of items returned across all pages, not to specify a batch size. When there are more items, AWS CLI returns a NextToken, but this code doesn't capture or use it.
For proper pagination, you should either:
- Remove --max-items and use --page-size with --starting-token to iterate through pages, OR
- Use --max-items with --starting-token to continue from where the previous call left off
Without proper pagination handling, if there are more than 1000 versions, the subsequent iterations of the while loop will keep returning the same first 1000 items, creating an infinite loop.
| while true; do | |
| # Fetch batch of versions (suppress errors if bucket is already gone) | |
| versions=$(aws s3api list-object-versions --bucket "$bucket_name" --region "$region_id" --max-items 1000 --query '{Objects: Versions[].{Key:Key,VersionId:VersionId}}' --output json 2>/dev/null || echo "") | |
| # Check if empty (jq returns null or empty list) | |
| count=$(echo "$versions" | jq '.Objects | length' 2>/dev/null || echo "0") | |
| if [ "$count" == "0" ] || [ "$versions" == "" ] || [ "$count" == "null" ]; then | |
| break | |
| fi | |
| echo " - Deleting batch of $count versions..." | |
| aws s3api delete-objects --bucket "$bucket_name" --region "$region_id" --delete "$versions" >/dev/null 2>&1 || true | |
| done | |
| # 5. Loop to delete ALL Delete Markers (Handles >1000 items pagination) | |
| while true; do | |
| markers=$(aws s3api list-object-versions --bucket "$bucket_name" --region "$region_id" --max-items 1000 --query '{Objects: DeleteMarkers[].{Key:Key,VersionId:VersionId}}' --output json 2>/dev/null || echo "") | |
| count=$(echo "$markers" | jq '.Objects | length' 2>/dev/null || echo "0") | |
| if [ "$count" == "0" ] || [ "$markers" == "" ] || [ "$count" == "null" ]; then | |
| break | |
| fi | |
| echo " - Deleting batch of $count markers..." | |
| aws s3api delete-objects --bucket "$bucket_name" --region "$region_id" --delete "$markers" >/dev/null 2>&1 || true | |
| next_token="" | |
| while true; do | |
| # Fetch a page of versions (suppress errors if bucket is already gone) | |
| if [ -n "$next_token" ]; then | |
| page=$(aws s3api list-object-versions --bucket "$bucket_name" --region "$region_id" --max-items 1000 --starting-token "$next_token" --output json 2>/dev/null || echo "") | |
| else | |
| page=$(aws s3api list-object-versions --bucket "$bucket_name" --region "$region_id" --max-items 1000 --output json 2>/dev/null || echo "") | |
| fi | |
| # Build delete payload and extract NextToken | |
| versions=$(echo "$page" | jq '{Objects: [(.Versions // [])[] | {Key:.Key,VersionId:.VersionId}]}' 2>/dev/null || echo "") | |
| count=$(echo "$versions" | jq '.Objects | length' 2>/dev/null || echo "0") | |
| next_token=$(echo "$page" | jq -r '.NextToken // empty' 2>/dev/null || echo "") | |
| if [ "$count" == "0" ] || [ "$versions" == "" ] || [ "$count" == "null" ]; then | |
| break | |
| fi | |
| echo " - Deleting batch of $count versions..." | |
| aws s3api delete-objects --bucket "$bucket_name" --region "$region_id" --delete "$versions" >/dev/null 2>&1 || true | |
| # If there is no next page, we're done | |
| if [ -z "$next_token" ]; then | |
| break | |
| fi | |
| done | |
| # 5. Loop to delete ALL Delete Markers (Handles >1000 items pagination) | |
| next_token="" | |
| while true; do | |
| if [ -n "$next_token" ]; then | |
| page=$(aws s3api list-object-versions --bucket "$bucket_name" --region "$region_id" --max-items 1000 --starting-token "$next_token" --output json 2>/dev/null || echo "") | |
| else | |
| page=$(aws s3api list-object-versions --bucket "$bucket_name" --region "$region_id" --max-items 1000 --output json 2>/dev/null || echo "") | |
| fi | |
| markers=$(echo "$page" | jq '{Objects: [(.DeleteMarkers // [])[] | {Key:.Key,VersionId:.VersionId}]}' 2>/dev/null || echo "") | |
| count=$(echo "$markers" | jq '.Objects | length' 2>/dev/null || echo "0") | |
| next_token=$(echo "$page" | jq -r '.NextToken // empty' 2>/dev/null || echo "") | |
| if [ "$count" == "0" ] || [ "$markers" == "" ] || [ "$count" == "null" ]; then | |
| break | |
| fi | |
| echo " - Deleting batch of $count markers..." | |
| aws s3api delete-objects --bucket "$bucket_name" --region "$region_id" --delete "$markers" >/dev/null 2>&1 || true | |
| if [ -z "$next_token" ]; then | |
| break | |
| fi |
| # Check if empty (jq returns null or empty list) | ||
| count=$(echo "$versions" | jq '.Objects | length' 2>/dev/null || echo "0") | ||
|
|
||
| if [ "$count" == "0" ] || [ "$versions" == "" ] || [ "$count" == "null" ]; then |
There was a problem hiding this comment.
The string comparison uses '==' which works in bash but is not POSIX-compliant. For better portability and consistency with shell scripting best practices, use '=' for string comparisons. Additionally, the check for 'null' as a string is correct since jq would output the literal string "null" for null values.
| # 1. List MRAPs | ||
| # We use '|| true' to ensure the script doesn't die if the API call flakes or returns 404 | ||
| # We route this specifically to us-west-2 to ensure we hit a valid control plane endpoint | ||
| MRAP_NAMES=$(aws s3control list-multi-region-access-points --account-id "$SUB_ACCOUNT_ID" --region us-west-2 --query 'AccessPoints[].Name' --output text || true) |
There was a problem hiding this comment.
The query parameter extracts only the 'Name' field, but the delete operation on line 159 requires the full MRAP name. While this might work, it would be more robust to store the complete MRAP ARN or alias which uniquely identifies the resource. AWS Multi-Region Access Points have both a name and an alias, and using just the name in the delete operation might be ambiguous in certain scenarios.
Consider capturing and using the full MRAP details or at least the alias to ensure unambiguous resource identification.
Add multi-region access point cleanup to account-nuke script.