This demo showcases AWS DevOps Agent's capabilities in identifying root causes of system issues and accelerating incident response through a realistic Unicorn Rentals microservices architecture.
Unicorn Rentals is a customer booking system with rental processing logic and analytics reporting:
graph TB
subgraph app["π¦ Unicorn Rentals Application"]
Customer["π€ Customer Requests"]
APIGW["π API Gateway<br/><i>unicorn-rentals-api</i>"]
Lambda["β‘ Lambda<br/><i>rental-processor</i><br/>"]
DDB["ποΈ DynamoDB<br/><i>unicorn-rentals</i><br/>"]
Customer -->|"POST /process"| APIGW
APIGW -->|"Invoke"| Lambda
Lambda -->|"Read/Write"| DDB
end
subgraph monitoring["π Monitoring & Alerting"]
A1["π΄ Error Alarm<br/>Errors > 3/min"]
A2["π‘ Duration Alarm<br/>Avg > 10s"]
A3["π Throttle Alarm<br/>Throttles β₯ 1"]
SNS["π¨ SNS Topic"]
Forwarder["βοΈ Webhook Forwarder<br/><i>Lambda</i>"]
A1 & A2 & A3 -->|"ALARM state"| SNS
SNS --> Forwarder
end
subgraph devops["π€ AWS DevOps Agent"]
Agent["π Investigation<br/>Root Cause Analysis<br/>Mitigation Plans"]
Slack["π¬ Slack<br/>Real-time findings"]
Agent -->|"Posts updates"| Slack
end
Lambda -.->|"Metrics & Logs"| A1 & A2
DDB -.->|"Throttle Events"| A3
Forwarder -->|"HMAC Webhook"| Agent
style app fill:#1a1a2e,stroke:#e94560,color:#fff
style monitoring fill:#1a1a2e,stroke:#f5a623,color:#fff
style devops fill:#1a1a2e,stroke:#00d2ff,color:#fff
The architecture includes intentional failure points that demonstrate DevOps Agent's diagnostic capabilities across multiple AWS services.
Rental Analytics Overload
- Problem: Rental processor runs out of memory (127/128 MB usage)
- Cause: Loading large analytics datasets during processing
- Impact: 30% booking failures with 10+ second delays
- Symptoms: Runtime.OutOfMemory errors, processing timeouts
Peak Demand Capacity Issues
- Problem: DynamoDB capacity exceeded during high traffic
- Cause: Provisioned throughput limits hit during batch operations
- Impact: 5+ second booking delays, intermittent failures
- Symptoms: ProvisionedThroughputExceeded errors
Cross-Service Error Propagation
- Problem: Multi-service error chain reaction
- Cause: Database throttling β Lambda timeouts β API Gateway 5XX errors
- Impact: Complete system outage affecting all customers
- Symptoms: Service dependency failures across the stack
- AWS CLI configured with appropriate permissions
- Python 3.7+ with
requestslibrary - CloudFormation deployment permissions
# Make deployment script executable
chmod +x deploy.sh
# Deploy the complete stack
./deploy.shThe deployment creates:
- API Gateway:
demo-unicorn-rentals-api - Lambda Function:
demo-unicorn-rental-processor(128MB memory, 30% error rate) - DynamoDB Table:
demo-unicorn-rentals(low provisioned capacity) - CloudWatch Alarms: Error rate, duration, and throttling monitors
# Source the generated environment file
source demo-environment.env
# Verify deployment
echo "API URL: $API_URL"
echo "Lambda: $LAMBDA_NAME"
echo "Table: $TABLE_NAME"Install Python dependencies:
pip install requestsStart continuous background load:
# Light continuous load (5 RPS baseline)
python continuous-load-generator.py --api-url $API_URL --rps 5 --duration 10
# Higher load for faster error generation
python continuous-load-generator.py --api-url $API_URL --rps 15 --duration 10The load generator creates realistic traffic patterns with business hours peaks and occasional spikes.
-
Create DevOps Agent Space
- Navigate to AWS Console β DevOps Agent
- Create a new Space for the demo
- Include resources with tag:
Application = unicorn_rentals - This will automatically discover and include:
- API Gateway:
demo-unicorn-rentals-api - Lambda Function:
demo-unicorn-rental-processor - DynamoDB Table:
demo-unicorn-rentals - CloudWatch Alarms and Logs
- API Gateway:
-
Open WebApp and Start Investigation
- Launch the DevOps Agent WebApp from your Space
- Begin investigation using the prompts below
- Agent will analyze logs, metrics, and service relationships
Use these prompts with AWS DevOps Agent to analyze the system:
The performance of my unicorn_rentals application has degraded significantly. Customer booking response times have increased and I'm seeing more rental processing errors. Can you analyze the system behavior over the last hour?
My unicorn rental processor is experiencing high latency spikes during booking confirmations. The duration metrics show some rental processing taking much longer than others. What's causing this inconsistent booking performance?
Investigation details: Why is my unicorn rentals API Gateway showing increased latency and 5XX errors? Customer bookings were working fine earlier today.
Once you've added your Slack Workspace as a capability provider, follow these steps to complete the integration:
-
Associate Slack Channel with your Agent Space
- In your Agent Space, go to Capabilities β Communications β Slack
- Select Add Slack and enter the Channel ID of your target channel
- Choose Create to complete the association
- For private channels, invite the DevOps Agent bot user to the channel before it can post
-
How Slack Works During Investigations
- When an investigation starts (manually from the WebApp, via webhook, or from a ticketing integration), DevOps Agent automatically posts updates to the configured Slack channel
- The channel receives key findings, root cause analyses, and mitigation plans as the investigation progresses
- Team members can follow along in real-time without needing console access
-
Starting Investigations for This Demo
- Investigations are started from the DevOps Agent WebApp (Incident Response tab), not directly from Slack
- Use the prompts from the Investigation Prompts section above, or choose a pre-configured starting point like "Latest alarm" or "Error rate spike"
- Once started, all findings stream into your Slack channel automatically
- You can also trigger investigations via webhooks from PagerDuty, Grafana, or custom alerting systems
Note: Slack serves as a notification and collaboration channel. The investigation itself is driven from the WebApp, ticketing integrations, or webhooks. Avoid uninstalling the Slack app during the public preview as reinstallation may not work.
You can configure CloudWatch Alarms to automatically trigger DevOps Agent investigations when errors occur. The stack includes an optional webhook integration pipeline:
CloudWatch Alarm β SNS Topic β Forwarder Lambda β DevOps Agent Webhook
To enable this:
-
Generate a webhook in DevOps Agent
- In your Agent Space, go to Capabilities β Webhook β Configure
- Click Generate webhook to create an HMAC key pair
- Save the webhook URL and secret securely (you won't see the secret again)
-
Deploy with webhook parameters
export DEVOPS_AGENT_WEBHOOK_URL="https://event-ai.us-east-1.api.aws/webhook/generic/YOUR_ID" export DEVOPS_AGENT_WEBHOOK_SECRET="your-hmac-secret" ./deploy.sh
-
Trigger the pipeline
- Run the load generator to produce errors
- CloudWatch Alarms fire when thresholds are breached
- The forwarder Lambda sends HMAC-signed webhook requests to DevOps Agent
- An investigation starts automatically, with findings posted to Slack
The forwarder only triggers investigations on ALARM state transitions (not OK recoveries), and each alarm produces a unique incident ID to avoid deduplication.
Connecting MCP servers to DevOps Agent gives it additional context and tools beyond what's available through native AWS integrations. In this case, the Billing & Cost Management MCP server provides access to pricing data, cost anomaly detection, and optimization recommendations β enabling the agent to make cost-aware recommendations during incident investigations (e.g., estimating the cost impact of increasing Lambda memory or switching DynamoDB to on-demand mode).
The MCP server runs as a Lambda function behind API Gateway, wrapping the awslabs.billing-cost-management-mcp-server package using the MCP Streamable HTTP transport.
cd mcp-server-hosting
bash deploy.shThe script will output the endpoint URL and API key. No CDK or Docker required β just AWS CLI, Python 3.10+, and pip.
- In your Agent Space, go to Capabilities β MCP Servers β Add MCP Server
- Enter the endpoint URL from the deploy output (e.g.,
https://<api-id>.execute-api.us-east-1.amazonaws.com/prod/mcp) - Select API Key as the authorization flow
- Configure:
- API Key Name:
bcm-mcp-key - API Key Header:
x-api-key - API Key Value: the key from the deploy output
- API Key Name:
- Leave "Dynamic Client Registration" and "Private connection" unchecked
- Choose AWS owned key for encryption
The skill instructs DevOps Agent when and how to use the billing MCP tools during investigations.
- In your Agent Space, go to Skills β Add Skill β Upload Skill
- Upload
devops-agent-skill-billing-mcp.zip - Select Generic agent type (applies to all investigation types)
Once connected, the agent will automatically use billing tools to check for cost anomalies correlated with incidents, estimate the cost of proposed mitigations, and include a cost impact summary in its findings.
Application: unicorn_rentals
- API Gateway:
${Environment}-unicorn-rentals-api- Customer-facing REST API - Lambda:
${Environment}-unicorn-rental-processor- Business logic with intentional constraints - DynamoDB:
${Environment}-unicorn-rentals- Rental data storage with throttling limits - CloudWatch: Comprehensive monitoring and alerting
Intentional Constraints:
- Lambda: 128MB memory limit, 30-second timeout
- DynamoDB: 2 RCU/WCU provisioned capacity
- Error injection: 30% failure rate with realistic scenarios
Remove all resources when finished:
aws cloudformation delete-stack --stack-name unicorn-rentals
aws cloudformation delete-stack --stack-name bcm-mcp-server- DevOps Agent Documentation
- Interactive Demo
- re:Invent Session
- AWS Service Terms
- AI Services Opt-out Policies
cloudformation-template.yaml- Complete infrastructure definitiondeploy.sh- Automated deployment with error handlingcontinuous-load-generator.py- Realistic traffic simulationreset-alarms.sh- Reset CloudWatch alarms to OK statemcp-server-hosting/- Billing & Cost Management MCP server (Lambda + API Gateway)devops-agent-skill/- DevOps Agent skill for cost-aware investigationsdevops-agent-skill-billing-mcp.zip- Ready-to-upload skill package