AWS DevOps Agent Demo

This demo showcases AWS DevOps Agent's capabilities in identifying root causes of system issues and accelerating incident response through a realistic Unicorn Rentals microservices architecture.

Application Overview

Unicorn Rentals is a customer booking system with rental processing logic and analytics reporting:

graph TB
    subgraph app["🦄 Unicorn Rentals Application"]
        Customer["👤 Customer Requests"]
        APIGW["🌐 API Gateway<br/><i>unicorn-rentals-api</i>"]
        Lambda["⚡ Lambda<br/><i>rental-processor</i><br/>"]
        DDB["🗄️ DynamoDB<br/><i>unicorn-rentals</i><br/>"]
        
        Customer -->|"POST /process"| APIGW
        APIGW -->|"Invoke"| Lambda
        Lambda -->|"Read/Write"| DDB
    end

    subgraph monitoring["📊 Monitoring & Alerting"]
        A1["🔴 Error Alarm<br/>Errors > 3/min"]
        A2["🟡 Duration Alarm<br/>Avg > 10s"]
        A3["🟠 Throttle Alarm<br/>Throttles ≥ 1"]
        SNS["📨 SNS Topic"]
        Forwarder["⚙️ Webhook Forwarder<br/><i>Lambda</i>"]
        
        A1 & A2 & A3 -->|"ALARM state"| SNS
        SNS --> Forwarder
    end

    subgraph devops["🤖 AWS DevOps Agent"]
        Agent["🔍 Investigation<br/>Root Cause Analysis<br/>Mitigation Plans"]
        Slack["💬 Slack<br/>Real-time findings"]
        
        Agent -->|"Posts updates"| Slack
    end

    Lambda -.->|"Metrics & Logs"| A1 & A2
    DDB -.->|"Throttle Events"| A3
    Forwarder -->|"HMAC Webhook"| Agent

    style app fill:#1a1a2e,stroke:#e94560,color:#fff
    style monitoring fill:#1a1a2e,stroke:#f5a623,color:#fff
    style devops fill:#1a1a2e,stroke:#00d2ff,color:#fff

The architecture includes intentional failure points that demonstrate DevOps Agent's diagnostic capabilities across multiple AWS services.

Demo Scenarios

🧠 Scenario 1: Memory Exhaustion

Rental Analytics Overload

Problem: Rental processor runs out of memory (127/128 MB usage)
Cause: Loading large analytics datasets during processing
Impact: 30% booking failures with 10+ second delays
Symptoms: Runtime.OutOfMemory errors, processing timeouts

🗄️ Scenario 2: Database Throttling

Peak Demand Capacity Issues

Problem: DynamoDB capacity exceeded during high traffic
Cause: Provisioned throughput limits hit during batch operations
Impact: 5+ second booking delays, intermittent failures
Symptoms: ProvisionedThroughputExceeded errors

🔗 Scenario 3: Cascade Failures

Cross-Service Error Propagation

Problem: Multi-service error chain reaction
Cause: Database throttling → Lambda timeouts → API Gateway 5XX errors
Impact: Complete system outage affecting all customers
Symptoms: Service dependency failures across the stack

Getting Started

Prerequisites

AWS CLI configured with appropriate permissions
Python 3.7+ with requests library
CloudFormation deployment permissions

1. Deploy the Infrastructure

# Make deployment script executable
chmod +x deploy.sh

# Deploy the complete stack
./deploy.sh

The deployment creates:

API Gateway: demo-unicorn-rentals-api
Lambda Function: demo-unicorn-rental-processor (128MB memory, 30% error rate)
DynamoDB Table: demo-unicorn-rentals (low provisioned capacity)
CloudWatch Alarms: Error rate, duration, and throttling monitors

2. Load Environment Variables

# Source the generated environment file
source demo-environment.env

# Verify deployment
echo "API URL: $API_URL"
echo "Lambda: $LAMBDA_NAME" 
echo "Table: $TABLE_NAME"

3. Generate Realistic Load

Install Python dependencies:

pip install requests

Start continuous background load:

# Light continuous load (5 RPS baseline)
python continuous-load-generator.py --api-url $API_URL --rps 5 --duration 10

# Higher load for faster error generation
python continuous-load-generator.py --api-url $API_URL --rps 15 --duration 10

The load generator creates realistic traffic patterns with business hours peaks and occasional spikes.

DevOps Agent Analysis

Setup DevOps Agent Space

Create DevOps Agent Space
- Navigate to AWS Console → DevOps Agent
- Create a new Space for the demo
- Include resources with tag: Application = unicorn_rentals
- This will automatically discover and include:
  - API Gateway: demo-unicorn-rentals-api
  - Lambda Function: demo-unicorn-rental-processor
  - DynamoDB Table: demo-unicorn-rentals
  - CloudWatch Alarms and Logs
Open WebApp and Start Investigation
- Launch the DevOps Agent WebApp from your Space
- Begin investigation using the prompts below
- Agent will analyze logs, metrics, and service relationships

Investigation Prompts

Use these prompts with AWS DevOps Agent to analyze the system:

Memory Issues Analysis

The performance of my unicorn_rentals application has degraded significantly. Customer booking response times have increased and I'm seeing more rental processing errors. Can you analyze the system behavior over the last hour?

Latency Investigation

My unicorn rental processor is experiencing high latency spikes during booking confirmations. The duration metrics show some rental processing taking much longer than others. What's causing this inconsistent booking performance?

Root Cause Analysis

Investigation details: Why is my unicorn rentals API Gateway showing increased latency and 5XX errors? Customer bookings were working fine earlier today.

Slack Integration

Once you've added your Slack Workspace as a capability provider, follow these steps to complete the integration:

Associate Slack Channel with your Agent Space
- In your Agent Space, go to Capabilities → Communications → Slack
- Select Add Slack and enter the Channel ID of your target channel
- Choose Create to complete the association
- For private channels, invite the DevOps Agent bot user to the channel before it can post
How Slack Works During Investigations
- When an investigation starts (manually from the WebApp, via webhook, or from a ticketing integration), DevOps Agent automatically posts updates to the configured Slack channel
- The channel receives key findings, root cause analyses, and mitigation plans as the investigation progresses
- Team members can follow along in real-time without needing console access
Starting Investigations for This Demo
- Investigations are started from the DevOps Agent WebApp (Incident Response tab), not directly from Slack
- Use the prompts from the Investigation Prompts section above, or choose a pre-configured starting point like "Latest alarm" or "Error rate spike"
- Once started, all findings stream into your Slack channel automatically
- You can also trigger investigations via webhooks from PagerDuty, Grafana, or custom alerting systems

Note: Slack serves as a notification and collaboration channel. The investigation itself is driven from the WebApp, ticketing integrations, or webhooks. Avoid uninstalling the Slack app during the public preview as reinstallation may not work.

Automatic Investigation via Webhook (Optional)

You can configure CloudWatch Alarms to automatically trigger DevOps Agent investigations when errors occur. The stack includes an optional webhook integration pipeline:

CloudWatch Alarm → SNS Topic → Forwarder Lambda → DevOps Agent Webhook

To enable this:

Generate a webhook in DevOps Agent
- In your Agent Space, go to Capabilities → Webhook → Configure
- Click Generate webhook to create an HMAC key pair
- Save the webhook URL and secret securely (you won't see the secret again)

Deploy with webhook parameters

export DEVOPS_AGENT_WEBHOOK_URL="https://event-ai.us-east-1.api.aws/webhook/generic/YOUR_ID"
export DEVOPS_AGENT_WEBHOOK_SECRET="your-hmac-secret"
./deploy.sh

Trigger the pipeline
- Run the load generator to produce errors
- CloudWatch Alarms fire when thresholds are breached
- The forwarder Lambda sends HMAC-signed webhook requests to DevOps Agent
- An investigation starts automatically, with findings posted to Slack

The forwarder only triggers investigations on ALARM state transitions (not OK recoveries), and each alarm produces a unique incident ID to avoid deduplication.

Billing & Cost Management MCP Server (Optional)

Connecting MCP servers to DevOps Agent gives it additional context and tools beyond what's available through native AWS integrations. In this case, the Billing & Cost Management MCP server provides access to pricing data, cost anomaly detection, and optimization recommendations — enabling the agent to make cost-aware recommendations during incident investigations (e.g., estimating the cost impact of increasing Lambda memory or switching DynamoDB to on-demand mode).

Deploy the MCP Server

The MCP server runs as a Lambda function behind API Gateway, wrapping the awslabs.billing-cost-management-mcp-server package using the MCP Streamable HTTP transport.

cd mcp-server-hosting
bash deploy.sh

The script will output the endpoint URL and API key. No CDK or Docker required — just AWS CLI, Python 3.10+, and pip.

Connect MCP to DevOps Agent

In your Agent Space, go to Capabilities → MCP Servers → Add MCP Server
Enter the endpoint URL from the deploy output (e.g., https://<api-id>.execute-api.us-east-1.amazonaws.com/prod/mcp)
Select API Key as the authorization flow
Configure:
- API Key Name: bcm-mcp-key
- API Key Header: x-api-key
- API Key Value: the key from the deploy output
Leave "Dynamic Client Registration" and "Private connection" unchecked
Choose AWS owned key for encryption

Upload the Billing Skill

The skill instructs DevOps Agent when and how to use the billing MCP tools during investigations.

In your Agent Space, go to Skills → Add Skill → Upload Skill
Upload devops-agent-skill-billing-mcp.zip
Select Generic agent type (applies to all investigation types)

Once connected, the agent will automatically use billing tools to check for cost anomalies correlated with incidents, estimate the cost of proposed mitigations, and include a cost impact summary in its findings.

Architecture Details

Application: unicorn_rentals

API Gateway: ${Environment}-unicorn-rentals-api - Customer-facing REST API
Lambda: ${Environment}-unicorn-rental-processor - Business logic with intentional constraints
DynamoDB: ${Environment}-unicorn-rentals - Rental data storage with throttling limits
CloudWatch: Comprehensive monitoring and alerting

Intentional Constraints:

Lambda: 128MB memory limit, 30-second timeout
DynamoDB: 2 RCU/WCU provisioned capacity
Error injection: 30% failure rate with realistic scenarios

Cleanup

Remove all resources when finished:

aws cloudformation delete-stack --stack-name unicorn-rentals
aws cloudformation delete-stack --stack-name bcm-mcp-server

Additional Resources

Files

cloudformation-template.yaml - Complete infrastructure definition
deploy.sh - Automated deployment with error handling
continuous-load-generator.py - Realistic traffic simulation
reset-alarms.sh - Reset CloudWatch alarms to OK state
mcp-server-hosting/ - Billing & Cost Management MCP server (Lambda + API Gateway)
devops-agent-skill/ - DevOps Agent skill for cost-aware investigations
devops-agent-skill-billing-mcp.zip - Ready-to-upload skill package

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AWS DevOps Agent Demo

Application Overview

Demo Scenarios

🧠 Scenario 1: Memory Exhaustion

🗄️ Scenario 2: Database Throttling

🔗 Scenario 3: Cascade Failures

Getting Started

Prerequisites

1. Deploy the Infrastructure

2. Load Environment Variables

3. Generate Realistic Load

DevOps Agent Analysis

Setup DevOps Agent Space

Investigation Prompts

Memory Issues Analysis

Latency Investigation

Root Cause Analysis

Slack Integration

Automatic Investigation via Webhook (Optional)

Billing & Cost Management MCP Server (Optional)

Deploy the MCP Server

Connect MCP to DevOps Agent

Upload the Billing Skill

Architecture Details

Cleanup

Additional Resources

Files

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
devops-agent-skill		devops-agent-skill
mcp-server-hosting		mcp-server-hosting
.gitignore		.gitignore
README.md		README.md
cloudformation-template.yaml		cloudformation-template.yaml
continuous-load-generator.py		continuous-load-generator.py
deploy.sh		deploy.sh
reset-alarms.sh		reset-alarms.sh

Folders and files

Latest commit

History

Repository files navigation

AWS DevOps Agent Demo

Application Overview

Demo Scenarios

🧠 Scenario 1: Memory Exhaustion

🗄️ Scenario 2: Database Throttling

🔗 Scenario 3: Cascade Failures

Getting Started

Prerequisites

1. Deploy the Infrastructure

2. Load Environment Variables

3. Generate Realistic Load

DevOps Agent Analysis

Setup DevOps Agent Space

Investigation Prompts

Memory Issues Analysis

Latency Investigation

Root Cause Analysis

Slack Integration

Automatic Investigation via Webhook (Optional)

Billing & Cost Management MCP Server (Optional)

Deploy the MCP Server

Connect MCP to DevOps Agent

Upload the Billing Skill

Architecture Details

Cleanup

Additional Resources

Files

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages