Skip to content

suvomoybiswas/ai-global-dns-platform

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🏗️ AWS NATIVE DNS + AI OPERATIONS ARCHITECTURE

Multi-Region Hybrid DNS with Intelligent Threat Detection & Auto-Remediation


EXECUTIVE SUMMARY

Enterprise-grade architecture combining AWS Route 53 advanced features, ECS Fargate containerization, AI-powered threat detection via Amazon Bedrock, and automated remediation through Lambda. Designed for high availability, security, and intelligent operations.

Key Features:

  • ✅ Multi-region active-passive failover (Route 53)
  • ✅ Hybrid DNS (on-premises + AWS integration)
  • ✅ DNS-level threat protection (DNS Firewall)
  • ✅ Real-time log aggregation (CloudWatch)
  • ✅ Long-term data lake (S3 + lifecycle management)
  • ✅ AI-powered threat detection (Amazon Bedrock)
  • ✅ Automated security response (Lambda remediation)
  • ✅ Complete audit trail (DynamoDB + CloudWatch)

ARCHITECTURE DIAGRAM (ASCII)

┌──────────────────────────────────────────────────────────────────────────────┐
│                              INTERNET USERS                                  │
│                         (Global Customer Traffic)                            │
└────────────────────────────────────┬─────────────────────────────────────────┘
                                     │
                    ╔════════════════╩════════════════╗
                    ║                                 ║
            ┌───────▼────────┐          ┌────────────▼──────────┐
            │  Route 53      │          │   Route 53            │
            │  Global DNS    │          │   Global DNS          │
            │  (Anycast)     │          │   (Anycast)           │
            └────────┬───────┘          └────────┬───────────────┘
                     │                           │
        ┌────────────┴────────────┐  ┌──────────┴────────────┐
        │                         │  │                       │
    ┌───▼──────────────┐  ┌────────▼──────────────────────┐
    │  US-EAST-1       │  │  US-WEST-2                   │
    │  (PRIMARY)       │  │  (DR/STANDBY)                │
    └───┬──────────────┘  └────┬──────────────────────────┘
        │                      │
    ┌───┴──────────────────────┴──────────────────────────┐
    │     HYBRID DNS LAYER (Route 53 Advanced)            │
    ├────────────────────────────────────────────────────┤
    │ ✓ Route 53 Profiles (Geolocation routing)          │
    │ ✓ Route 53 Resolver (On-prem hybrid DNS)           │
    │ ✓ Private Hosted Zones (Internal DNS)              │
    │ ✓ DNS Firewall (Threat blocking)                   │
    │ ✓ Query Logging (CloudWatch integration)           │
    └───┬──────────────────────────────────────────────────┘
        │
        ├──────────────────────┬──────────────────────┐
        │                      │                      │
    ┌───▼────────┐        ┌────▼────────┐        ┌───▼─────────┐
    │  ALB       │        │  ALB        │        │ Route 53    │
    │ (Public)   │        │  (Public)   │        │ Resolver    │
    └───┬────────┘        └────┬────────┘        └─────────────┘
        │                      │
    ┌───┴──────────────────────┴────────────────────────┐
    │    ECS FARGATE CLUSTER (Container Workloads)      │
    ├────────────────────────────────────────────────────┤
    │ • API Service (Node.js)                            │
    │ • Web Service (Nginx)                              │
    │ • Monitoring Agent (Log forwarding)                │
    │ • CloudWatch Agent (Metrics collection)            │
    └───┬──────────────────────────────────────────────────┘
        │
        │ (Logs → CloudWatch)
        │
    ┌───▼────────────────────────────────────────────────┐
    │  CloudWatch Logs (Real-time, 30-day retention)    │
    ├────────────────────────────────────────────────────┤
    │ • Route 53 Query Logs                              │
    │ • DNS Firewall Blocks                              │
    │ • ECS Container Logs                               │
    │ • Application Logs (JSON structured)               │
    └───┬──────────────────────────────────────────────────┘
        │
        │ (Daily export)
        │
    ┌───▼────────────────────────────────────────────────┐
    │  S3 Data Lake (Long-term archival)                │
    ├────────────────────────────────────────────────────┤
    │ • Route 53 query logs (partitioned by date)        │
    │ • DNS Firewall logs (partitioned by date)          │
    │ • Application logs (compressed, archived)          │
    │ • Lifecycle: Standard → Glacier → Deep Archive    │
    └───┬──────────────────────────────────────────────────┘
        │
        │ (Orchestrated analysis)
        │
    ┌───▼────────────────────────────────────────────────┐
    │  AWS Lambda: threat-detection-orchestrator        │
    ├────────────────────────────────────────────────────┤
    │ • Extract data from S3                             │
    │ • Prepare threat analysis prompt                   │
    │ • Invoke Bedrock AI models                         │
    │ • Process threat detections                        │
    │ • Determine remediation actions                    │
    └───┬──────────────────────────────────────────────────┘
        │
        │ (Send threat analysis requests)
        │
    ┌───▼────────────────────────────────────────────────┐
    │  Amazon Bedrock: Claude 3 Sonnet                   │
    ├────────────────────────────────────────────────────┤
    │ AI-Powered Threat Detection:                       │
    │ • DGA (Domain Generation Algorithm) detection      │
    │ • Ransomware C&C identification                    │
    │ • DNS exfiltration pattern recognition             │
    │ • Anomaly detection in traffic patterns            │
    │ • Predictive threat assessment                     │
    │ • Root cause analysis                              │
    │ • Remediation recommendations                      │
    └───┬──────────────────────────────────────────────────┘
        │
        │ (Threat detection results)
        │
        ├──────────────┬──────────────┬──────────────┐
        │              │              │              │
    ┌───▼──────┐  ┌────▼────┐  ┌─────▼───┐  ┌───────▼──────┐
    │DynamoDB  │  │  SNS    │  │Dashboard│  │  S3 Reports  │
    │Tracking  │  │ Topics  │  │ Update  │  │  (Archive)   │
    └──────────┘  └────┬────┘  └─────────┘  └──────────────┘
                       │
                       │ (Alert security team)
                       │
              ┌────────▼──────────┐
              │  Email / Slack    │
              │  Security Team    │
              └────────┬──────────┘
                       │
                       │ (High severity threats)
                       │
            ┌──────────▼────────────┐
            │  SNS → Lambda Trigger │
            └──────────┬────────────┘
                       │
    ┌──────────────────▼──────────────────┐
    │ AWS Lambda: Auto-Remediation        │
    ├─────────────────────────────────────┤
    │ 1. Update DNS Firewall rules        │
    │ 2. Modify Route 53 traffic policies │
    │ 3. Update security group rules      │
    │ 4. Create CloudWatch alarms         │
    │ 5. Log all actions (audit trail)    │
    │ 6. Notify security team             │
    │ 7. Create JIRA incident ticket      │
    └──────────────────┬──────────────────┘
                       │
        ┌──────────────┼──────────────┐
        │              │              │
    ┌───▼─────┐  ┌─────▼────┐  ┌────▼──────┐
    │DNS Rule │  │Route53   │  │Security   │
    │Updated  │  │Policy    │  │Group      │
    │         │  │Adjusted  │  │Updated    │
    └─────────┘  └──────────┘  └───────────┘
                       │
                       │ (Changes propagate)
                       │
                    THREAT MITIGATED ✅

COMPONENT DETAILS

1. ROUTE 53 - DNS LAYER

Public Hosted Zone (example.com)

  • Handles global DNS queries from internet users
  • Uses anycast routing (queries routed to nearest edge)
  • TTL: 60 seconds (fast failover)
  • Health checks on ALBs (automatic failover)
  • Weighted routing: 70% us-east-1, 30% us-west-2

Private Hosted Zones (internal.example.com)

  • Only resolvable from within VPCs
  • Associated with both regions
  • Records: api.internal, db.internal, cache.internal, etc.
  • TTL: 300 seconds (less frequent changes)

Route 53 Resolver (Hybrid DNS)

  • Inbound endpoint: Accepts DNS queries from on-premises
  • Outbound endpoint: Forwards VPC queries to on-premises DNS
  • Enables seamless hybrid cloud environment

DNS Firewall

  • Rule Groups: Malware domains, Ransomware, Cryptomining, Suspicious patterns
  • Actions: BLOCK (drop query), ALERT (log + block), ALLOW
  • Logging: All blocks logged to CloudWatch → S3
  • Updates: Lambda can add/remove rules dynamically

Route 53 Query Logging

  • Logs every DNS query (source IP, domain, response code)
  • Destination: CloudWatch Logs
  • Volume: Can handle 1M+ queries per second
  • Retention: 30 days in CloudWatch, then archived to S3

2. ECS FARGATE - COMPUTE LAYER

Cluster Configuration

  • Two clusters (one per region)
  • Fargate launch type (serverless containers)
  • VPC integration with private subnets
  • Auto Scaling Groups (min 2-3, max 10-20 tasks per service)

Services

  1. API Service

    • Container: Node.js 18 app
    • Desired count: 3-10 (auto-scales)
    • Health check: /health endpoint (30s interval)
    • Logs: CloudWatch (/ecs/api-service)
  2. Web Service

    • Container: Nginx reverse proxy
    • Desired count: 2-5 (auto-scales)
    • Health check: HTTP GET / (200 OK expected)
    • Logs: CloudWatch (/ecs/web-service)
  3. Monitoring Agent

    • Lightweight container
    • Collects and forwards logs
    • Runs 1 per cluster (no scaling)

3. CloudWatch - MONITORING & LOGGING

Log Groups

  • /aws/route53/queries - All DNS queries
  • /aws/route53/firewall - DNS Firewall blocks
  • /ecs/api-service - API application logs
  • /ecs/web-service - Web server logs
  • /lambda/threat-detection - Threat detection execution
  • /lambda/auto-remediation - Remediation actions

Dashboards

  • Real-time traffic metrics
  • DNS query rates and error rates
  • Firewall block statistics
  • Application performance (latency, errors)
  • Security alerts summary

Alarms

  • High error rate (>1% of requests)
  • High latency (p99 > 500ms)
  • DNS firewall blocks (>100 per minute)
  • Lambda errors (any errors = alert)

4. S3 DATA LAKE - LONG-TERM STORAGE

Bucket Structure

s3://dns-operations-datalake/
├── logs/
│   ├── route53/queries/year=2026/month=05/day=31/...
│   ├── route53/firewall/year=2026/month=05/day=31/...
│   ├── ecs/api/year=2026/month=05/day=31/...
│   └── ecs/web/year=2026/month=05/day=31/...
├── models/
│   ├── threat-patterns/
│   └── ml-artifacts/
├── reports/
│   ├── threat-analysis/2026-05-31-daily-report.md
│   └── incidents/INC-001-2026-05-31.json
└── archive/

Lifecycle Policies

  • Standard (0-90 days): Immediate retrieval needed
  • Glacier (90 days - 1 year): Infrequent access
  • Deep Archive (1+ year): Compliance/legal hold

Export to S3

  • Daily scheduled Lambda exports CloudWatch logs to S3
  • Partitioned by date/hour for queryability
  • Compressed (.gz) to save space

5. AMAZON BEDROCK - AI THREAT DETECTION

Model: Claude 3 Sonnet

  • Optimized for security analysis
  • Context: 200K tokens (can analyze massive log files)
  • Temperature: 0.3 (deterministic, consistent results)

Input to Bedrock

  • DNS query summary (top domains, volume, patterns)
  • DNS firewall blocks (what was blocked)
  • Application metrics (errors, latency)
  • Traffic anomalies detected
  • Historical patterns

Threat Detection Capabilities

  • DGA detection (Domain Generation Algorithm)
  • Ransomware C&C identification
  • DNS exfiltration (data smuggling)
  • Crypto-mining infrastructure
  • APT activity indicators
  • Anomalies in traffic patterns

Output from Bedrock

  • Threat severity (Critical/High/Medium/Low)
  • Confidence score (0-100%)
  • Root cause analysis
  • Recommended remediation actions
  • False positive risk assessment

6. LAMBDA - AUTOMATION

Function 1: threat-detection-orchestrator

  • Trigger: CloudWatch Events (hourly)
  • Timeout: 15 minutes
  • Memory: 3008 MB (for processing large datasets)

Execution Steps

  1. Query S3 data lake (last hour of logs)
  2. Decompress and parse JSON logs
  3. Aggregate metrics (query counts, patterns, etc.)
  4. Format prompt for Bedrock
  5. Invoke Claude 3 Sonnet
  6. Parse response (detect threats)
  7. Store in DynamoDB (tracking)
  8. Publish to SNS (if threats detected)

Function 2: auto-remediation

  • Trigger: SNS topic (threat-alerts)
  • Timeout: 60 seconds
  • Memory: 1024 MB

Remediation Actions

  1. Update DNS Firewall (add block rules)
  2. Modify Route 53 traffic policies
  3. Update ALB security groups
  4. Create CloudWatch alarms
  5. Log to DynamoDB (audit trail)
  6. Send SNS notifications (team alerts)
  7. Create JIRA ticket (incident tracking)

7. DynamoDB - INCIDENT TRACKING

Table: security-incidents

  • Partition Key: incident_id
  • Sort Key: timestamp
  • TTL: 365 days (auto-delete old incidents)

Attributes

  • incident_id (unique)
  • threat_id (link to threat detection)
  • severity (CRITICAL, HIGH, MEDIUM, LOW)
  • threat_type (DGA, ransomware, etc.)
  • affected_domains (list)
  • affected_ips (list)
  • remediation_actions (list)
  • status (pending, in-progress, resolved)
  • assigned_to (on-call engineer)
  • created_timestamp
  • resolved_timestamp
  • jira_ticket_url
  • bedrock_analysis (JSON)

SECURITY & BEST PRACTICES

Defense in Depth

  1. DNS Firewall (DNS layer threat protection)
  2. ALB + WAF (HTTP layer protection)
  3. Security Groups (network layer)
  4. IAM policies (least privilege access)

Automation & Speed

  • Threats detected and blocked in seconds
  • No manual intervention for obvious threats
  • Escalation to team for investigation

Audit Trail

  • Every DNS query logged
  • Every remediation action logged
  • Immutable logs in S3 (compliance ready)
  • CloudTrail for AWS API calls

Redundancy

  • Multi-region failover (Route 53)
  • Multi-AZ deployment (within regions)
  • Auto Scaling (handle traffic spikes)
  • DynamoDB on-demand (no capacity management)

DEPLOYMENT ESTIMATE

Component Time Difficulty
Route 53 setup 2 hours Medium
VPC & Networking 4 hours Medium
ECS Fargate 6 hours High
CloudWatch setup 3 hours Medium
S3 Data Lake 2 hours Low
Lambda functions 8 hours High
Bedrock integration 4 hours Medium
Testing & validation 8 hours High
TOTAL ~37 hours Medium

COST ESTIMATION (Monthly)

Service Usage Cost
Route 53 1M queries/month $50
ECS Fargate 4 vCPU, 8GB RAM $800
ALB 2 regional $300
CloudWatch 10GB logs $150
S3 500GB standard $20
Lambda 100 invocations/day $50
Bedrock 1000 invocations/month $500
DynamoDB On-demand, light $50
Data Transfer 100GB/month $100
TOTAL ~$2,020/month

NEXT STEPS

  1. Create Route 53 zones
  2. Deploy ECS clusters
  3. Set up CloudWatch logging
  4. Create S3 data lake
  5. Deploy Lambda functions
  6. Configure Bedrock integration
  7. Test threat detection pipeline
  8. Implement auto-remediation
  9. Set up alerting (email/Slack/PagerDuty)
  10. Document runbooks for manual intervention

This architecture provides enterprise-grade DNS operations with intelligent threat detection and automated response.

About

This project is an enterprise-grade cloud DNS security and governance system built on AWS using Terraform. It demonstrates how modern infrastructure can combine networking, security, and AI automation to detect, analyze, and automatically respond to DNS-based threats in real time.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages