feat(ecs-fargate): EventBridge lifecycle forwarder + OTel collector improvements#154
feat(ecs-fargate): EventBridge lifecycle forwarder + OTel collector improvements#154prathamesh-sonpatki wants to merge 8 commits into
Conversation
Add session management (15m inactivity / 4h max, file persistence), automatic UIKit view tracking (swizzle), SwiftUI .trackView(name:) modifier, and user identification API — aligned with browser SDK and Datadog iOS patterns. New files: SessionManager, SessionStore, SessionSpanProcessor, ViewManager, UserInfo Linear: FDE-104 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
HangDetector: background thread pings main thread every 1s, captures Mach thread stack trace if main thread blocks >2s, emits hang start/end log events with duration. WatchdogTerminationDetector: persists app state to disk, detects abnormal termination (no clean shutdown + no crash handler) on next launch, emits FATAL log event linking to previous session. Linear: FDE-104 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rash handling App launch: cold/warm start time (process uptime → first viewDidAppear). Performance: per-view CPU/memory (mach_task_basic_info) + frame rate monitoring (CADisplayLink with slow >25ms and frozen >700ms frame detection). Interactions: tap tracking via UIApplication.sendEvent swizzle with target info. Network timing: DNS/TLS/TTFB breakdown from URLSessionTaskMetrics. Signal crashes: POSIX signal handlers (SIGSEGV, SIGABRT, SIGBUS, SIGFPE, SIGILL, SIGTRAP) with pre-allocated crash markers and handler chaining. Closes competitive gaps with Datadog and Sentry iOS SDKs (except session replay). Linear: FDE-104 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove the 12 internal SDK source files from the examples directory. The instrumentation now lives in the published last9-ios-swift-sdk package (v0.1.0). The example retains only AppDelegate.swift and ExampleUsage.swift. - Package.swift now depends on last9/last9-ios-swift-sdk instead of opentelemetry-swift directly - All Last9OTel.* references updated to Last9RUM.* - README updated with SPM installation instructions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Picks up macOS build fixes (OtlpConfiguration import, LoggerProviderSdk lifecycle, platform declaration). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove Last9OTel.swift (manual OTel setup — replaced by the SDK) - Update Package.swift to depend on last9/last9-ios-swift-sdk v0.1.2 - Add macOS(.v12) platform to satisfy SDK's platform requirements - Update README with DHH-style setup guide and correct version SDK handles sessions, views, crashes, ANR, user identification, and all network tracing automatically via one initialize call. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ctor improvements
- Add eventbridge-task-lifecycle/ with two CloudFormation templates:
- cloudformation-no-lambda.yaml: EventBridge API Destination → Last9 /json/v2
(no Lambda; input transformer wraps ECS detail in JSON array)
- cloudformation.yaml: EventBridge → Lambda → Last9 OTLP /v1/logs
(richer OTLP attributes for structured correlation)
- Fix otel-config-tcp.yaml: collection_interval 60s → 10s (captures task state
during 30s shutdown window), add memory_limiter, add traces pipeline,
add otlp receiver to logs pipeline, batch timeout 10s → 5s
- Fix task-definition-with-otel.json: pin collector to 0.149.0, add
stopTimeout: 30, sync embedded OTEL_CONFIG env var with same changes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| Type: AWS::Events::ApiDestination | ||
| Properties: | ||
| Name: !Sub 'last9-ecs-lifecycle-${AWS::StackName}' | ||
| Description: > |
There was a problem hiding this comment.
The YAML block scalar > here produces a trailing newline in the resolved string, which violates the AWS API Destination description regex constraint (Member must satisfy regular expression pattern: .*). CloudFormation fails with CREATE_FAILED on this resource.
Fix: use a single-line string instead:
Description: Posts ECS task stop events to Last9 HTTP log ingestion endpoint.| lastStatus: | ||
| - STOPPED | ||
| resources: !If | ||
| - FilterByCluster |
There was a problem hiding this comment.
The resources prefix filter here uses the cluster ARN format (arn:aws:ecs:REGION:ACCOUNT:cluster/CLUSTER-NAME), but the resources array in ECS Task State Change events contains task ARNs (arn:aws:ecs:REGION:ACCOUNT:task/CLUSTER-NAME/TASK-ID).
Since task/ ≠ cluster/, this prefix will never match and the rule silently drops all events when ECSClusterArn is provided.
Verified by deploying this template with a real ECS cluster — TriggeredRules stayed at 0 until the resources filter was removed.
Fix: filter on detail.clusterArn instead, which contains the actual cluster ARN:
EventPattern:
source:
- aws.ecs
detail-type:
- ECS Task State Change
detail:
lastStatus:
- STOPPED
clusterArn: !If
- FilterByCluster
- - !Ref ECSClusterArn
- !Ref "AWS::NoValue"| InputTransformer: | ||
| InputPathsMap: | ||
| detail: '$.detail' | ||
| InputTemplate: '[<detail>]' |
There was a problem hiding this comment.
The InputTransformer with template [<detail>] produces invalid JSON for Last9's /json/v2 endpoint. EventBridge serializes the <detail> variable as a JSON string (double-encoded), resulting in ["{\"taskArn\":...}"] instead of [{"taskArn":...}]. Every invocation fails silently.
Verified by deploying two rules targeting the same API Destination — one with InputTransformer and one without:
- With InputTransformer:
TriggeredRules: 7,FailedInvocations: 7 - Without InputTransformer:
TriggeredRules: 7,FailedInvocations: 0
Fix: remove the InputTransformer entirely. The raw EventBridge envelope (which includes source, detail-type, time, region, and detail) is a valid JSON object that /json/v2 accepts as-is. This actually provides richer data in Last9 since it includes event metadata alongside the ECS detail.
…cycle events Three bugs fixed (verified by deploying to real ECS cluster): - ApiDestination Description: YAML block scalar newline caused CREATE_FAILED - Cluster filter: resources prefix used cluster ARN but events have task ARNs; moved to detail.clusterArn with conditional EventPattern to avoid empty detail - InputTransformer: [<detail>] double-encoded JSON as string, causing 100% FailedInvocations; removed InputTransformer, raw EventBridge envelope works Expanded from STOPPED-only to full ECS lifecycle: - ECS Task State Change (PROVISIONING → RUNNING → STOPPED) - ECS Service Action (deployments, scaling, steady-state) - ECS Deployment State Change (IN_PROGRESS → COMPLETED/FAILED) Each event type has its own toggleable EventBridge rule. Lambda handler updated to v0.2.0: generic event handling with dynamic OTLP attributes, EVENT_NAMES mapping, and sample events for all 3 types. Added README.md, .env.example, .gitignore per repo conventions.
Summary
eventbridge-task-lifecycle/— captures killed/stopped ECS task events (stop reason, exit codes) and forwards to Last9 as searchable logs via EventBridge → no Lambda requirednodejs-express-large-logs/— OTel collector config improvements for better visibility of stopped tasksEventBridge lifecycle forwarder
When ECS kills a task (OOM, health check failure, scale-in), the OTel metrics receiver stops emitting immediately. The stop reason and exit codes are never captured. This forwarder bridges that gap.
How it works:
Two CloudFormation templates:
cloudformation-no-lambda.yaml← recommended: EventBridge API Destination posts directly to Last9/json/v2using an input transformer[<detail>]. Zero code.cloudformation.yaml: Lambda variant that wraps events in OTLP format (useful if structured OTLP attributes are needed for metric correlation)Correlation key:
taskArnin log body ↔aws_ecs_task_arnlabel in ECS metricsOTel config fixes (
nodejs-express-large-logs)collection_interval: 60s → 10smemory_limiter(256 MiB)tracespipelineotlpto logs pipelinebatch.timeout: 10s → 5s0.149.0stopTimeout: 30Validation
[<detail>]validated as valid JSON arraygo vet🤖 Generated with Claude Code