Skip to content

Mission-critical audit remediation: OTLP, TraceKit, and OpenTelemetry edge-case hardening#12

Closed
christopherkarani wants to merge 2 commits into
mainfrom
codex/mission-critical-audit-20260226
Closed

Mission-critical audit remediation: OTLP, TraceKit, and OpenTelemetry edge-case hardening#12
christopherkarani wants to merge 2 commits into
mainfrom
codex/mission-critical-audit-20260226

Conversation

@christopherkarani

Copy link
Copy Markdown
Owner

Summary

This PR remediates mission-critical correctness and safety issues identified in the Swift audit across Terra core runtime, TraceKit ingestion, and HTTP instrumentation.

Fixes implemented

  • Normalized OpenTelemetry configuration before install idempotency checks:
    • non-finite traceSamplingRatio (NaN, ±Inf) now canonicalizes to nil
    • finite values are clamped to [0, 1] for deterministic behavior
  • Hardened OTLP decompression path:
    • fails on zero-progress zlib decode loops (prevents truncated-payload hang)
    • requires terminal COMPRESSION_STATUS_END
  • Fixed trace file discovery mismatch:
    • TraceFileLocator now accepts numeric-prefix filenames via TraceFileNameParser (consistent with Trace parsing)
  • Removed trace file read TOCTOU window:
    • TraceFileReader now performs bounded streaming reads with in-flight max-size enforcement
  • Improved OTLP HTTP server state safety:
    • queue-confined access for start, stop, deinit, and port
  • Strengthened privacy key fallback:
    • replaced UUID-derived fallback anonymization key with cryptographic random bytes from SystemRandomNumberGenerator
  • Resolved API-default mismatch:
    • HTTPAIInstrumentation.install now uses defaultOpenClawGatewayHosts as the default argument

Tests added/updated

  • TraceKitTests
    • added coverage for composite numeric-prefix trace filenames
  • OTLPRequestDecoderTests
    • added truncated-deflate regression test (ensures fail-fast instead of hang)
  • TerraRedactionPolicyTests
    • added .drop redaction behavior test
  • TerraOpenTelemetryInstallConcurrencyTests
    • tightened same-config concurrent install assertion (== 2 successes)
    • added non-finite sampling ratio idempotency test

Validation

  • swift build
  • swift test
    • XCTest suite: 49 tests passed
    • Swift Testing suite: 87 tests passed

Notes

  • Existing third-party plugin/tooling deprecation warnings remain unchanged and out of scope for this remediation.

- Normalize OpenTelemetry install configuration before idempotency checks so non-finite sampling ratios do not break same-config installs.
- Sanitize and clamp sampling ratios consistently during install and sampler construction.
- Replace UUID-based anonymization fallback key generation with SystemRandomNumberGenerator bytes.
- Make HTTPAIInstrumentation default OpenClaw gateway hosts use the documented constant.
- Make TraceFileLocator accept numeric-prefix filenames via TraceFileNameParser for parser/locator consistency.
- Convert TraceFileReader to bounded streaming reads to enforce max file size without TOCTOU race.
- Harden OTLP zlib decompression to fail on zero-progress/incomplete payloads and require terminal stream status.
- Confine OTLPHTTPServer public state access (start/stop/deinit/port) to the internal serial queue.
- Add regression tests for:
  * composite trace filename discovery,
  * truncated deflate payload rejection,
  * drop redaction behavior,
  * strict same-config concurrent OpenTelemetry install,
  * non-finite sampling ratio idempotency.
- Validation performed: swift build, full swift test (XCTest + Swift Testing suites).
@christopherkarani

Copy link
Copy Markdown
Owner Author

Superseded by #15, which consolidates the still-valid changes from the current open PR set onto main and intentionally skips stale or conflicting churn.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant