AIR Blackbox

The flight recorder for autonomous AI agents. Record, replay, enforce, audit.

One proxy swap. Complete coverage. Runs locally.

# Before
client = OpenAI(base_url="https://api.openai.com/v1")

# After - everything else in your code stays identical
client = OpenAI(
    base_url="http://localhost:8080/v1",
    default_headers={"X-Gateway-Key": "your-key"}
)

Every LLM call now generates a signed, tamper-evident, replayable audit record. No SDK changes. No refactoring. Measured overhead: ~3 ms per call, under 0.4% of a typical LLM request (benchmarks).

What You Get

Audit chain: every call produces an HMAC-SHA256 chained .air.json record, written asynchronously. Tamper with one record and every record after it breaks.

Quantum-safe signing: the chain is signed with ML-DSA-65 (FIPS 204 / Dilithium3). Keys are generated locally and never leave your machine. Post-quantum secure today.

Evidence bundle: one command packages the audit chain, scan results, and ML-DSA-65 signatures into a self-verifying .air-evidence ZIP. An auditor runs python verify.py and gets PASS/FAIL in two seconds. No pip install needed on their end.

PII and injection scanning: 20 weighted patterns across 5 attack categories detected before the prompt reaches the model. Configurable sensitivity. Auto-blocking.

EU AI Act gap analysis: 51+ checks across Articles 9, 10, 11, 12, 13, 14, 15. Maps to ISO 42001, NIST AI RMF, and Colorado SB 24-205. One scan, four frameworks, one report.

Replay: load any past episode from the audit chain, verify the HMAC signature, and replay every step with timestamps. Incident reconstruction without guesswork.

Framework trust layers: drop-in wrappers for LangChain, CrewAI, OpenAI Agents SDK, Anthropic, AutoGen, Google ADK, and Haystack. Same audit chain, native integration.

Quickstart

pip install air-blackbox

# Run your first gap analysis - works on any Python AI project
air-blackbox comply --scan . -v

# Inventory every model and provider observed in live gateway traffic
air-blackbox discover

# Replay any recorded episode
air-blackbox replay

# Generate a signed evidence package for audit or regulator review
air-blackbox export

Full stack (Gateway + Episode Store + Policy Engine + observability):

git clone https://github.com/airblackbox/air-platform.git
cd air-platform
cp .env.example .env      # add OPENAI_API_KEY
make up                   # running in ~8 seconds

Traces: localhost:16686 (Jaeger)
Metrics: localhost:9091 (Prometheus)
Episodes: localhost:8081 (Episode Store API)

Performance

Measured, reproducible, and published in BENCHMARKS.md:

~0.3 ms median overhead per call at low concurrency, ~3 ms under heavy load
Under 0.4% of a realistic 800 ms LLM call at the median, under 1% at p99
~7,200 requests/sec single-node ceiling with full recording enabled
100/100 requests succeeded with the vault and OTel collector down - recording is best-effort, proxying is guaranteed

Reproduce it yourself: bash bench/run-bench.sh && bash bench/failure-injection.sh

Deploy on Kubernetes

helm install air deploy/helm/air-gateway \
  --set providerURL=https://api.openai.com \
  --set vault.existingSecret=air-vault-creds

Ships with 2 replicas, pod anti-affinity, health probes, and optional HPA. See deploy/HA.md for the high-availability story, including how per-replica audit chains stay independently verifiable.

How It Fits Your Stack

Your Agent
    |
    v
AIR Gateway          <- swap base_url here
    |
    |-- PII + injection scan      (before prompt reaches model)
    |-- HMAC audit record         (async, zero latency impact)
    |-- ML-DSA-65 signing         (keys never leave your machine)
    |
    v
LLM Provider         <- OpenAI / Anthropic / Azure / local
    |
    v
AIR Record           <- tamper-evident .air.json
    |
    v
Evidence Bundle      <- self-verifying .air-evidence ZIP

Works with any OpenAI-compatible API. Same format, same integration, regardless of provider.

Why Not Just Log Everything?

You probably already have logging. The problems logging doesn't solve:

Tamper-evidence: anyone with write access to your log store can alter a record. HMAC chains make alteration detectable. ML-DSA-65 signatures prove who signed and when.

Prompt reconstruction: most logging captures responses but not the full prompt context, tool calls, and intermediate reasoning. AIR records the complete episode.

Compliance structure: EU AI Act Article 12 requires tamper-evident logs with specific retention and audit access guarantees. Raw logs don't satisfy that. Evidence bundles do.

Secrets leaking into traces: every team that builds their own logging eventually discovers credentials in their observability backend. AIR strips and vault-encrypts API keys before writing any record.

Gate - Pre-Execution Policy Enforcement

Gate is a bilateral receipt system that governs what your AI agents are allowed to do and proves what they actually did. Every action goes through a covenant (policy), produces an Ed25519-signed receipt, and chains into a tamper-evident audit trail.

Covenants - Declare What's Allowed

Write your agent's policy in YAML before it runs:

# covenant.yaml
agent: loan-processor
version: "1.0"
rules:
  - permit: read_credit_score
  - permit: approve_loan
    when: "amount <= 50000"
  - require_approval: approve_loan
    when: "amount > 50000"
  - forbid: delete_records
  - forbid: modify_credit_score

Precedence: forbid > require_approval > permit > default deny.

Bilateral Receipts - Two-Phase Proof

Every action produces a receipt with two cryptographic phases:

from air_blackbox.gate import Gate, Covenant

covenant = Covenant.from_yaml("covenant.yaml")
gate = Gate(covenant=covenant)

# Phase 1: Authorization - checks covenant, signs decision
receipt = gate.authorize(
    agent_id="loan-processor",
    action_name="approve_loan",
    payload={"applicant": "jane@example.com", "amount": 75000},
    context={"amount": 75000},
)

if receipt.authorized:
    result = process_loan(...)

    # Phase 2: Seal - binds result to the authorization
    gate.seal(receipt, result=result, status="success")

# Third party can verify without the signing key
print(gate.verify(receipt))
# {'authorization_valid': True, 'seal_valid': True, 'overall': True}

What the receipt proves:

The covenant hash locks which rules were active at decision time
Ed25519 signatures (HMAC-SHA256 fallback) provide non-repudiation
The seal covers the authorization signature, binding the full lifecycle
Payloads are SHA-256 hashed - raw data never stored in the receipt

Delegation Chains - Multi-Agent Traceability

When one agent delegates to another, the receipts chain together:

# Parent agent authorizes its action
parent_receipt = gate.authorize("orchestrator", "delegate_task", ...)

# Child agent links back to the parent
child_receipt = gate.authorize(
    "notifier-agent", "send_confirmation",
    parent_receipt=parent_receipt,
)

# Walk the full chain
chain = gate.walk_delegation_chain(child_receipt)
# [orchestrator receipt, notifier receipt]

Install

pip install air-blackbox[gate]   # includes Ed25519 via cryptography

Ecosystem

air-blackbox is the scanner. air-trust is the cryptographic proof layer. gate is pre-execution policy enforcement.

Your AI Agent
       |
       |-- air-blackbox scan     ->  finds compliance gaps
       |-- air-blackbox gate     ->  pre-execution policy + bilateral receipts
       |-- air-trust             ->  proves what happened (HMAC + Ed25519)
       +-- air-blackbox-mcp      ->  all of the above inside Claude Desktop / Cursor

Package	What It Does
air-trust	Tamper-evident audit chain + Ed25519 signed handoffs
air-gate	Human-in-the-loop tool gating (Article 14)
air-blackbox-mcp	MCP server for Claude Desktop, Cursor, Claude Code
air-platform	Docker Compose - full stack in one command
compliance-action	GitHub Action - checks on every pull request

Validated By

Julian Risch (deepset) - public validation on LinkedIn and GitHub issue #10810
Piero Molino (Ludwig maintainer) - merged EU AI Act compliance changes driven by AIR scan results
arXiv AEGIS (March 2026) - independent researchers published the identical interception-layer architecture for AI agent governance
McKinsey State of AI Trust 2026: trust infrastructure named as the critical agentic AI category

Contributing

See CONTRIBUTING.md.

False positive on a compliance check? Correct it - your correction flows into training data for the fine-tuned scanner model. The scanner gets smarter with every fix your team submits.

Good first issues: labeled good first issue - mostly new compliance checks and framework integrations.

License

Apache-2.0 - airblackbox.ai

This is not a certified compliance test. It is a starting point to identify potential gaps.

If this helps you prepare for EU AI Act enforcement, star the repo - it helps other teams find it.

Name		Name	Last commit message	Last commit date
Latest commit History 146 Commits
.github		.github
aibom		aibom
air-blackbox-telemetry		air-blackbox-telemetry
bench		bench
cmd		cmd
collector		collector
compliance-action		compliance-action
compliance		compliance
dashboard		dashboard
deploy		deploy
docs		docs
episodes		episodes
eval		eval
examples		examples
mcp		mcp
model		model
otel		otel
packages/air-openai-trust		packages/air-openai-trust
pkg		pkg
plugin		plugin
policy		policy
replay-action		replay-action
repo-readmes		repo-readmes
sandbox		sandbox
scripts		scripts
sdk		sdk
skills/air-blackbox-sales-agent		skills/air-blackbox-sales-agent
testdata		testdata
tests		tests
training		training
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-hooks.yaml		.pre-commit-hooks.yaml
AIR_Gate_DevTo_Article.md		AIR_Gate_DevTo_Article.md
AIR_Gate_HN_Post.md		AIR_Gate_HN_Post.md
BADGE_REFERENCE.md		BADGE_REFERENCE.md
BENCHMARKS.md		BENCHMARKS.md
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
DATA_GOVERNANCE.md		DATA_GOVERNANCE.md
DEVTO_GATE_ARTICLE.md		DEVTO_GATE_ARTICLE.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
README_TEMPLATE.md		README_TEMPLATE.md
RISK_ASSESSMENT.md		RISK_ASSESSMENT.md
SECURITY.md		SECURITY.md
air-gate-README.md		air-gate-README.md
air-platform-README.md		air-platform-README.md
air_gate_demo.tape		air_gate_demo.tape
architecture-preview.html		architecture-preview.html
code_scanner.py		code_scanner.py
compliance-action-README.md		compliance-action-README.md
demo.html		demo.html
deploy-enterprise.sh		deploy-enterprise.sh
docker-compose.enterprise.yaml		docker-compose.enterprise.yaml
docker-compose.ollama.yaml		docker-compose.ollama.yaml
docker-compose.yaml		docker-compose.yaml
engine.py		engine.py
evidence-bundle-2026-04-13T22-33-35.air-evidence		evidence-bundle-2026-04-13T22-33-35.air-evidence
evidence-bundle-2026-04-14T04-46-35.air-evidence		evidence-bundle-2026-04-14T04-46-35.air-evidence
evidence-bundle-2026-04-14T04-59-48.air-evidence		evidence-bundle-2026-04-14T04-59-48.air-evidence
evidence-bundle-2026-04-14T05-35-10.air-evidence		evidence-bundle-2026-04-14T05-35-10.air-evidence
evidence-bundle-2026-04-15T18-34-48.air-evidence		evidence-bundle-2026-04-15T18-34-48.air-evidence
gateway-README.md		gateway-README.md
github_replies.md		github_replies.md
go.mod		go.mod
go.sum		go.sum
guardrails.enterprise.yaml		guardrails.enterprise.yaml
guardrails.yaml		guardrails.yaml
guardrails.yaml.example		guardrails.yaml.example
mcp-registry-server.json		mcp-registry-server.json
outlines-compliance-report.md		outlines-compliance-report.md
outreach-browser-use.md		outreach-browser-use.md
outreach-litellm.md		outreach-litellm.md
ph-gallery-1-hero.html		ph-gallery-1-hero.html
ph-gallery-1-hero.png		ph-gallery-1-hero.png
ph-gallery-2-architecture.html		ph-gallery-2-architecture.html
ph-gallery-2-architecture.png		ph-gallery-2-architecture.png
ph-gallery-3-terminal.html		ph-gallery-3-terminal.html
ph-gallery-3-terminal.png		ph-gallery-3-terminal.png
ph-gallery-4-comparison.html		ph-gallery-4-comparison.html
ph-gallery-4-comparison.png		ph-gallery-4-comparison.png
ph-gallery-5-stats.html		ph-gallery-5-stats.html
ph-gallery-5-stats.png		ph-gallery-5-stats.png
ph-launch-kit.md		ph-launch-kit.md
pico-reply.md		pico-reply.md
publish.sh		publish.sh
pyproject.toml		pyproject.toml
render_cover.py		render_cover.py
reply-botbotfromuk-v2.md		reply-botbotfromuk-v2.md
reply-botbotfromuk.md		reply-botbotfromuk.md
sample_agent.py		sample_agent.py
setup.cfg		setup.cfg
show-hn-post.md		show-hn-post.md
signed-handoff-demo.html		signed-handoff-demo.html
test_gate.py		test_gate.py
tuesday-launch-checklist.md		tuesday-launch-checklist.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AIR Blackbox

What You Get

Quickstart

Performance

Deploy on Kubernetes

How It Fits Your Stack

Why Not Just Log Everything?

Gate - Pre-Execution Policy Enforcement

Covenants - Declare What's Allowed

Bilateral Receipts - Two-Phase Proof

Delegation Chains - Multi-Agent Traceability

Install

Ecosystem

Validated By

Contributing

License

About

Uh oh!

Releases 5

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AIR Blackbox

What You Get

Quickstart

Performance

Deploy on Kubernetes

How It Fits Your Stack

Why Not Just Log Everything?

Gate - Pre-Execution Policy Enforcement

Covenants - Declare What's Allowed

Bilateral Receipts - Two-Phase Proof

Delegation Chains - Multi-Agent Traceability

Install

Ecosystem

Validated By

Contributing

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages