Skip to content

Latest commit

 

History

History
219 lines (157 loc) · 10.4 KB

File metadata and controls

219 lines (157 loc) · 10.4 KB

Agent Context Compression Protocol (ACCP)

Vision & Concept Document

Version 1.0


1. Executive Summary

The Agent Context Compression Protocol (ACCP) is a proposed open, lightweight communication protocol designed to drastically reduce token consumption and API costs in multi-agent AI systems. While existing protocols like MCP (agent-to-tool), A2A (agent-to-agent), and ACP (agent messaging) solve interoperability and routing, none are optimized for token efficiency — the single largest cost driver in production agentic AI.

ACCP introduces a semantic encoding layer that replaces verbose natural language and bloated JSON payloads with compact, structured, machine-interpretable messages. It is designed to be harness-agnostic and to complement MCP and A2A rather than replace them.

Target outcome: 60–90% reduction in token consumption for inter-agent communication within agentic harnesses, while preserving full semantic fidelity.


2. Problem Statement

2.1 The Cost Explosion

Modern agentic harnesses (CrewAI, LangGraph, AutoGen, Semantic Kernel, custom orchestrators) suffer from compounding token costs:

  • Context re-transmission: Every reasoning step re-sends the full accumulated context (system prompt + tool definitions + conversation history + RAG docs + agent state).
  • Verbose inter-agent messaging: Agents communicate in natural language, consuming 5–50x more tokens than the semantic payload requires.
  • Redundant tool definitions: Tool schemas are re-injected into every call, even when the agent has already demonstrated competence with that tool.
  • State serialization bloat: Agent state is captured as pretty-printed JSON or raw text, wasting tokens on whitespace, redundant keys, and structural overhead.

A single agent performing 10 reasoning steps can easily consume 50K–100K tokens per task. A multi-agent system with 4–6 agents on a complex workflow can burn 500K–1M+ tokens per execution.

2.2 What Existing Protocols Don't Solve

Protocol Purpose Token-Aware? Compression?
MCP Agent ↔ Tool connectivity No No (JSON-RPC)
A2A Agent ↔ Agent coordination No No (JSON over HTTPS)
ACP Agent messaging (RESTful) No No (MIME multipart)
ANP Agent network discovery No No
AG-UI Agent ↔ User interface Partial No

None of these protocols treat token efficiency as a first-class design concern.

2.3 The Opportunity

If agents could communicate through a shared, compact semantic encoding — where a 500-token natural language message compresses to 30–50 tokens of structured intent — the economics of agentic AI shift dramatically:

  • 10x cost reduction on inter-agent messaging
  • Deeper reasoning chains within the same context budget
  • More agents per workflow without hitting context ceilings
  • Faster inference due to smaller payloads

3. Core Concept

3.1 What Is ACCP?

ACCP is a semantic encoding and context compression protocol for AI agent communication. It defines:

  1. A compact message format — structured, schema-driven, and optimized for minimal token footprint when injected into LLM context.
  2. A shared ontology of intents and operations — standardized codes that agents understand without verbose natural language descriptions.
  3. A context management strategy — rules for state accumulation, compression checkpoints, and memory hierarchy.
  4. A codec layer — encode/decode functions that sit between the harness and the LLM API, transparently compressing and decompressing communication.

3.2 The Layered Architecture

┌─────────────────────────────────────────────┐
│           Agentic Harness                   │
│   (CrewAI / LangGraph / AutoGen / Custom)   │
├─────────────────────────────────────────────┤
│              ACCP Codec Layer               │  ◄── THIS IS WHAT WE BUILD
│  ┌─────────┐ ┌──────────┐ ┌──────────────┐ │
│  │ Encoder │ │ State Mgr│ │ Intent Index │ │
│  └─────────┘ └──────────┘ └──────────────┘ │
├─────────────────────────────────────────────┤
│         Transport (MCP / A2A / HTTP)         │
├─────────────────────────────────────────────┤
│              LLM API Layer                   │
│        (OpenAI / Anthropic / etc.)           │
└─────────────────────────────────────────────┘

3.3 Design Principles

  1. Token-first design — Every format decision is benchmarked against token count (tiktoken/BPE).
  2. Semantic fidelity — Compression must be lossless in meaning, even if lossy in verbosity.
  3. LLM-readable — The compressed format must be natively parseable by modern LLMs without special fine-tuning. Human readability is secondary but desirable.
  4. Harness-agnostic — Works with any orchestration framework via a thin adapter.
  5. Protocol-complementary — Sits alongside MCP/A2A, not in competition.
  6. Progressive adoption — Teams can adopt incrementally: start with message compression, add state management later.

4. Key Innovation Areas

4.1 Compact Message Format (ACCP-M)

Replace verbose agent messages with structured semantic encodings:

Before (natural language — ~85 tokens):

The research agent has completed its analysis of the quarterly 
sales data. Key findings include: revenue is down 12% 
quarter-over-quarter, the enterprise segment is the primary 
driver of the decline, and customer churn increased by 3.2%. 
The research agent recommends escalating to the strategy agent 
for remediation planning.

After (ACCP-M — ~22 tokens):

@RSA>done:analyze{d:q_sales|f:[rev:-12%QoQ,seg:enterprise=primary_decline,
churn:+3.2%]|nx:@STA:plan_remediation}

4.2 Intent Ontology (ACCP-I)

A shared vocabulary of agent intents, operations, and status codes:

INTENTS:
  REQ   — Request action
  DONE  — Task completed
  FAIL  — Task failed
  WAIT  — Awaiting input
  ESC   — Escalate to another agent
  COMP  — Context compression checkpoint
  SYNC  — State synchronization
  QRY   — Query for information
  ACK   — Acknowledgment

4.3 State Compression Engine (ACCP-S)

Rules for when and how to compress accumulated context:

  • Checkpoint triggers: After N tool calls, at phase boundaries, before handoffs.
  • Compression tiers: Full → Summary → Key-facts → Hashes.
  • Shared state references: Agents reference a shared state store by key rather than re-transmitting full context.
  • Delta encoding: Only transmit what changed since last checkpoint.

4.4 Schema Registry (ACCP-R)

Pre-defined schemas for common agent operations, indexed by compact codes rather than re-described in each prompt.


5. Target Personas & Use Cases

5.1 Primary Users

  • AI/ML Engineers building multi-agent systems in production
  • Platform teams managing agentic infrastructure at scale
  • Indie developers running agents on personal budgets
  • Enterprise architects designing cost-efficient AI workflows

5.2 Key Use Cases

  1. Multi-agent research pipelines — 5+ agents collaborating on analysis tasks
  2. Coding agent harnesses — Agents with deep tool integration and long sessions
  3. Customer service agent networks — High-volume, cost-sensitive deployments
  4. Personal assistant agent chains — Budget-constrained local/API hybrid setups

6. Success Metrics

Metric Target
Token reduction on inter-agent messages ≥ 70%
Token reduction on full pipeline execution ≥ 50%
Semantic fidelity (meaning preservation) ≥ 95%
Adoption friction (integration time) < 1 hour for supported harnesses
Latency overhead of encode/decode < 5ms per message
LLM comprehension accuracy of ACCP format ≥ 90% without fine-tuning

7. Competitive Landscape

Project Relationship to ACCP
JSPLIT (Janea Systems) Solves MCP tool pruning, not message compression
TOON (Token-Oriented Object Notation) Overlapping — data serialization format for LLMs; ACCP can incorporate TOON-like ideas for data payloads
Claw Compactor Workspace-level compression, rule-based; ACCP is protocol-level with semantic awareness
LLMLingua Prompt compression via smaller LM; could be used as one ACCP compression backend
MCP Complementary — ACCP compresses what MCP transmits
A2A Complementary — ACCP compresses what A2A coordinates

7.1 ACCP's Unique Position

ACCP is the only proposed protocol that treats token efficiency as the primary design constraint for agent-to-agent communication. It does not replace transport or routing protocols — it makes them cheaper to use.


8. Open Questions & Risks

  1. LLM comprehension ceiling: Can LLMs reliably parse heavily compressed formats without fine-tuning? (Requires empirical validation.)
  2. Format standardization: Who governs the intent ontology? Open governance model needed.
  3. Versioning: How do agents negotiate ACCP versions?
  4. Security: Compact formats may be harder to audit/inspect.
  5. Adoption: Requires harness-level integration — chicken-and-egg problem.
  6. Model-specificity: Different LLMs may tokenize the same format differently, affecting compression ratios.

9. Recommended Next Steps

  1. Release v1.0 of the formal protocol specification (Completed).
  2. Finalize codec implementation in C# (.NET 8+) and proceed to production hardening.
  3. Expand harness adapters to include LangChain and AutoGen alongside Semantic Kernel.
  4. Develop comprehensive threat model and best practices guides for protocol implementers.
  5. Publish benchmarks and open-source the protocol repository.

Document Owner: Project Lead Status: APPROVED Created: April 2026