Skip to content

Sardor-M/agent_mux

Repository files navigation

agent_mux

agent_mux runs your MCP servers in production — with auth, budgets, rate limits, lifecycle, and observability — so you don't have to.

Most teams adopting MCP end up rebuilding the same operational scaffolding: spawn the server, watch it crash, restart with backoff, route JSON-RPC, gate with auth, enforce per-tenant quotas, log every call, kill on budget breach. agent_mux is that scaffolding, shipped — a single OpenResty/LuaJIT process that supervises a fleet of MCP servers and exposes them through one streaming agent endpoint.

The agent loop, the policy plane, and the MCP transport all live in the same worker, so latency stays in the millisecond range and the operational surface stays small. No queue, no scheduler, no microservices.

What you get

MCP supervision

  • Subprocess lifecycle for stdio MCP servers — spawn, monitor, respawn with exponential backoff on crash
  • JSON-RPC 2.0 framing over the stdio pipe
  • Tools from MCP servers join the same registry as inline Lua tools and HTTP tools, behind one dispatcher
  • The same auth / per-tool rate limit / per-org concurrency / token budget apply to MCP tool calls as to any other tool — no MCP-shaped policy holes

Agent runtime around it

  • Multi-turn agent loop with a wall-clock timeout per session
  • Streaming Anthropic upstream (+ a mock upstream for local dev) returning Server-Sent Events
  • Concurrent fan-out for tool calls inside a single assistant turn
  • Sessions in Redis with cancellation (DELETE /v1/sessions/:id), per-session token budgets, per-org concurrency slots, schema migrations
  • Hooks runtime (pre/post LLM, pre/post tool) with hot reload — drop a Lua file in AGENT_MUX_HOOKS_DIR and it picks up on next request

Operations

  • API-key auth, per-IP rate limit, per-tool rate limit, tool authorisation
  • Prometheus metrics, OpenTelemetry spans, structured request logs
  • Graceful shutdown that drains in-flight sessions and emits a final done event before the worker exits
  • Request body size pre-check, redis health gauge, fail-open audit counter

Quick start

make check-deps                                    # openresty + redis on PATH
export AGENT_MUX_API_KEYS=test-key
export AGENT_MUX_MCP_FILE=examples/tools/mcp_servers.json
make demo                                          # boots redis + OpenResty

This boots redis on :6390, OpenResty on :8080, and supervises the demo MCP server (a Python stdio server in examples/tools/mcp_demo/).

In a second terminal:

curl localhost:8080/healthz                        # → ok
curl localhost:8080/metrics | grep agent_mux       # Prometheus exposition

curl -N -X POST localhost:8080/v1/agents \
     -H 'X-API-Key: test-key' \
     -H 'Content-Type: application/json' \
     --data @examples/agent_request.json           # streams SSE

HTTP surface

Route Purpose
POST /v1/agents Start an agent run; streams SSE deltas + tool events
DELETE /v1/sessions/:id Cancel an in-flight session gracefully
GET /healthz Liveness check + redis health probe
GET /metrics Prometheus exposition
POST /mock/v1/messages Local mock upstream for tests and demos

Layout

conf/nginx.conf                OpenResty config — env, locations, phases
lua/agent_mux/
  ├─ tools/                    registry, dispatcher, inline / HTTP / MCP handlers
  │   └─ mcp.lua               stdio MCP client + subprocess respawn
  ├─ transport/
  │   ├─ jsonrpc.lua           JSON-RPC 2.0 framing for MCP stdio
  │   └─ sse.lua               SSE encoder for client responses
  ├─ agent_loop.lua            the multi-turn loop
  ├─ server.lua                access / content / log phases, graceful shutdown
  ├─ session/                  store, messages, budget, concurrency, migrations
  ├─ hooks/                    pre/post LLM and pre/post tool runtime + loader
  ├─ observability/            log, Prometheus metrics, OpenTelemetry spans
  ├─ policy/                   API-key auth, IP rate limit, tool rate limit + authz
  ├─ upstream/                 LLM clients (Anthropic streaming, SSE chunk parser)
  ├─ scripts/                  atomic Redis Lua scripts (budget, concurrency, RL)
  ├─ redis_client.lua          pooled cosocket client + script registry
  └─ errors.lua                shared error taxonomy
examples/
  ├─ tools/mcp_demo/           stdio MCP server in Python — supervised by agent_mux
  ├─ tools/mcp_servers.json    MCP manifest the supervisor reads
  ├─ tools/inline_calculator.lua, http_search/, http_tools.json
  └─ hooks/audit_log.lua       reference audit hook
tests/                         busted unit + integration suite (67 tests)
bench/                         wrk harness + baseline output

Prerequisites

  • OpenResty 1.25+brew install openresty/brew/openresty
  • Redis 7+brew install redis
  • busted for tests — luarocks install busted
  • lua-resty-http — not bundled with OpenResty; install via opm get ledgetech/lua-resty-http
  • Python 3.10+ if you want to run the demo MCP server
  • Optional: wrk for benchmarks, stylua for formatting

Useful targets

make help               # list everything
make demo               # boot redis + OpenResty for an end-to-end run
make dev                # OpenResty in foreground (you bring redis)
make test               # busted unit + integration suite
make bench              # wrk against /healthz, /metrics, /v1/agents
make stop               # stop a backgrounded OpenResty
make clean              # clear logs/ and run/

What's next

Honest list of the gaps between what's shipped and full "MCP fleet manager" parity. None of these block the use cases above; they're the work that turns agent_mux into something an operations team can manage as a first-class service:

  • GET /v1/mcp/servers — list supervised servers with restart count, in-flight calls, last-call latency, status. Right now you read logs/error.log to see what MCP is doing.
  • Hot-reload of AGENT_MUX_MCP_FILE — add or replace an MCP server without make stop && make dev.
  • Per-MCP-server resource caps — memory ceiling, max concurrent calls, idle-timeout-then-respawn. Crash recovery (feat(mcp): subprocess respawn with exponential backoff) is in; proactive bounds aren't.
  • Live MCP traffic tailGET /v1/mcp/servers/<name>/calls (SSE) so operators can watch a server in real time.
  • Dockerfile + docker-compose.yaml — one image, redis sidecar, configurable MCP manifest. Today, deployment is "install OpenResty + Redis, copy the repo, set env vars, run."
  • Thin client libraries — Python and TypeScript wrappers around /v1/agents that yield SSE events as objects. Hand-rolling SSE in every integration is friction we should absorb.
  • More upstream providers — only Anthropic today. OpenAI, Bedrock, Gemini adapters are mechanical work.

License

MIT — see LICENSE.

About

Low-latency LLM agent harness on OpenResty/LuaJIT - multi-turn loop, MCP/HTTP/inline tools, Redis-backed sessions, streaming SSE.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors