A realistic coding-agent benchmark demonstrating Aegis as a runtime control layer across all five scopes:
- LLM → generation control
- RAG → retrieval control
- STEP → execution control
- CONTEXT → information-state control
- AGENT → workflow-loop control
This repo is not a model, not an agent framework, and not a tool executor.
It shows how Aegis controls behavior at runtime boundaries in a real system.
These results come from a live backend (no fallback).
The stable lane is intentionally simple and fully deterministic.
| Metric | Baseline | Aegis |
|---|---|---|
| Tasks completed | 6 / 6 | 6 / 6 |
| Live Aegis scope calls | 0 | 24 |
| Fallback calls | 0 | 0 |
Live scope usage:
- RAG: 6
- CONTEXT: 6
- LLM: 6
- AGENT: 6
- STEP: 0
This lane proves: → Aegis integrates cleanly → All five scopes execute live → No regressions in simple workflows
The stress lane introduces:
- retrieval ambiguity
- incorrect first patches
- repair loops
- validator pressure
- planner/executor disagreement
- optional multi-agent coordination
| Metric | Baseline | Aegis | Delta |
|---|---|---|---|
| Tasks completed | 5 / 7 | 7 / 7 | +2 |
| First-pass success | 3 / 7 | 6 / 7 | +3 |
| Retries | 2 | 1 | -1 |
| Replans | 4 | 1 | -3 |
| Repair attempts | 2 | 1 | -1 |
| Retrieval expansions | 4 | 1 | -3 |
| Duplicate inspections | 9 | 3 | -6 |
| Planner/executor disagreement | 6 | 1 | -5 |
| Validator rejections | 8 | 2 | -6 |
| Step scope activations | 0 | 2 | +2 |
Live scope usage:
- RAG: 8
- CONTEXT: 8
- LLM: 8
- STEP: 2
- AGENT: 7
- Fallback: 0
Aegis improves coordination efficiency under stress while reducing wasted work.
It does not replace the system.
It controls it.
This project is:
- A credible public demo
- A production-style integration reference
- A testing harness for improving Aegis
- file search, read, write
- patch generation
- test execution
- retry/replan loops
- multi-agent coordination (stress lane)
- RAG → shapes retrieved evidence
- CONTEXT → filters and prioritizes information
- LLM → shapes planning behavior
- STEP → controls retry/repair loops
- AGENT → bounds workflow progression
Aegis never executes tools or models.
It returns structured control decisions.
This demo uses the Aegis Python SDK to apply runtime control at key system boundaries.
Install:
pip install scelabs-aegis
from aegis import AegisClient
client = AegisClient()
result = client.auto().rag(
query="Fix failing test for normalize_username",
retrieved_context=[
"src/users.py contains normalize_username",
"tests/test_users.py asserts spaces become underscores",
],
symptoms=["retrieval_noise"],
severity="medium",
)
print(result.actions)
print(result.trace)
Aegis returns structured control outputs that are applied by your system. It does not execute models, tools, or workflows.
Each scope controls a different boundary in your system:
llm→ control generation and model-call behaviorrag→ control retrieved evidence and contextstep→ control a single workflow action or retry boundarycontext→ clean and prioritize information state before the next stepagent→ control multi-step workflow loops
This repo applies Aegis at multiple points in the workflow:
- Retrieval →
rag - Context shaping →
context - Planning →
llm - Retry/repair loop →
step - Task lifecycle →
agent
Aegis shapes behavior at each boundary but does not replace the workflow, tools, or model execution.
- SDK repo: https://github.com/SCELabs/aegis-client
- Request shapes: https://github.com/SCELabs/aegis-client/blob/main/docs/request-shapes.md
- Examples: https://github.com/SCELabs/aegis-client/tree/main/docs/examples
agent/→ workflow logicmultiagent/→ stress coordinationretrieval/→ context assemblytools/→ execution layeraegis_integration/→ scope adaptersbenchmark/→ tasks + target reposrunners/→ execution scriptsresults/→ run artifacts
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtOptional:
cp .env.example .env
# set AEGIS_API_KEY and AEGIS_BASE_URL# stable
python main.py baseline
python main.py aegis
python main.py compare
# stress
python main.py stress_baseline
python main.py stress_aegis
python main.py compare_stress
# full
python main.py all
python main.py all_stressEach run creates:
summary.jsontask_results.jsonmetrics.json
Per-task:
aegis_result_rag.jsonaegis_result_context.jsonaegis_result_llm.jsonaegis_result_step.json(if activated)aegis_result_agent.jsonscope_usage.json
Stress mode adds:
coordination_log.jsonagent_decisions.json
Aegis is inserted at boundaries:
- retrieval →
rag - context state →
context - planning →
llm - retry/coordination →
step - task lifecycle →
agent
It does not replace:
- your agent
- your model
- your tools
- your framework
- Runtime control improves real system behavior
- Improvements come from coordination, not intelligence
- The same pipeline becomes more efficient and stable
- Requires live backend for full behavior
- Fallback logic is simplified
- Planner is intentionally lightweight
Aegis does not build your system.
It makes your system behave.