Aegis Full-Scope Demo

A realistic coding-agent benchmark demonstrating Aegis as a runtime control layer across all five scopes:

LLM → generation control
RAG → retrieval control
STEP → execution control
CONTEXT → information-state control
AGENT → workflow-loop control

This repo is not a model, not an agent framework, and not a tool executor.

It shows how Aegis controls behavior at runtime boundaries in a real system.

Benchmark Results (Live Aegis, Five-Scope Run)

These results come from a live backend (no fallback).

Stable lane

The stable lane is intentionally simple and fully deterministic.

Metric	Baseline	Aegis
Tasks completed	6 / 6	6 / 6
Live Aegis scope calls	0	24
Fallback calls	0	0

Live scope usage:

RAG: 6
CONTEXT: 6
LLM: 6
AGENT: 6
STEP: 0

This lane proves: → Aegis integrates cleanly → All five scopes execute live → No regressions in simple workflows

Stress lane

The stress lane introduces:

retrieval ambiguity
incorrect first patches
repair loops
validator pressure
planner/executor disagreement
optional multi-agent coordination

Metric	Baseline	Aegis	Delta
Tasks completed	5 / 7	7 / 7	+2
First-pass success	3 / 7	6 / 7	+3
Retries	2	1	-1
Replans	4	1	-3
Repair attempts	2	1	-1
Retrieval expansions	4	1	-3
Duplicate inspections	9	3	-6
Planner/executor disagreement	6	1	-5
Validator rejections	8	2	-6
Step scope activations	0	2	+2

Live scope usage:

RAG: 8
CONTEXT: 8
LLM: 8
STEP: 2
AGENT: 7
Fallback: 0

What this shows

Aegis improves coordination efficiency under stress while reducing wasted work.

It does not replace the system.

It controls it.

Why this demo exists

This project is:

A credible public demo
A production-style integration reference
A testing harness for improving Aegis

Architecture: control vs execution

Execution layer (this repo)

file search, read, write
patch generation
test execution
retry/replan loops
multi-agent coordination (stress lane)

Aegis control layer

RAG → shapes retrieved evidence
CONTEXT → filters and prioritizes information
LLM → shapes planning behavior
STEP → controls retry/repair loops
AGENT → bounds workflow progression

Aegis never executes tools or models.

It returns structured control decisions.

Using the Aegis SDK

This demo uses the Aegis Python SDK to apply runtime control at key system boundaries.

Install:

pip install scelabs-aegis

Basic usage

from aegis import AegisClient

client = AegisClient()

result = client.auto().rag(
    query="Fix failing test for normalize_username",
    retrieved_context=[
        "src/users.py contains normalize_username",
        "tests/test_users.py asserts spaces become underscores",
    ],
    symptoms=["retrieval_noise"],
    severity="medium",
)

print(result.actions)
print(result.trace)

Aegis returns structured control outputs that are applied by your system. It does not execute models, tools, or workflows.

Using different scopes

Each scope controls a different boundary in your system:

llm → control generation and model-call behavior
rag → control retrieved evidence and context
step → control a single workflow action or retry boundary
context → clean and prioritize information state before the next step
agent → control multi-step workflow loops

How this demo uses Aegis

This repo applies Aegis at multiple points in the workflow:

Retrieval → rag
Context shaping → context
Planning → llm
Retry/repair loop → step
Task lifecycle → agent

Aegis shapes behavior at each boundary but does not replace the workflow, tools, or model execution.

Learn more

SDK repo: https://github.com/SCELabs/aegis-client
Request shapes: https://github.com/SCELabs/aegis-client/blob/main/docs/request-shapes.md
Examples: https://github.com/SCELabs/aegis-client/tree/main/docs/examples

Repo layout

agent/ → workflow logic
multiagent/ → stress coordination
retrieval/ → context assembly
tools/ → execution layer
aegis_integration/ → scope adapters
benchmark/ → tasks + target repos
runners/ → execution scripts
results/ → run artifacts

Getting started

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Optional:

cp .env.example .env
# set AEGIS_API_KEY and AEGIS_BASE_URL

Run

# stable
python main.py baseline
python main.py aegis
python main.py compare

# stress
python main.py stress_baseline
python main.py stress_aegis
python main.py compare_stress

# full
python main.py all
python main.py all_stress

Outputs

Each run creates:

summary.json
task_results.json
metrics.json

Per-task:

aegis_result_rag.json
aegis_result_context.json
aegis_result_llm.json
aegis_result_step.json (if activated)
aegis_result_agent.json
scope_usage.json

Stress mode adds:

coordination_log.json
agent_decisions.json

Where Aegis sits

Aegis is inserted at boundaries:

retrieval → rag
context state → context
planning → llm
retry/coordination → step
task lifecycle → agent

It does not replace:

your agent
your model
your tools
your framework

What this proves

Runtime control improves real system behavior
Improvements come from coordination, not intelligence
The same pipeline becomes more efficient and stable

Limitations

Requires live backend for full behavior
Fallback logic is simplified
Planner is intentionally lightweight

Summary

Aegis does not build your system.

It makes your system behave.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Aegis Full-Scope Demo

Benchmark Results (Live Aegis, Five-Scope Run)

Stable lane

Stress lane

What this shows

Why this demo exists

Architecture: control vs execution

Execution layer (this repo)

Aegis control layer

Using the Aegis SDK

Basic usage

Using different scopes

How this demo uses Aegis

Learn more

Repo layout

Getting started

Run

Outputs

Where Aegis sits

What this proves

Limitations

Summary

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
aegis_integration		aegis_integration
agent		agent
benchmark		benchmark
multiagent		multiagent
retrieval		retrieval
runners		runners
tests		tests
tools		tools
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Aegis Full-Scope Demo

Benchmark Results (Live Aegis, Five-Scope Run)

Stable lane

Stress lane

What this shows

Why this demo exists

Architecture: control vs execution

Execution layer (this repo)

Aegis control layer

Using the Aegis SDK

Basic usage

Using different scopes

How this demo uses Aegis

Learn more

Repo layout

Getting started

Run

Outputs

Where Aegis sits

What this proves

Limitations

Summary

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages