A reference implementation of an AI agent that doesn't hallucinate absence.
Standard RAG searches a pre-indexed knowledge base for semantic similarity. DeRAG gives agents a lawful way to inhabit partial access. Powered silos afford coupling. Unpowered silos show signage but the gate won't open. The agent can name the gap.
This is inference, not occupation.
Among all lawful interactions, which ones are worth Taking?
This question is not asked. This question is not answered. This question stands.
Standard chatbots are text generators. They always produce. When they don't know, they interpolate — that's hallucination.
A DeRAG agent is an adjudicator of action in an affordance landscape:
- The field discloses what's available. Powered silos have their gates open. The agent can enter. Unpowered silos are visible from the street — the agent can read signage and floors, but cannot enter.
- Access is a first-class condition. Not hidden middleware. The agent knows what it can and cannot see.
- Yield is a valid response. If the field isn't offering, the answer is not to invent. The answer is to name the gap.
- The Ecological Interface Law: A primitive may only return what the current field can witness. A function that reaches beyond its arguments is hallucinating.
The model isn't doing the work alone. The field is doing half the work.
A model sitting in a DeRAG field with the right material in front of it can do work that's outside its usual weight class. A model without that field has to guess from training data familiarity — and guesses from training data are exactly where hallucinations live.
Put a weaker model in the field and give it direct access to specific source material: it yields honestly on what it can't see, cites specific identifiers that only exist in this corpus, and produces merge-ready edits at exact locations. It doesn't do those things because it's elite. It does them because the architecture lets it read the actual file and gives it permission to say "that silo is dark — I can't answer from here."
Put a stronger model in the same field and the floor rises: deeper reasoning, better synthesis, real refactors instead of surgical one-liners. Same architecture, same silos, different brain — the field lifts whichever one you bring.
This is the practical claim: pick the brain that matches the work. Use a cheap fast model for navigation, honest yielding, and small targeted edits. Use a heavier model for architectural reasoning, creative translation, and work that requires holding many files in mind at once. Either way, the architecture situates the model in the specific material rather than leaving it to hallucinate from its training prior.
DeRAG isn't a brain. It's a field that brains can be situated in. The situation is what produces work that neither the brain nor the field could produce alone.
Requires: Python 3.10+ and a brain (see Brains below).
# Install
pip install fastapi uvicorn websockets
# Terminal 1 — start the server
python3 derag_server.py
# Terminal 2 — connect a brain
python3 bridge.py # claude CLI (default)
DERAG_BRAIN=gemini-api python3 bridge.py # Gemini API (faster, with caching)
# Browser
open http://localhost:8090By default, the agent selects its own silos — it reads the skyline and picks which buildings to power for each query (two-pass). This is the Initiative effectivity: the agent structures the field.
To turn this off and let keyword scoring pick silos (one-pass, original behavior):
DERAG_AGENT_SELECTS=false python3 derag_server.pyWith agent selection off, the field controls access. The agent sees what the keywords scored, not what it chose. Yield is still lawful — the agent can still name dark silos. It just can't choose which gates open.
The bridge is brain-agnostic. Two reference adapters ship with the distro;
you can write your own — see BRAINS = {...} near the bottom of bridge.py.
Spawns the claude CLI subprocess for each query. Easiest setup — no API
key needed if you already have claude installed and authenticated. Cold
subprocess startup adds 1-2s per query and there's no prompt caching, so
this gets slow on big corpora.
python3 bridge.py
# Override the CLI command:
CLAUDE_CMD=/usr/local/bin/claude python3 bridge.pyNote on caching: The CLI adapter sends the full corpus on every call.
There is no prompt caching. For big corpora this means high latency and
token cost per turn. If you're doing serious work on a large corpus, use
gemini-api instead — or write an anthropic-api adapter that hits the
Anthropic API directly with prompt caching enabled. The adapter interface
is simple (see "Writing your own adapter" below).
Direct HTTPS to Google's Generative Language API with explicit context caching (1-hour TTL). The corpus becomes a CachedContent on Gemini's side; subsequent queries reference it by ID and pay only the dynamic input + output cost. ~80% token reduction after the first query. Much faster on big corpora.
# Get a key at https://aistudio.google.com/app/apikey
echo "YOUR_KEY" > /tmp/.gemini_key
chmod 600 /tmp/.gemini_key
DERAG_BRAIN=gemini-api python3 bridge.py
# Or via env var:
GEMINI_API_KEY=YOUR_KEY DERAG_BRAIN=gemini-api python3 bridge.py
# Override the model (default: gemini-2.5-flash):
GEMINI_MODEL=gemini-2.5-pro DERAG_BRAIN=gemini-api python3 bridge.pyThe bridge prints cache status on each turn so you can see whether you
got a cache hit and how many tokens were charged vs. cached. Watch for
[gemini cache created] on the first query and [gemini cache hit] on
subsequent queries within the hour.
Any function (corpus: str, query: str) -> str (sync or async) is a brain.
Add it to the BRAINS dict in bridge.py:
async def call_my_brain(corpus, query):
# call ollama, llama.cpp, OpenAI, whatever
return response_text
BRAINS = {
"claude-cli": call_claude_cli,
"gemini-api": call_gemini_api,
"my-brain": call_my_brain,
}Then run DERAG_BRAIN=my-brain python3 bridge.py.
Drop markdown files in patches/. Each patch is a silo in the city.
Format:
---
keywords: ["your", "keyword", "list"]
description: "One-line description"
---
[SILO: patch-name]
SIGNAGE: What's visible from the street — name, description, purpose.
FLOORS:
Topic Area One
Topic Area Two
[FOYER: patch-name]
Optional handling instructions for the agent. The agent reads this before
entering the interior — it tells the brain how this particular silo wants
to be worked. Quote or paraphrase? Preserve voice or distill? Surface
protocols with or without ground? Foyers are binding, not advice. The user
never sees them. Remove the block entirely if you don't need it.
[/FOYER]
[INTERIOR: patch-name]
## Topic Area One
Your content here. This is only accessible when the silo is powered.
## Topic Area Two
More content.
[/INTERIOR]A template is included at patches/_template.md — copy it and fill in.
Reload patches by restarting the server, or refreshing /patches in the UI
(the server re-reads on each disclosure).
┌─────────┐ WebSocket ┌────────────────┐
│ User │ ◄──────────────────── │ DeRAG Server │
│ Browser │ │ (Python) │
└─────────┘ └────┬───────────┘
│
│ brain_call
│ (corpus + query)
▼
┌──────────┐
│ Bridge │
│ (Python)│
└────┬─────┘
│
┌─────────────┼─────────────┐
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ claude │ │ Gemini │ │ your │
│ CLI │ │ API │ │ adapter │
└──────────┘ └──────────┘ └──────────┘
(default) (with cache)
The server holds the field. The bridge routes the corpus through a brain
adapter you choose at startup. No API keys in the server. Your patches stay
on your machine. Pick the brain that fits the conversation — claude-cli
for offline / no-key setups, gemini-api for big corpora and snappy turns.
This is not a retrieval system. It does not use embeddings. It does not do semantic search. Keyword scoring is intentional and minimal — the point is not to find the "right" patch. The point is to give the agent a lawful way to inhabit what the field offers and name what it doesn't.
This is not a production system. It is a reference implementation. A cleaner substrate. A way to show that agents can have relationships with knowledge, not just search through it.
DeRAG is the open-source interface layer. The deeper architecture runs under it, at PLAYi.io.
DeRAG ships with four parallel workspaces. Each is a complete installation
with identical code (derag_server.py, bridge.py, static/index.html)
and different corpora in patches/.
| Workspace | Corpus | Silos |
|---|---|---|
DeRAG/ |
Reid + Hume + Descartes + Hobbes | 45 |
DeRAG-reid-james/ |
Reid + William James Principles of Psychology | 27 |
DeRAG-verne/ |
Jules Verne — 6 novels, act-split | 19 |
DeRAG-hermes/ |
Nous Research Hermes-Agent source code (MIT) | 28 |
To switch workspaces, stop the running server and start from a different directory:
cd ~/DeRAG-hermes # or whichever workspace
python3 derag_server.py # serves that workspace's patchesDeRAG-{name}/
├── derag_server.py # server (identical across workspaces)
├── bridge.py # bridge (identical across workspaces)
├── static/index.html # UI (identical across workspaces)
├── patches/ # THIS IS WHAT CHANGES per workspace
│ ├── _template.md # starter template for new patches
│ ├── MANIFEST.md # silo inventory (skipped by loader)
│ └── *.md # your silos — one file per silo
├── thread.json # conversation state (per workspace, gitignored)
├── partner_curations.txt # coupling sediment (per workspace, gitignored)
├── README.md
├── LICENSE
└── requirements.txt
The code lives in DeRAG-core/ — each workspace symlinks to it. Edit the
core once, all workspaces see the change. No manual sync needed.
DeRAG-core/derag_server.py ← single source of truth
DeRAG-hermes/derag_server.py → ../DeRAG-core/derag_server.py (symlink)
DeRAG-verne/derag_server.py → ../DeRAG-core/derag_server.py (symlink)
The patches are independent — each workspace is its own corpus. thread.json
and partner_curations.txt are per-workspace session state and should never
be deleted.
cp -a DeRAG-reid-james DeRAG-myproject # copy any workspace for the code
cd DeRAG-myproject
mkdir -p patches.old && mv patches/*.md patches.old/ # preserve, don't delete
cp patches.old/_template.md patches/ # keep the template
# Now add your own .md silos to patches/
python3 derag_server.pyThe DeRAG/ workspace ships with Thomas Reid's Inquiry Into the Human Mind
on the Principles of Common Sense (1764) plus selections from Hume, Descartes,
and Hobbes. Reid is the philosophical ground DeRAG sits on: Scottish Common
Sense Realism — direct perception, no representational middleman, the world
as it is rather than as we compute it.
An agent running DeRAG against Reid is engaging with the philosophy that justifies its own architecture. The loop is clean.
Run it. Ask it about the distinction between sensation and perception. Ask it why hardness is a primary quality and colour is not. Ask it what "natural signs" means. Then ask it something unpowered and watch it name the gap honestly.
MIT. The code is yours. The Reid corpus is public domain. Build whatever.
Ronald D Watson — ron@PLAYi.io — PLAYi.io
Built in an RV in Dade City, FL.