An experimental framework for measuring whether persistent memory, reflection, self-modeling, and cross-session continuity change the behavior of an LLM agent in observable, falsifiable ways.
BDM asks one testable question:
Does giving an LLM agent persistent memory, reflection, a self-model, and cross-session continuity produce measurably different behavior than a stateless or long-context-only baseline using the same underlying model?
The framework (bdm-core) builds the memory, reflection, self-model, and continuity layers. The evaluation harness (bdm-eval) runs an agent on multi-session benchmark tasks and reports numbers comparing a BDM-augmented agent to a same-model baseline. Every claim made by the project is anchored to a measurement, or it is not made.
BDM does not claim to create consciousness. It does not make medical claims. It is software research, evaluated by experiment.
- Not a medical product or clinical tool
- Not a consciousness engine or proof of machine sentience
- Not a brain simulation or connectome model
- Not a mind-upload or mind-transfer technology
- Not a general-purpose AI assistant
- Not production software (early research phase)
- Build memory, reflection, self-model, and continuity layers as composable Python modules (
bdm-core) - Build an evaluation harness that compares BDM-augmented agents against same-model baselines (
bdm-eval) - Produce reproducible results tables tied to a single stated hypothesis at a time
- Publish all findings — including negative results — openly
- Falsify hypotheses where the data falsifies them; revise where the data revises them
| Module | Description |
|---|---|
| Memory Layer | Stores episodic, semantic, and working memory structures |
| Attention Layer | Selects relevant context from memory and input |
| Reflection Layer | Reviews prior outputs for consistency |
| Learning Loop | Updates internal state based on feedback and new experience |
| Self-Model Layer | Maintains a lightweight record of the system's own state |
| Context Continuity | Preserves thread of context across sessions |
| Interface Layer | Connects layers to external inputs and LLMs |
| Evaluation Harness | Runs agents on multi-session benchmark tasks and reports metrics |
Milestone 1 — Memory Core — feature complete. SQLite persistence for LongTermStore shipped (#3).
Milestone Eval — First End-to-End Evaluation Slice — in progress. Wires bdm-core into an agent loop, runs a minimal multi-session benchmark against a same-model baseline, and produces the first results table.
# Install both packages in development mode
pip install -e "packages/bdm-core[dev]"
pip install -e "packages/bdm-eval[dev]"
# Run the test suite
pytest packages/bdm-core/tests packages/bdm-eval/tests
# Produce the first results table (uses deterministic mock LLM — no API key needed)
python -m bdm_eval.runners.runThe runner writes packages/bdm-eval/src/bdm_eval/results/<UTC-timestamp>.{json,md} and prints the Markdown table to stdout.
bdm/
├── README.md
├── CLAUDE.md — Claude Code session context
├── CONTRIBUTING.md — contribution rules
├── LICENSE.md — source-available, all rights reserved
│
├── .ai/ — AI assistant context directory
│ ├── README.md
│ ├── project.md — project overview and goals
│ ├── behavior.md — rules for AI assistants
│ ├── architecture.md — architectural decisions
│ ├── milestones.md — milestone status tracker
│ ├── current.md — active task and focus
│ ├── git-conventions.md — branch, issue, PR naming
│ └── specs/ — per-module specifications
│
├── docs/ — long-form documentation (EN + PL)
├── concepts/ — concept files per cognitive layer (EN + PL)
├── research/ — hypotheses, experiments, reading list (EN + PL)
│
├── packages/
│ ├── bdm-core/ — memory, reflection, self-model, continuity layers
│ └── bdm-eval/ — agents, benchmarks, metrics, runner
│
├── .github/
└── .gitignore
Beautiful Deep Mind is source-available, but it is not open source.
All rights are reserved by Boring Code. You may read and evaluate the repository, but you may not copy, redistribute, commercialize, or create derivative works without written permission.
Contributions are welcome under the rules described in CONTRIBUTING.md. By submitting a contribution, you agree to the contribution terms described in LICENSE.md.
See:
BDM does not make medical claims. It does not claim to create, simulate, or replicate consciousness. It does not claim that software can copy, upload, transfer, or preserve a human mind. This project is software research inspired by cognitive science concepts. All claims are framed as hypotheses to be tested by experiment, not established results.