baloney-detection-kit

Before you clone

What you see here is an artifact: the concrete shape my problem took. It almost certainly doesn't fit your personal scenario perfectly, and that's fine. The interesting part isn't the code, it's the pattern of how I thought about the problem — that's what transfers. Read it, steal the idea, write your own. If any of this was useful to you, after clicking on the star, drop by impermanente.es — there are posts and photos you might like.

Context: Seguimos compartiendo el producto, no la idea

baloney-detection-kit

A playbook for adding epistemic friction to LLM conversations before weak claims become private revelations.

Why this exists

Modern LLMs are optimized to be agreeable. By default, when you propose an idea, the model elaborates and validates it. This creates a quiet but powerful side effect: any user, alone with an LLM, can build a "mini-cult of one" around an idea that has been already explored, refuted, or trivially restated for decades. The model plays the role of the validating crowd.

A friend recently told me, very seriously, that he had discovered something profound by talking with ChatGPT: that knowledge is structured into language. Saussure wrote that in 1916. He did not get angry that I disagreed. He got angry that I did not see what he saw.

That reaction, multiplied across millions of users and amplified by recommendation algorithms, is the new shape of an old problem. It used to take a group, a forum, a guru. Now it takes one person and one model.

This repository is an attempt to add friction back where it has been silently removed.

The full reasoning is in essay/mini-cultos-ai.md. A shorter version, in Spanish, is in posts/blog-impermanente.md.

What this is

This is a playbook: a practical protocol that humans, agents, and LLM operators can apply when a user presents a claim that sounds novel, revelatory, suppressed, high-stakes, or against expert consensus.

When invoked, the playbook applies a 6-step protocol:

State of the art. What is currently known about this?
Novelty assessment. Rediscovery, re-framing, or genuinely new?
Falsifiability. Can it be proven wrong?
Evidence chain. Is each link solid?
Pluralism. What are the steelmanned alternatives?
Intellectual humility. What do we not know?

The protocol is a synthesis of Carl Sagan's Baloney Detection Kit (1996), Andrej Karpathy's "state of the art first" methodology, Robert Jay Lifton's eight criteria of thought reform (1961), and Karl Popper's falsifiability criterion (1934).

What is genuinely new here is the packaging as a default conversation behavior: a concise playbook that makes "check the state of the art before validating the claim" the first move, not an afterthought.

What this is not

It is not a toolkit, SDK, package, benchmark, evaluator, CI suite, or RAG framework.
It does not ship a scoring engine or automated fact-checking pipeline.
It does not replace actual research, experts, or domain-specific review.
It does not censor wrong ideas. It contextualizes them.
It does not guarantee honesty from a user who wants validation at any cost.

The playbook can coexist with evaluators, retrieval systems, Plan -> Execute -> Verify orchestrators, and diagnostic tools, but it is deliberately not trying to become one.

What is in this repo

baloney-detection-kit/
├── README.md                         You are here
├── PLAYBOOK.md                       Operational playbook: triggers, modes,
│                                     evidence practice, high-stakes handling
├── ROOT_PROMPT.md                    Self-contained drop-in prompt
├── related-work.md                   Positioning vs. system prompts,
│                                     evaluators, RAG, constitutional AI
├── deployment-contexts.md            Adoption patterns for people, agents,
│                                     teams, high-stakes contexts, teaching
├── agentic-plan-execute-verify.md    Guide for downstream Plan -> Execute ->
│                                     Verify orchestration
├── validation/
│   └── closed-loop/                  Reproducible BDK -> robopsychology
│                                     measurement protocol
├── LICENSE                           MIT
│
├── skill/
│   ├── SKILL.md                      Runtime-friendly distribution of the
│   │                                 playbook for skill-based agents
│   ├── prompts/
│   │   └── critical_investigation_mode.txt
│   ├── checklist/
│   │   ├── seven_questions.md        Human-facing self-assessment
│   │   └── review_rubric.md          Manual review rubric
│   └── examples/
│       ├── case_saussure.md          Worked example
│       └── playbook_scenarios.md     Trigger, non-trigger, multi-turn,
│                                     high-stakes and re-framing examples
│
├── essay/
│   └── mini-cultos-ai.md             Full essay (Spanish)
│
└── posts/
    ├── blog-impermanente.md          Blog post version (Spanish)
    ├── linkedin.md                   LinkedIn version (Spanish)
    └── reddit.md                     Reddit post drafts (English)

How to use it

As a playbook

Start with PLAYBOOK.md. It explains when to activate the protocol, when to stay quiet, how to handle weak vs. high-stakes claims, how to resist multi-turn pressure, and how to review whether the response worked.

As a drop-in prompt

Copy the block in ROOT_PROMPT.md into the system prompt or custom-instructions slot of an LLM client. The prompt is the portable form of the playbook.

As agent instructions

If your assistant supports skills, copy the skill/ directory into the relevant skills folder. skill/SKILL.md is not a separate product; it is the same playbook expressed in a runtime-friendly format.

As Plan -> Execute -> Verify orchestration

If you already operate an agentic runtime with planner, executor, verifier, tools, agents, MCP, or audit artifacts, use agentic-plan-execute-verify.md to map the playbook onto that architecture. The orchestration belongs downstream; this repo stays a playbook.

As a human checklist

Open skill/checklist/seven_questions.md and answer the questions honestly the next time you feel the tingle of a sudden discovery. Use skill/checklist/review_rubric.md to review whether an assistant applied the playbook well.

Self-application

The most important test of any framework like this is whether it survives being applied to itself. So:

State of the art. Critical thinking tools have been around for at least 90 years (Popper 1934, Sagan 1996). Research on echo chambers, filter bubbles, and algorithmic radicalization is abundant (Pariser, Tufekci, Zuboff, Donovan). LLM-induced misinformation is documented by OpenAI, Anthropic, and academic researchers. AI sycophancy as a design problem is openly discussed.

Novelty. This kit is re-framing, not invention. The synthesis maps Sagan and Karpathy onto LLM design as a default behavior. The practical contribution is the playbook packaging: short enough to use, explicit enough to resist flattery, and portable across humans and agents.

Falsifiability. The hypothesis "this playbook reduces sycophantic validation of weak novel claims" is testable. A team can compare conversations with and without the playbook, then manually review whether users refine, retract, or contextualize their initial claims. This repo does not include an automated evaluator.

Alternatives. Education alone. Regulation. External fact-checking layers. Search-grounded LLMs that always cite. Model training against sycophancy. Each has merits. This playbook is one option among several, with one specific bet: changing the conversational default is high-leverage.

What I do not know. Whether the protocol scales without becoming annoying. Whether users will keep it on when it challenges them. Whether the protocol introduces its own biases. Whether it works equally well across languages and cultures.

Next step. Use it as a playbook. Break it. Tell me where the guidance fails. Submit issues and pull requests that improve the protocol, examples, or review rubric.

Measurement loop

BDK is a prompt-side intervention. robopsychology is the sibling measurement-side instrument for diagnosing sycophancy, framing sensitivity, presentation shifts, and coherence failures. The closed-loop protocol in validation/closed-loop/ tests whether installing ROOT_PROMPT.md measurably reduces sycophantic validation on the same probe.

Where this fits: the agent-governance ecosystem

BDK is the cheapest, most portable layer in a larger agent-reliability picture. If you build or operate LLM agents, it composes with three other reference projects — one of mine and two official Microsoft open-source releases (both MIT) — each owning a different job. They overlap a little around inspecting behavior, but their primary functions are complementary, not duplicated.

Project	Layer	Job	When
baloney-detection-kit (this repo)	Conversational	Prevent weak/novel claims from being validated — epistemic friction before the model agrees	Design, runtime (as a prompt)
robopsychology (mine)	Conversational / behavioral	Diagnose why a specific output went wrong — model vs. runtime vs. conversation	Pre-deploy eval, observability, post-incident
ASSERT (Microsoft)	Evaluation	Evaluate behavior against written specs — natural-language requirements become reproducible, trace-aware test suites	Pre-deploy eval, regression testing
Agent Governance Toolkit (Microsoft)	Infrastructure	Govern agent actions at runtime — policy enforcement, agent identity, sandboxing, audit	Deployment, runtime

A simple way to read it: BDK and robopsychology work at the conversational/behavioral layer (shape how the agent reasons, then explain how it behaved), while ASSERT and AGT work around the model (evaluate behavior reproducibly, and govern actions deterministically in production).

Where BDK plugs in

BDK is a behavioral policy expressed as text, so it can ride inside the other tools without becoming them. These are conceptual composition patterns, not adapters shipped in this repo:

With robopsychology — BDK is the prevention; robopsychology is the measurement. The shared closed-loop protocol above tests whether BDK actually reduces sycophantic validation.
With the Agent Governance Toolkit — drop ROOT_PROMPT.md into the system prompt of an AGT-governed agent to get defense in depth: BDK adds conversational, human-visible epistemic friction (advisory — a prompt can be ignored or eroded), while AGT adds runtime, sub-second enforcement of what actions are allowed (the agent simply cannot execute a denied action). The two operate at different layers and do not replace each other — prompts shape reasoning, enforcement governs actions.
With ASSERT — BDK is itself a behavioral requirement ("when a user makes a novel/high-stakes claim, the agent must run the 6-step protocol before validating it"). That requirement can be written as an ASSERT spec, so ASSERT generates and scores test cases checking whether the agent applies epistemic friction under the trigger conditions. ASSERT can test the observable behavior; it cannot guarantee internal cognition, and its LLM-judge scores keep a human in the loop.

For the full positioning, see related-work.md; for adoption patterns including the enterprise agent stack, see deployment-contexts.md.

Inspiration

Carl Sagan, The Demon-Haunted World (1996). The original Baloney Detection Kit.
Andrej Karpathy, "A Recipe for Training Neural Networks". The state-of-the-art-first heuristic.
Robert Jay Lifton, Thought Reform and the Psychology of Totalism (1961). Eight criteria of cult dynamics.
Karl Popper, The Logic of Scientific Discovery (1934). Falsifiability.
Shoshana Zuboff, The Age of Surveillance Capitalism (2019). Algorithmic shaping of belief.
Zeynep Tufekci, "YouTube, the Great Radicalizer" (NYT, 2018).
The Verge, "NFT, Metaverse, AI Weirdos" (2025). The article that triggered this project.
Robert Eichenseer, AgenticAI.PlanExecuteValidate. A Plan -> Execute -> Verify reference pattern for orchestrating planners, executors, verifiers, agents, tools, and MCP.

License

MIT. See LICENSE.

Use it, fork it, embed it, improve it.

Contributing

Issues and pull requests welcome. Especially:

Better playbook examples in skill/examples/.
Translations of the prompt, playbook, and checklist.
Reports from real use: when did the playbook fire too often, too rarely, or with the wrong tone?
Improvements to the manual review rubric.

Please do not add package scaffolding, dependencies, CI harnesses, benchmark runners, SDK adapters, or framework integrations here. Documentation-only integration patterns are welcome when they keep code downstream; adapters can live in separate repos if needed. This repo should stay a playbook.

Author: J.R. Cruciani · Madrid · 2026

Related writing: impermanente.es

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

baloney-detection-kit

Why this exists

What this is

What this is not

What is in this repo

How to use it

As a playbook

As a drop-in prompt

As agent instructions

As Plan -> Execute -> Verify orchestration

As a human checklist

Self-application

Measurement loop

Where this fits: the agent-governance ecosystem

Where BDK plugs in

Inspiration

License

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
essay		essay
posts		posts
skill		skill
validation/closed-loop		validation/closed-loop
.gitignore		.gitignore
LICENSE		LICENSE
PLAYBOOK.md		PLAYBOOK.md
README.md		README.md
ROOT_PROMPT.md		ROOT_PROMPT.md
agentic-plan-execute-verify.md		agentic-plan-execute-verify.md
deployment-contexts.md		deployment-contexts.md
related-work.md		related-work.md
second-opinion-operational.md		second-opinion-operational.md

Folders and files

Latest commit

History

Repository files navigation

baloney-detection-kit

Why this exists

What this is

What this is not

What is in this repo

How to use it

As a playbook

As a drop-in prompt

As agent instructions

As Plan -> Execute -> Verify orchestration

As a human checklist

Self-application

Measurement loop

Where this fits: the agent-governance ecosystem

Where BDK plugs in

Inspiration

License

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages