/discovery

Turn a folder of raw customer evidence into a defensible, evidence-linked Opportunity Solution Tree — locally, with quotes that cannot be faked.

/discovery is a Claude Code skill for product managers. You drop your interview transcripts, sales calls, support tickets, churn notes, NPS verbatims, decks, and spreadsheets into a folder. It reads them, extracts the signal, and produces a structured set of opportunities — each one organized under a strategic outcome, framed as a Job To Be Done, and backed by verbatim customer quotes that are cryptographically locked so they cannot be silently reworded or invented. The result is a single self-contained HTML report you can open in a browser and take into a roadmap review.

It runs on your own machine, over your own files. Nothing is uploaded to a third-party research repository. It is free and open source.

Who this is for
The problem it solves
What one run produces
What makes it different
When not to use it
How it works (the trust model)
Install
Use
Supported input formats
Defaults and configuration
Reading the OST report
Methodology grounding
How to know if it is working for you
Roadmap
Project status and provenance
Contributing
Security
License
Acknowledgements

Who this is for

Product managers, heads of product, and research or RevOps leaders who already gather customer evidence but struggle to turn the pile of transcripts and notes into something a roadmap decision can stand on. If you have ever said "I know we heard this from customers, but I can't quickly show where," this is built for you.

You do not need to be technical to use it. You do need Claude Code and a few minutes to install it once.

The problem it solves

Discovery generates a lot of raw material and very little structure. Transcripts accumulate in a drive. Insights live in someone's head or a stale Notion page. When a roadmap decision gets questioned — by a skeptical exec, a new VP, or your own future self — the evidence is hard to find and harder to trust. Did three customers say this, or did one say it loudly? Is that quote real, or a paraphrase that drifted over five retellings?

/discovery closes that gap. It converts unstructured evidence into a derived, regenerable model of what customers are actually telling you, organized the way product strategy is actually decided: outcomes at the top, opportunities beneath them, every opportunity grounded in quotes you can trust.

What one run produces

You give it a folder containing an inputs/ directory. It produces, in that same folder:

inputs/                  your raw sources (read-only to the skill)
interviews/<slug>.md     one structured extraction per source: analyst notes + hash-locked verbatim quotes
themes.md                an evolving taxonomy of themes, with active/stale markers
opportunities/<id>.md    derived opportunities, each with a full JTBD and a journey-stage tag
outcomes/<id>.md         strategic outcomes that cluster the opportunities
ost.html                 a single self-contained Opportunity Solution Tree report
.discovery/              hidden working state (run logs, content hashes) — gitignored

ost.html is the artifact you actually share. Open it in any browser; it needs no server and makes no network calls.

What makes it different

Most discovery tools are repositories: they store and tag what you put in, so you can search it later. /discovery is a synthesis tool: it derives an opinionated, structured model from your evidence and regenerates it every run as new sources arrive.

	Research repositories (Dovetail, Gong, Notably, EnjoyHQ)	`/discovery`
Core job	Store, tag, and retrieve research	Derive a structured opportunity model from evidence
Output	A searchable library of tagged clips	An Opportunity Solution Tree you can defend
Quote integrity	Trust the tagger	Verbatim quotes sha256-locked; tampering aborts the write
Structure	Folders, tags, boards	Outcome to Opportunity to evidence, MECE-checked
Where it runs	A SaaS you log into	Your machine, your files
Cost and procurement	Per-seat license, a buying decision	Free, open source, adopt it unilaterally

Four things, specifically, that a PM should care about:

Evidence integrity. Every verbatim quote is wrapped in a hash-locked fence and its sha256 is recorded. If any step — including the LLM — tries to alter the text of a locked quote, the write is aborted and the failure is logged. Quotes cannot be silently paraphrased or hallucinated. This is what makes the artifact defensible.
Grounded metrics, not invented ones. Each opportunity carries a success metric. The synthesizer is required to ground it in a number, threshold, or comparator that appears in an actual quote. When the evidence does not support a number, it writes needs validation rather than fabricating one.
MECE discipline. Opportunities are checked for overlap, gaps, and quality by an independent judge step, looping until the set is mutually exclusive and collectively exhaustive or a cap is reached. You are not handed a redundant or contradictory list.
It works alongside what you already have. Export a transcript from Granola, Gong, Fireflies, or Otter; drop a Dovetail export, a customer-council deck, or an NPS spreadsheet into inputs/. /discovery does not replace your call recorder or your repository — it sits downstream and turns their output into a decision artifact.

When not to use it

This is deliberately honest, because adopting the wrong tool wastes more time than not adopting one.

If your job is "find a quote faster while writing a brief," you want a search-first repository. Buy or keep Dovetail.
If your organization does not actually look at qualitative evidence when it decides — if the real blocker is political or decisional, not retrieval — a synthesis tool will not fix that. Address the decision ritual first.
If your product is account-driven or deal-driven rather than opportunity-driven, the Opportunity Solution Tree may be the wrong ontology for you. The OST assumes you are mapping discrete customer needs, not pipeline stages.
If you need multi-user editing, governance, and a shared system of record, this is not that. It is a single-operator tool that regenerates its output each run; it has no concept of "whose tree, edited when, by whom."
Audio and video are not ingested directly. Transcribe first (most call tools export .vtt, .srt, or text), then drop the transcript in.

How it works (the trust model)

The architecture is built so that the parts you must trust are deterministic, and the language model is used only where judgment is genuinely required.

inputs/  --> pre-check (Python: scan, validate, hash, diff)
                 |
   per source -> extractor --> critic --> editor --> interviews/<slug>.md
                 (LLM)        (LLM)      (LLM)        analyst notes + sha256-locked quotes
                 |
            theme aggregation (Python: cluster, mark active/stale) --> themes.md
                 |
   synthesis loop:  synthesizer  <-->  triage           --> opportunities/<id>.md
                    (writes)          (judges MECE)          JTBD + journey_stage + grounded metric
                 |
   cluster pass:    synthesizer (cluster_outcomes mode)  --> outcomes/<id>.md
                    + Python coverage validation
                 |
            OST render (Python, no LLM)                  --> ost.html
                 |
            run summary + 20-run retention (Python)

The deterministic core (input scanning, content hashing, the hash-lock integrity check, theme thresholds, the synthesis-loop control, the HTML render) is plain Python with a unit-tested test suite. It does not improvise.
Five LLM subagents do the judgment work: extract signal from one source, critique an extraction for fidelity, apply fixes, synthesize opportunities, and judge the set for MECE quality. Each has a narrow contract and returns structured output.
The hash-lock is the spine. Verbatim quotes are fenced with HTML comments; their sha256 is computed on every write and compared to the recorded value. A mismatch aborts the write and logs an integrity failure. This is what lets you tell a skeptical stakeholder "this quote is exactly what the customer said."

If a single source fails to process, the run skips it, logs why, and continues with the rest — you get a result built on the evidence that succeeded, with the skipped source flagged in the run summary.

Install

Prerequisites:

Claude Code installed and working.
Python 3.11 or newer.
git.

Steps:

# 1. Clone
git clone https://github.com/klausners/discovery-agents-handoff.git
cd discovery-agents-handoff

# 2. Create a virtual environment and install the deterministic library
python3 -m venv .venv
.venv/bin/pip install -e ".[dev]"

# 3. Verify the engine
.venv/bin/pytest -q          # expect: 81 passed, 1 skipped

# 4. Install the skill so Claude Code can find it
ln -s "$(pwd)" ~/.claude/skills/discovery

No path editing required. SKILL.md invokes the deterministic helpers as ~/.claude/skills/discovery/.venv/bin/python, which resolves through the symlink from step 4 to the virtual environment from step 2 — on any machine, for any user. If you install the skill under a different name or location, update the four Run: command paths in SKILL.md to match.

Confirm the skill is registered by opening Claude Code and typing / — discovery should appear in the skill list.

Use

Make a working folder for the product area you are doing discovery on, and put your sources in an inputs/ subfolder:

my-product-discovery/
  inputs/
    acme-interview.txt
    q3-sales-calls.vtt
    churn-notes.docx
    nps-export.xlsx
    customer-council.pptx

In Claude Code, change into that folder and run the skill:
```
cd my-product-discovery
/discovery
```
The skill runs the pipeline and writes interviews/, themes.md, opportunities/, outcomes/, and ost.html into the folder. Open ost.html in your browser.
Re-run any time you add sources. The pre-check hashes every input and only re-processes what is new or changed; opportunities and the report are regenerated from the full evidence set each run.

A set of 15 synthetic sample sources lives in test-run/inputs/ if you want to see a full run before pointing it at your own data.

Supported input formats

Eight formats, grouped by how they are read:

Kind	Extensions	Notes
Prose	`.txt`, `.md`, `.pdf`, `.docx`	Documents and plain transcripts
Transcript	`.vtt`, `.srt`	Timed captions; timestamps are stripped from quotes
Deck	`.pptx`	Each slide is treated as a discrete piece of evidence
Tabular	`.xlsx`	Free-text columns (NPS comments, churn reasons) are the quote source

Unsupported files are skipped and listed in the run summary rather than crashing the run. Audio and video must be transcribed first.

Defaults and configuration

The opinionated defaults are deliberate and conservative. They live in the deterministic library.

Default	Value	Meaning
Significance threshold	2 sources	A theme is "significant" once two distinct sources support it
Stale-theme threshold	90 days	A theme with no new evidence in 90 days is marked stale (never deleted)
MECE iteration cap	3	The synthesizer/triage loop runs at most three rounds
Run-log retention	20 runs	Older run logs in `.discovery/` are pruned
Journey stages	8	awareness, acquisition, activation, engagement, retention, expansion, referral, churn-prevention
Language	English (v1)	Multi-language is not yet supported

Reading the OST report

ost.html is structured top-down, the way an Opportunity Solution Tree is meant to be read:

A collapsible run-stats strip: sources processed, sources skipped, opportunities, strategic outcomes.
A coverage bar segmented by journey stage; click a segment to filter the tree to that stage.
Filter chips: filter by journey stage, theme, minimum evidence (number of sources), dissenting evidence, or quality flags.
The tree itself: strategic outcomes as headers, with their opportunities branched beneath.
Each opportunity card expands to reveal its full Job To Be Done (When, I want to, So I can, and a success metric), the top verbatim quotes in the customer's own words, and a separate section for any dissenting evidence — because the honest signal includes the customer who disagreed.

Every quote links back to the interview file it came from, so the chain from a roadmap claim to a customer's exact words is one click long.

Methodology grounding

/discovery is opinionated because the methods it implements are opinionated.

Continuous Discovery and the Opportunity Solution Tree, from Teresa Torres ("Continuous Discovery Habits"). Outcomes sit above opportunities; opportunities are distinct customer needs, not solutions.
Jobs To Be Done, in the measurable framing associated with Tony Ulwick and Strategyn. Each opportunity is expressed as a situation, a motivation, a desired outcome, and a success metric.
MECE (Mutually Exclusive, Collectively Exhaustive), the standard structuring discipline, applied to the opportunity set by an independent judge step.
A lifecycle tag taxonomy (awareness through churn-prevention) so opportunities can be filtered by where in the customer journey they live.

The tool does not invent a method; it makes an existing, well-regarded method cheap to apply to a messy pile of evidence.

How to know if it is working for you

Borrowed from the project's own design review, and worth stating plainly: the test of this tool is not whether it produces a pretty report. It is whether someone opens ost.html during a real opportunity-review or roadmap conversation and it changes what gets decided. If, thirty days after your first serious run, no one has opened the artifact during an actual decision, the problem is not the tool's features — the problem is upstream, in how your team decides, and more software will not fix it.

Roadmap

Deliberately deferred, listed so contributors know the edges:

Audio and video ingestion with transcription.
Drift detection: surface which opportunities are intensifying, which "solved" problems are resurfacing in new vocabulary, and which segments are diverging from the tree across runs. This is the longitudinal version of the tool and the most interesting unbuilt piece.
Cross-run trend view inside ost.html (today the report shows only the current run).
A governance model for the tree (who edits it, when, with what review).
Multi-language support.

Project status and provenance

This project has an unusual and deliberate origin. Before any code was written, the thesis was stress-tested by a panel of advisors whose verdict was "don't build yet" — the user, the decision, and the artifact were not yet pinned down, and the risk was building the wrong thing. The honest competitive and adoption risks from that critique are summarized in When not to use it. The decision to proceed anyway, with a tightly scoped first version, is reflected in the design record:

docs/superpowers/specs/ — the design specifications for v1 and v2.
docs/superpowers/plans/ — the task-by-task implementation plans.

Reading them is the fastest way to understand both what this tool is and what it is consciously not. The test suite stands at 81 passing.

Contributing

Contributions are welcome. See CONTRIBUTING.md for the development setup, the test discipline, and how the deterministic library and the LLM subagent prompts are kept in sync. In short: the deterministic core is test-driven, the agent prompts are contract-checked against the JSON schemas, and changes ship as small, reviewed units.

Security

/discovery runs language-model subagents over your customer data on your machine and writes files into your working folder. The integrity model and how to report a vulnerability are documented in SECURITY.md.

License

MIT. Use it, fork it, embed it, sell services around it. No attribution required beyond the license text, though a link back is appreciated.

Acknowledgements

Teresa Torres, for the Opportunity Solution Tree and the continuous-discovery framing this tool implements.
Tony Ulwick and Strategyn, for the measurable Jobs To Be Done structure.
Built as a skill for Anthropic's Claude Code.

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
agents		agents
docs/superpowers		docs/superpowers
lib		lib
schemas		schemas
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README-v2.md		README-v2.md
README.md		README.md
SECURITY.md		SECURITY.md
SKILL.md		SKILL.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

/discovery

Contents

Who this is for

The problem it solves

What one run produces

What makes it different

When not to use it

How it works (the trust model)

Install

Use

Supported input formats

Defaults and configuration

Reading the OST report

Methodology grounding

How to know if it is working for you

Roadmap

Project status and provenance

Contributing

Security

License

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

/discovery

Contents

Who this is for

The problem it solves

What one run produces

What makes it different

When not to use it

How it works (the trust model)

Install

Use

Supported input formats

Defaults and configuration

Reading the OST report

Methodology grounding

How to know if it is working for you

Roadmap

Project status and provenance

Contributing

Security

License

Acknowledgements

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages