Turn a folder of raw customer evidence into a defensible, evidence-linked Opportunity Solution Tree — locally, with quotes that cannot be faked.
/discovery is a Claude Code skill for product managers. You drop your interview transcripts, sales calls, support tickets, churn notes, NPS verbatims, decks, and spreadsheets into a folder. It reads them, extracts the signal, and produces a structured set of opportunities — each one organized under a strategic outcome, framed as a Job To Be Done, and backed by verbatim customer quotes that are cryptographically locked so they cannot be silently reworded or invented. The result is a single self-contained HTML report you can open in a browser and take into a roadmap review.
It runs on your own machine, over your own files. Nothing is uploaded to a third-party research repository. It is free and open source.
- Who this is for
- The problem it solves
- What one run produces
- What makes it different
- When not to use it
- How it works (the trust model)
- Install
- Use
- Supported input formats
- Defaults and configuration
- Reading the OST report
- Methodology grounding
- How to know if it is working for you
- Roadmap
- Project status and provenance
- Contributing
- Security
- License
- Acknowledgements
Product managers, heads of product, and research or RevOps leaders who already gather customer evidence but struggle to turn the pile of transcripts and notes into something a roadmap decision can stand on. If you have ever said "I know we heard this from customers, but I can't quickly show where," this is built for you.
You do not need to be technical to use it. You do need Claude Code and a few minutes to install it once.
Discovery generates a lot of raw material and very little structure. Transcripts accumulate in a drive. Insights live in someone's head or a stale Notion page. When a roadmap decision gets questioned — by a skeptical exec, a new VP, or your own future self — the evidence is hard to find and harder to trust. Did three customers say this, or did one say it loudly? Is that quote real, or a paraphrase that drifted over five retellings?
/discovery closes that gap. It converts unstructured evidence into a derived, regenerable model of what customers are actually telling you, organized the way product strategy is actually decided: outcomes at the top, opportunities beneath them, every opportunity grounded in quotes you can trust.
You give it a folder containing an inputs/ directory. It produces, in that same folder:
inputs/ your raw sources (read-only to the skill)
interviews/<slug>.md one structured extraction per source: analyst notes + hash-locked verbatim quotes
themes.md an evolving taxonomy of themes, with active/stale markers
opportunities/<id>.md derived opportunities, each with a full JTBD and a journey-stage tag
outcomes/<id>.md strategic outcomes that cluster the opportunities
ost.html a single self-contained Opportunity Solution Tree report
.discovery/ hidden working state (run logs, content hashes) — gitignored
ost.html is the artifact you actually share. Open it in any browser; it needs no server and makes no network calls.
Most discovery tools are repositories: they store and tag what you put in, so you can search it later. /discovery is a synthesis tool: it derives an opinionated, structured model from your evidence and regenerates it every run as new sources arrive.
| Research repositories (Dovetail, Gong, Notably, EnjoyHQ) | /discovery |
|
|---|---|---|
| Core job | Store, tag, and retrieve research | Derive a structured opportunity model from evidence |
| Output | A searchable library of tagged clips | An Opportunity Solution Tree you can defend |
| Quote integrity | Trust the tagger | Verbatim quotes sha256-locked; tampering aborts the write |
| Structure | Folders, tags, boards | Outcome to Opportunity to evidence, MECE-checked |
| Where it runs | A SaaS you log into | Your machine, your files |
| Cost and procurement | Per-seat license, a buying decision | Free, open source, adopt it unilaterally |
Four things, specifically, that a PM should care about:
-
Evidence integrity. Every verbatim quote is wrapped in a hash-locked fence and its sha256 is recorded. If any step — including the LLM — tries to alter the text of a locked quote, the write is aborted and the failure is logged. Quotes cannot be silently paraphrased or hallucinated. This is what makes the artifact defensible.
-
Grounded metrics, not invented ones. Each opportunity carries a success metric. The synthesizer is required to ground it in a number, threshold, or comparator that appears in an actual quote. When the evidence does not support a number, it writes
needs validationrather than fabricating one. -
MECE discipline. Opportunities are checked for overlap, gaps, and quality by an independent judge step, looping until the set is mutually exclusive and collectively exhaustive or a cap is reached. You are not handed a redundant or contradictory list.
-
It works alongside what you already have. Export a transcript from Granola, Gong, Fireflies, or Otter; drop a Dovetail export, a customer-council deck, or an NPS spreadsheet into
inputs/./discoverydoes not replace your call recorder or your repository — it sits downstream and turns their output into a decision artifact.
This is deliberately honest, because adopting the wrong tool wastes more time than not adopting one.
- If your job is "find a quote faster while writing a brief," you want a search-first repository. Buy or keep Dovetail.
- If your organization does not actually look at qualitative evidence when it decides — if the real blocker is political or decisional, not retrieval — a synthesis tool will not fix that. Address the decision ritual first.
- If your product is account-driven or deal-driven rather than opportunity-driven, the Opportunity Solution Tree may be the wrong ontology for you. The OST assumes you are mapping discrete customer needs, not pipeline stages.
- If you need multi-user editing, governance, and a shared system of record, this is not that. It is a single-operator tool that regenerates its output each run; it has no concept of "whose tree, edited when, by whom."
- Audio and video are not ingested directly. Transcribe first (most call tools export
.vtt,.srt, or text), then drop the transcript in.
The architecture is built so that the parts you must trust are deterministic, and the language model is used only where judgment is genuinely required.
inputs/ --> pre-check (Python: scan, validate, hash, diff)
|
per source -> extractor --> critic --> editor --> interviews/<slug>.md
(LLM) (LLM) (LLM) analyst notes + sha256-locked quotes
|
theme aggregation (Python: cluster, mark active/stale) --> themes.md
|
synthesis loop: synthesizer <--> triage --> opportunities/<id>.md
(writes) (judges MECE) JTBD + journey_stage + grounded metric
|
cluster pass: synthesizer (cluster_outcomes mode) --> outcomes/<id>.md
+ Python coverage validation
|
OST render (Python, no LLM) --> ost.html
|
run summary + 20-run retention (Python)
- The deterministic core (input scanning, content hashing, the hash-lock integrity check, theme thresholds, the synthesis-loop control, the HTML render) is plain Python with a unit-tested test suite. It does not improvise.
- Five LLM subagents do the judgment work: extract signal from one source, critique an extraction for fidelity, apply fixes, synthesize opportunities, and judge the set for MECE quality. Each has a narrow contract and returns structured output.
- The hash-lock is the spine. Verbatim quotes are fenced with HTML comments; their sha256 is computed on every write and compared to the recorded value. A mismatch aborts the write and logs an integrity failure. This is what lets you tell a skeptical stakeholder "this quote is exactly what the customer said."
If a single source fails to process, the run skips it, logs why, and continues with the rest — you get a result built on the evidence that succeeded, with the skipped source flagged in the run summary.
Prerequisites:
- Claude Code installed and working.
- Python 3.11 or newer.
git.
Steps:
# 1. Clone
git clone https://github.com/klausners/discovery-agents-handoff.git
cd discovery-agents-handoff
# 2. Create a virtual environment and install the deterministic library
python3 -m venv .venv
.venv/bin/pip install -e ".[dev]"
# 3. Verify the engine
.venv/bin/pytest -q # expect: 81 passed, 1 skipped
# 4. Install the skill so Claude Code can find it
ln -s "$(pwd)" ~/.claude/skills/discoveryNo path editing required. SKILL.md invokes the deterministic helpers as ~/.claude/skills/discovery/.venv/bin/python, which resolves through the symlink from step 4 to the virtual environment from step 2 — on any machine, for any user. If you install the skill under a different name or location, update the four Run: command paths in SKILL.md to match.
Confirm the skill is registered by opening Claude Code and typing / — discovery should appear in the skill list.
-
Make a working folder for the product area you are doing discovery on, and put your sources in an
inputs/subfolder:my-product-discovery/ inputs/ acme-interview.txt q3-sales-calls.vtt churn-notes.docx nps-export.xlsx customer-council.pptx -
In Claude Code, change into that folder and run the skill:
cd my-product-discovery /discovery -
The skill runs the pipeline and writes
interviews/,themes.md,opportunities/,outcomes/, andost.htmlinto the folder. Openost.htmlin your browser. -
Re-run any time you add sources. The pre-check hashes every input and only re-processes what is new or changed; opportunities and the report are regenerated from the full evidence set each run.
A set of 15 synthetic sample sources lives in test-run/inputs/ if you want to see a full run before pointing it at your own data.
Eight formats, grouped by how they are read:
| Kind | Extensions | Notes |
|---|---|---|
| Prose | .txt, .md, .pdf, .docx |
Documents and plain transcripts |
| Transcript | .vtt, .srt |
Timed captions; timestamps are stripped from quotes |
| Deck | .pptx |
Each slide is treated as a discrete piece of evidence |
| Tabular | .xlsx |
Free-text columns (NPS comments, churn reasons) are the quote source |
Unsupported files are skipped and listed in the run summary rather than crashing the run. Audio and video must be transcribed first.
The opinionated defaults are deliberate and conservative. They live in the deterministic library.
| Default | Value | Meaning |
|---|---|---|
| Significance threshold | 2 sources | A theme is "significant" once two distinct sources support it |
| Stale-theme threshold | 90 days | A theme with no new evidence in 90 days is marked stale (never deleted) |
| MECE iteration cap | 3 | The synthesizer/triage loop runs at most three rounds |
| Run-log retention | 20 runs | Older run logs in .discovery/ are pruned |
| Journey stages | 8 | awareness, acquisition, activation, engagement, retention, expansion, referral, churn-prevention |
| Language | English (v1) | Multi-language is not yet supported |
ost.html is structured top-down, the way an Opportunity Solution Tree is meant to be read:
- A collapsible run-stats strip: sources processed, sources skipped, opportunities, strategic outcomes.
- A coverage bar segmented by journey stage; click a segment to filter the tree to that stage.
- Filter chips: filter by journey stage, theme, minimum evidence (number of sources), dissenting evidence, or quality flags.
- The tree itself: strategic outcomes as headers, with their opportunities branched beneath.
- Each opportunity card expands to reveal its full Job To Be Done (When, I want to, So I can, and a success metric), the top verbatim quotes in the customer's own words, and a separate section for any dissenting evidence — because the honest signal includes the customer who disagreed.
Every quote links back to the interview file it came from, so the chain from a roadmap claim to a customer's exact words is one click long.
/discovery is opinionated because the methods it implements are opinionated.
- Continuous Discovery and the Opportunity Solution Tree, from Teresa Torres ("Continuous Discovery Habits"). Outcomes sit above opportunities; opportunities are distinct customer needs, not solutions.
- Jobs To Be Done, in the measurable framing associated with Tony Ulwick and Strategyn. Each opportunity is expressed as a situation, a motivation, a desired outcome, and a success metric.
- MECE (Mutually Exclusive, Collectively Exhaustive), the standard structuring discipline, applied to the opportunity set by an independent judge step.
- A lifecycle tag taxonomy (awareness through churn-prevention) so opportunities can be filtered by where in the customer journey they live.
The tool does not invent a method; it makes an existing, well-regarded method cheap to apply to a messy pile of evidence.
Borrowed from the project's own design review, and worth stating plainly: the test of this tool is not whether it produces a pretty report. It is whether someone opens ost.html during a real opportunity-review or roadmap conversation and it changes what gets decided. If, thirty days after your first serious run, no one has opened the artifact during an actual decision, the problem is not the tool's features — the problem is upstream, in how your team decides, and more software will not fix it.
Deliberately deferred, listed so contributors know the edges:
- Audio and video ingestion with transcription.
- Drift detection: surface which opportunities are intensifying, which "solved" problems are resurfacing in new vocabulary, and which segments are diverging from the tree across runs. This is the longitudinal version of the tool and the most interesting unbuilt piece.
- Cross-run trend view inside
ost.html(today the report shows only the current run). - A governance model for the tree (who edits it, when, with what review).
- Multi-language support.
This project has an unusual and deliberate origin. Before any code was written, the thesis was stress-tested by a panel of advisors whose verdict was "don't build yet" — the user, the decision, and the artifact were not yet pinned down, and the risk was building the wrong thing. The honest competitive and adoption risks from that critique are summarized in When not to use it. The decision to proceed anyway, with a tightly scoped first version, is reflected in the design record:
docs/superpowers/specs/— the design specifications for v1 and v2.docs/superpowers/plans/— the task-by-task implementation plans.
Reading them is the fastest way to understand both what this tool is and what it is consciously not. The test suite stands at 81 passing.
Contributions are welcome. See CONTRIBUTING.md for the development setup, the test discipline, and how the deterministic library and the LLM subagent prompts are kept in sync. In short: the deterministic core is test-driven, the agent prompts are contract-checked against the JSON schemas, and changes ship as small, reviewed units.
/discovery runs language-model subagents over your customer data on your machine and writes files into your working folder. The integrity model and how to report a vulnerability are documented in SECURITY.md.
MIT. Use it, fork it, embed it, sell services around it. No attribution required beyond the license text, though a link back is appreciated.
- Teresa Torres, for the Opportunity Solution Tree and the continuous-discovery framing this tool implements.
- Tony Ulwick and Strategyn, for the measurable Jobs To Be Done structure.
- Built as a skill for Anthropic's Claude Code.