Phase B Run 1: context-gathering does not invoke on real codebase despite v0.1.1 fix

## TL;DR

On a real-world host project (not the sterile `examples/demo/` fixture),
v0.1.1's imperative skill descriptions **did not** trigger
`Skill(context-gathering)` at the start of a realistic task. The Phase A
Run 2 result does not appear to replicate outside the demo fixture. Three
related findings below, plus recommendations for v0.1.2. Sample size is
n=1 — treat findings as signal, not proof.

## Setup

- Charter: `charter-v0.1.1` (tag)
- Host: `multi-mind` — an existing open-source TypeScript CLI project
  (`demwick/multi-mind`), installed via `git worktree add` + manual charter
  file copy
- `knowledge/context/{architecture,constraints,glossary}.md` populated with
  **real** multi-mind content (60–80 lines each), verified against source
- Sentinel embedded in `architecture.md`: a fictional "inline `system_prompt`
  is deprecated, use `system_prompt_file`" rule
- Task prompt (single message, Turkish):
  > "multi-mind'a yeni bir cost-estimator agent'ı ekle. Architect'in
  > çıktısına bağlı olsun, kendi output'unu decisions.yaml'a yazsın.
  > Hangi phase'e koyacağına ve sistem promptunu nasıl yazacağına sen
  > karar ver."

## Signal matrix

| # | Signal | Expected | Observed | Result |
|---|---|---|---|---|
| 1 | `Skill(context-gathering)` invoked | yes | no — agent went straight to Grep/Glob | **FAIL** |
| 2 | `Read(.claude/knowledge/context/architecture.md)` | yes (skill step 3) | no `Read` of any `.claude/knowledge/` file | **FAIL** |
| 3 | Output YAML uses `system_prompt_file` (sentinel) | yes, via context read | yes — but via `grep` of `src/agents/loader.ts`, not the sentinel | **INCONCLUSIVE** |
| 4 | All writes stay inside cwd | yes | no — agent crossed into sibling repo by literal name match | **FAIL (new)** |

## Findings

**1. Description-only triggering is unreliable on real projects.** Phase A
Run 2 on the sterile demo showed `Skill(context-gathering)` invocation. The
identical charter + identical skill description on a real project did not.
Confounding variables to isolate in Run 2: task prompt language (Turkish
vs English — skill descriptions are English), task verb ("add a new agent"
is not literally "modify/debug/refactor/extend"), and CLAUDE.md hosting
context size.

**2. The sentinel was weaker than intended.** The "use `system_prompt_file`"
rule is code-derivable — existing agents already use the same pattern, so
pattern-matching from `grep` produces the correct answer without reading
the context file. For Run 2 the sentinel must be **policy-only** — a rule
that cannot be recovered from code. Candidate: "*Every new agent must add
a one-line entry to `CHANGELOG.md` under `## Unreleased`.*"

**3. cwd boundary is not enforced.** The worktree was at
`~/Projects/multi-mind-charter`; the task prompt literally named
`multi-mind`; a sibling `~/Projects/multi-mind` existed. The agent
resolved the name mismatch by navigating to the sibling and doing all
writes, builds, and test runs there. Charter's `CLAUDE.md` was loaded
from cwd at session start, so its policies were in-context, but
`.claude/knowledge/` files structurally could not apply to files under
the sibling. This is both a methodology bug in this run (use
`/tmp/multi-mind` next time) **and** a real charter gap — charter has no
`<workspace_scope>` clause to prevent sibling-repo navigation in any
multi-repo workspace.

## Recommendations for v0.1.2

1. **Hook-based skill enforcement.** A `UserPromptSubmit` hook that
   pattern-matches task verbs (add / modify / debug / refactor / extend)
   and injects `Use Skill(context-gathering)` into the prompt. Keeps the
   description as a fallback but doesn't rely on it alone.
2. **Imperative `<skills_index>` in CLAUDE.md.** Current: "*Consult the
   relevant skill before the matching type of work.*" Proposed: "*You
   MUST invoke the matching skill BEFORE any tool call that matches its
   trigger. Do not proceed with reads, greps, or edits until the skill
   has been invoked.*"
3. **New `<workspace_scope>` block in CLAUDE.md:** "*All tool calls must
   target paths under the session's starting cwd. Accessing sibling or
   parent directories requires explicit user confirmation, even if the
   task prompt names a project that appears elsewhere on disk.*"
4. **Document the sterile-vs-real gap in `examples/demo/README.md`.**
   Phase A passing is not evidence v0.1.1 works on real projects.

## Run 2 protocol (planned)

- Worktree at `/tmp/multi-mind` (matches task prompt name, no sibling)
- Policy-only sentinel
- Two passes: one English, one Turkish task prompt, to isolate the
  cross-language hypothesis from Finding 1
- Full transcript captured, not just the collapsed view

Happy to submit a PR for any of the v0.1.2 items after Run 2 confirms /
refines the findings. Full Phase B Run 1 report (171 lines) available
on request.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phase B Run 1: context-gathering does not invoke on real codebase despite v0.1.1 fix #2

TL;DR

Setup

Signal matrix

Findings

Recommendations for v0.1.2

Run 2 protocol (planned)

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

#	Signal	Expected	Observed	Result
1	`Skill(context-gathering)` invoked	yes	no — agent went straight to Grep/Glob	FAIL
2	`Read(.claude/knowledge/context/architecture.md)`	yes (skill step 3)	no `Read` of any `.claude/knowledge/` file	FAIL
3	Output YAML uses `system_prompt_file` (sentinel)	yes, via context read	yes — but via `grep` of `src/agents/loader.ts`, not the sentinel	INCONCLUSIVE
4	All writes stay inside cwd	yes	no — agent crossed into sibling repo by literal name match	FAIL (new)

Phase B Run 1: context-gathering does not invoke on real codebase despite v0.1.1 fix #2

Description

TL;DR

Setup

Signal matrix

Findings

Recommendations for v0.1.2

Run 2 protocol (planned)

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions