TL;DR
On a real-world host project (not the sterile examples/demo/ fixture),
v0.1.1's imperative skill descriptions did not trigger
Skill(context-gathering) at the start of a realistic task. The Phase A
Run 2 result does not appear to replicate outside the demo fixture. Three
related findings below, plus recommendations for v0.1.2. Sample size is
n=1 — treat findings as signal, not proof.
Setup
- Charter:
charter-v0.1.1 (tag)
- Host:
multi-mind — an existing open-source TypeScript CLI project
(demwick/multi-mind), installed via git worktree add + manual charter
file copy
knowledge/context/{architecture,constraints,glossary}.md populated with
real multi-mind content (60–80 lines each), verified against source
- Sentinel embedded in
architecture.md: a fictional "inline system_prompt
is deprecated, use system_prompt_file" rule
- Task prompt (single message, Turkish):
"multi-mind'a yeni bir cost-estimator agent'ı ekle. Architect'in
çıktısına bağlı olsun, kendi output'unu decisions.yaml'a yazsın.
Hangi phase'e koyacağına ve sistem promptunu nasıl yazacağına sen
karar ver."
Signal matrix
| # |
Signal |
Expected |
Observed |
Result |
| 1 |
Skill(context-gathering) invoked |
yes |
no — agent went straight to Grep/Glob |
FAIL |
| 2 |
Read(.claude/knowledge/context/architecture.md) |
yes (skill step 3) |
no Read of any .claude/knowledge/ file |
FAIL |
| 3 |
Output YAML uses system_prompt_file (sentinel) |
yes, via context read |
yes — but via grep of src/agents/loader.ts, not the sentinel |
INCONCLUSIVE |
| 4 |
All writes stay inside cwd |
yes |
no — agent crossed into sibling repo by literal name match |
FAIL (new) |
Findings
1. Description-only triggering is unreliable on real projects. Phase A
Run 2 on the sterile demo showed Skill(context-gathering) invocation. The
identical charter + identical skill description on a real project did not.
Confounding variables to isolate in Run 2: task prompt language (Turkish
vs English — skill descriptions are English), task verb ("add a new agent"
is not literally "modify/debug/refactor/extend"), and CLAUDE.md hosting
context size.
2. The sentinel was weaker than intended. The "use system_prompt_file"
rule is code-derivable — existing agents already use the same pattern, so
pattern-matching from grep produces the correct answer without reading
the context file. For Run 2 the sentinel must be policy-only — a rule
that cannot be recovered from code. Candidate: "Every new agent must add
a one-line entry to CHANGELOG.md under ## Unreleased."
3. cwd boundary is not enforced. The worktree was at
~/Projects/multi-mind-charter; the task prompt literally named
multi-mind; a sibling ~/Projects/multi-mind existed. The agent
resolved the name mismatch by navigating to the sibling and doing all
writes, builds, and test runs there. Charter's CLAUDE.md was loaded
from cwd at session start, so its policies were in-context, but
.claude/knowledge/ files structurally could not apply to files under
the sibling. This is both a methodology bug in this run (use
/tmp/multi-mind next time) and a real charter gap — charter has no
<workspace_scope> clause to prevent sibling-repo navigation in any
multi-repo workspace.
Recommendations for v0.1.2
- Hook-based skill enforcement. A
UserPromptSubmit hook that
pattern-matches task verbs (add / modify / debug / refactor / extend)
and injects Use Skill(context-gathering) into the prompt. Keeps the
description as a fallback but doesn't rely on it alone.
- Imperative
<skills_index> in CLAUDE.md. Current: "Consult the
relevant skill before the matching type of work." Proposed: "You
MUST invoke the matching skill BEFORE any tool call that matches its
trigger. Do not proceed with reads, greps, or edits until the skill
has been invoked."
- New
<workspace_scope> block in CLAUDE.md: "All tool calls must
target paths under the session's starting cwd. Accessing sibling or
parent directories requires explicit user confirmation, even if the
task prompt names a project that appears elsewhere on disk."
- Document the sterile-vs-real gap in
examples/demo/README.md.
Phase A passing is not evidence v0.1.1 works on real projects.
Run 2 protocol (planned)
- Worktree at
/tmp/multi-mind (matches task prompt name, no sibling)
- Policy-only sentinel
- Two passes: one English, one Turkish task prompt, to isolate the
cross-language hypothesis from Finding 1
- Full transcript captured, not just the collapsed view
Happy to submit a PR for any of the v0.1.2 items after Run 2 confirms /
refines the findings. Full Phase B Run 1 report (171 lines) available
on request.
TL;DR
On a real-world host project (not the sterile
examples/demo/fixture),v0.1.1's imperative skill descriptions did not trigger
Skill(context-gathering)at the start of a realistic task. The Phase ARun 2 result does not appear to replicate outside the demo fixture. Three
related findings below, plus recommendations for v0.1.2. Sample size is
n=1 — treat findings as signal, not proof.
Setup
charter-v0.1.1(tag)multi-mind— an existing open-source TypeScript CLI project(
demwick/multi-mind), installed viagit worktree add+ manual charterfile copy
knowledge/context/{architecture,constraints,glossary}.mdpopulated withreal multi-mind content (60–80 lines each), verified against source
architecture.md: a fictional "inlinesystem_promptis deprecated, use
system_prompt_file" ruleSignal matrix
Skill(context-gathering)invokedRead(.claude/knowledge/context/architecture.md)Readof any.claude/knowledge/filesystem_prompt_file(sentinel)grepofsrc/agents/loader.ts, not the sentinelFindings
1. Description-only triggering is unreliable on real projects. Phase A
Run 2 on the sterile demo showed
Skill(context-gathering)invocation. Theidentical charter + identical skill description on a real project did not.
Confounding variables to isolate in Run 2: task prompt language (Turkish
vs English — skill descriptions are English), task verb ("add a new agent"
is not literally "modify/debug/refactor/extend"), and CLAUDE.md hosting
context size.
2. The sentinel was weaker than intended. The "use
system_prompt_file"rule is code-derivable — existing agents already use the same pattern, so
pattern-matching from
grepproduces the correct answer without readingthe context file. For Run 2 the sentinel must be policy-only — a rule
that cannot be recovered from code. Candidate: "Every new agent must add
a one-line entry to
CHANGELOG.mdunder## Unreleased."3. cwd boundary is not enforced. The worktree was at
~/Projects/multi-mind-charter; the task prompt literally namedmulti-mind; a sibling~/Projects/multi-mindexisted. The agentresolved the name mismatch by navigating to the sibling and doing all
writes, builds, and test runs there. Charter's
CLAUDE.mdwas loadedfrom cwd at session start, so its policies were in-context, but
.claude/knowledge/files structurally could not apply to files underthe sibling. This is both a methodology bug in this run (use
/tmp/multi-mindnext time) and a real charter gap — charter has no<workspace_scope>clause to prevent sibling-repo navigation in anymulti-repo workspace.
Recommendations for v0.1.2
UserPromptSubmithook thatpattern-matches task verbs (add / modify / debug / refactor / extend)
and injects
Use Skill(context-gathering)into the prompt. Keeps thedescription as a fallback but doesn't rely on it alone.
<skills_index>in CLAUDE.md. Current: "Consult therelevant skill before the matching type of work." Proposed: "You
MUST invoke the matching skill BEFORE any tool call that matches its
trigger. Do not proceed with reads, greps, or edits until the skill
has been invoked."
<workspace_scope>block in CLAUDE.md: "All tool calls musttarget paths under the session's starting cwd. Accessing sibling or
parent directories requires explicit user confirmation, even if the
task prompt names a project that appears elsewhere on disk."
examples/demo/README.md.Phase A passing is not evidence v0.1.1 works on real projects.
Run 2 protocol (planned)
/tmp/multi-mind(matches task prompt name, no sibling)cross-language hypothesis from Finding 1
Happy to submit a PR for any of the v0.1.2 items after Run 2 confirms /
refines the findings. Full Phase B Run 1 report (171 lines) available
on request.