|
| 1 | +# Simulate Persona |
| 2 | + |
| 3 | +Simulate a persona-based user test against the live Knowledge Mapper application. |
| 4 | + |
| 5 | +## Usage |
| 6 | + |
| 7 | +``` |
| 8 | +/simulate-persona <PERSONA_ID> |
| 9 | +``` |
| 10 | + |
| 11 | +Example: `/simulate-persona P01` runs Alex the Tech Reporter simulation. |
| 12 | + |
| 13 | +## Arguments |
| 14 | + |
| 15 | +- `$ARGUMENTS`: Persona ID (P01–P21) or category name (reporter, expert, learner, power-user, pedant, edge-case) |
| 16 | + |
| 17 | +## Pipeline Overview |
| 18 | + |
| 19 | +The simulation runs a 4-phase pipeline (5 phases if issues are found): |
| 20 | + |
| 21 | +1. **Phase 1: Playwright Automation** — Mechanical browser interaction |
| 22 | +2. **Phase 2: AI Cognitive Evaluation** — Task agent reads checkpoints + screenshots |
| 23 | +3. **Phase 3: Pedant Web Verification** — (Pedant only) Opus agent verifies corrections |
| 24 | +4. **Phase 4: Report Assembly** — Compile JSON + Markdown reports |
| 25 | +5. **Phase 5: Issue Triage & Fix** — Create GitHub issues, implement fixes, submit PRs |
| 26 | + |
| 27 | +## Execution Steps |
| 28 | + |
| 29 | +### Step 0: Setup |
| 30 | + |
| 31 | +1. Read persona definition from `tests/visual/personas/definitions.js` — find the persona matching `$ARGUMENTS` |
| 32 | +2. Clean any stale working files: delete `tests/visual/.working/personas/{personaId}-*` |
| 33 | +3. Verify dev server is running at `http://localhost:5173/mapper/` |
| 34 | +4. Create TodoWrite entries for progress tracking |
| 35 | + |
| 36 | +### Step 1: Playwright Automation (Phase 1) |
| 37 | + |
| 38 | +Run the Playwright test for this persona: |
| 39 | + |
| 40 | +```bash |
| 41 | +npx playwright test persona-agents.spec.js -g "Persona: {personaName}" |
| 42 | +``` |
| 43 | + |
| 44 | +For pedant personas: |
| 45 | +```bash |
| 46 | +npx playwright test persona-pedant.spec.js -g "Pedant: {personaName}" |
| 47 | +``` |
| 48 | + |
| 49 | +This produces: |
| 50 | +- `tests/visual/.working/personas/{personaId}-checkpoint-{N}.json` for each checkpoint |
| 51 | +- `tests/visual/screenshots/personas/{personaId}-checkpoint-{N}.png` for each screenshot |
| 52 | + |
| 53 | +Each checkpoint JSON contains: |
| 54 | +```json |
| 55 | +{ |
| 56 | + "personaId": "P01", |
| 57 | + "checkpointNumber": 1, |
| 58 | + "questionsAnswered": 5, |
| 59 | + "questionsInBatch": [ |
| 60 | + { |
| 61 | + "questionId": "abc123", |
| 62 | + "questionText": "...", |
| 63 | + "options": { "A": "...", "B": "...", "C": "...", "D": "..." }, |
| 64 | + "correctAnswer": "B", |
| 65 | + "selectedAnswer": "B", |
| 66 | + "wasCorrect": true, |
| 67 | + "difficulty": 2, |
| 68 | + "domainId": "physics", |
| 69 | + "sourceArticle": "..." |
| 70 | + } |
| 71 | + ], |
| 72 | + "screenshotPath": "tests/visual/screenshots/personas/P01-checkpoint-1.png", |
| 73 | + "consoleErrors": [], |
| 74 | + "domainMappedPct": 12, |
| 75 | + "timestamp": 1709352000000 |
| 76 | +} |
| 77 | +``` |
| 78 | + |
| 79 | +### Step 2: AI Cognitive Evaluation (Phase 2) |
| 80 | + |
| 81 | +For EACH checkpoint, spawn a Task agent: |
| 82 | + |
| 83 | +**Regular personas (Sonnet 4.6):** |
| 84 | +``` |
| 85 | +Task agent (model: sonnet, subagent_type: general-purpose): |
| 86 | + "You are role-playing as {persona.name}. {persona.personality} |
| 87 | +
|
| 88 | + Read the checkpoint data at: {checkpointPath} |
| 89 | + Read the screenshot at: {screenshotPath} |
| 90 | +
|
| 91 | + BEFORE looking at the screenshot, state what you expect the map to look like. |
| 92 | + THEN read the screenshot and compare reality to your expectation. |
| 93 | +
|
| 94 | + For each question in this batch, evaluate: |
| 95 | + - Is the marked answer correct? |
| 96 | + - Are the distractors plausible? |
| 97 | + - Does the question test meaningful understanding? |
| 98 | + - Rate content validity, distractor quality, difficulty, educational value, clarity (1-5 each) |
| 99 | +
|
| 100 | + Write your evaluation as JSON to: {evalOutputPath} |
| 101 | + Use the AgentEvaluation schema from the data model." |
| 102 | +``` |
| 103 | + |
| 104 | +**Pedant personas (Opus 4.6):** |
| 105 | +``` |
| 106 | +Task agent (model: opus, subagent_type: general-purpose): |
| 107 | + Same as above but with additional instructions: |
| 108 | + "If you disagree with any marked answer, use the WebSearch tool to verify. |
| 109 | + Search for authoritative sources. Cite the URL. |
| 110 | + If web evidence supports your correction: verdict = CORRECTION_VERIFIED |
| 111 | + If web evidence confirms original: verdict = ORIGINAL_CONFIRMED |
| 112 | + If inconclusive: verdict = INCONCLUSIVE |
| 113 | + NEVER hallucinate a correction without web evidence." |
| 114 | +``` |
| 115 | + |
| 116 | +Each evaluation produces: |
| 117 | +- `tests/visual/.working/personas/{personaId}-eval-{N}.json` |
| 118 | + |
| 119 | +#### Category-Specific Evaluation Guidance |
| 120 | + |
| 121 | +**Reporter agents (P01-P03)** should focus on: |
| 122 | +- Visual impact — would this screenshot look good in a tech article? |
| 123 | +- Question quality for non-expert audience — nothing too obscure |
| 124 | +- Polish — no loading spinners, no visual artifacts, smooth gradients |
| 125 | +- First impression criteria from expected-outcomes/reporters.json |
| 126 | + |
| 127 | +**Expert agents (P04-P07)** should focus on: |
| 128 | +- Answer correctness — use real domain knowledge to verify marked answers |
| 129 | +- Difficulty calibration — do questions test conceptual understanding vs trivia? |
| 130 | +- Map accuracy — does the green/yellow/red distribution match their expertise profile? |
| 131 | +- Distractor quality — all four options should be plausible at first glance |
| 132 | + |
| 133 | +**Learner agents (P08-P11)** should focus on: |
| 134 | +- Emotional arc — curiosity → mixed success → insight → continued engagement |
| 135 | +- Question diversity — no more than 5 consecutive questions on the same sub-topic |
| 136 | +- "Aha moments" — identify at least 1 moment where the map reveals something surprising |
| 137 | +- Map readability for non-experts — clear color differentiation, intuitive layout |
| 138 | +- Self-assessment: "Would I show this to a friend?" and "Did I learn something about myself?" |
| 139 | + |
| 140 | +**Power user agents (P12-P14)** should focus on: |
| 141 | +- Estimator stability — no Cholesky errors, NaN, or Infinity values |
| 142 | +- Domain-mapped % smooth progression — no jumps >15 percentage points |
| 143 | +- Domain switching cleanliness (P13) — no state leakage between domains |
| 144 | +- Rapid input handling (P14) — no dropped answers or visual glitches |
| 145 | + |
| 146 | +### Step 3: Pedant Web Verification (Phase 3 — pedant only) |
| 147 | + |
| 148 | +For any question where the pedant agent flagged `isCorrectAsMarked: false`: |
| 149 | + |
| 150 | +1. Read the eval JSON to find flagged questions |
| 151 | +2. If the agent already searched (webVerification.searched = true), the verification is done |
| 152 | +3. If not, spawn an additional Opus Task agent with WebSearch tool to verify |
| 153 | +4. Write all verified corrections to: `tests/visual/.working/personas/{personaId}-corrections.json` |
| 154 | + |
| 155 | +### Step 4: Report Assembly (Phase 4) |
| 156 | + |
| 157 | +1. Read all checkpoint JSONs and evaluation JSONs from `.working/personas/` |
| 158 | +2. Compile the PersonaReport: |
| 159 | + - Concatenate all belief narratives into experience summary |
| 160 | + - Collect all question evaluations into question audit |
| 161 | + - Collect all issues, sort by severity |
| 162 | + - Determine result: PASS / FAIL / AMBIGUOUS per spec criteria |
| 163 | +3. Write outputs: |
| 164 | + - `tests/visual/reports/{personaId}-report.json` (machine-readable) |
| 165 | + - `tests/visual/reports/{personaId}-report.md` (human-readable) |
| 166 | + |
| 167 | +### Step 5: Issue Triage & Fix (Phase 5 — if issues found) |
| 168 | + |
| 169 | +For each blocker or major issue discovered: |
| 170 | + |
| 171 | +1. Create a GitHub issue on the feature branch describing the problem |
| 172 | +2. Spawn a Task agent to investigate and implement a fix |
| 173 | +3. Verify the fix by re-running the affected checkpoint |
| 174 | +4. Submit the fix as a commit on the `004-persona-user-testing` branch |
| 175 | + |
| 176 | +## Resume from Checkpoint |
| 177 | + |
| 178 | +If context runs out mid-simulation: |
| 179 | + |
| 180 | +1. Check `tests/visual/.working/personas/` for existing files |
| 181 | +2. Find the highest checkpoint number with a corresponding eval file |
| 182 | +3. Resume from the next unevaluated checkpoint |
| 183 | +4. The Playwright test only needs to re-run if checkpoint data files are missing |
| 184 | + |
| 185 | +## Working File Conventions |
| 186 | + |
| 187 | +All intermediate files in `tests/visual/.working/personas/`: |
| 188 | + |
| 189 | +| Pattern | Phase | Description | |
| 190 | +|---------|-------|-------------| |
| 191 | +| `{id}-checkpoint-{N}.json` | 1 | Playwright automation output | |
| 192 | +| `{id}-eval-{N}.json` | 2 | AI agent evaluation | |
| 193 | +| `{id}-corrections.json` | 3 | Pedant verified corrections | |
| 194 | +| `{id}-report.json` | 4 | Final compiled report | |
| 195 | +| `{id}-report.md` | 4 | Human-readable report | |
| 196 | + |
| 197 | +## Pass/Fail Criteria |
| 198 | + |
| 199 | +- **PASS**: All checkpoints met expectations. No blocker/major issues. Positive experience summary. ≤10% low-quality questions. |
| 200 | +- **FAIL**: Any blocker issue (crash, estimator collapse, wrong map). Negative experience summary. >25% problematic questions. |
| 201 | +- **AMBIGUOUS**: Only minor/cosmetic issues but mixed feelings. Small but consistent expectation-reality gaps. Requires human review. |
| 202 | + |
| 203 | +## Persona Categories Quick Reference |
| 204 | + |
| 205 | +| Category | IDs | Model | Checkpoint Interval | Special | |
| 206 | +|----------|-----|-------|--------------------|---------| |
| 207 | +| Reporter | P01-P03 | Sonnet | 4-5 | First impressions | |
| 208 | +| Expert | P04-P07 | Sonnet | 5 | Domain expertise verification | |
| 209 | +| Learner | P08-P11 | Sonnet | 5 | Emotional arc, aha moments | |
| 210 | +| Power User | P12-P14 | Sonnet | 10-20 | Stress test, stability | |
| 211 | +| Pedant | P19-P21 | Opus | 1 (every Q) | Web-verified corrections | |
| 212 | +| Edge Case | P15-P18 | Sonnet | 8-10 | Feature-specific testing | |
0 commit comments