Add Steps 13-15: hooks, testing, procedure encoding by mike-remakerdigital · Pull Request #1 · mike-remakerdigital/groundtruth

mike-remakerdigital · 2026-03-23T14:34:17Z

Summary

Step 13: Defense-in-Depth Hook Architecture — Documents the three-layer enforcement model (PreToolUse real-time hooks, git pre-commit checks, SessionStart session-time hooks) with specific hook implementations, registration patterns, and custom review agents
Step 14: Automated Testing Strategy — Documents the 12-suite testing taxonomy (9,152 tests), 4 test generation patterns, live-only testing principle, two-container test host architecture, and known testing gaps
Step 15: Encoding Procedures to Reduce Orchestration Load — Documents the core insight that Claude is too error-prone for reliable multi-step orchestration from prose alone, with the encoding principle ("prompts encode intent, hooks encode enforcement, skills encode execution"), the 10-skill operational surface, and measured effectiveness (GOV-12 violations reduced from 30% to 3%)

Also updates:

6 new lessons learned (41-46) from sessions S207-S212
Session count 206 → 212 throughout
Metrics updated (9,152 tests, 10 skills, 33 open WIs)
5 new evolution timeline entries (S207-S212)
10 new Quick Start Checklist items
4 new "What the System Catches" entries in README

Test plan

All 15 steps present in document
46 lessons learned (was 40)
27 checklist items (was 17)
Markdown renders correctly (no broken tables or code blocks)
Session counts consistent throughout both files

🤖 Generated with Claude Code

…encoding New content covering the full Agent Red setup (S207-S212): Step 13 — Defense-in-Depth Hook Architecture: - Three-layer enforcement model (PreToolUse → pre-commit → SessionStart) - Credential scanner, destructive command gate, assertion ratchet, test deletion guard, architectural guards, TSX spec gate - Custom agents (code-reviewer, security-analyzer) Step 14 — Automated Testing Strategy: - 12 suite taxonomy (unit through property-based, 9,152 tests) - 4 test generation patterns (spec-driven, assertion, auto-generated, regression) - Live-only testing principle (GOV-10) and two-container test host - Known testing gaps documented honestly Step 15 — Encoding Procedures to Reduce Orchestration Load: - The orchestration problem (step omission, order violation, drift, partial execution) - "Prompts encode intent. Hooks encode enforcement. Skills encode execution." - 10-skill catalog as operational surface - Measured 10x reduction in GOV-12 violations Also: 6 new lessons learned (41-46), updated metrics (212 sessions, 9,152 tests, 10 skills), new evolution timeline entries, expanded Quick Start Checklist. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

gemini-code-assist · 2026-03-23T14:34:38Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the documentation of the Membase pattern by integrating three new foundational steps: a robust, layered hook architecture for defense-in-depth, a comprehensive strategy for automated testing, and a principle for encoding repeatable procedures to improve AI orchestration reliability. These additions formalize critical quality assurance and governance mechanisms, ensuring more consistent and secure AI-driven development by shifting from prose-based instructions to executable enforcement and structured workflows.

Highlights

Defense-in-Depth Hook Architecture (Step 13): Introduced a three-layer enforcement model (PreToolUse real-time hooks, git pre-commit checks, SessionStart session-time hooks) with specific implementations and custom review agents to catch different failure modes.
Automated Testing Strategy (Step 14): Documented a comprehensive 12-suite testing taxonomy (9,152 tests), 4 test generation patterns, the live-only testing principle, a two-container test host architecture, and acknowledged known testing gaps.
Encoding Procedures to Reduce Orchestration Load (Step 15): Explained the core insight that Claude is error-prone for multi-step orchestration from prose alone, introducing the encoding principle ('prompts encode intent, hooks encode enforcement, skills encode execution'), a 10-skill operational surface, and measured effectiveness (GOV-12 violations reduced from 30% to 3%).
Lessons Learned: Added 6 new lessons learned (41-46) derived from sessions S207-S212, covering topics like layered defense, assertion ratchets, procedure encoding, hook design, test architecture, and browser instance management.
Session Count Updates: Updated the project's session count from 206 to 212 across all relevant documentation.
Metrics Updates: Revised key metrics, including the total number of automated tests (now 9,152), skills (now 10), and open work items (now 33).
Evolution Timeline: Added 5 new entries to the evolution timeline (S207-S212) in the README, detailing recent advancements.
Quick Start Checklist: Expanded the Quick Start Checklist with 10 new items related to the newly introduced hooks, testing, and procedure encoding steps.
System Catches: Added 4 new 'What the System Catches' entries in the README, highlighting new preventative measures against credential leaks, test weakening, architectural erosion, and untraced frontend changes.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds extensive documentation for Steps 13-15, covering defense-in-depth hooks, automated testing strategies, and procedure encoding. It also updates various metrics and lessons learned across the documentation. My review focuses on ensuring consistency and clarity in the newly added content. I've found a few inconsistencies in metrics (like test counts and skill counts) across different sections and files, a typo in a table header, and a broken markdown table. Addressing these points will improve the overall quality and readability of the documentation.

gemini-code-assist · 2026-03-23T14:36:53Z

 | Operational procedures | 14 |
 | Governance principles | 20 (GOV-01 through GOV-18 + 2 architectural) |
-| Test plan phases | 18 active (incl. fuzzing + property phases) |
+| Test plan phases | 18 active (incl. fuzzing + property phases) | |


gemini-code-assist · 2026-03-23T14:36:54Z

+| Suite | Est. Tests | Speed | What It Catches | Generation Method |
+|-------|-----------|-------|----------------|-------------------|
+| **Unit** | 950 | ~2 min | Logic errors, type mismatches, pure function bugs | Claude-generated from specs |
+| **Core** (multi-tenant) | 3,700 | ~5 min | Tenant isolation, API routing, auth, middleware | Claude-generated from specs |
+| **Integration** | 270 | ~3 min | Component interaction, database queries, cache behavior | Claude-generated from specs + WIs |
+| **Agents** | 300 | ~2 min | MCP agent dispatch, tool execution, guardrails | Claude-generated from SPEC-1706..1712 |
+| **Security** | 150 | ~2 min | Auth bypass, injection, tenant leaks, OWASP patterns | Claude-generated from security specs |
+| **Regression** | 47 | ~1 min | Previously-fixed bugs | Auto-created when WI is resolved |
+| **Widget** | 60 | ~1 min | Embed behavior, postMessage, launcher, resize | Claude-generated from widget specs |
+| **Ops** | 80 | ~1 min | Deployment checks, config validation, health probes | Claude-generated from procedures |
+| **E2E Live** | 1,100 | ~15 min | Real deployment: Playwright against staging/production | Claude-generated from user stories |
+| **Load** | variable | ~10 min | Throughput, latency under load, rate limit behavior | Locust scenarios from capacity specs |
+| **Fuzzing** | 307 ops | ~10 min | API contract violations, edge cases, unexpected inputs | Schemathesis auto-generated from OpenAPI |
+| **Property-based** | 46 | ~3 min | Algebraic invariants, roundtrip properties | Hypothesis strategies from data models |


The estimated test counts in this table seem to be inconsistent with the total number of tests mentioned elsewhere. The sum of Est. Tests here is 6,703 (or 7,010 if including fuzzing ops), but other parts of the documentation (like README.md) mention a total of 9,152 automated tests. To avoid confusion, could you please update these numbers to be consistent with the total?

gemini-code-assist · 2026-03-23T14:36:54Z

+```
+Batch 1: core-a (2,400 tests, parallel)  -> 30s cooldown
+Batch 2: core-b (680 tests, parallel)    -> 30s cooldown
+Batch 3: agents-chat (600 tests, parallel) -> 30s cooldown
+Batch 4: integrations (400 tests, parallel) -> 30s cooldown
+Batch 5: sequential (120 tests, serial)
+```


The test counts in this example for thermal-safe testing are inconsistent with the Testing Taxonomy table in the same document. For example:

core-a (2,400) + core-b (680) = 3,080 tests, but the taxonomy lists 3,700 for Core.

agents-chat is 600 tests here, but Agents in the taxonomy is 300.

integrations is 400 tests here, but Integration in the taxonomy is 270.

Please align these numbers to ensure consistency throughout the document.

gemini-code-assist · 2026-03-23T14:36:54Z

+- [ ] Create git pre-commit hooks: assertion ratchet, test deletion guard, architectural guards, TSX gate, credential scan (Step 13)
+- [ ] Generate assertion baseline JSON for the ratchet hook (Step 13)
+- [ ] Create custom review agents (code-reviewer, security-analyzer) in `.claude/agents/` (Step 13)
+- [ ] Expand test taxonomy beyond unit/integration/e2e: add security, regression, fuzzing, property-based, load suites (Step 14)


This checklist item for expanding the test taxonomy could be clearer. It lists security, regression, fuzzing, property-based, load suites, but this is an incomplete list compared to the 12-suite taxonomy defined in Step 14. It's also slightly confusing because regression and fuzzing are already part of the Phase Taxonomy in Step 10.

To improve clarity, I suggest either listing all test suites that should be added or rephrasing to better guide the user on how to expand from a basic setup to the full 12-suite taxonomy.

gemini-code-assist · 2026-03-23T14:36:54Z

+| S211 | Claude silently weakens tests and removes architectural patterns | **Quality guardrails** — 5 PreToolUse/pre-commit hooks (assertion ratchet, test deletion guard, architecture guard, TSX spec gate, credential scan). Three-layer defense model. |
+| S212 | Production and staging share test infrastructure; uncontrolled regressions | **Environment isolation + production verification** — Separate test hosts per environment, skip-as-pass classification, SPEC-0058 enforcement (24 files cleaned), widget storefront presence testing |

 ### Current Database (as of Session 206)


There seems to be a typo in this section header. It says "as of Session 206", but based on the pull request and other updates in the file, it should probably be "as of Session 212".

Suggested change

### Current Database (as of Session 206)

### Current Database (as of Session 212)

gemini-code-assist · 2026-03-23T14:36:54Z

 The database is used exclusively by Claude and contains only what Claude needs to remember. The human observes through a lightweight read-only UI (sort, filter, search, tree-view, change history) that deliberately excludes write operations. When the human spots a discrepancy, they tell Claude, and Claude creates a corrected version.

-The current database is ~40 MB with 2,052 specifications, 10,847 test artifacts, 1 test plan (18 active phases), ~1,600 work items, 14 operational procedures, 176 documents, 520 testable elements, ~2,040 specs with machine-verifiable assertions (99.5% coverage), 8 KB-aware Claude Code skills, and multi-agent coordination via prime-bridge — all accumulated across 206 sessions with zero data loss.
+The current database is ~40 MB with 2,052 specifications, 10,847 test artifacts, 1 test plan (18 active phases), ~1,600 work items, 14 operational procedures, 176 documents, 520 testable elements, ~2,040 specs with machine-verifiable assertions (99.5% coverage), 8 KB-aware Claude Code skills, and multi-agent coordination via prime-bridge — all accumulated across 212 sessions with zero data loss.


This summary paragraph contains several outdated metrics that are inconsistent with the "Current Database" table updated in this same pull request. Specifically:

Test artifacts: 10,847 here vs. 10,912 in the table.

Knowledge documents: 176 here vs. 154 in the table.

Claude Code skills: 8 here vs. 10 in the table and elsewhere.

Please update this paragraph to reflect the latest metrics for consistency.

Suggested change

The current database is ~40 MB with 2,052 specifications, 10,847 test artifacts, 1 test plan (18 active phases), ~1,600 work items, 14 operational procedures, 176 documents, 520 testable elements, ~2,040 specs with machine-verifiable assertions (99.5% coverage), 8 KB-aware Claude Code skills, and multi-agent coordination via prime-bridge — all accumulated across 212 sessions with zero data loss.

The current database is ~40 MB with 2,052 specifications, 10,912 test artifacts, 1 test plan (18 active phases), ~1,600 work items, 14 operational procedures, 154 documents, 520 testable elements, ~2,040 specs with machine-verifiable assertions (99.5% coverage), 10 KB-aware Claude Code skills, and multi-agent coordination via prime-bridge — all accumulated across 212 sessions with zero data loss.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9b9cd56e35

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-23T14:36:58Z

 The database is used exclusively by Claude and contains only what Claude needs to remember. The human observes through a lightweight read-only UI (sort, filter, search, tree-view, change history) that deliberately excludes write operations. When the human spots a discrepancy, they tell Claude, and Claude creates a corrected version.

-The current database is ~40 MB with 2,052 specifications, 10,847 test artifacts, 1 test plan (18 active phases), ~1,600 work items, 14 operational procedures, 176 documents, 520 testable elements, ~2,040 specs with machine-verifiable assertions (99.5% coverage), 8 KB-aware Claude Code skills, and multi-agent coordination via prime-bridge — all accumulated across 206 sessions with zero data loss.
+The current database is ~40 MB with 2,052 specifications, 10,847 test artifacts, 1 test plan (18 active phases), ~1,600 work items, 14 operational procedures, 176 documents, 520 testable elements, ~2,040 specs with machine-verifiable assertions (99.5% coverage), 8 KB-aware Claude Code skills, and multi-agent coordination via prime-bridge — all accumulated across 212 sessions with zero data loss.


Update summary paragraph to match revised database metrics

The final “current database” paragraph was edited for the 212-session count but still reports old values (10,847 test artifacts, 176 documents, 8 skills) that now conflict with the updated metrics table in the same section (10,912, 154, 10). This introduces contradictory source-of-truth data in one README and can mislead readers who copy numbers from the summary text instead of the table.

Useful? React with 👍 / 👎.

mike-remakerdigital merged commit 7ed1a01 into main Mar 23, 2026
2 checks passed

gemini-code-assist Bot reviewed Mar 23, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Mar 23, 2026

View reviewed changes

mike-remakerdigital mentioned this pull request Mar 23, 2026

Add reference implementations for Steps 13-15 #2

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Steps 13-15: hooks, testing, procedure encoding#1

Add Steps 13-15: hooks, testing, procedure encoding#1
mike-remakerdigital merged 1 commit into
mainfrom
update-setup-documentation

mike-remakerdigital commented Mar 23, 2026

Uh oh!

gemini-code-assist Bot commented Mar 23, 2026

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Mar 23, 2026

Uh oh!

gemini-code-assist Bot Mar 23, 2026

Uh oh!

gemini-code-assist Bot Mar 23, 2026

Uh oh!

gemini-code-assist Bot Mar 23, 2026

Uh oh!

gemini-code-assist Bot Mar 23, 2026

Uh oh!

gemini-code-assist Bot Mar 23, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	\| Test plan phases \| 18 active (incl. fuzzing + property phases) \| \|
	\| Test plan phases \| 18 active (incl. fuzzing + property phases) \|

	### Current Database (as of Session 206)
	### Current Database (as of Session 212)

Conversation

mike-remakerdigital commented Mar 23, 2026

Summary

Test plan

Uh oh!

gemini-code-assist Bot commented Mar 23, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant