feat: accept language arg on add_pattern / update_pattern MCP tools by attila · Pull Request #63 · attila/lore

attila · 2026-05-19T19:26:47Z

Summary

Closes the MCP authoring-surface gap from PR #50: add_pattern and update_pattern now accept an optional language argument (scalar or array), write a canonical language: [...] frontmatter line, and surface unknown tokens via stderr and a new language_warnings array on the lore-metadata fence. update_pattern mirrors tags's three-way semantics (omit preserves, [] clears, non-empty replaces). append_to_pattern does not accept language by design — appends are body-only; the schema test pins this and fails fast if reverted.

Closes the inbox-branch advisory blackout in the same diff — the short-circuit now invokes parse_frontmatter_language_list directly so agent-submitted patterns get the same warning surface as locally-indexed ones.

Plan: docs/plans/2026-05-19-001-feat-mcp-language-arg-plan.md.

Test plan

just ci passes locally (fmt + clippy + 777 lib tests + 10 integration tests + deny + doc)
Real-binary UAT via lore serve over JSON-RPC from an isolated XDG environment
- Scalar language: "rust" → file written with language: [rust], fence language_warnings: []
- Array language: ["java","kotlin","groovy"] → order preserved
- Unknown token ["rust","objectiv-c"] → both surfaces fire (stderr Warning: line + fence language_warnings: ["objectiv-c"]); file still written (warn-and-proceed)
- update_pattern body-only against language: [swift, objectivec] → frontmatter preserved verbatim
- tools/list confirms language absent on append_to_pattern, present on the other two
lore status post-UAT reflects all writes in the Languages: breakdown

Establishes the per-unit breakdown, key technical decisions, and risk analysis for accepting a language argument on the pattern-authoring MCP tools. The plan is the input artefact for the four implementation units that follow.

Add the optional `language` field to both write-tool input schemas (`oneOf [string, array<string>]`) and parse it at the MCP boundary. `parse_language_arg` coerces a scalar to a one-element vec and preserves the absent / empty-array / non-empty distinction that U2 will route into `update_pattern`'s preserve / clear / replace branches. `check_language_limit` caps the serialised JSON shape at 8 KB to keep the validation cheap and bounded, mirroring `check_tags_limit`. The parsed value is intentionally unused in this commit — `_language` binding is consumed by U2 when the ingest signatures grow the parameter and `build_file_content` learns to render `language:` frontmatter. Validation still runs end-to-end, so a malformed argument fails with a structured error before any write lock is taken. Tests cover the six argument-shape cases (absent, scalar, array, duplicates, empty, non-string scalar, array with non-string entries), the size-limit guard, and three schema-shape pins on the tool catalogue: language present with the right `oneOf` shape on add_pattern and update_pattern, language absent on append_to_pattern (load-bearing regression guard for the schema-honesty decision).

Extend `build_file_content` to render a canonical `language: [...]` flow-list line after `tags:` so the on-disk frontmatter shape matches what `parse_frontmatter_language_list` already accepts. Single-token inputs render as one-element flow lists (`[rust]`) rather than scalars so the on-disk shape stays uniform regardless of how the agent passed the value through MCP. `ingest::add_pattern` gains a `language: &[&str]` parameter — empty slice writes no `language:` line, preserving the pre-feature shape byte-for-byte. `ingest::update_pattern` gains a matching `language: Option<&[&str]>` parameter with the same three-way semantics as `tags` (absent preserves the existing list via `parse_frontmatter_language_list`, `Some(&[])` clears, `Some(&[..])` replaces). The handlers in `server.rs` thread the parsed argument through. Tests pin all four U2 contracts end-to-end against the DB row's `language_json` value (the slice-shape-vs-pipeline-tests learning): flow-list rendering, multi-token order preservation, the tags-before-language frontmatter ordering, the absent case writing no frontmatter, and update_pattern's preserve / clear / replace branches with the preserve-on-absent case as the load-bearing regression guard against the de-language footgun. The two `#[allow(clippy::too_many_arguments)]` annotations on the write functions are intentional — both are the canonical MCP write paths and refactoring them into a parameter struct would obscure the call sites rather than help readers.

Extend `WriteResult` with `language_warnings: Vec<String>` and populate it from the chunking parser's `malformed_language` advisories so MCP callers can see exactly which tokens warn-and-proceeded. The metadata fence on `add_pattern`, `update_pattern`, and `append_to_pattern` now renders the array unconditionally (empty when valid, never omitted) so agents can pattern-match on the key without first checking existence. Closes the inbox-branch advisory gap (Residual 2). Before this commit the inbox-branch short-circuit skipped `index_single_file` entirely, silently swallowing unknown-language-token warnings on the dominant agent-submission path — exactly the dropped-argument failure mode this plan exists to fix, just on a different code path. Each of the three short-circuits (add, update, append) now invokes `parse_frontmatter_language_list` directly on the about-to-be-written content; the shared `collect_language_warnings` helper emits one stderr line per unique unknown token and populates `WriteResult`. Stderr and metadata-fence advisories now match across every write path. Integration tests in `tests/branch_push.rs` pin the inbox-branch behaviour against a real bare-remote setup: unknown tokens collect into the warnings vec, all-valid input returns an empty vec (present, not omitted), and update_pattern's inbox path mirrors add_pattern's. Unit tests in `src/ingest.rs` cover the normal-write equivalents including the preserve-on-absent path firing for unknown tokens already on disk. The `add_pattern` metadata-fence test in `src/server.rs` pins the end-to-end MCP-side contract for both the all-valid and warn cases.

Tighten the public-API contract for the new `language` field across the three pattern-authoring tools' top-level descriptions: - `add_pattern` mentions `language` and the warn-and-proceed semantics so agents discover the capability from the schema alone. - `update_pattern` documents the three-way `tags`-style semantics (omit preserves, `[]` clears, non-empty replaces) so callers do not have to test the boundary themselves to learn the contract. - `append_to_pattern` points at `update_pattern` for frontmatter changes — the discoverability surface for Decision 3's schema-honesty posture (no `language` field on the body-only tool). Add unit tests that pin each description-prose substring above as a drift guard. Substring matches keyed on stable nouns — `language`, `update_pattern`, the three preserve/clear/replace keywords — survive innocent wording tweaks while still failing fast if the contract documentation regresses. Move the ROADMAP entry from `## Up Next` to `## Completed` and add a single one-sentence CHANGELOG bullet under `Unreleased` › `Added`. The PR number on the CHANGELOG line is left as `(#?)` until the PR is opened; the owner can substitute the real number before review.

Two findings from the correctness review applied: Reject tokens containing `,`, `[`, `]`, newlines, or control characters at the MCP boundary. Embedded delimiters would render as `language: [a,b]` and the chunking parser would split the value into two distinct tokens on read-back, producing a silent round-trip mismatch the agent cannot easily detect. Hard-erroring in `parse_language_arg` surfaces the mistake before any write or push. Three new tests pin the comma, bracket, and newline cases. Document the intentional asymmetry between `tags` and `language` in the `update_pattern` preserve-on-`None` branch: the language parser lowercases tokens at read time, so a body-only update against an existing `language: [Rust]` file rewrites it as `language: [rust]`. The DB has stored the lowercased form since PR #50; the file now converges to match the canonical form. `tags`'s preserve path keeps casing because the tags parser is case-preserving — language tokens are validated against the canonical `LANGUAGES` table where lowercase IS the canonical form. Pinned by a new regression test so a future parser change in either direction surfaces at the call site.

Add the `pr:` frontmatter field to the plan so the in-flight mapping between plan and pull request is discoverable. Status stays `active` until merge per the existing convention. Condense the CHANGELOG entry to a single assertive sentence per the project's two-rule CHANGELOG convention; the detail lives in the plan and PR body.

Three knowledge-track solution docs derived from this PR's planning, implementation, and review cycle: - best-practices/schema-and-description-prose-are-testable-surface — treat MCP tool schemas, descriptions, and any contract artefact as testable surface. Substring and structural assertions catch a regression class nothing else covers. Triggered by the U4 plan-review exchange that reframed 'documentation deliverable' as testable contract. - best-practices/sibling-code-paths-can-reintroduce-fixed-failure-modes — when a fix establishes a contract on a public surface, every sibling code path that returns through that surface owes the contract. Audit sibling paths against the canonical failure signal and fold the fix into the current PR when cheap; otherwise hazard-pin per the composition-cascades pattern. Triggered by the R4 inbox-branch gap becoming U3 inside the same PR rather than a follow-up. - design-patterns/preserve-branch-canonicalisation-asymmetry — when two sibling fields share three-way preserve/clear/replace semantics but their parsers normalise differently, the preserve branches will diverge in observable behaviour. Pin intentional asymmetry with a regression test and an in-tree comment. Triggered by the correctness review of update_pattern's language preserve branch versus the tags precedent. All three cross-reference adjacent prior docs (slice-shape-tests, composition-cascades, round-trip-discriminator) and demarcate their distinct angle.

* doc: mark merged plans complete with PR links Four plan docs were left with `status: active` after their PRs merged. Flip to `status: complete`, add `completed:` dates matching merge dates, and link the `pr:` URL — matching the frontmatter shape used by other completed plans that carry a PR link. - coverage-check skill (#32, merged 2026-04-08) - language-detection architecture (#50, merged 2026-05-14) - track-2 observability (#59, merged 2026-05-16) - mcp language arg (#63, merged 2026-05-19) * chore(release): cut v0.4.0

attila added 7 commits May 19, 2026 19:54

doc: plan for MCP language arg on add/update/append

7f3c730

Establishes the per-unit breakdown, key technical decisions, and risk analysis for accepting a language argument on the pattern-authoring MCP tools. The plan is the input artefact for the four implementation units that follow.

doc: pin PR number in CHANGELOG

77b3b82

attila marked this pull request as ready for review May 19, 2026 19:34

attila added 2 commits May 19, 2026 20:37

attila merged commit 0011502 into main May 19, 2026
10 checks passed

attila deleted the feat/mcp-language-arg branch May 19, 2026 19:57

attila mentioned this pull request May 19, 2026

chore(release): cut v0.4.0 #64

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: accept language arg on add_pattern / update_pattern MCP tools#63

feat: accept language arg on add_pattern / update_pattern MCP tools#63
attila merged 9 commits into
mainfrom
feat/mcp-language-arg

attila commented May 19, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

attila commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

attila commented May 19, 2026 •

edited

Loading