feat: accept language arg on add_pattern / update_pattern MCP tools#63
Merged
Conversation
Establishes the per-unit breakdown, key technical decisions, and risk analysis for accepting a language argument on the pattern-authoring MCP tools. The plan is the input artefact for the four implementation units that follow.
Add the optional `language` field to both write-tool input schemas (`oneOf [string, array<string>]`) and parse it at the MCP boundary. `parse_language_arg` coerces a scalar to a one-element vec and preserves the absent / empty-array / non-empty distinction that U2 will route into `update_pattern`'s preserve / clear / replace branches. `check_language_limit` caps the serialised JSON shape at 8 KB to keep the validation cheap and bounded, mirroring `check_tags_limit`. The parsed value is intentionally unused in this commit — `_language` binding is consumed by U2 when the ingest signatures grow the parameter and `build_file_content` learns to render `language:` frontmatter. Validation still runs end-to-end, so a malformed argument fails with a structured error before any write lock is taken. Tests cover the six argument-shape cases (absent, scalar, array, duplicates, empty, non-string scalar, array with non-string entries), the size-limit guard, and three schema-shape pins on the tool catalogue: language present with the right `oneOf` shape on add_pattern and update_pattern, language absent on append_to_pattern (load-bearing regression guard for the schema-honesty decision).
Extend `build_file_content` to render a canonical `language: [...]` flow-list line after `tags:` so the on-disk frontmatter shape matches what `parse_frontmatter_language_list` already accepts. Single-token inputs render as one-element flow lists (`[rust]`) rather than scalars so the on-disk shape stays uniform regardless of how the agent passed the value through MCP. `ingest::add_pattern` gains a `language: &[&str]` parameter — empty slice writes no `language:` line, preserving the pre-feature shape byte-for-byte. `ingest::update_pattern` gains a matching `language: Option<&[&str]>` parameter with the same three-way semantics as `tags` (absent preserves the existing list via `parse_frontmatter_language_list`, `Some(&[])` clears, `Some(&[..])` replaces). The handlers in `server.rs` thread the parsed argument through. Tests pin all four U2 contracts end-to-end against the DB row's `language_json` value (the slice-shape-vs-pipeline-tests learning): flow-list rendering, multi-token order preservation, the tags-before-language frontmatter ordering, the absent case writing no frontmatter, and update_pattern's preserve / clear / replace branches with the preserve-on-absent case as the load-bearing regression guard against the de-language footgun. The two `#[allow(clippy::too_many_arguments)]` annotations on the write functions are intentional — both are the canonical MCP write paths and refactoring them into a parameter struct would obscure the call sites rather than help readers.
Extend `WriteResult` with `language_warnings: Vec<String>` and populate it from the chunking parser's `malformed_language` advisories so MCP callers can see exactly which tokens warn-and-proceeded. The metadata fence on `add_pattern`, `update_pattern`, and `append_to_pattern` now renders the array unconditionally (empty when valid, never omitted) so agents can pattern-match on the key without first checking existence. Closes the inbox-branch advisory gap (Residual 2). Before this commit the inbox-branch short-circuit skipped `index_single_file` entirely, silently swallowing unknown-language-token warnings on the dominant agent-submission path — exactly the dropped-argument failure mode this plan exists to fix, just on a different code path. Each of the three short-circuits (add, update, append) now invokes `parse_frontmatter_language_list` directly on the about-to-be-written content; the shared `collect_language_warnings` helper emits one stderr line per unique unknown token and populates `WriteResult`. Stderr and metadata-fence advisories now match across every write path. Integration tests in `tests/branch_push.rs` pin the inbox-branch behaviour against a real bare-remote setup: unknown tokens collect into the warnings vec, all-valid input returns an empty vec (present, not omitted), and update_pattern's inbox path mirrors add_pattern's. Unit tests in `src/ingest.rs` cover the normal-write equivalents including the preserve-on-absent path firing for unknown tokens already on disk. The `add_pattern` metadata-fence test in `src/server.rs` pins the end-to-end MCP-side contract for both the all-valid and warn cases.
Tighten the public-API contract for the new `language` field across
the three pattern-authoring tools' top-level descriptions:
- `add_pattern` mentions `language` and the warn-and-proceed
semantics so agents discover the capability from the schema alone.
- `update_pattern` documents the three-way `tags`-style semantics
(omit preserves, `[]` clears, non-empty replaces) so callers do not
have to test the boundary themselves to learn the contract.
- `append_to_pattern` points at `update_pattern` for frontmatter
changes — the discoverability surface for Decision 3's schema-honesty
posture (no `language` field on the body-only tool).
Add unit tests that pin each description-prose substring above as a
drift guard. Substring matches keyed on stable nouns — `language`,
`update_pattern`, the three preserve/clear/replace keywords — survive
innocent wording tweaks while still failing fast if the contract
documentation regresses.
Move the ROADMAP entry from `## Up Next` to `## Completed` and add a
single one-sentence CHANGELOG bullet under `Unreleased` › `Added`. The
PR number on the CHANGELOG line is left as `(#?)` until the PR is
opened; the owner can substitute the real number before review.
Two findings from the correctness review applied: Reject tokens containing `,`, `[`, `]`, newlines, or control characters at the MCP boundary. Embedded delimiters would render as `language: [a,b]` and the chunking parser would split the value into two distinct tokens on read-back, producing a silent round-trip mismatch the agent cannot easily detect. Hard-erroring in `parse_language_arg` surfaces the mistake before any write or push. Three new tests pin the comma, bracket, and newline cases. Document the intentional asymmetry between `tags` and `language` in the `update_pattern` preserve-on-`None` branch: the language parser lowercases tokens at read time, so a body-only update against an existing `language: [Rust]` file rewrites it as `language: [rust]`. The DB has stored the lowercased form since PR #50; the file now converges to match the canonical form. `tags`'s preserve path keeps casing because the tags parser is case-preserving — language tokens are validated against the canonical `LANGUAGES` table where lowercase IS the canonical form. Pinned by a new regression test so a future parser change in either direction surfaces at the call site.
Add the `pr:` frontmatter field to the plan so the in-flight mapping between plan and pull request is discoverable. Status stays `active` until merge per the existing convention. Condense the CHANGELOG entry to a single assertive sentence per the project's two-rule CHANGELOG convention; the detail lives in the plan and PR body.
Three knowledge-track solution docs derived from this PR's planning, implementation, and review cycle: - best-practices/schema-and-description-prose-are-testable-surface — treat MCP tool schemas, descriptions, and any contract artefact as testable surface. Substring and structural assertions catch a regression class nothing else covers. Triggered by the U4 plan-review exchange that reframed 'documentation deliverable' as testable contract. - best-practices/sibling-code-paths-can-reintroduce-fixed-failure-modes — when a fix establishes a contract on a public surface, every sibling code path that returns through that surface owes the contract. Audit sibling paths against the canonical failure signal and fold the fix into the current PR when cheap; otherwise hazard-pin per the composition-cascades pattern. Triggered by the R4 inbox-branch gap becoming U3 inside the same PR rather than a follow-up. - design-patterns/preserve-branch-canonicalisation-asymmetry — when two sibling fields share three-way preserve/clear/replace semantics but their parsers normalise differently, the preserve branches will diverge in observable behaviour. Pin intentional asymmetry with a regression test and an in-tree comment. Triggered by the correctness review of update_pattern's language preserve branch versus the tags precedent. All three cross-reference adjacent prior docs (slice-shape-tests, composition-cascades, round-trip-discriminator) and demarcate their distinct angle.
attila
added a commit
that referenced
this pull request
May 19, 2026
* doc: mark merged plans complete with PR links Four plan docs were left with `status: active` after their PRs merged. Flip to `status: complete`, add `completed:` dates matching merge dates, and link the `pr:` URL — matching the frontmatter shape used by other completed plans that carry a PR link. - coverage-check skill (#32, merged 2026-04-08) - language-detection architecture (#50, merged 2026-05-14) - track-2 observability (#59, merged 2026-05-16) - mcp language arg (#63, merged 2026-05-19) * chore(release): cut v0.4.0
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes the MCP authoring-surface gap from PR #50:
add_patternandupdate_patternnow accept an optionallanguageargument (scalar or array), write a canonicallanguage: [...]frontmatter line, and surface unknown tokens via stderr and a newlanguage_warningsarray on thelore-metadatafence.update_patternmirrorstags's three-way semantics (omit preserves,[]clears, non-empty replaces).append_to_patterndoes not acceptlanguageby design — appends are body-only; the schema test pins this and fails fast if reverted.Closes the inbox-branch advisory blackout in the same diff — the short-circuit now invokes
parse_frontmatter_language_listdirectly so agent-submitted patterns get the same warning surface as locally-indexed ones.Plan:
docs/plans/2026-05-19-001-feat-mcp-language-arg-plan.md.Test plan
just cipasses locally (fmt + clippy + 777 lib tests + 10 integration tests + deny + doc)lore serveover JSON-RPC from an isolated XDG environmentlanguage: "rust"→ file written withlanguage: [rust], fencelanguage_warnings: []language: ["java","kotlin","groovy"]→ order preserved["rust","objectiv-c"]→ both surfaces fire (stderrWarning:line + fencelanguage_warnings: ["objectiv-c"]); file still written (warn-and-proceed)update_patternbody-only againstlanguage: [swift, objectivec]→ frontmatter preserved verbatimtools/listconfirmslanguageabsent onappend_to_pattern, present on the other twolore statuspost-UAT reflects all writes in theLanguages:breakdown