Unified mapper data model PR2a: write-side wiring + AI Assistant payload v2 + match_type removal by paynejd · Pull Request #17 · OpenConceptLab/oclmap

paynejd · 2026-05-11T10:02:05Z

⚠️ Deploy to STAGING ONLY before prod

This PR contains the bridge-recommendation bug fix on the AI Assistant payload side, but the bridge code path can't be exercised locally (the bridge module is closed-source — BridgeMatchStub.jsx returns canBridge: () => false in OSS builds). Please deploy to staging and run a full bridge-using project end-to-end before promoting to prod.

The Option A safety means nothing user-visible breaks even if the bridge wiring has a bug — the new normalized state is dark-launched and the legacy candidates field stays in the AI payload. But staging verification is the only way to confirm the new dark-launched paths are correct before PR2b flips reads onto them.

Summary

PR2a of the row-scoped, canonical-URL-identified candidate/concept refactor for OCL Mapper. Spec: unified-mapper-model.md.

References OpenConceptLab/ocl_issues#2337 (does not yet close — PR2a of 4: PR1 merged as #16, PR2b and PR3 still pending).

What's in this PR

Four commits, dark-launched (UNIFIED_MODEL_ENABLED = false stays default-OFF):

1d0dfec — match_type removal (-95 LOC). Removes the legacy 5-bucket match_type enum (very_high/high/medium/low/no_match), matchTypes state, selectedMatchBucket filter UI, updateMatchTypeCounts, onMatchTypeChange, and the Auto Match Switch + count Badge. The 3-bucket score grouping (recommended/available/low_ranked) driven by candidatesScore thresholds — already used by ScoreBucketButton — is the replacement. setStateViews's auto-match path is converted from match_type === 'very_high' to search_normalized_score >= candidatesScore.recommended, matching the existing setAutoMatched threshold. __match_type__ and __Match Type__ stay in the save-format omit lists as defensive cleanup against legacy data.
0b881ef — bridge wiring. New CONCEPT_IDENTITY_BY_TYPE map in algorithms.jsx as single source of truth (covers ocl-search, ocl-semantic, ocl-bridge, ocl-ciel-bridge). getAlgoDef injects concept_identity from the map when an algo (typically API-loaded bridge variants) is missing it. buildProjectContext now includes bridge_repo derived from algo.target_repo_url (relative URL → derived canonical_url; PR2b will read explicit canonical from bridge repo metadata once ConfigurationForm carries it). fetchBulkBridgeCandidates callback adds the normalizer + mergeIntoRowMatchState wiring (gated by the flag); the per-row path was already routed through onResponse so it just needed the concept_identity injection.
b29d490 — scispacy wiring. Adds 'ocl-scispacy' to CONCEPT_IDENTITY_BY_TYPE (reference_source: 'fixed', canonical_url: 'http://loinc.org'). The single-row path was already routed through onResponse and the inline fromScispacyResultsToConcepts transform was already there; only the bulk path callback needed normalizer wiring (mirrors the bridge bulk pattern from [Snyk] Security upgrade nginx from 1.19-alpine to 1.29.4-alpine #2).
8aafd4e — AI Assistant payload v2 (Option A: additive) [the bridge-recommendation bug fix]. New buildV2RecommendationPayload helper inline before fetchRecommendation. Iterates selectedAlgoIds, runs normalizeAlgorithmInvocation per algo against allCandidates for the row (sourced from legacy state — works regardless of feature flag), aggregates with richer-wins dedup, then projects into:
- target_repo: canonical_url + relative_url + version
- recommendable_concepts: deduped target-repo concepts with per-source evidence[] including bridge provenance via via.bridge_concept_key + map_type
- bridge_context: bridge intermediaries with target_concept_keys pointing back to recommendable entries they justify
fetchRecommendation payload spreads v2 fields alongside the legacy candidates field with payload_version: 'v2' so the prompt template can branch. AICandidatesAnalysis.jsx and the aiCandidateID export read canonical_reference.code first, fall back to legacy concept_id/id. Bridges are now structurally excluded from the recommendation pool — once the prompt template revision lands and reads recommendable_concepts instead of candidates, the bridge-recommendation bug is fixed.

Verification done locally

npm test → 26/26 normalizer unit tests passing (covers concept_identity resolution for all three reference_source modes, bridge cascade fan-out + intra-invocation dedup, scispacy partial → richer dedup, multi-algo convergence)
npm run eslint → clean
NODE_ENV=production npm run build → webpack compiled (only pre-existing bundle-size warnings)
Smoke test against prod via local dev server: project loads, candidates retrieve (ocl-search, ocl-semantic, scispacy), AI Recommendation fires successfully, v2 fields present in payload (payload_version: 'v2', target_repo, recommendable_concepts correctly deduped, bridge_context: [] since bridge module unavailable in OSS build), recommendable_concepts.length ≤ legacy candidates.length confirming dedup, current prompt template still works (Option A's safety net), recommendation displays normally in UI
Match-type removal smoke: Auto Match Switch + count Badge are gone; ScoreBucketButton works; Score badges render with bucket-derived color

Verification needed in staging (bridge path)

The bridge code path is dark-launched but couldn't be exercised in the OSS dev build. In staging please:

Open a CIEL → LOINC bridge project
Confirm Mapper still works as today with the flag OFF (no user-visible regression — Option A's safety means even if our bridge wiring is wrong, the legacy path is untouched)
Temporarily flip UNIFIED_MODEL_ENABLED = true in MapProject.jsx:116
Run candidates on a row that triggers the bridge algorithm
Inspect rowMatchStateRef.current (e.g. via React DevTools or a temporary console.log in mergeIntoRowMatchState):
- Bridge intermediary should appear as a ConceptRow with reference.url = the bridge repo's canonical URL (e.g. https://CIELterminology.org or the derived https://ns.openconceptlab.org/orgs/CIEL/sources/CIEL/) and a Candidate entry with type: 'bridge'
- Each cascade target should appear as a separate ConceptRow with reference.url = target repo canonical (e.g. http://loinc.org) and Candidate entries with type: 'bridge_child', bridge_concept_key, parent_candidate_id, map_type
- concept_definitions for cascade targets are lookup_status: 'pending' (will be filled by ensureLoaded in PR2b)
Run AI Recommendation on the same row and inspect the payload — bridge_context[] should now be populated with each bridge intermediary and its target_concept_keys; recommendable_concepts[i].evidence[] for cascade targets should include entries with candidate_type: 'bridge_child' and via: { bridge_concept_key, map_type }
Revert the feature flag change before doing anything else

If bridge entries don't appear in rowMatchState or look malformed, the bridge response shape doesn't match what the normalizer expects (the spec's assumption from cascade_target_concept_code/url/name); fix is to extend the cascade extraction in normalizers.js and re-test.

Coordination needed in ocl-ai-assistant before PR3

The new payload v2 fields (recommendable_concepts, bridge_context) bypass the server-side _to_essential field-stripping at services.py:251 — which today strips candidates, bridge_candidates, etc. Add the new field names to that allow-list when revising the prompt template, otherwise the new fields skip the stripping and inflate token count.

The prompt-template revision (also separate scope) should branch on payload_version === 'v2' to read from recommendable_concepts/bridge_context instead of candidates. That's when the bridge-recommendation bug is structurally fixed.

Deferred to PR2b / PR3

PR2b — read-side flip (Candidates.jsx, Concept.jsx, Score.jsx, MapButton.jsx, setAutoMatched/setStateViews); ensureLoaded over $resolveReference (verified callable from existing APIService — see unified-mapper-model.md status table); MultiAlgoSelector canonical_url field for custom algos; ConfigurationForm namespace + bridge_repos[] UI; flip the feature flag ON.
PR3 — schema-v2 save format with normalizeLegacy.js; remove legacy allCandidates, lookupCandidates/lookupCode; drop the legacy candidates field from the AI payload; drop the concept_id/id fallback shims from the response handler (the request-side and response-side legacy compat get cleaned up together).

Test plan

npm install, npm test, npm run eslint, npm run build all green
Code review the four commits
Deploy to staging only
Bridge flag-on test in staging (steps in "Verification needed in staging" above)
After bridge verification + prompt-template revision lands in ai-assistant: deploy to prod
Don't merge PR2b or change the feature flag default until staging bridge verification passes

🤖 Generated with Claude Code

…es state The legacy 5-bucket match_type enum (very_high/high/medium/low/no_match) is superseded by the 3-bucket score grouping (recommended/available/low_ranked) already driven by candidatesScore thresholds. Maintaining both invited drift and added surface area for no benefit. Removed: - MATCH_TYPES constant in constants.jsx (and orphan AutoMatch/MediumMatch/ LowMatch/NoMatch icon imports) - matchTypes state and selectedMatchBucket state in MapProject.jsx - updateMatchTypeCounts() and all its call sites - onMatchTypeChange handler and the selectedMatchBucket filter in getRows - The Badge + Switch UI for the very_high filter - showMatchSummary (orphan) and orphan FormControlLabel/Switch/Badge/countBy/ sum imports - match_type read in Score.jsx (color now derived from bucketColor) - Orphan setMatchTypes call inside setAutoMatched Refactored: - setStateViews now derives auto-match decisions from search_meta.search_normalized_score >= candidatesScore.recommended, matching the existing setAutoMatched threshold The __match_type__ / __Match Type__ entries in the save-format omit lists are kept as defensive cleanup against legacy data. No behavior change visible to users beyond the removal of the very_high filter Switch (which is replaced by the existing ScoreBucketButton filtering on 'recommended'). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…nified-model normalizer Bridge algorithms (ocl-bridge / ocl-ciel-bridge) come from the OCL Online API and don't carry the concept_identity config the normalizer needs. Their per-row fetch path already routes through onResponse via the fetchBridgeCandidates callback chain, but the bulk fetch path bypasses onResponse with its own callback. Both paths now feed the normalizer when UNIFIED_MODEL_ENABLED is true. algorithms.jsx: - New CONCEPT_IDENTITY_BY_TYPE export — single source of truth for per-algo-type concept identity, covering ocl-search, ocl-semantic, ocl-bridge, ocl-ciel-bridge. Bridge entries declare reference_source: 'bridge_repo' for the intermediary plus a cascade_target block that resolves the cascade to target_repo. - useAlgos now references the map for the inline ocl-search / ocl-semantic concept_identity (no behavior change; deduplication). MapProject.jsx: - getAlgoDef injects concept_identity from CONCEPT_IDENTITY_BY_TYPE when the algo (typically API-loaded bridge variants) doesn't carry it. - buildProjectContext now includes bridge_repo when a bridge algo is selected, derived from algo.target_repo_url. PR2b will read explicit canonical from bridge repo metadata once ConfigurationForm carries it. Reordered so it sits below bridgeAlgo in the component body to satisfy TDZ for the new useCallback dep. - fetchBulkBridgeCandidates: callback adds the normalizeAlgorithmInvocation + mergeIntoRowMatchState wiring (gated by UNIFIED_MODEL_ENABLED) so the bulk path mirrors what onResponse does on the per-row path. Feature flag remains OFF by default; this is dark-launch scaffolding. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… unified-model normalizer The single-row scispacy path already routed through onResponse and the inline fromScispacyResultsToConcepts transform was already inside onResponse. The two missing pieces: - concept_identity for ocl-scispacy: added to CONCEPT_IDENTITY_BY_TYPE with reference_source='fixed', canonical_url='http://loinc.org', code_field='id'. getAlgoDef injects it on lookup since the scispacy algo definition comes from the OCL Online API and doesn't carry its own concept_identity. - Bulk path (fetchBulkScispacyCandidates) had its own callback that bypassed the normalizer. Added the normalizeAlgorithmInvocation + mergeIntoRowMatchState wiring (gated by UNIFIED_MODEL_ENABLED) to mirror what we did for the bulk bridge path in 0b881ef. Feature flag remains OFF. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…n A: additive) Add the v2 payload structure alongside the legacy `candidates` field in fetchRecommendation, sourced from allCandidates via the unified-model normalizer (so it works with the feature flag OFF). The current prompt template ignores the new fields and continues working unchanged. Once the prompt template is revised to read recommendable_concepts / bridge_context, the bridge-recommendation bug is fixed structurally: bridges live in bridge_context (never recommendable), and target-repo concepts in recommendable_concepts are deduped across algorithms with per-source evidence. MapProject.jsx: - New buildV2RecommendationPayload(rowIndex) helper inline before fetchRecommendation. Iterates selectedAlgoIds, runs normalizeAlgorithmInvocation per algo against allCandidates for the row, aggregates with richer-wins dedup, then projects into: - target_repo: from buildProjectContext (canonical_url + version) - recommendable_concepts: deduped target-repo concepts with per- source evidence[] including bridge provenance via 'via' - bridge_context: bridge intermediaries with target_concept_keys pointing back to recommendable_concepts entries they justify - fetchRecommendation payload spreads v2 fields when constructible, with payload_version: 'v2' so the prompt template can branch. - aiCandidateID export now reads canonical_reference.code first, falls back to legacy concept_id (for the period both prompt-template versions may be in flight). AICandidatesAnalysis.jsx: - getAlternateIds() and the primary_candidate display read canonical_reference.code first, fall back to legacy concept_id/id. The legacy `candidates` field, the concept_id fallback shims, and payload_version itself can all be removed in PR3 alongside the other legacy-shape cleanup once the prompt template revision is stable. Note for ocl-ai-assistant coordination: the server-side _to_essential allow-list at core/prompts/services.py:251 should add 'recommendable_ concepts' and 'bridge_context' so the new fields get the same field- stripping pass as 'candidates' / 'bridge_candidates'. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

paynejd and others added 4 commits May 9, 2026 10:57

paynejd requested a review from snyaggarwal May 11, 2026 10:02

snyaggarwal approved these changes May 11, 2026

View reviewed changes

snyaggarwal merged commit a0d0185 into main May 11, 2026
1 check passed

paynejd mentioned this pull request May 11, 2026

PR2b: read-side flip + ensureLoaded + UI config + flag ON (#2337) #18

Open

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unified mapper data model PR2a: write-side wiring + AI Assistant payload v2 + match_type removal#17

Unified mapper data model PR2a: write-side wiring + AI Assistant payload v2 + match_type removal#17
snyaggarwal merged 4 commits into
mainfrom
issues#2337-pr2a

paynejd commented May 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

paynejd commented May 11, 2026

⚠️ Deploy to STAGING ONLY before prod

Summary

What's in this PR

Verification done locally

Verification needed in staging (bridge path)

Coordination needed in ocl-ai-assistant before PR3

Deferred to PR2b / PR3

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants