Skip to content

[MC-464-A] Consolidate export canonical lexeme selection#657

Merged
TrueNorth49 merged 1 commit into
mainfrom
feat/export-concept-consolidation
Jun 15, 2026
Merged

[MC-464-A] Consolidate export canonical lexeme selection#657
TrueNorth49 merged 1 commit into
mainfrom
feat/export-concept-consolidation

Conversation

@TarahAssistant

Copy link
Copy Markdown
Collaborator

MC Task: MC-464 — Consolidate survey-overlapping concepts on export / Lane MC-464-A

Fixes #654.

Current consolidate:true findings

  • export_complete_lingpy_dataset already accepts consolidate and forwards it to both export_lingpy_tsv and export_nexus; conceptTag also implies consolidation.
  • The low-level consolidated path groups duplicate survey concept IDs through compare.consolidated_matrix, collapses only byte-identical committed cognate partitions, and keeps differing partitions as suffixed gloss#id characters with warnings.
  • Before this PR, the consolidated LingPy TSV de-duplicated one form per (meaning, doculect) by earliest timestamp only; it did not honor manual_overrides.canonical_lexemes when both survey forms existed for the same speaker.

Change

  • Thread manual_overrides.canonical_lexemes into consolidated LingPy wordlist generation.
  • When multiple survey forms collapse onto one (meaning, doculect), prefer the explicitly selected canonical CSV row; fall back to the existing earliest-start behavior when no selection exists.
  • Add a fixture-only round-trip test for complete export: two safe overlapping meanings collapse to N concept blocks instead of 2N duplicate gloss#id blocks, a disagreeing meaning stays split with warnings, and the canonical lexeme selection chooses the selected form.

Validation

  • PYTHONPATH=python python3 -m pytest python/ai/tools/test_consolidated_export_wiring.py::test_complete_export_consolidate_roundtrip_collapses_n_not_2n -q1 passed in 0.05s
  • PYTHONPATH=python python3 -m pytest python/ai/tools/test_consolidated_export_wiring.py python/ai/test_workflow_tools.py -q18 passed in 0.08s
  • PYTHONPATH=python python3 -m pytest python/ -q1926 passed, 6 skipped, 1 warning, 3 subtests passed in 34.45s
  • uvx ruff check python/ --select E9,F63,F7,F82All checks passed!
  • git diff --check → clean

No live workspace, browser, parse-run, npm, or dev server used.

@TarahAssistant TarahAssistant added feat Feature work MC-464 Mission Control MC-464 labels Jun 15, 2026
@TrueNorth49 TrueNorth49 merged commit cc4b5f8 into main Jun 15, 2026
4 checks passed
@TrueNorth49 TrueNorth49 deleted the feat/export-concept-consolidation branch June 15, 2026 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feat Feature work MC-464 Mission Control MC-464

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Export duplicates concepts when source surveys overlap (per-survey concept_ids → split, non-comparable cognate characters)

2 participants