[MC-464-A] Consolidate export canonical lexeme selection#657
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
MC Task: MC-464 — Consolidate survey-overlapping concepts on export / Lane MC-464-A
Fixes #654.
Current
consolidate:truefindingsexport_complete_lingpy_datasetalready acceptsconsolidateand forwards it to bothexport_lingpy_tsvandexport_nexus;conceptTagalso implies consolidation.compare.consolidated_matrix, collapses only byte-identical committed cognate partitions, and keeps differing partitions as suffixedgloss#idcharacters with warnings.(meaning, doculect)by earliest timestamp only; it did not honormanual_overrides.canonical_lexemeswhen both survey forms existed for the same speaker.Change
manual_overrides.canonical_lexemesinto consolidated LingPy wordlist generation.(meaning, doculect), prefer the explicitly selected canonical CSV row; fall back to the existing earliest-start behavior when no selection exists.gloss#idblocks, a disagreeing meaning stays split with warnings, and the canonical lexeme selection chooses the selected form.Validation
PYTHONPATH=python python3 -m pytest python/ai/tools/test_consolidated_export_wiring.py::test_complete_export_consolidate_roundtrip_collapses_n_not_2n -q→1 passed in 0.05sPYTHONPATH=python python3 -m pytest python/ai/tools/test_consolidated_export_wiring.py python/ai/test_workflow_tools.py -q→18 passed in 0.08sPYTHONPATH=python python3 -m pytest python/ -q→1926 passed, 6 skipped, 1 warning, 3 subtests passed in 34.45suvx ruff check python/ --select E9,F63,F7,F82→All checks passed!git diff --check→ cleanNo live workspace, browser, parse-run, npm, or dev server used.