fix/encoding: Convert MEI5 files to NEON-compatible CDN format#27
Open
kyrieb-ekat wants to merge 5 commits into
Open
fix/encoding: Convert MEI5 files to NEON-compatible CDN format#27kyrieb-ekat wants to merge 5 commits into
kyrieb-ekat wants to merge 5 commits into
Conversation
ET.Element() copies tag and attributes but not .text or .tail, so all text content (including mei:l OCR lines) was silently dropped. Also fix root element losing its attributes (meiversion etc.), update output filename to match the existing mei5 naming convention, and rewrite liberbatch.py to process all three subdirectories and write output to the correct Liber Usualis - mei5/ directories. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
First-pass script applying attribute fixes to bring MEI5 files closer to CDN/NEON conventions: namespace correction, facs hash-prefixing, staffDef notationtype/clef attrs, facsimile type, surface lrx/lry from IIIF images, pb facs from surface id, neume/nc facs reassignment, mdiv type removal. Does not restructure multi-staff layout — superseded by fixmei5.py. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Complete rewrite of MEI5 files to match the CDN/DDMAL MEI_encoding.py output format accepted by NEON (mei-Neumes2.rng schema). Applies 12 fixes: 1. Namespace: syllable/neume/nc/divLine/episema/syl/liquescent -> MEI NS 2. facs values: prepend # where missing 3. Root <mei>: meiversion="5.1"; xml-model PIs inserted 4. <staffDef>: notationtype, lines, clef.shape, clef.line from first clef 5. <facsimile>: type="transcription" 6. <surface>: lrx/lry from IIIF JPEG via PIL; <graphic> child removed 7. <pb>: n attribute replaced with facs="#<surface-id>"; moved into layer 8. <neume>: type and facs attributes removed 9. <nc>: neume's facs assigned to first nc child only 10. <mdiv>: type attribute removed 11. signifLeft (typo in liberupdatev5.py) -> signifLet (correct schema name) 12. <episema>: startid/endid pointer attrs stripped (invalid in MEI5 inline) Key structural fix: merges multi-staff/layer layout (one <staff> per system) into a single <staff><layer> with <sb> system breaks, matching CDN format. Confirmed working: files 0781 and 0782 load correctly in NEON with full image overlay and neume rendering. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Reprocess all 777 MEI5 files in the 0781-1560 subdirectory through fixmei5.py, converting them to CDN-format files accepted by NEON. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Reprocess all 776 MEI5 files in the 1561-2340 subdirectory through fixmei5.py, converting them to CDN-format files accepted by NEON. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
fix_mei_encoding.py: initial attribute-level fix script (documented intermediate step)fixmei5.py: full structural rewrite of MEI5 files to match the CDN/DDMALMEI_encoding.pyoutput format accepted by NEON0781-1560and1561-2340subdirectoriesWhat fixmei5.py does (12 fixes)
syllable,neume,nc,divLine,episema,syl,liquescentmoved into MEI namespace#prepended where missing<mei>--meiversion="5.1";<?xml-model?>PIs inserted<staffDef>--notationtype,lines,clef.shape,clef.lineset from first clef in file<facsimile>--type="transcription"added<surface>--lrx/lryset from IIIF JPEG dimensions via PIL;<graphic>child removed<pb>--nattribute replaced withfacs="#<surface-id>"; moved inside<layer><neume>--typeandfacsattributes removed<nc>-- neume zone facs assigned to first<nc>child only<mdiv>--typeattribute removedsignifLefttosignifLet-- fixes typo introduced byliberupdatev5.py(schema usessignifLet, line 15141 ofmei-Neumes2.rng)<episema>--startid/endidpointer attributes stripped (invalid for inline MEI5 episema)Key structural fix: Merges one-staff-per-system layout into a single
<staff><layer>with<sb>system breaks, matching CDN format exactly.Test plan
0001-0780subdirectory (no music content -- not reprocessed)Generated with Claude Code