Skip to content

fix/encoding: Convert MEI5 files to NEON-compatible CDN format#27

Open
kyrieb-ekat wants to merge 5 commits into
mainfrom
fix/encoding
Open

fix/encoding: Convert MEI5 files to NEON-compatible CDN format#27
kyrieb-ekat wants to merge 5 commits into
mainfrom
fix/encoding

Conversation

@kyrieb-ekat

Copy link
Copy Markdown

Summary

  • Adds fix_mei_encoding.py: initial attribute-level fix script (documented intermediate step)
  • Adds fixmei5.py: full structural rewrite of MEI5 files to match the CDN/DDMAL MEI_encoding.py output format accepted by NEON
  • Applies fixes to all 1553 MEI5 files in the 0781-1560 and 1561-2340 subdirectories

What fixmei5.py does (12 fixes)

  1. Namespace -- syllable, neume, nc, divLine, episema, syl, liquescent moved into MEI namespace
  2. facs values -- # prepended where missing
  3. Root <mei> -- meiversion="5.1"; <?xml-model?> PIs inserted
  4. <staffDef> -- notationtype, lines, clef.shape, clef.line set from first clef in file
  5. <facsimile> -- type="transcription" added
  6. <surface> -- lrx/lry set from IIIF JPEG dimensions via PIL; <graphic> child removed
  7. <pb> -- n attribute replaced with facs="#<surface-id>"; moved inside <layer>
  8. <neume> -- type and facs attributes removed
  9. <nc> -- neume zone facs assigned to first <nc> child only
  10. <mdiv> -- type attribute removed
  11. signifLeft to signifLet -- fixes typo introduced by liberupdatev5.py (schema uses signifLet, line 15141 of mei-Neumes2.rng)
  12. <episema> -- startid/endid pointer attributes stripped (invalid for inline MEI5 episema)

Key structural fix: Merges one-staff-per-system layout into a single <staff><layer> with <sb> system breaks, matching CDN format exactly.

Test plan

  • Pages 0781 and 0782 confirmed loading in NEON with full image overlay and neume rendering
  • Spot-check additional pages from both subdirectories in NEON
  • Verify pages in 0001-0780 subdirectory (no music content -- not reprocessed)

Generated with Claude Code

kyrieb-ekat and others added 5 commits May 27, 2026 15:07
ET.Element() copies tag and attributes but not .text or .tail, so all
text content (including mei:l OCR lines) was silently dropped. Also fix
root element losing its attributes (meiversion etc.), update output
filename to match the existing mei5 naming convention, and rewrite
liberbatch.py to process all three subdirectories and write output to
the correct Liber Usualis - mei5/ directories.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
First-pass script applying attribute fixes to bring MEI5 files closer to
CDN/NEON conventions: namespace correction, facs hash-prefixing, staffDef
notationtype/clef attrs, facsimile type, surface lrx/lry from IIIF images,
pb facs from surface id, neume/nc facs reassignment, mdiv type removal.
Does not restructure multi-staff layout — superseded by fixmei5.py.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Complete rewrite of MEI5 files to match the CDN/DDMAL MEI_encoding.py
output format accepted by NEON (mei-Neumes2.rng schema). Applies 12 fixes:

1.  Namespace: syllable/neume/nc/divLine/episema/syl/liquescent -> MEI NS
2.  facs values: prepend # where missing
3.  Root <mei>: meiversion="5.1"; xml-model PIs inserted
4.  <staffDef>: notationtype, lines, clef.shape, clef.line from first clef
5.  <facsimile>: type="transcription"
6.  <surface>: lrx/lry from IIIF JPEG via PIL; <graphic> child removed
7.  <pb>: n attribute replaced with facs="#<surface-id>"; moved into layer
8.  <neume>: type and facs attributes removed
9.  <nc>: neume's facs assigned to first nc child only
10. <mdiv>: type attribute removed
11. signifLeft (typo in liberupdatev5.py) -> signifLet (correct schema name)
12. <episema>: startid/endid pointer attrs stripped (invalid in MEI5 inline)

Key structural fix: merges multi-staff/layer layout (one <staff> per system)
into a single <staff><layer> with <sb> system breaks, matching CDN format.

Confirmed working: files 0781 and 0782 load correctly in NEON with full
image overlay and neume rendering.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Reprocess all 777 MEI5 files in the 0781-1560 subdirectory through
fixmei5.py, converting them to CDN-format files accepted by NEON.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Reprocess all 776 MEI5 files in the 1561-2340 subdirectory through
fixmei5.py, converting them to CDN-format files accepted by NEON.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant