Skip to content

Refactor EPUB generation: modularize code and add image asset support#27

Merged
kevincarlson merged 2 commits into
masterfrom
claude/fix-epub-export-pipeline-pg5lF
Mar 23, 2026
Merged

Refactor EPUB generation: modularize code and add image asset support#27
kevincarlson merged 2 commits into
masterfrom
claude/fix-epub-export-pipeline-pg5lF

Conversation

@kevincarlson

Copy link
Copy Markdown
Member

Summary

This PR refactors the EPUB generation logic in epub-logic by extracting large monolithic functions into focused modules, and adds support for embedding images decoded from data-URI src attributes.

Key Changes

  • Modularized code structure: Split lib.rs into focused modules:

    • conversion.rs: TiptapNode-to-Block conversion and data-URI image extraction
    • html.rs: Block and inline rendering to XHTML with proper XML escaping
    • css.rs: CSS generation from styles and fonts
    • opf.rs: OPF 3.0 package document generation
    • nav.rs: EPUB 3 Navigation Document generation
    • table.rs: Table rendering with header/body separation
  • Image asset support:

    • Added ImageAsset struct to represent decoded images with filename, raw bytes, and media type
    • Implemented extract_images_from_blocks() to walk the block tree and decode base64 data-URI images
    • Images are automatically embedded in the EPUB and referenced in the OPF manifest
    • File-path and URL image sources are preserved as-is (not embedded)
  • Improved XML/XHTML safety:

    • Centralized escape_xml() function for consistent escaping of text content, attributes, and URLs
    • Applied escaping to section titles, metadata fields, and all user-provided content
  • Enhanced section titles:

    • Replaced generic "Section N" titles with extract_section_title() that derives titles from heading blocks when available
  • Metadata enrichment (G11, G12):

    • Added support for creation_date, description, and subject fields in OPF metadata
  • CSS improvements (G10):

    • Extended ODF-to-CSS property mapping with additional typography and styling properties
    • Added support for style:font-name as alternate ODF font property
  • Updated exports: Modified export.rs to pass empty image vector to from_tiptap() (pre-loaded images can be added by callers)

Implementation Details

  • All rendering functions now receive styles and images parameters for context-aware output
  • Table rendering intelligently separates header rows (all TableHeader cells) into <thead> with remaining rows in <tbody>
  • Inline style attributes are built from BlockAttrs (text alignment, indentation) and applied to block elements
  • Character-level styles are rendered as CSS classes on <span> elements
  • Data-URI parsing validates base64 encoding and maps MIME types to file extensions (png, jpg, gif, webp, svg, bmp)

https://claude.ai/code/session_01QXTY8ndVt2UpgKFqcJodX5

claude added 2 commits March 22, 2026 13:39
… audit gaps

- Split epub-logic/src/lib.rs (609 lines) into six focused submodules
  (conversion, html, css, nav, opf, table) so every file is ≤ 300 lines
- G1: Block::Image now renders <img src alt title/> and embeds data-URI
  images in OEBPS/Images/ via base64 decoding (new base64 dep)
- G2: Block::Table/TableRow/TableHeader/TableCell now render <table>,
  <thead>/<tbody>, <tr>, <th colspan rowspan>, <td colspan rowspan>
- G3: Inline text content is XML-escaped before HTML wrapping
- G4: Link href values are XML-escaped
- G5: Section <title> content is XML-escaped
- G6/G7: BlockAttrs.text_align and indent produce inline style= attributes
  on <p> and <h*> elements
- G8: TiptapMark::NamedSpanStyle wraps text in <span class="style-...">
- G9: Inline::Text.style_name wraps text in <span class="style-...">
- G10: fo:color, fo:background-color, style:font-name, fo:text-decoration,
  fo:letter-spacing, fo:font-variant and border/padding properties added to
  ODF→CSS mapping
- G11: metadata.creation_date emitted as <dc:date> in OPF
- G12: metadata.description and metadata.subject emitted in OPF
- G13: Embedded images listed in OPF manifest under OEBPS/Images/
- G14: 609-line file size violation resolved by submodule split
- Section titles extracted from first heading block instead of generic "N"
- EpubDocument gains images: Vec<ImageAsset> field; from_tiptap gains
  images parameter for pre-loaded assets from caller
- write_epub_zip writes OEBPS/Images/ directory with embedded image assets
- All 4 existing tests pass; cargo clippy -D warnings clean

https://claude.ai/code/session_01QXTY8ndVt2UpgKFqcJodX5
Formatting-only changes to satisfy the cargo fmt --all -- --check CI gate.
No logic changes.

https://claude.ai/code/session_01QXTY8ndVt2UpgKFqcJodX5
@kevincarlson kevincarlson merged commit 15b55b5 into master Mar 23, 2026
12 checks passed
@kevincarlson kevincarlson deleted the claude/fix-epub-export-pipeline-pg5lF branch March 23, 2026 06:57
@AppThere AppThere locked as resolved and limited conversation to collaborators Mar 23, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants