Skip to content

Hyphenation detaches punctuation from words in the selection text #428

Description

@neoden

When a paragraph doesn't fit without hyphenation, hyphenate_paragraph splits every token at alphabetic/non-alphabetic boundaries: leading/trailing punctuation, apostrophes and infix hyphens become standalone paragraph boxes. cleanup_paragraph only re-merges the alphabetic segments (that's all hyph_indices covers), so the punctuation stays detached in the page's text boxes.

Reader::text_excerpt joins boxes with a space, so everything built from a selection gets spurious spaces around punctuation:

  • highlight/annotation text (also as exported),
  • Search on a selection (the query no longer matches the book's own text),
  • Define on a selection.

For example, highlighting a sentence from a hyphenated paragraph stores:

“ No , it can ' t be right — there must be a mistake somewhere ,” he thought .

instead of:

“No, it can't be right — there must be a mistake somewhere,” he thought.

This affects any language once a paragraph goes through the hyphenation pass (i.e. the words must be split at line ends: justified text in a narrow column or with a large font). In a 1200-page test layout nearly every page with running text was affected.

I'll send a PR shortly: recording the whole token as the merge range in hyphenate_paragraph lets the existing cleanup_paragraph machinery glue the punctuation back; line breaking itself is unaffected.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions