Skip to content

wcag/ChunkParser: scale Type3 glyph advances by /FontMatrix, not hardcoded 1/1000#732

Closed
augustovillar wants to merge 1 commit into
veraPDF:integrationfrom
augustovillar:fix/type3-fontmatrix-glyph-advance
Closed

wcag/ChunkParser: scale Type3 glyph advances by /FontMatrix, not hardcoded 1/1000#732
augustovillar wants to merge 1 commit into
veraPDF:integrationfrom
augustovillar:fix/type3-fontmatrix-glyph-advance

Conversation

@augustovillar

@augustovillar augustovillar commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Problem

ChunkParser#parseString computes each glyph's text-space advance as:

width = width * getTextState().getTextFontSize() / 1000 * getTextState().getHorizontalScaling();

PDFont#getWidth(code) returns the raw /Widths value in the font's glyph space. For most fonts glyph space is 1/1000 text units, so dividing by 1000 is correct — but Type3 fonts define their own glyph space via /FontMatrix. For a Type3 font whose /FontMatrix is e.g. [.0004883 0 0 -.0004883 0 0] (= 1/2048), dividing by 1000 instead of by 2048 makes every advance ~2.048× too large, corrupting TextPiece end positions and the resulting text-chunk bounding boxes.

Impact

Consumers of these metrics mis-assemble text. In opendataloader-pdf (which bundles this module), real-world Type3 PDFs drop a duplicate adjacent glyph and gain a spurious space — 1.66E-2 is extracted as 1. 6E-2, Programme as Progra me. Downstream report: opendataloader-project/opendataloader-pdf#578

This follows up #722 (which refactored this method but kept the hardcoded / 1000); same spirit as veraPDF-parser #511 (CFF CID FontMatrix).

Fix

Add a getGlyphWidthToTextSpaceScale(PDFont) helper that returns the horizontal /FontMatrix component (fontMatrix[0]) for a PDType3Font (null/length/zero-guarded), and the unchanged 1/1000 for every other font; the factor is hoisted out of the per-code loop. The TJ-array displacement path (obj.getReal() / 1000, which is text-space thousandths and font-independent) is intentionally left unchanged.

Verification

  • Compiles on the current integration branch (mvn -pl wcag-validation -am compile, BUILD SUCCESS).
  • End-to-end through opendataloader-pdf 2.4.7 (which bundles wcag-validation 1.31.65) on a public real-world Type3 PDF (EPD epd28600): before1. 6E-2, 34 spacing artifacts; after1.66E-2, 0 artifacts. Same toolchain, only this patch toggled.
  • Non-Type3 (Type1) PDFs produce byte-identical output — the instanceof PDType3Font guard leaves the default path untouched — and previously-correct values are unchanged.

Summary by CodeRabbit

  • Bug Fixes
    • Improved text glyph width calculation accuracy in PDF validation, particularly for Type3 fonts, resulting in more precise text positioning and rendering measurements.

…coded 1/1000

PDFont.getWidth(code) returns the raw /Widths value in the font's glyph space.
For Type3 fonts glyph space is defined by /FontMatrix, not 1/1000, so the
hardcoded /1000 divisor produced advances off by (1000 * FontMatrix[0]) for any
Type3 font whose FontMatrix != 1/1000 -- corrupting TextPiece end positions and
text-chunk bounding boxes (dropped duplicate glyphs / spurious spaces downstream).

Use the horizontal /FontMatrix component for a PDType3Font (null/length/zero
guarded), 1/1000 otherwise. The TJ-array displacement path (obj.getReal()/1000,
text-space thousandths and font-independent) is left unchanged.
@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 051e88f5-da97-4e91-bddd-ed37aecae34a

📥 Commits

Reviewing files that changed from the base of the PR and between dc5bd35 and 6fae194.

📒 Files selected for processing (1)
  • wcag-validation/src/main/java/org/verapdf/gf/model/factory/chunks/ChunkParser.java

📝 Walkthrough

Walkthrough

ChunkParser.parseString replaces the hardcoded 1/1000 glyph-width divisor with a computed glyphSpaceToTextSpace factor. A new private helper getGlyphWidthToTextSpaceScale(PDFont font) returns a scale derived from the Type3 font matrix when applicable, falling back to 1/1000 for other font types.

Changes

Type3 Font Glyph Width Scaling

Layer / File(s) Summary
Type3 glyph width scaling helper and parseString integration
wcag-validation/src/main/java/org/verapdf/gf/model/factory/chunks/ChunkParser.java
Adds PDFont and PDType3Font imports. Introduces getGlyphWidthToTextSpaceScale(PDFont font) to return a Type3 font matrix-derived scale or 1/1000 as fallback. Updates parseString to compute glyphSpaceToTextSpace via that helper and apply it when calculating each glyph's rendered width and text-piece advance.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

  • veraPDF/veraPDF-validation#722: Directly related — both PRs modify glyph width calculation logic in ChunkParser.parseString, with this PR extending that work to handle Type3 font matrix scaling.

Suggested reviewers

  • MaximPlusov

Poem

🐇 A font named Type3 came knocking one day,
Its glyphs had their own special matrix to weigh.
No more fixed one-thousandth, a true scale we seek,
The rabbit derived it — peak-to-peak!
TextPieces now land where they rightfully play. ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and specifically describes the main change: fixing Type3 font glyph advance scaling by using /FontMatrix instead of hardcoded 1/1000.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@MaximPlusov

Copy link
Copy Markdown
Contributor

@augustovillar Since the font matrix should also be used in other places, I created a separate PR that incorporates your code. Thank you for your contribution!

@augustovillar augustovillar deleted the fix/type3-fontmatrix-glyph-advance branch June 17, 2026 09:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants