Handle .puz files with malformed header data#75
Merged
Conversation
…ying to parse. readString() in the .puz decoder had no bounds check: once ibyte walked past bytes.length, bytes[ibyte] was undefined, the `b !== 0` loop condition stayed true, and the loop appended String.fromCharCode of undefined (a NUL char) forever — Node OOM'd at ~4 GB instead of reporting a corrupt file. PUZtoJSON now rejects files where the header reports a zero-sized grid or where the file is shorter than 52 + 2*ncol*nrow (header + solution + progress), and readString throws a descriptive error with the grid-scan clue counts when it hits EOF mid-string. Adds a fixture (truncated-strings.puz — header says 10x10 / 34 clues but only 38 strings are present) and a regression test that asserts puzToXD throws instead of looping.
Member
|
👍🏻 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds a fixture that has a mismatch between header data and content, which, before this PR, would cause an infinite loop
Context for LLM reviewers
Worth a close look
packages/xd-crossword-tools/src/vendor/puzjs.tsreadString()— the infinite loop was here:bytes[ibyte]past EOF returnedundefined,undefined !== 0is true, andString.fromCharCode(undefined)is a NUL char, so the loop appended NULs toresultforever until Node OOM'd at ~4 GB. The new EOF check throws with the grid-scan counts so future malformed files surface a diagnosable error instead of a hung process.minBytes = 52 + 2*ncol*nrowprecheck is the cheap structural guard; thereadStringguard is the backstop for files that pass the size check but still have inconsistent header vs. body data (which is exactly what the fixture does — it has the right number of bytes, but the header's grid size disagrees with the actual content).Decisions
Out of scope
puzjs.tsis vendored from downforacross/puzjs and has other latent issues (e.g.getExtensiondoes an unbounded substring scan that can match inside binary payloads). Not touched here — only the specific OOM path is fixed.Already verified
mainwith the fixture (Node grew to ~4 GB over ~45s before crashing).puzToXDthrows synchronously with the descriptive message.yarn test puz2XD— 3/3 pass including the new regression test.yarn type-checkerrors that surface are pre-existing inwebsite/and unrelated to this change.