Skip to content

fix(geocoder): reject control chars in :id to prevent NUL-byte 500#13

Merged
yorickdewid merged 1 commit into
mainfrom
fix/reject-control-chars-id
Jun 5, 2026
Merged

fix(geocoder): reject control chars in :id to prevent NUL-byte 500#13
yorickdewid merged 1 commit into
mainfrom
fix/reject-control-chars-id

Conversation

@yorickdewid

Copy link
Copy Markdown
Contributor

What

A NUL byte (%00) anywhere in the :id path parameter caused HTTP 500 instead of a clean 404 on every /v4/product/* route.

detectFormat matched the NL.IMBAG.PAND. / NL.IMBAG.NUMMERAANDUIDING. prefix via startsWith() and passed the NUL-containing id straight into the SQL bind parameter. Postgres rejects it:

PostgresError: invalid byte sequence for encoding "UTF8": 0x00

That error escaped the route handler → onError500 {"message":"Internal server error"}. No injection risk (queries are parameterized), but a client could trivially spam 500s and flood the error log.

Fix

detectFormat now classifies any input containing a C0 control character or DEL (0x7f) as "unknown", so the route returns 404. Central guard at the shared classifier — covers analysis / risk / light / statistics in one place.

How it was found

Input-fuzzing the live WS — ~4.7M requests over a 4h aggressive run (real-ID corpus + targeted mutators + malformed HTTP). The NUL byte was the only 5xx class observed across the entire run; a Schemathesis property pass found no other server error.

Verification

  • bun test — 49 pass (5 new regression cases for NUL/tab/DEL).
  • bun run typecheck — clean.
  • E2E against the live service: the four NUL cases now return 404 (were 500); valid building/neighborhood ids still 200; well-formed-but-nonexistent ids still 404.

🤖 Generated with Claude Code

A NUL byte anywhere in the :id path param passed the PAND /
NUMMERAANDUIDING startsWith() branch and reached Postgres as a bind
parameter, raising `invalid byte sequence for encoding "UTF8": 0x00`.
That error escaped the route handler and surfaced as HTTP 500 instead
of a clean 404 — letting a client trivially generate 500s and pollute
the error log (no injection risk; queries are parameterized).

detectFormat now classifies any input containing a C0 control char or
DEL (0x7f) as "unknown", so every product route returns 404. Found via
input fuzzing of the live WS (~4.7M requests over 4h; the NUL byte was
the only 5xx class observed). Adds 5 regression cases.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@yorickdewid yorickdewid merged commit f7974d3 into main Jun 5, 2026
1 check passed
@yorickdewid yorickdewid deleted the fix/reject-control-chars-id branch June 5, 2026 08:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant