Skip to content

Chunked media uploads to bypass Vercel's 4.5 MB request limit#408

Open
carlosjdelgado wants to merge 12 commits into
hunvreus:developmentfrom
carlosjdelgado:feat/chunked-media-upload
Open

Chunked media uploads to bypass Vercel's 4.5 MB request limit#408
carlosjdelgado wants to merge 12 commits into
hunvreus:developmentfrom
carlosjdelgado:feat/chunked-media-upload

Conversation

@carlosjdelgado

@carlosjdelgado carlosjdelgado commented Jun 24, 2026

Copy link
Copy Markdown

Problem

Vercel serverless functions cap request bodies at 4.5 MB. After base64 encoding and the JSON envelope, media uploads through /api/[owner]/[repo]/[branch]/files/[path] hit FUNCTION_PAYLOAD_TOO_LARGE at ~3.3 MB of real file size. Users get a hard wall on anything bigger.

Solution

A single new upload path that slices the file into 4 MB binary chunks:

  • The first chunk (always exactly 4 MB for multi-chunk uploads, or the whole file for single-chunk uploads) rides inline in a multipart POST to /api/upload/finalize.
  • The remaining chunks are POSTed in parallel (batches of 4 MB) to /api/upload/chunk, which stages them in a new upload_chunk table.
  • /api/upload/finalize reads any staged chunks back, concatenates them with the inline chunk, and pushes the file to GitHub via the existing token flow.

Files ≤ 4 MB use the same code path but with zero staged chunks: the entire file rides inline and the DB is never touched. Files > 4 MB stage N-1 chunks in the DB. The GitHub token never leaves the server, and no new infrastructure is required — the existing Postgres database is the buffer.

Why first chunk inline (not last)

chunk 0 is always exactly 4 MB; the last chunk may be smaller. Sending the largest chunk inline keeps the most bytes out of the DB. For files whose size is a multiple of 4 it makes no difference; for the rest, DB bandwidth drops by up to (FILE_MB - 4) × 2.

Storage details

  • data column is bytea (no base64 overhead in the table).
  • Two opportunistic cleanups run via next/server's after() so the response is not blocked: /api/upload/finalize deletes the chunks of the upload it just consumed and all the stale chunks older than 10 minutes.

Limits

Setting Value
Max file size 15 MB
Max chunks per upload 4
Chunk size 4 MB
Inline chunk max 4 MB
Stale-chunk TTL 10 minutes
File size Chunks staged in DB DB bandwidth
≤ 4 MB 0 0
5 MB 1 (1 MB) 2 MB
8 MB 1 (4 MB) 8 MB
14 MB 3 (4+4+2 MB) 20 MB
15 MB (max) 3 (4+4+3 MB) 22 MB

Refactor

githubSaveFile (with its rename-on-conflict logic) was extracted from app/api/[owner]/[repo]/[branch]/files/[path]/route.ts into lib/utils/github-save-file.ts so the existing files endpoint and the new finalize endpoint share the same write path.

Migration

A single new migration (0013_upload_chunks.sql) creates the upload_chunk table directly with bytea. It runs on Vercel's postbuild hook with the rest.

Test plan

  • Upload a 1 MB file → no upload_chunk rows created; file lands on the target branch
  • Upload a 3.5 MB file → no upload_chunk rows created; whole file rides inline in the finalize request
  • Upload a 5 MB file → 2 chunks (1 staged, 1 inline)
  • Upload a 14 MB file → 4 chunks (3 staged, 1 inline)
  • Try to upload a 20 MB file → blocked client-side with a clear error

@carlosjdelgado carlosjdelgado changed the title Feat/chunked media upload [WIP] Feat/chunked media upload Jun 24, 2026
@carlosjdelgado carlosjdelgado force-pushed the feat/chunked-media-upload branch from 2d16dab to 9614425 Compare June 24, 2026 10:39
@carlosjdelgado carlosjdelgado changed the title [WIP] Feat/chunked media upload [WIP] Chunked media uploads to bypass Vercel's 4.5 MB request limit Jun 24, 2026
@carlosjdelgado carlosjdelgado changed the title [WIP] Chunked media uploads to bypass Vercel's 4.5 MB request limit Chunked media uploads to bypass Vercel's 4.5 MB request limit Jun 24, 2026
@carlosjdelgado carlosjdelgado marked this pull request as ready for review June 24, 2026 11:54
@carlosjdelgado carlosjdelgado changed the base branch from main to development June 24, 2026 15:00
Vercel serverless functions cap request bodies at 4.5 MB, which after
base64 overhead limited media uploads to ~3.3 MB. The browser now slices
files into ~3 MB chunks, POSTs each to /api/upload/chunk (multipart),
and then calls /api/upload/finalize which reassembles, pushes to GitHub,
and deletes the chunks.

Chunks are staged in a new upload_chunk table (Postgres text, base64).
Stale chunks are reaped opportunistically on each insert. Max file size
is 50 MB / 50 chunks, configurable in both endpoints and the client.

githubSaveFile is extracted to lib/utils/github-save-file.ts so both the
existing files endpoint and the new finalize endpoint share rename-on-
conflict logic.
Client now uploads chunks in batches of 4 instead of sequentially.
Server moves opportunistic stale-chunk cleanup (chunk endpoint) and
per-upload chunk deletion (finalize endpoint) into next/server `after`,
so neither blocks the response.
When the file fits in Vercel's 4.5 MB request body (after base64
overhead and JSON envelope), upload directly via the existing files
endpoint instead of the chunked path. Cuts DB write/read traffic by
the share of small uploads, which is most of them in practice.
The last chunk is sent inline with the finalize metadata (multipart)
instead of going through the DB, saving one INSERT+SELECT per upload.
For a 4 MB file (2 chunks) this halves DB writes; for larger files
the ratio drops but every saved chunk still helps under Neon Free
quotas.
Eliminates the +33% base64 overhead in the upload_chunk table. The
INSERT/SELECT bandwidth per chunk drops by ~25% with no client change
and no CPU cost. Migration uses decode('base64') in USING so any chunks
in flight at upgrade time are converted instead of dropped.

For a 20 MB upload, total DB bandwidth goes from ~48 MB to ~36 MB
(2.4x -> 1.8x file size).
Multipart carries the chunk as raw binary, not base64, so 4 MB fits
in Vercel's 4.5 MB body with about 500 KB of headroom. For a 20 MB
upload this drops the chunk count from 7 to 5 (4 staged in DB, 1
inline) and total DB bandwidth from ~36 MB to ~32 MB.
The first chunk is always CHUNK_BYTES (4 MB) for any multi-chunk
upload; the last one can be smaller. Sending the first inline keeps
the largest chunk out of the DB. For files whose size is a multiple
of CHUNK_BYTES there is no change; for the rest, DB bandwidth drops
by up to (CHUNK_BYTES - lastChunkSize) * 2.
Limits per upload to 22 MB of DB bandwidth in the worst case, sized
to fit Neon Free quotas with margin for other traffic.
Merges the old 0013 (CREATE TABLE with text data) and 0014 (ALTER to
bytea) into a single 0013 that creates the table with bytea directly.
Also drops the unused chunk-assembly self-check script.
Files <=3 MB previously skipped the chunked path and posted base64+JSON
to the files endpoint. Removed because the chunked path already
short-circuits to inline-only when totalChunks=1, no DB rows are
created, and the request body is smaller (binary multipart vs base64
JSON). One code path now handles every size up to 15 MB.
@carlosjdelgado carlosjdelgado force-pushed the feat/chunked-media-upload branch from ad2b11f to 543c2d1 Compare June 24, 2026 15:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant