feat(web): edge cache HTML + metadata routes at Cloudflare#12
Merged
Conversation
Cloudflare doesn't cache HTML by default. After the Astro migration, HTML pages were returning cf-cache-status: DYNAMIC, hitting R2 on every request. Two-part fix: 1. A Cloudflare zone-level Cache Rule (set outside the repo, recorded in apps/web/README.md) enables caching for: '/', the four content pages, the two index pages, /types/* and /topics/*, and the three metadata files. Static assets keep Cloudflare's default behavior. Action: cache=true, edge_ttl.mode=bypass_by_default (TTL driven by origin Cache-Control). 2. deploy-site.yml sets Cache-Control: public, max-age=300, stale-while-revalidate=3600 on HTML uploads and on sitemap.xml, robots.txt, llms.txt. Non-HTML/non-metadata files keep their previous upload behavior so favicon/og-image/logo caching is unchanged. After this deploys, two same-URL requests in a row should show cf-cache-status: MISS then HIT. /documents/* and /extracted/* are not in the Cache Rule expression, so the noindex transform on raw files is unaffected.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
caio-pizzol
pushed a commit
that referenced
this pull request
May 19, 2026
After PR #12 deployed, live Cache-Control returned max-age=14400 from Cloudflare's zone-level Browser Cache TTL default, overriding origin max-age=300. Patched the existing rule (id b85abcfdb1fa4d07a160e88f7cc4fafd) via API to add browser_ttl.mode: respect_origin. Live now serves origin max-age on HTML/metadata paths while keeping the 4-hour zone default for static assets and content-addressed raw files where it's actually desired. This commit keeps the README accurate; the Cloudflare rule itself was updated out-of-band via API.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stops paying R2 GetObject ops on every page view for HTML and the three metadata files. Two-part change, only one of which is in this repo.
Out-of-repo (already live, recorded in apps/web/README.md): Zone-level Cloudflare Cache Rule that enables caching for the homepage, the four content pages, the type/topic indexes + per-type/topic pages, and `/sitemap.xml`, `/robots.txt`, `/llms.txt`. Action is `cache: true` with `edge_ttl.mode: bypass_by_default` so origin Cache-Control drives the TTL.
In-repo: `deploy-site.yml` now adds `Cache-Control: public, max-age=300, stale-while-revalidate=3600` to HTML uploads and to the three metadata files. Non-HTML / non-metadata files (favicons, og-image, logo) keep their previous upload behavior so default static-asset caching is unchanged.
Verified before merge:
Smoke test after merge auto-deploys:
```bash
URL="https://docxcorp.us/dataset"
curl -sI "$URL" | grep -iE "cf-cache-status|cache-control" # MISS, header from origin
curl -sI "$URL" | grep -i "cf-cache-status" # HIT (within 5 min)
curl -sI "https://docxcorp.us/documents/.docx" | grep -i "x-robots-tag" # noindex preserved
```
Rollback if needed: revert this PR (origin headers go away) and disable the Cache Rule in the Cloudflare dashboard.