Problem
The site has ~100+ external links to primary sources (journal articles, YouTube, GitHub repos, DOIs). Links rot. Currently the only way to catch broken links is manually. PRs that fix one broken link can introduce another without anyone noticing.
Suggested approach
Add a GitHub Actions workflow at .github/workflows/link-check.yml that runs on every PR targeting main:
name: Link check
on:
pull_request:
branches: [main]
jobs:
linkcheck:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Check links in source files
uses: lycheeverse/lychee-action@v2
with:
args: >
--exclude 'localhost'
--exclude 'youtube-nocookie.com'
--exclude 'fonts.googleapis.com'
--exclude 'shields.io'
--timeout 10
--max-retries 2
'./understanding-ai/src/**/*.jsx'
'./understanding-ai/src/**/*.js'
fail: true
lychee is a fast Rust-based link checker with good false-positive handling. YouTube and font CDN URLs can be excluded since they require browser rendering.
Considerations
- DOI links (
doi.org) are reliable but occasionally slow — set a longer timeout or exclude and handle separately
- Rate limiting from academic publishers (Springer, Cambridge, Oxford) may cause flaky failures —
--max-retries 2 helps
- Consider running on a schedule (
on: schedule: cron: '0 6 * * 1') in addition to PRs, to catch rot on existing links
Acceptance criteria
- Workflow runs on PRs targeting
main
- Build fails if any external link returns a non-2xx/3xx response after retries
- Known-noisy URLs (YouTube nocookie, Google Fonts) are excluded
Problem
The site has ~100+ external links to primary sources (journal articles, YouTube, GitHub repos, DOIs). Links rot. Currently the only way to catch broken links is manually. PRs that fix one broken link can introduce another without anyone noticing.
Suggested approach
Add a GitHub Actions workflow at
.github/workflows/link-check.ymlthat runs on every PR targetingmain:lychee is a fast Rust-based link checker with good false-positive handling. YouTube and font CDN URLs can be excluded since they require browser rendering.
Considerations
doi.org) are reliable but occasionally slow — set a longer timeout or exclude and handle separately--max-retries 2helpson: schedule: cron: '0 6 * * 1') in addition to PRs, to catch rot on existing linksAcceptance criteria
main