Skip to content

GitHub Action: automated broken link checker on PRs #31

Description

@bster

Problem

The site has ~100+ external links to primary sources (journal articles, YouTube, GitHub repos, DOIs). Links rot. Currently the only way to catch broken links is manually. PRs that fix one broken link can introduce another without anyone noticing.

Suggested approach

Add a GitHub Actions workflow at .github/workflows/link-check.yml that runs on every PR targeting main:

name: Link check
on:
  pull_request:
    branches: [main]

jobs:
  linkcheck:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Check links in source files
        uses: lycheeverse/lychee-action@v2
        with:
          args: >
            --exclude 'localhost'
            --exclude 'youtube-nocookie.com'
            --exclude 'fonts.googleapis.com'
            --exclude 'shields.io'
            --timeout 10
            --max-retries 2
            './understanding-ai/src/**/*.jsx'
            './understanding-ai/src/**/*.js'
          fail: true

lychee is a fast Rust-based link checker with good false-positive handling. YouTube and font CDN URLs can be excluded since they require browser rendering.

Considerations

  • DOI links (doi.org) are reliable but occasionally slow — set a longer timeout or exclude and handle separately
  • Rate limiting from academic publishers (Springer, Cambridge, Oxford) may cause flaky failures — --max-retries 2 helps
  • Consider running on a schedule (on: schedule: cron: '0 6 * * 1') in addition to PRs, to catch rot on existing links

Acceptance criteria

  • Workflow runs on PRs targeting main
  • Build fails if any external link returns a non-2xx/3xx response after retries
  • Known-noisy URLs (YouTube nocookie, Google Fonts) are excluded

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions