Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Copy to .env (gitignored) and fill in. The CLI does not auto-load this
# file; these are documented for your own shell setup, e.g.:
# export $(grep -v '^#' .env | xargs)

# Your Substack site, e.g. https://example.substack.com
SUBSTACK_BASE_URL=

# Substack session cookie (substack.sid). Treat like a password.
# Get it from your browser DevTools -> Application -> Cookies after logging in.
SUBSTACK_COOKIE=
51 changes: 51 additions & 0 deletions .github/ISSUE_TEMPLATE/bug_report.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
name: Bug report
description: Report a problem with the link checker
title: "[Bug]: "
labels: [bug]
body:
- type: markdown
attributes:
value: |
Thanks for taking the time to file a bug! **Do not include your
Substack session cookie** anywhere in this report. Redact it from
any commands or logs you paste.
- type: textarea
id: what-happened
attributes:
label: What happened?
description: A clear description of the bug and what you expected instead.
validations:
required: true
- type: textarea
id: command
attributes:
label: Command you ran
description: Paste the command (with cookie value redacted as `REDACTED`).
render: bash
validations:
required: true
- type: textarea
id: output
attributes:
label: Output / error
description: Full output or stack trace. Redact any cookie values.
render: shell
- type: input
id: python
attributes:
label: Python version
placeholder: "e.g. 3.11.5"
validations:
required: true
- type: input
id: os
attributes:
label: Operating system
placeholder: "e.g. macOS 14.4, Ubuntu 22.04, Windows 11"
validations:
required: true
- type: input
id: version
attributes:
label: Tool version / commit
placeholder: "e.g. 0.1.0 or commit hash"
5 changes: 5 additions & 0 deletions .github/ISSUE_TEMPLATE/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
blank_issues_enabled: false
contact_links:
- name: Security vulnerability
url: https://github.com/jcddc83/substack-broken-link-checker/security/advisories/new
about: Please report security issues privately, not in public issues. See SECURITY.md.
23 changes: 23 additions & 0 deletions .github/ISSUE_TEMPLATE/feature_request.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
name: Feature request
description: Suggest a new feature or improvement
title: "[Feature]: "
labels: [enhancement]
body:
- type: textarea
id: problem
attributes:
label: Problem
description: What problem would this solve? What's the use case?
validations:
required: true
- type: textarea
id: proposal
attributes:
label: Proposed solution
description: How might this work? CLI flags, behavior, output format, etc.
validations:
required: true
- type: textarea
id: alternatives
attributes:
label: Alternatives considered
20 changes: 20 additions & 0 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
## Summary

<!-- What does this PR change and why? -->

## Type of change

- [ ] Bug fix
- [ ] New feature
- [ ] Refactor / cleanup
- [ ] Docs only
- [ ] CI / tooling

## Checklist

- [ ] I ran `ruff check .` and `ruff format .`
- [ ] I ran `pytest` (if applicable)
- [ ] I verified `python substack_link_checker.py --help` still works
- [ ] I updated `README.md` / `USAGE.md` for any CLI changes
- [ ] I updated `CHANGELOG.md`
- [ ] No secrets / session cookies are included in this PR
17 changes: 17 additions & 0 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
version: 2
updates:
- package-ecosystem: pip
directory: "/"
schedule:
interval: weekly
open-pull-requests-limit: 5
labels:
- dependencies

- package-ecosystem: github-actions
directory: "/"
schedule:
interval: weekly
labels:
- dependencies
- github-actions
63 changes: 63 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
name: CI

on:
push:
branches: [main]
pull_request:
branches: [main]

jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- name: Install ruff
run: pip install ruff
- name: Lint
run: ruff check .
- name: Format check
run: ruff format --check .

test:
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest]
python-version: ["3.8", "3.10", "3.12"]
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
cache: pip
- name: Install
run: |
python -m pip install --upgrade pip
pip install -e ".[dev]"
- name: Import smoke test
run: |
python -c "import substack_link_checker; print('import OK')"
python substack_link_checker.py --help
- name: Run tests
run: pytest -q

build:
runs-on: ubuntu-latest
needs: [lint, test]
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- name: Build distributions
run: |
pip install build
python -m build
- uses: actions/upload-artifact@v4
with:
name: dist
path: dist/
63 changes: 63 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

### Added
- `pyproject.toml` making the project pip-installable with a
`substack-link-checker` console entry point.
- GitHub Actions CI workflow (`.github/workflows/ci.yml`) running ruff
lint, multi-version Python smoke tests, a real `pytest` suite, and a
build step.
- Initial `pytest` test suite (`tests/`) covering domain filtering,
CSV report generation, history persistence, the
`load_domains_from_file` helper, and cookie-handling guarantees.
- `SECURITY.md` documenting vulnerability reporting and safe handling of
Substack session cookies.
- `CONTRIBUTING.md` and `CODE_OF_CONDUCT.md`.
- Issue and pull request templates under `.github/`.
- Dependabot configuration for weekly dependency and Actions updates.
- `.env.example` documenting the supported environment variables.

### Security
- `SUBSTACK_COOKIE` environment variable is now supported as a safer
alternative to the `--cookie` CLI flag (which leaks the cookie into
shell history and `ps aux`). README and `SECURITY.md` updated to
recommend the env-var path.

### Fixed
- Corrected the clone URL in `README.md` (was `substack-link-checker`,
now `substack-broken-link-checker`).

## [1.0.0] - 2026-01-01

Major rewrite of the Substack broken link checker with significant
performance improvements and new features. See the
[GitHub Release](https://github.com/jcddc83/substack-broken-link-checker/releases/tag/v1.0.0)
for the full announcement.

### Added
- Async concurrent link checking with `aiohttp` (10-20x faster than
sequential).
- Smart link caching — the same URL across multiple posts is checked once.
- Retry logic with exponential backoff for transient failures.
- Incremental scanning: `--history-file` to track checked posts and
`--only-new` to skip ones already covered.
- `import_checked_posts.py` to import previous results from Excel/CSV.
- Domain filtering: `--skip-domains` / `--skip-domains-file` to assume OK
for bot-blocking sites; `--broken-domains` / `--broken-domains-file` to
auto-flag known broken domains.
- `--cookie` flag for Substack session cookie authentication (works with
paywalled / bot-protected content).
- Helper scripts: `compare_posts.py` to find unchecked posts,
`fetch_archive_urls.py` as an archive-page fallback, and
`run_link_checker.ps1` for Windows Task Scheduler automation.
- Complete `README.md` / `USAGE.md` rewrite with security considerations
and expanded troubleshooting.

[Unreleased]: https://github.com/jcddc83/substack-broken-link-checker/compare/v1.0.0...HEAD
[1.0.0]: https://github.com/jcddc83/substack-broken-link-checker/releases/tag/v1.0.0
46 changes: 46 additions & 0 deletions CODE_OF_CONDUCT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Contributor Covenant Code of Conduct

## Our Pledge

We as members, contributors, and leaders pledge to make participation in our
community a harassment-free experience for everyone, regardless of age, body
size, visible or invisible disability, ethnicity, sex characteristics, gender
identity and expression, level of experience, education, socio-economic status,
nationality, personal appearance, race, caste, color, religion, or sexual
identity and orientation.

We pledge to act and interact in ways that contribute to an open, welcoming,
diverse, inclusive, and healthy community.

## Our Standards

Examples of behavior that contributes to a positive environment:

- Demonstrating empathy and kindness toward others
- Being respectful of differing opinions, viewpoints, and experiences
- Giving and gracefully accepting constructive feedback
- Accepting responsibility and apologizing to those affected by mistakes
- Focusing on what is best for the overall community

Examples of unacceptable behavior:

- The use of sexualized language or imagery, and sexual attention or advances
- Trolling, insulting or derogatory comments, and personal or political attacks
- Public or private harassment
- Publishing others' private information without their explicit permission
- Other conduct which could reasonably be considered inappropriate in a
professional setting

## Enforcement

Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported to the project maintainers through GitHub. All complaints will be
reviewed and investigated promptly and fairly.

## Attribution

This Code of Conduct is adapted from the [Contributor Covenant][homepage],
version 2.1, available at
https://www.contributor-covenant.org/version/2/1/code_of_conduct.html.

[homepage]: https://www.contributor-covenant.org
42 changes: 42 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Contributing

Thanks for your interest in improving the Substack Broken Link Checker.

## Development setup

```bash
git clone https://github.com/jcddc83/substack-broken-link-checker.git
cd substack-broken-link-checker
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -e ".[dev]"
```

## Before opening a PR

- Run the linter: `ruff check .`
- Run the formatter: `ruff format .`
- Run the tests: `pytest`
- Verify the CLI still launches: `python substack_link_checker.py --help`

## Filing issues

Please use the issue templates under `.github/ISSUE_TEMPLATE/`. For bugs,
include your Python version, OS, the command you ran (with the cookie
value redacted), and the full error output.

## Reporting security issues

See [SECURITY.md](SECURITY.md). Do **not** file public issues for security
vulnerabilities.

## Pull requests

- Keep PRs focused — one logical change per PR.
- Update the `README.md` and `USAGE.md` if you change CLI behavior.
- Add tests for new behavior where practical.
- Match the style of the surrounding code.

## Code of Conduct

This project follows the [Contributor Covenant](CODE_OF_CONDUCT.md).
Loading
Loading