Builder Jobs — Scraper

Hourly pipeline that scrapes engineering jobs from company career pages, classifies each role with Claude, and publishes a curated index to zachproffitt/builder-jobs.

_{Last updated June 7, 2026 at 01:27 UTC}

Pipeline

fetch_jobs.py              fetch current listings from all companies
fetch_job_descriptions.py  fetch full description text for new jobs
classify_companies.py      generate company summaries via Claude
classify_jobs.py           classify roles, summarize, extract skills and comp
render_job_indexes.py      regenerate README.md, REMOTE.md, COMPANIES.md in builder-jobs

Actions

Workflow	Schedule	What it does
Jobs	Hourly	Fetch listings → classify → publish index to builder-jobs
Companies	Sundays	Discover new companies from YC, VC portfolios, and industry curation → detect ATS

Both workflows write a step summary visible in the Actions dashboard. Logs are committed with each run (data/jobs.log, data/companies.log).

Runs hourly via GitHub Actions. Commits to both repos automatically.

Monitoring

Pipeline metrics are pushed to Grafana Cloud via Prometheus. Dashboard →

Classification

Each job is sent to Claude with a structured prompt that extracts:

BUILDER — does this role primarily write code?
SUMMARY — 1–2 sentence description
SKILLS — up to 8 specific technologies, languages, or tools
LEVEL — intern / junior / mid / senior / staff / principal / manager
COMP — base salary range with original currency symbol
HYBRID / CONTRACT — work arrangement flags
REGION — us / canada / international / unclear

Non-engineering, contract, and international (outside US/Canada) roles are filtered out.

Supported ATS

ATS	Companies	Scraper
Ashby	574	`scrapers/ats_ashby.py`
Greenhouse	567	`scrapers/ats_greenhouse.py`
Lever	184	`scrapers/ats_lever.py`
Workday	169	`scrapers/ats_workday.py`
BambooHR	74	`scrapers/ats_bamboo.py`
Breezy	51	`scrapers/ats_breezy.py`
Workable	72	`scrapers/ats_workable.py`
SmartRecruiters	12	`scrapers/ats_smartrecruiters.py`
Eightfold	7	`scrapers/ats_eightfold.py`
Total	1710

Company sources

~5,700 companies in data/companies.json, sourced from:

Y Combinator — all active batches via Algolia (pipeline/fetch_yc_companies.py)
VC portfolios — Founders Fund, Khosla Ventures, Greylock, Sequoia (pipeline/fetch_vc_companies.py)
Industry curation — Claude-enumerated top companies across 20+ sectors (pipeline/fetch_industry_companies.py)

ATS detection (pipeline/fetch_companies.py) resolves any entry with "status": "new" and updates it in place.

To add a company manually: add a stub to data/companies.json and trigger the Companies workflow (or run fetch_companies.py locally):

{"name": "Acme Corp", "website": "https://acme.com", "status": "new"}

To remove a company: set "status": "inactive" in data/companies.json. It will stop being fetched immediately and won't be re-added by discovery.

More ATS scrapers and company sources are actively being added.

Rolling window

Jobs older than 14 days are dropped and their .md files deleted. seen_jobs.json is a permanent ID registry — prevents re-surfacing long-running postings that age out and reappear.

New companies are archived on first fetch so their existing backlog doesn't flood the board. Only jobs that appear on subsequent runs are treated as new.

Setup

Requires Python 3.11+. Clone both repos as siblings:

git clone https://github.com/zachproffitt/builder-jobs-scraper
git clone https://github.com/zachproffitt/builder-jobs

pip install -r requirements.txt
export ANTHROPIC_API_KEY=...

Run the full pipeline:

PYTHONPATH=. python pipeline/fetch_jobs.py
PYTHONPATH=. python pipeline/fetch_job_descriptions.py
PYTHONPATH=. python pipeline/classify_companies.py
PYTHONPATH=. python pipeline/classify_jobs.py
PYTHONPATH=. python pipeline/render_job_indexes.py ../jobs

Data files

File	Description
`data/companies.json`	All companies: name, website, ATS, slug, status
`data/jobs_seen.json`	Permanent ID registry: `{job_id: first_seen_timestamp}`
`data/companies_seen.json`	First-fetch registry per company
`data/jobs_raw.json`	Rolling 14-day window of listings with descriptions (not committed)
`data/jobs_classified.json`	Claude inference results per job ID
`data/companies_classified.json`	Company summaries used in rendered output
`data/jobs.log`	Log for the Jobs workflow (committed each run)
`data/companies.log`	Log for the Companies workflow (committed each run)

Name		Name	Last commit message	Last commit date
Latest commit History 1,208 Commits
.github/workflows		.github/workflows
data		data
pipeline		pipeline
scrapers		scrapers
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
pipeline.sh		pipeline.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Builder Jobs — Scraper

Pipeline

Actions

Monitoring

Classification

Supported ATS

Company sources

Rolling window

Setup

Data files

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Builder Jobs — Scraper

Pipeline

Actions

Monitoring

Classification

Supported ATS

Company sources

Rolling window

Setup

Data files

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages