Job-search bot for Claude Code. Two-stage pipeline: deterministic Node filter → in-context Claude scoring. README.md targets human users; this file targets me.
- Node 20+ — two npm deps:
js-yaml(parse YAML),ajv(JSON Schema validation) - No API keys, no Anthropic SDK — review runs in the user's existing Claude Code session at whatever model is configured
- No MCP browser dependency for v0.1. SPA-rendered and auth-walled boards
are shipped disabled in
sources.yamluntil browser-MCP support lands.
| Question | Authoritative source |
|---|---|
| "What's the contract for X?" | schemas/<X>.schema.yaml |
| "What does populated X look like?" | templates/<X>.yaml |
| "How does X get filled or used?" | .claude/commands/argopia-<verb>.md |
| "What does the user see?" | README.md |
| "What's the rubric?" | .claude/commands/argopia-review.md (inlined) |
| "What conventions apply?" | The Conventions: block at the top of each schema |
| Stage | Who | Does | Optimizes for | Cost |
|---|---|---|---|---|
| 1 — Survey | Node + WebFetch | source pre-filter, discover URLs, dedup vs reviews.jsonl, fetch JD postings (cached by URL hash), apply keyword filter, write openings |
recall | listing fetches + posting fetches |
| 2 — Review | Claude | read each opening's cached posting, apply rubric, append one JSON line to reviews.jsonl |
precision | rubric tokens |
Claude reasons. Node enforces fixed logic. When unsure where something belongs:
- Pure regex / substring / arithmetic / file ops → Node script
- Reading PDF, extracting structure, judging fit, narrating → Claude (slash command body)
.claude/commands/ user-facing slash commands (entry surface)
.claude/agents/ reusable subagent prompts (profile-extractor, criteria-deriver, source-surveyor, posting-fetcher)
schemas/ 3 validation contracts (profile, criteria, sources)
templates/ starter scaffolds copied to working/ during onboarding
scripts/ single-file Node helpers (.mjs)
working/ 3 user-editable files after /argopia-onboard (gitignored)
data/ runtime state — reviews.jsonl, listings/, postings/, openings/
reports/ dashboard.html + advice/
| File | Role |
|---|---|
profile.yaml |
Identity — who I am, what I've built, what I know |
criteria.yaml |
Preferences — what I want / won't accept; survey-stage keyword rules |
sources.yaml |
Where to look — one entry per board, dispatched by type (api / html / browser) |
The scoring rubric is inlined in
.claude/commands/argopia-review.md — not a separate YAML.
/argopia-onboard <cv-path>— parse CV, copy template toworking/, deriveprofile.yaml+ parts ofcriteria.yamlfrom CV- (user manually reviews
working/*.yaml) /argopia-survey [<url> ...]— discover URLs (per enabled source, pre-filtered via URL params), dedup againstdata/reviews.jsonl, fetch JD postings intodata/postings/(cached by URL hash), apply keyword filter, writedata/openings/<TS>.jsonl. With URL args: skip discovery, run the same posting-fetch + filter pipeline on those URLs (Mode B)./argopia-review [--limit N]— for each queued opening: read posting from cache, apply the inlined rubric, append one JSON line todata/reviews.jsonl./argopia-advise— on demand; aggregatereviews.jsonl→ positioning rewrites, market gaps, pipeline health, criteria signals.
Environment setup runs automatically on npm install via
scripts/install.mjs (npm postinstall); no slash command for it.
| Script | Reads | Writes | Used by |
|---|---|---|---|
fetch.mjs |
working/sources.yaml entry, source URL (JSON for type=api) |
data/listings/<ts>-<name>.jsonl |
survey |
survey.mjs prepare |
stdin: raw JSONL; data/reviews.jsonl; data/postings/ |
stdout: unseen JSONL with posting_path; stderr: cache-miss report (JSON) | survey |
survey.mjs inject |
stdin: JSONL with posting_path; data/postings/<sha>.md |
stdout: same JSONL with description = first 1.5K chars of cached body | survey |
survey.mjs finalize |
stdin: filter survivors JSONL | stdout: openings JSONL — {url, posting_path} only (metadata lives in posting front-matter) |
survey |
filter.mjs |
working/criteria.yaml + stdin JSONL |
stdout filtered JSONL | survey |
onboard.mjs |
templates/ |
working/ |
onboard |
dashboard.mjs |
data/reviews.jsonl, data/postings/<sha>.md |
reports/dashboard.html — single self-contained HTML (triage state lives in browser localStorage) |
npm run dashboard |
install.mjs |
(none) | runtime dirs (working/, data/, reports/); env check. Auto-runs on npm install via postinstall. |
npm postinstall |
lib/schema.mjs |
(library) | YAML shape validator | survey/onboard |
Applied by filter.mjs after the posting body is injected. All gates
read from working/criteria.yaml:
| Gate | Source | Behavior |
|---|---|---|
| Sub-target seniority | auto-derived from target.level |
drops {junior, intern, trainee, entry-level, ...} matches against title |
keywords.negative |
criteria.yaml | vocabulary-collision drops (full haystack — title + body) |
| Positive required | criteria.yaml keywords.positive |
at least one of: job_titles, title_token_sets (AND of tokens), technical, tools matches |
excluded_companies |
criteria.yaml target |
runs against company field if present in body |
max_listing_age_days |
criteria.yaml target |
runs against posted-date in body |
| Region lock | criteria.yaml target.{preferred,acceptable}_timezones |
runs against location in body |
Survey writes nothing to reviews.jsonl — that's review's job.
Posting cache (data/postings/<sha1>.md) + canonical review ledger
(data/reviews.jsonl) together mean both commands are incrementally
re-runnable: re-running survey with tweaked criteria re-evaluates
filter rejects without re-fetching postings; re-running review skips
URLs already scored.
- Section order in
profile.yaml= scoring weight, descending.tech_stacksits right afterexperience(hightech_overlapsignal); training and work-output follow; recognition (awards / certs / talks) later; community (volunteering) last. current_employeris NOT a field. Source of truth isexperience[0].employer. Avoid duplicate-data drift.experience[].locationis NOT a field. Employer location doesn't affect CV ↔ JD matching; the candidate'slocationdoes.experience[].toolsandprojects[].techare NOT fields. Tools surface throughachievements(narrative) and top-leveltech_stack(aggregate). Three sources of truth was a bug magnet.work_authorized_inis a single list, not a 3-list dict. Citizenship + held visas combined; sponsorship-required is implied as the inverse.tech_stackkeys use abbreviations for category labels (asr,tts,llm,mlops) and plural nouns for enumerable lists (voice_agents,programming_languages). Both conventions are correct — domain abbreviations don't pluralize cleanly.- Profile uses
programming_languages, notlanguages.candidate.languagesis human languages (Vietnamese, English). kindnottypefor publication/talk classification — avoids visual collision with the schema'stype:directive.achievementsnotbullets— names the meaning, not the render shape.- Title-vs-body matching is operational.
positive.job_titlesmatches title only;positive.technical/toolsmatch title + body. The bucket distinction is real, not decorative. - Sub-target seniority is auto-derived, not hand-listed. Don't add
internorjuniortokeywords.negative—filter.mjsderives the sub-target list fromtarget.levelautomatically. - Negative keywords stay specific to vocabulary collisions. Domain-collision phrases (voice actor, speech therapy, VoIP, sound design). Don't add things review should handle (location, comp, spam quality).
- Comments stay file-scope. Profile schema comments don't reference rubric / scoring / consumption. Filters schema documents OR / AND-NOT matching semantics (write-time guidance) but not what stage runs the matcher. The principle: comments tell users what to write, not how data is later used.
- Don't duplicate fields. No
current_employerwhileexperience[0].employerexists; no per-roletoolswhiletech_stackexists; nonarrativeblob while structured fields exist. - Don't reference review behavior in profile / sources schema
comments. File-scope only. The rubric / scoring / "model normalizes
X" lives in
argopia-review.md, not in profile schema. - Don't add per-site adapter scripts. Source-specific behavior lives
in
sources.yamlconfig (selectors, patterns, field_map), dispatched bytype(api / html / browser). Adding a.mjsper board would duplicate config that's already declarative. - Don't break the 3-file
working/contract. New runtime config goes inschemas/(validation) ortemplates/(defaults). - Don't fabricate CV facts. Missing field →
null. Empty list-type section →[]. Don't inferemployment_type: full-timeunless the CV explicitly says so. - Don't add comments that restate the field name or duplicate the schema's Conventions block. A field gets an inline comment only if the field name + type is insufficient (enum values, format quirks, null semantics, when-to-fill guidance, disambiguation).