Argopia — Claude Context

Job-search bot for Claude Code. Two-stage pipeline: deterministic Node filter → in-context Claude scoring. README.md targets human users; this file targets me.

Stack

Node 20+ — two npm deps: js-yaml (parse YAML), ajv (JSON Schema validation)
No API keys, no Anthropic SDK — review runs in the user's existing Claude Code session at whatever model is configured
No MCP browser dependency for v0.1. SPA-rendered and auth-walled boards are shipped disabled in sources.yaml until browser-MCP support lands.

Where to read first

Question	Authoritative source
"What's the contract for X?"	`schemas/<X>.schema.yaml`
"What does populated X look like?"	`templates/<X>.yaml`
"How does X get filled or used?"	`.claude/commands/argopia-<verb>.md`
"What does the user see?"	`README.md`
"What's the rubric?"	`.claude/commands/argopia-review.md` (inlined)
"What conventions apply?"	The `Conventions:` block at the top of each schema

The two-stage model — load-bearing contract

Stage	Who	Does	Optimizes for	Cost
1 — Survey	Node + WebFetch	source pre-filter, discover URLs, dedup vs `reviews.jsonl`, fetch JD postings (cached by URL hash), apply keyword filter, write openings	recall	listing fetches + posting fetches
2 — Review	Claude	read each opening's cached posting, apply rubric, append one JSON line to `reviews.jsonl`	precision	rubric tokens

Claude reasons. Node enforces fixed logic. When unsure where something belongs:

Pure regex / substring / arithmetic / file ops → Node script
Reading PDF, extracting structure, judging fit, narrating → Claude (slash command body)

File layout

.claude/commands/    user-facing slash commands (entry surface)
.claude/agents/      reusable subagent prompts (profile-extractor, criteria-deriver, source-surveyor, posting-fetcher)
schemas/             3 validation contracts (profile, criteria, sources)
templates/           starter scaffolds copied to working/ during onboarding
scripts/             single-file Node helpers (.mjs)
working/             3 user-editable files after /argopia-onboard (gitignored)
data/                runtime state — reviews.jsonl, listings/, postings/, openings/
reports/             dashboard.html + advice/

`working/` contract — exactly 3 files

File	Role
`profile.yaml`	Identity — who I am, what I've built, what I know
`criteria.yaml`	Preferences — what I want / won't accept; survey-stage keyword rules
`sources.yaml`	Where to look — one entry per board, dispatched by `type` (api / html / browser)

The scoring rubric is inlined in .claude/commands/argopia-review.md — not a separate YAML.

Slash commands (workflow order)

/argopia-onboard <cv-path> — parse CV, copy template to working/, derive profile.yaml + parts of criteria.yaml from CV
(user manually reviews working/*.yaml)
/argopia-survey [<url> ...] — discover URLs (per enabled source, pre-filtered via URL params), dedup against data/reviews.jsonl, fetch JD postings into data/postings/ (cached by URL hash), apply keyword filter, write data/openings/<TS>.jsonl. With URL args: skip discovery, run the same posting-fetch + filter pipeline on those URLs (Mode B).
/argopia-review [--limit N] — for each queued opening: read posting from cache, apply the inlined rubric, append one JSON line to data/reviews.jsonl.
/argopia-advise — on demand; aggregate reviews.jsonl → positioning rewrites, market gaps, pipeline health, criteria signals.

Environment setup runs automatically on npm install via scripts/install.mjs (npm postinstall); no slash command for it.

Scripts (Node, deterministic)

Script	Reads	Writes	Used by
`fetch.mjs`	`working/sources.yaml` entry, source URL (JSON for type=api)	`data/listings/<ts>-<name>.jsonl`	survey
`survey.mjs prepare`	stdin: raw JSONL; `data/reviews.jsonl`; `data/postings/`	stdout: unseen JSONL with posting_path; stderr: cache-miss report (JSON)	survey
`survey.mjs inject`	stdin: JSONL with posting_path; `data/postings/<sha>.md`	stdout: same JSONL with description = first 1.5K chars of cached body	survey
`survey.mjs finalize`	stdin: filter survivors JSONL	stdout: openings JSONL — `{url, posting_path}` only (metadata lives in posting front-matter)	survey
`filter.mjs`	`working/criteria.yaml` + stdin JSONL	stdout filtered JSONL	survey
`onboard.mjs`	`templates/`	`working/`	onboard
`dashboard.mjs`	`data/reviews.jsonl`, `data/postings/<sha>.md`	`reports/dashboard.html` — single self-contained HTML (triage state lives in browser localStorage)	`npm run dashboard`
`install.mjs`	(none)	runtime dirs (`working/`, `data/`, `reports/`); env check. Auto-runs on `npm install` via `postinstall`.	npm postinstall
`lib/schema.mjs`	(library)	YAML shape validator	survey/onboard

Filter gates (survey, body-aware)

Applied by filter.mjs after the posting body is injected. All gates read from working/criteria.yaml:

Gate	Source	Behavior
Sub-target seniority	auto-derived from `target.level`	drops {junior, intern, trainee, entry-level, ...} matches against title
`keywords.negative`	criteria.yaml	vocabulary-collision drops (full haystack — title + body)
Positive required	criteria.yaml `keywords.positive`	at least one of: `job_titles`, `title_token_sets` (AND of tokens), `technical`, `tools` matches
`excluded_companies`	criteria.yaml `target`	runs against company field if present in body
`max_listing_age_days`	criteria.yaml `target`	runs against posted-date in body
Region lock	criteria.yaml `target.{preferred,acceptable}_timezones`	runs against location in body

Idempotency

Survey writes nothing to reviews.jsonl — that's review's job. Posting cache (data/postings/<sha1>.md) + canonical review ledger (data/reviews.jsonl) together mean both commands are incrementally re-runnable: re-running survey with tweaked criteria re-evaluates filter rejects without re-fetching postings; re-running review skips URLs already scored.

Non-obvious decisions (tribal knowledge)

Section order in profile.yaml = scoring weight, descending. tech_stack sits right after experience (high tech_overlap signal); training and work-output follow; recognition (awards / certs / talks) later; community (volunteering) last.
current_employer is NOT a field. Source of truth is experience[0].employer. Avoid duplicate-data drift.
experience[].location is NOT a field. Employer location doesn't affect CV ↔ JD matching; the candidate's location does.
experience[].tools and projects[].tech are NOT fields. Tools surface through achievements (narrative) and top-level tech_stack (aggregate). Three sources of truth was a bug magnet.
work_authorized_in is a single list, not a 3-list dict. Citizenship + held visas combined; sponsorship-required is implied as the inverse.
tech_stack keys use abbreviations for category labels (asr, tts, llm, mlops) and plural nouns for enumerable lists (voice_agents, programming_languages). Both conventions are correct — domain abbreviations don't pluralize cleanly.
Profile uses programming_languages, not languages. candidate.languages is human languages (Vietnamese, English).
kind not type for publication/talk classification — avoids visual collision with the schema's type: directive.
achievements not bullets — names the meaning, not the render shape.
Title-vs-body matching is operational. positive.job_titles matches title only; positive.technical / tools match title + body. The bucket distinction is real, not decorative.
Sub-target seniority is auto-derived, not hand-listed. Don't add intern or junior to keywords.negative — filter.mjs derives the sub-target list from target.level automatically.
Negative keywords stay specific to vocabulary collisions. Domain-collision phrases (voice actor, speech therapy, VoIP, sound design). Don't add things review should handle (location, comp, spam quality).
Comments stay file-scope. Profile schema comments don't reference rubric / scoring / consumption. Filters schema documents OR / AND-NOT matching semantics (write-time guidance) but not what stage runs the matcher. The principle: comments tell users what to write, not how data is later used.

What NOT to do

Don't duplicate fields. No current_employer while experience[0].employer exists; no per-role tools while tech_stack exists; no narrative blob while structured fields exist.
Don't reference review behavior in profile / sources schema comments. File-scope only. The rubric / scoring / "model normalizes X" lives in argopia-review.md, not in profile schema.
Don't add per-site adapter scripts. Source-specific behavior lives in sources.yaml config (selectors, patterns, field_map), dispatched by type (api / html / browser). Adding a .mjs per board would duplicate config that's already declarative.
Don't break the 3-file working/ contract. New runtime config goes in schemas/ (validation) or templates/ (defaults).
Don't fabricate CV facts. Missing field → null. Empty list-type section → []. Don't infer employment_type: full-time unless the CV explicitly says so.
Don't add comments that restate the field name or duplicate the schema's Conventions block. A field gets an inline comment only if the field name + type is insufficient (enum values, format quirks, null semantics, when-to-fill guidance, disambiguation).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Argopia — Claude Context

Stack

Where to read first

The two-stage model — load-bearing contract

File layout

`working/` contract — exactly 3 files

Slash commands (workflow order)

Scripts (Node, deterministic)

Filter gates (survey, body-aware)

Idempotency

Non-obvious decisions (tribal knowledge)

What NOT to do

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

Argopia — Claude Context

Stack

Where to read first

The two-stage model — load-bearing contract

File layout

working/ contract — exactly 3 files

Slash commands (workflow order)

Scripts (Node, deterministic)

Filter gates (survey, body-aware)

Idempotency

Non-obvious decisions (tribal knowledge)

What NOT to do

`working/` contract — exactly 3 files