GitHub - undergroundpost/obsidian-auto-tagger: Use AI to automatically generate and add relevant tags to your Obsidian notes.

Tags Obsidian notes automatically with a local Ollama LLM, reuses your existing tag vocabulary instead of creating near-duplicates, and includes a cleanup tool to consolidate the tag pool over time.

What it does

Tags new and modified notes automatically. Each note gets a fresh set of frontmatter tags drawn from its content.
Respects your existing tag vocabulary. When a note mentions something you've tagged before, the script picks the tag you already use instead of creating a near-duplicate.
Disambiguates with context. When two existing tags look alike — julie vs julieandrews — the script reads the note and decides which one fits.
Cleans up after itself. A separate cleanup_tags.py tool surfaces duplicates, typos, and junk tags for review. Nothing writes until you approve.
Designed to run on its own. Point a nightly cron at your vault and forget it.

Quick start

You need:

Ollama running locally with gemma3:12b and nomic-embed-text pulled.
Python 3.11+ on Linux, or 3.13 specifically on macOS (3.14+ has a Local Network privacy quirk — see Design notes).

git clone https://github.com/undergroundpost/obsidian-auto-tagger.git
cd obsidian-auto-tagger
python3 -m venv .venv
.venv/bin/pip install -r requirements.txt
cp config.yaml.example config.yaml
# edit config.yaml — at minimum, set INPUT_FOLDER to your vault path

Sanity-check on five files without writing anything:

.venv/bin/python generate_tags.py --dry-run --limit 5

Drop --dry-run once you're happy with the output.

Daily run

Pick the path that matches your setup.

Local vault

For a vault that lives on the same machine the cron runs on:

0 1 * * * /path/to/obsidian-auto-tagger/.venv/bin/python /path/to/obsidian-auto-tagger/generate_tags.py

The script writes its own dated log to logs/generate_tags_YYYY-MM-DD.log.

Headless server with Obsidian Sync

If your vault lives in Obsidian Sync and you want this script to run on an always-on server (instead of relying on a laptop being awake at 1am), use Obsidian's official headless client to pull and push around each run.

One-time setup on the server:

npm install -g obsidian-headless
ob sync-setup    # interactive — pair to your Obsidian Sync vault

Then schedule the bundled wrapper, which does ob sync → tag → ob sync:

0 1 * * * /path/to/obsidian-auto-tagger/run-daily.sh

The wrapper logs to logs/run-daily_YYYY-MM-DD.log; the Python script logs to logs/generate_tags_YYYY-MM-DD.log.

Manual runs

.venv/bin/python generate_tags.py                       # default: only files modified since their last processed: stamp
.venv/bin/python generate_tags.py --dry-run --limit 5   # preview without writing
.venv/bin/python generate_tags.py --force --limit 10    # re-tag, ignoring the processed timestamp
.venv/bin/python generate_tags.py --untagged-only       # backfill files with no tags

Other flags: --debug, --input <folder>, --exclude <folder> (repeatable), --model <name>, --provider {ollama,openai}, --api-key <key>.

Cleanup tool

Vault tag pools drift over time — typos, near-duplicates, one-off junk. cleanup_tags.py consolidates yours through a review-and-apply workflow:

.venv/bin/python cleanup_tags.py scan      # writes cleanup_proposals.json
# open cleanup_proposals.json and edit any actions you want to change
.venv/bin/python cleanup_tags.py apply     # rewrites frontmatter + inline #tag references

Proposals are grouped into:

Section	Default	What it catches
`format_consolidations`	apply	hyphen / underscore / accent variants
`semantic_consolidations_auto`	apply	high-similarity clusters (cosine ≥ 0.90)
`semantic_consolidations_review`	review	borderline clusters (0.80–0.90) — approve each manually
`suspected_junk`	delete	concatenation blobs, garbage strings
`one_offs`	review	tags used once; merged if a close neighbor exists

By default the tool also runs an LLM judge over each cluster to verify cohesion and pick the canonical form (e.g. claud + claudai → claude). Pass --no-llm to skip the judge and use heuristics only.

Configuration

Key	Default	Notes
`INPUT_FOLDER`	`~/Documents/Notes`	Vault root
`EXCLUDE_FOLDERS`	`[]`	Skipped during scan and tagging
`LLM_PROVIDER`	`ollama`	`ollama` or `openai`
`OLLAMA_MODEL`	`gemma3:12b`	Tag-extraction model (also used by cleanup LLM passes)
`OLLAMA_SERVER_ADDRESS`	`http://localhost:11434`	Ollama endpoint
`OLLAMA_CONTEXT_WINDOW`	`32000`	`num_ctx` for tag extraction
`EMBEDDING_MODEL`	`nomic-embed-text`	Embedding model for the matcher
`EMBEDDING_BATCH_SIZE`	`100`	Tags per `/api/embed` call
`SIMILARITY_THRESHOLD_HIGH`	`0.95`	Cosine sim ≥ this → auto-consolidate
`SIMILARITY_THRESHOLD_LOW`	`0.70`	Cosine sim < this → new tag (no judge call)
`LLM_JUDGE_ENABLED`	`true`	Gray-zone candidates use the LLM judge with note context
`MAX_SPECIFIC_TAG_WORDS`	`3`	`specific_tags` with more words are dropped (anti-SKU)
`MAX_NORMALIZED_TAG_LEN`	`30`	Tags whose normalized form exceeds this length are dropped
`REQUIRE_SPECIFIC_TAG_IN_BODY`	`true`	Drop `specific_tags` not present in the note body (anti-leakage)
`EMPTY_NOTE_BODY_MIN_CHARS`	`1`	Skip notes whose body is shorter than this
`OPT_OUT_FRONTMATTER_KEY`	`"auto_tag"`	Written as `<key>: false` on hard LLM failure; remove to retry
`OPENAI_API_KEY`	`""`	Required if provider is `openai`
`OPENAI_MODEL`	`gpt-3.5-turbo`
`OPENAI_MAX_TOKENS`	`4000`

Model performance

Tag quality is sensitive to model size. From measured runs on a mixed personal vault:

gemma3:12b (recommended). Reliably follows the granularity rule — emits brand-level tags like Hoka, Darn Tough instead of SKU strings. Doesn't hallucinate acronym expansions. ~6–10s per file on a 3060.
qwen2.5:7b (works, with caveats). Functions, but ignores the granularity rule on list-heavy notes (emits "Darn Tough T4050 Heavyweight Tactical Full Cushion" verbatim). Hallucinated "Reformer Autoencoder Generator" as an expansion of RAG in one test. ~1–4s per file.
Models below ~7B: not tested but the pattern suggests they'll be worse at granularity rule-following.

The post-processing filters (MAX_SPECIFIC_TAG_WORDS, MAX_NORMALIZED_TAG_LEN) catch the worst-case SKU bloat regardless of model, but they're a safety net, not a substitute for picking a capable model. If you change models, re-evaluate MAX_SPECIFIC_TAG_WORDS — different models concatenate differently.

Design notes

Two-stage matcher with LLM judge — fresh per note. Pure cosine-similarity matching conflates lexical proximity with referent identity — it cannot tell "your coworker Julie" from "Julie Andrews", because both embed similarly. The matcher uses cosine as a coarse filter: above SIMILARITY_THRESHOLD_HIGH auto-consolidates (typos, trivial morphology); below SIMILARITY_THRESHOLD_LOW is auto-rejected as a new tag; the gray zone in between is sent to the LLM judge with note context, which has the world knowledge needed to disambiguate. Judge verdicts are not cached — a verdict made in one note's context is not globally true (a later fan note that uses "Julie" to mean Julie Andrews deserves the opposite verdict from a coworker note). Every judge call is appended to tag_decisions.log (JSONL) along with the file path, body excerpt, sim, verdict, and reason, providing a full audit trail without freezing decisions.
Grounding check defeats in-prompt example leakage. LLMs given specific in-context examples can regurgitate those examples on weakly-grounded notes — e.g. emitting Hoka on a software-UX note because clothing brands appeared in the prompt's worked examples. The substring-against-body check (REQUIRE_SPECIFIC_TAG_IN_BODY) is the deterministic backstop: any specific_tag that doesn't actually appear in the note is dropped before reaching the matcher. general_tags are exempt because they're conceptual.
Schema-enforced structured output is non-negotiable. Single-array JSON ({"tags": [...]}) lets the model paraphrase or ignore output rules. The dual-array schema (specific_tags + general_tags) forces dual-level tagging via grammar-constrained decoding — the model cannot return without filling both fields.
The matcher is the source of consolidation truth, not the prompt. The model has no knowledge of which existing tags to reuse; it generates freely. Cosine similarity over nomic-embed-text embeddings handles the consolidation deterministically. This avoids prompt-bloat and keeps tag reuse consistent across runs.
Junk detection uses vault tags plus /usr/share/dict/words. Vault-vocabulary-only decomposition produces too many false positives because the established vocab is small relative to natural English. Combining the two gives clean results — vault tags handle niche technical terms, the dict handles normal English.
Cleanup judge: classify-then-decide, not free-form reason-then-decide. The cluster-cohesion LLM call returns a structured relationship field with an enum value (typo, morphological, synonym, related_distinct, named_vs_common) that deterministically maps to same_entity. This was rebuilt from a free-form "are these the same?" prompt that consistently failed in one direction: the model used "they're related" as evidence of sameness. Forcing a relationship label first, with explicit anti-patterns (broad-vs-narrow, device-vs-OS, adjective-vs-noun, sibling-concepts), eliminated that failure mode. The model writes a counterexample only when the relationship is related_distinct or named_vs_common — gating the counterexample to ambiguous cases prevents it from over-firing on trivial sing/pl pairs.
Cluster extension uses lexical similarity, not embedding similarity. Misspelled non-words are essentially random in nomic-embed-text space — sim("claud", "claude") = 0.54, well below any useful cluster threshold. So the cleanup tool's external-canonical lookup uses Levenshtein ratio (≥ 0.80) to find vault tags lex-close to cluster members, then drops candidates whose embedding sim to the cluster centroid is below 0.50 (filters out coincidental lookalikes like cloud for claud+claudai). The lookup is typo-gated: it only runs when at least one cluster member fails to appear in /usr/share/dict/words and isn't an established vault tag, because for clusters of real-word pairs (e.g. prophetic+prophecy) the external candidates introduce noise that shifts the cohesion verdict.
macOS Local Network privacy is per-binary. Each Homebrew Python minor version is treated as a separate binary by the macOS permission system. New venvs that need LAN access (e.g. talking to Ollama on a different host) must be created with python@3.13 if that's the binary that has the Local Network grant. A new minor version will silently fail with EHOSTUNREACH and no prompt. This quirk only applies on macOS; on Linux any Python 3.11+ works.
The processed: frontmatter timestamp is the only source of truth for incremental work. No external state file. Touching the note in any way (including a tag rewrite from the cleanup tool) intentionally does not update processed: — tag-level edits aren't semantic changes.

Repo layout

generate_tags.py        # daily tagger (entry point for cron)
generate_tags.md        # prompt template (read at runtime by generate_tags.py)
cleanup_tags.py         # vault-wide tag cleanup with scan/apply workflow
config.yaml             # all configuration
requirements.txt        # Python deps
run-daily.sh            # cron wrapper: ob sync → python → ob sync
.venv/                  # repo-local virtualenv (created during setup)
tag_embeddings.json     # embedding cache, regenerated when missing or model changes
tag_decisions.log       # append-only JSONL audit log of matcher judge calls
cleanup_decisions.json  # cleanup tool's LLM-verdict cache (cluster cohesion + junk-judge)
cleanup_proposals.json  # output of `cleanup_tags.py scan`; reviewed before apply
logs/                   # per-day dated logs
harness/                # replay harness for iterating on the cleanup judge prompt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Contents

What it does

Quick start

Daily run

Local vault

Headless server with Obsidian Sync

Manual runs

Cleanup tool

Configuration

Model performance

Design notes

Repo layout

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
docs/images		docs/images
harness		harness
.gitignore		.gitignore
README.md		README.md
cleanup_tags.py		cleanup_tags.py
config.yaml.example		config.yaml.example
generate_tags.md		generate_tags.md
generate_tags.py		generate_tags.py
requirements.txt		requirements.txt
run-daily.sh		run-daily.sh

Folders and files

Latest commit

History

Repository files navigation

Contents

What it does

Quick start

Daily run

Local vault

Headless server with Obsidian Sync

Manual runs

Cleanup tool

Configuration

Model performance

Design notes

Repo layout

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages