fix: anchor note GUIDs to Puzzle ID to prevent duplicates on re-import by SKOHscripts · Pull Request #19 · SKOHscripts/Optimized-Chess-Puzzles

SKOHscripts · 2026-05-28T08:02:14Z

Summary

Add PuzzleNote(genanki.Note) subclass that overrides guid to hash only fields[0] (the Puzzle ID), instead of all fields
Update _row_to_note() to return PuzzleNote instead of genanki.Note
Add tests/build_apkg_test.py with 12 tests covering GUID stability, Model ID constancy, Deck ID determinism, and integration

Problem

When a Lichess puzzle's Rating or Popularity changes between builds, genanki's default GUID (which hashes all fields) changes too. Anki then treats the re-imported note as new, duplicating every updated puzzle in the user's collection on each delivery.

Root cause

Three identifiers must be stable across builds for Anki to update rather than duplicate:

Note GUID — was hashing all fields (bug fixed here)
Model ID — already hardcoded (1757360269638), no change needed
Deck IDs — already derived from SHA1 of deck name, no change needed

Migration note

This is a one-time breaking change for users with an existing collection. Their notes carry the old multi-field GUID. The first import with the new package will not match those GUIDs and will create duplicates once. Users must:

Delete their existing ♟️ Optimized Chess Puzzles decks
Reimport the new .apkg

All subsequent deliveries will update cleanly.

Test plan

All 12 tests in tests/build_apkg_test.py pass (pytest tests/build_apkg_test.py -v)
test_guid_is_stable_across_builds: same PuzzleID with different Rating/Popularity → same GUID
test_different_puzzle_ids_give_different_guids: distinct Puzzle IDs → distinct GUIDs
test_model_id_is_hardcoded: MODEL_ID == 1757360269638
test_all_subdeck_ids_are_stable and test_all_subdeck_ids_are_distinct: deck IDs reproducible and unique
test_apkg_contains_expected_entries: output zip contains collection.anki2, collection.anki21, meta

https://claude.ai/code/session_01VAUnQCt5CM2TVpRbsQbSBL

Generated by Claude Code

Successive Anki imports of updated .apkg files were creating duplicate cards instead of updating existing ones. The root cause was that genanki's default GUID hashes all note fields together, so any change to Rating or Popularity generated a new GUID and Anki treated the note as new. Add PuzzleNote, a genanki.Note subclass that overrides guid to call genanki.guid_for(self.fields[0]) — keying solely on the Puzzle ID. Model ID (1757360269638) and deck IDs (SHA1 of deck name) were already stable across builds and required no changes. Add tests/build_apkg_test.py covering: - GUID stability when mutable fields change - GUID uniqueness across different Puzzle IDs - Model ID constant value - Deck ID determinism and distinctness per sub-deck - Integration: note GUIDs are consistent across two builds from the same SAMPLE_CARDS Migration note: this is a one-time breaking change for existing collections. Users must delete their old puzzle decks and reimport; subsequent deliveries will update correctly. https://claude.ai/code/session_01VAUnQCt5CM2TVpRbsQbSBL

The new build_apkg_test.py imports build_apkg which depends on genanki. genanki is declared in requirements-build.txt (which already extends requirements.txt via -r), so switching the install target makes genanki available to the full test suite without duplicating the dependency. https://claude.ai/code/session_01VAUnQCt5CM2TVpRbsQbSBL

Each sub-deck and the parent deck now carry a plain-text description computed at build time from the puzzle rows, mirroring what lichess_optimized_puzzles_datasets reports at generation time: 847 puzzles Rating: 1000–1100, average 1048 Popularity: average 84% 23 themes: fork (89) · pin (76) · mateIn1 (65) · deflection (52) · ... Implementation: - _build_description(rows) aggregates count, ELO range/average, popularity average, and top-15 themes sorted by frequency. - build_from_csvs now reads each CSV into a list first so stats can be computed before the genanki.Deck object is created. All rows are also accumulated for the parent deck's aggregate description. - An explicit parent deck (♟️ Optimized Chess Puzzles) is created with the aggregate description across all sub-decks; genanki.Deck accepts a description= keyword argument. - build_sample follows the same pattern with SAMPLE_CARDS. - 12 new tests cover description content and edge cases (missing fields, theme sorting, singular/plural grammar). https://claude.ai/code/session_01VAUnQCt5CM2TVpRbsQbSBL

build_from_csvs had 21 local variables (R0914) after adding n_subdecks. Replaced it with len(decks) - 1 inline in the final print statement. https://claude.ai/code/session_01VAUnQCt5CM2TVpRbsQbSBL

Coverage = unique themes in sampled deck / unique themes in full ELO tranche before filtering, computed at CSV generation time. lichess_optimized_puzzles_datasets.py: - report_theme_coverage() now returns a stats dict {selected, unique_themes_sample, unique_themes_tranche, coverage_pct} in addition to printing the existing report. - extract_tranches() collects those dicts and writes puzzles_stats.json alongside the puzzle CSVs so build_apkg.py can read them. build_apkg.py: - Add _load_deck_stats(csv_dir) helper to read puzzles_stats.json. - _build_description() accepts an optional coverage float and appends "(74.3% of tranche themes)" inline with the theme list when present. - build_from_csvs() loads stats once and passes coverage_pct per deck. - Move media path to module constant MEDIA_PATH to stay within pylint's 20-local-variable limit (was 22 after new locals were added). 5 new tests cover coverage display, absence when None, and JSON loading. https://claude.ai/code/session_01VAUnQCt5CM2TVpRbsQbSBL

Instead of writing puzzles_stats.json and reading it back in a second run, build_full() passes coverage stats directly from extract_tranches() to build_from_csvs() in memory. lichess_optimized_puzzles_datasets.py: - extract_tranches() return type is now Dict[str, Dict] (was None). - return all_stats added at the end (json.dump is kept for standalone use). build_apkg.py: - build_from_csvs() gains an optional deck_stats parameter; when provided it is used directly, skipping _load_deck_stats() and the JSON file. - build_full(csv_dir, output) chains download_puzzle_db(), decompress_zst(), extract_tranches(), and build_from_csvs() in one process. lichess module is imported lazily so pandas/chess are not loaded in normal mode. - --full CLI flag triggers the full pipeline. The JSON sidecar (puzzles_stats.json) is still written by extract_tranches for backward-compat with separate invocations; the full pipeline simply bypasses it. 2 new tests: apkg produced correctly with injected stats, and _load_deck_stats is not called when deck_stats is provided. https://claude.ai/code/session_01VAUnQCt5CM2TVpRbsQbSBL

claude added 6 commits May 28, 2026 08:01

fix: inline n_subdecks to stay within pylint's 20-local-variable limit

cf4da24

build_from_csvs had 21 local variables (R0914) after adding n_subdecks. Replaced it with len(decks) - 1 inline in the final print statement. https://claude.ai/code/session_01VAUnQCt5CM2TVpRbsQbSBL

SKOHscripts marked this pull request as ready for review May 28, 2026 08:37

SKOHscripts merged commit fe4f194 into main May 28, 2026
14 checks passed

SKOHscripts deleted the claude/anki-guid-puzzle-id-4pdvG branch May 28, 2026 08:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: anchor note GUIDs to Puzzle ID to prevent duplicates on re-import#19

fix: anchor note GUIDs to Puzzle ID to prevent duplicates on re-import#19
SKOHscripts merged 6 commits into
mainfrom
claude/anki-guid-puzzle-id-4pdvG

SKOHscripts commented May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

SKOHscripts commented May 28, 2026

Summary

Problem

Root cause

Migration note

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants