refactor: rework puzzle selection with quality score, denylist, and true motif caps#21
Merged
Merged
Conversation
…rue motif caps Replace the iterrows+dict sampling with a vectorized pipeline that addresses six structural weaknesses in the original algorithm: - Bayesian quality score (Popularity + NbPlays confidence-shrunk) so a noisy 100%/3-plays puzzle no longer outranks a well-evidenced 92%/5000-plays puzzle; RD used as a soft sort-key preference without blocking any motif. - THEME_DENYLIST excludes non-tactical metadata tags (mateIn1..5, crushing, master, oneMove, etc.) from the diversity objective and coverage metric. - Vectorized fast-pass: explode+sort+groupby.head instead of iterrows over millions of rows; fully deterministic via PuzzleId tiebreak. - Theme-aware complement: motifs present only in low-popularity puzzles are guaranteed ≥1 representative (no popularity gate in complement phase). - True per-motif cap: motif_count tracks all co-occurring motifs of selected puzzles, preventing overrepresentation via the quality top-up phase. - target_deck_size (default 1200) gives explicit volume control per tranche. - The unbounded ≥1800 tail is split into 1800-1900, 1900-2000, 2000-2200, 2200+ for homogeneous Woodpecker decks; ALL_DECKS updated accordingly. - report_theme_coverage adds motif-based keys (unique_motifs_sample/tranche, coverage_pct_all) while keeping all backward-compat keys. - 10 new tests covering quality monotonicity, denylist, complement, cap, and determinism under row shuffle. https://claude.ai/code/session_01VAUnQCt5CM2TVpRbsQbSBL
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Complete overhaul of the puzzle-selection algorithm in
lichess_optimized_puzzles_datasets.py, addressing six structural weaknesses identified in the original design.Problems fixed
quality = (NbPlays×p + 30×0.5) / (NbPlays + 30)— a noisy 100%/3-plays puzzle no longer beats a solid 92%/5000-plays puzzlePopularitywas the only quality signal;NbPlaysandRatingDeviationignoredNbPlaysread (tolerantly — absent columns fall back gracefully);RatingDeviationused as a soft sort-key preference (never blocks a motif)motif_counttracks ALL motifs of every selected puzzle; quality top-up respects the true capThemesmixed tactical motifs with metadata tags, inflating coverage metricTHEME_DENYLISTexcludesmateIn1..5,oneMove,short/long/veryLong,crushing/advantage/equality,master/masterVsMaster/superGM,opening/middlegame/endgameiterrows()over millions of rowsexplode+sort_values+groupby().head()fast-passNew capabilities
target_deck_size(default 1200) gives explicit control over cards per deckUPPER_TRANCHE_EDGES: the unbounded≥1800tail is split into 1800–1900, 1900–2000, 2000–2200, 2200+, giving homogeneous Woodpecker decks (14 sub-decks total)report_theme_coveragenow returnsunique_motifs_sample,unique_motifs_tranche,coverage_pct(motif-based, honest), andcoverage_pct_all(all-theme, for reference); all existing keys preservedALL_DECKSinbuild_apkg.pyupdated to match the new filenamesTest plan
pytest tests/ -q)_quality_topupcap enforcement, determinism under row shuffle, target deck size, motif coverage keys, upper tranche edgesextract_trancheson a real Lichess CSV sample and verifycoverage_pct≈ 100%, deck sizes ≈ 1200, no motif exceedstarget_per_themehttps://claude.ai/code/session_01VAUnQCt5CM2TVpRbsQbSBL
Generated by Claude Code