diff --git a/.gitignore b/.gitignore
index c45b76a..e0790a3 100644
--- a/.gitignore
+++ b/.gitignore
@@ -31,6 +31,7 @@ chess_openings.csv
 !puzzles_errors_traps.csv
 !puzzles_openings.csv
 opening_report.txt
+puzzles_stats.json
 
 # IDE
 .vscode/
diff --git a/README.md b/README.md
index 0c115d8..5e1ee67 100644
--- a/README.md
+++ b/README.md
@@ -16,9 +16,9 @@
 
 **Scientifically Curated Training Deck for Chess Tactical Mastery for [Anki](https://apps.ankiweb.net/)** featuring:
 
-- **12674** puzzles curated from [complete Lichess database](https://database.lichess.org/) using advanced thematic sampling algorithms
-- Each 100 ELO range containing **~1200** puzzles
-- **≥98.4%** coverage over all themes available in each 100 ELO range
+- **~16 800** puzzles curated from [complete Lichess database](https://database.lichess.org/) using advanced thematic sampling algorithms
+- Each ELO band targeting **~1200** puzzles (14 sub-decks)
+- **Near-100%** motif coverage per ELO band (tactical themes only, metadata tags excluded)
 - Pedagogical quality for systematic chess improvement.
 - **500+ opening variations** across 13+ major families
 - **Color-balanced training** with separate analysis for white openings and black defenses
@@ -130,7 +130,7 @@ The only way to use puzzles and transpose them into real games is to learn to ca
 ## 🔬 Training Methodologies
 
 ### **1. Woodpecker Method by ELO Range 🔨**
-Each range (~1200 puzzles) allows you to apply the famous Woodpecker method: solve the same set multiple times in accelerated cycles to develop automatic recognition of tactical patterns. This approach transforms conscious thinking into unconscious reflexes, drastically increasing calculation speed in games. [[1](https://forwardchess.com/blog/what-is-the-woodpecker-method/)]
+Each range (~1200 puzzles, 14 sub-decks: <1000, 1000–1100, …, 1700–1800, 1800–1900, 1900–2000, 2000–2200, 2200+) allows you to apply the famous Woodpecker method: solve the same set multiple times in accelerated cycles to develop automatic recognition of tactical patterns. This approach transforms conscious thinking into unconscious reflexes, drastically increasing calculation speed in games. [[1](https://forwardchess.com/blog/what-is-the-woodpecker-method/)]
 
 ### **2. Personalized Spaced Repetition 🧠🔄**
 Use Anki's spaced repetition system to optimize learning according to your current level. The carefully selected puzzles guarantee constant progress without excessive frustration. Research shows that spaced repetition improves long-term retention by **200-300%** compared to traditional methods. [[2](https://www.bananote.ai/blog/the-complete-spaced-repetition-schedule-for-long-term-retention-a-science-based-guide-to-never-forgetting-what-you-learn)], [[3](https://pmc.ncbi.nlm.nih.gov/articles/PMC12357012/)]
@@ -176,20 +176,22 @@ Solution, themes, and analysis links appear only after your attempt, respecting
 The script downloads the **complete Lichess database** (several million puzzles) and automatically processes it. This database contains all community-validated puzzles with their metadata: ELO rating, popularity, tactical themes, and associated openings.
 
 ### **2. Intelligent Sampling by Thematic Diversity 🎯**
-**Fundamental principle:** Instead of simply taking the most popular puzzles (which would create redundancies), the script applies a **maximum coverage algorithm by theme**:
+**Fundamental principle:** Instead of simply taking the most popular puzzles (which would create redundancies), the script applies a **maximum-coverage algorithm** with a Bayesian quality score and explicit motif caps:
 
 ```python
-def sample_by_themes(tranche, target_per_theme=17, popularity_threshold=90):
+def sample_by_themes(tranche, target_per_theme=17, popularity_threshold=90,
+                     target_deck_size=1200, min_nbplays=20):
 ```
 
 **Selection steps:**
-1. **Theme identification**: Extract all tactical themes (fork, pin, discovered attack, etc.)
-2. **Quality filtering**: Priority selection of puzzles with Popularity ≥ 90%
-3. **Balanced distribution**: Maximum 17 puzzles per theme to avoid overrepresentation
-4. **Intelligent complement**: Add puzzles with lower popularity for rare themes
+1. **Quality scoring**: Each puzzle gets a Bayesian confidence score combining Popularity and NbPlays — a 100%/3-plays puzzle correctly ranks below a 92%/5000-plays puzzle.
+2. **Motif filtering**: A denylist removes non-tactical metadata tags (`mateIn1..5`, `oneMove`, `short/long`, `crushing`, `master`…) so the diversity objective targets real patterns.
+3. **Vectorized fast-pass**: For each meaningful motif, select the top `target_per_theme` puzzles by quality from the primary pool (Popularity ≥ 90%, NbPlays ≥ 20).
+4. **Theme-aware complement**: Any motif still uncovered (e.g., only found in low-popularity puzzles) gets a best-available puzzle added without a popularity gate.
+5. **Quality top-up**: Fill to `target_deck_size` in quality order, respecting a **true per-motif cap** that counts co-occurrences across all motifs of each selected puzzle.
 
 ### **3. Exhaustive Coverage Guarantee 📊**
-If thematic sampling produces fewer than 700 puzzles, the script automatically completes with the most popular remaining puzzles, guaranteeing sufficient volume for intensive training while preserving diversity.
+If thematic sampling produces fewer than 700 puzzles (only for tiny tranches), the script fills up with the highest-quality remaining puzzles. In normal operation, the theme-aware complement step guarantees ≥ 1 puzzle per tactical motif present in the tranche.
 
 ### **4. Optimized Technical Preprocessing 🔄**
 **Crucial point**: Lichess puzzles show the position **before** the opponent's move. The script automatically applies this first move to present the real position to solve, then converts remaining moves to readable notation (SAN).
@@ -357,9 +359,9 @@ This deck combines the best modern pedagogical practices:
 ### 📊 Statistics
 - **Based on**: Lichess community database
 - **Optimization**: Spaced repetition algorithms
-- **Coverage**: >98% thematic coverage per ELO range
-- **Quality**: 90%+ community approval rating
-- **Volume**: ~1200 puzzles per ELO range
+- **Coverage**: Near-100% motif coverage per ELO band (tactical themes, denylist applied)
+- **Quality**: Bayesian quality score combining Popularity + NbPlays (confidence-weighted)
+- **Volume**: ~1200 puzzles per ELO band (14 sub-decks, bounded high-ELO ranges)
 
 ***
 
diff --git a/build_apkg.py b/build_apkg.py
index eb91a97..1a8710b 100644
--- a/build_apkg.py
+++ b/build_apkg.py
@@ -70,7 +70,10 @@ def guid(self):
     ("puzzles_1500_1600.csv",    "07 | 1500 - 1600 ELO"),
     ("puzzles_1600_1700.csv",    "08 | 1600 - 1700 ELO"),
     ("puzzles_1700_1800.csv",    "09 | 1700 - 1800 ELO"),
-    ("puzzles_1800plus.csv",     "10 | 1800+ ELO"),
+    ("puzzles_1800_1900.csv",    "10 | 1800 - 1900 ELO"),
+    ("puzzles_1900_2000.csv",    "11 | 1900 - 2000 ELO"),
+    ("puzzles_2000_2200.csv",    "12 | 2000 - 2200 ELO"),
+    ("puzzles_2200plus.csv",     "13 | 2200+ ELO"),
 ]
 
 SAMPLE_CARDS: List[Dict[str, str]] = [
@@ -326,7 +329,8 @@ def build_full(csv_dir: str, output: str) -> None:
     import lichess_optimized_puzzles_datasets as ld  # pylint: disable=import-outside-toplevel
     ld.download_puzzle_db()
     ld.decompress_zst()
-    stats = ld.extract_tranches(ld.CSV_FILE, target_per_theme=17, popularity_threshold=90)
+    stats = ld.extract_tranches(ld.CSV_FILE, target_per_theme=17, popularity_threshold=90,
+                               target_deck_size=1200)
     build_from_csvs(csv_dir, output, deck_stats=stats)
 
 
diff --git a/lichess_optimized_puzzles_datasets.py b/lichess_optimized_puzzles_datasets.py
index 637e7a3..5e1ba07 100644
--- a/lichess_optimized_puzzles_datasets.py
+++ b/lichess_optimized_puzzles_datasets.py
@@ -36,8 +36,7 @@
 import os
 import re
 import subprocess
-from collections import defaultdict
-from typing import Dict, List, Tuple
+from typing import Dict, List, Set, Tuple
 
 import chess
 import pandas
@@ -48,9 +47,51 @@
 CSV_FILE = "lichess_db_puzzle.csv"
 
 MIN_PUZZLES_PER_RANGE = 700
+TARGET_DECK_SIZE = 1200
 DOWNLOAD_TIMEOUT = 120
 DOWNLOAD_CHUNK_SIZE = 8192
 
+# Bayesian quality-score hyperparameters:
+# quality = (NbPlays * p + QUALITY_WEIGHT * QUALITY_PRIOR) / (NbPlays + QUALITY_WEIGHT)
+# A puzzle with few plays is pulled toward the prior, preventing a noisy 100%/3-plays
+# from outranking a well-evidenced 95%/5000-plays puzzle.
+QUALITY_WEIGHT: int = 30
+QUALITY_PRIOR: float = 0.5
+
+# Maximum RatingDeviation for the "calibrated" soft-preference sort key.
+# Puzzles with RD above this are still selectable but ranked lower.
+RD_MAX: int = 90
+
+# Lichess Themes tags that are NOT tactical motifs — metadata tags that would
+# inflate the diversity count and coverage metric without reflecting pedagogical
+# content. Applied as a denylist (fail-open: new genuine motifs in the Lichess
+# vocabulary are kept automatically).
+THEME_DENYLIST: Set[str] = {
+    # Move-count / length descriptors
+    "oneMove", "short", "long", "veryLong",
+    # Forced-mate labels (the motif is checkmate, which is already tactical, but
+    # the sub-labels add no diversity signal — every "mateIn2" theme is the same
+    # diversity unit regardless of the motif that leads to it)
+    "mate", "mateIn1", "mateIn2", "mateIn3", "mateIn4", "mateIn5",
+    # Evaluation buckets (outcome, not motif)
+    "crushing", "advantage", "equality",
+    # Game-phase tags (broad phases, not specific patterns; sub-motifs like
+    # pawnEndgame, rookEndgame, etc. are kept because they are pedagogically distinct)
+    "opening", "middlegame", "endgame",
+    # Player-strength provenance (not a tactical pattern)
+    "master", "masterVsMaster", "superGM",
+}
+
+# Additional 100-ELO sub-tranches that replace the unbounded >=1800 tail, which
+# was too heterogeneous (1800–2800+) for the Woodpecker method. Each entry is
+# (lower_bound, upper_bound, output_filename). The final >=2200 tranche is
+# handled separately in extract_tranches.
+UPPER_TRANCHE_EDGES: List[Tuple[int, int, str]] = [
+    (1800, 1900, "puzzles_1800_1900.csv"),
+    (1900, 2000, "puzzles_1900_2000.csv"),
+    (2000, 2200, "puzzles_2000_2200.csv"),
+]
+
 
 def safe_str(value) -> str:
     """
@@ -170,55 +211,257 @@ def uci_seq_to_san(fen: str, uci_moves: str) -> str:
     return " ".join(san_moves)
 
 
+# ---------------------------------------------------------------------------
+# Sampling helpers
+# ---------------------------------------------------------------------------
+
+
+def _meaningful_motifs(themes_str) -> List[str]:
+    """Return the tactical-motif tokens from a Themes string, excluding metadata tags."""
+    return [t for t in str(themes_str).split() if t and t not in THEME_DENYLIST]
+
+
+def _augment_tranche(
+    tranche: pandas.DataFrame,
+    popularity_threshold: int,
+    min_nbplays: int,
+) -> Tuple[pandas.DataFrame, set, pandas.DataFrame]:
+    """
+    Add computed columns to *tranche* and return (work, all_motifs, primary_pool).
+
+    Columns added to *work*:
+    - ``_quality``: Bayesian confidence-shrunk quality score in [0, 1].
+    - ``_rd_ok``:   1 if RatingDeviation ≤ RD_MAX, else 0 (soft preference).
+    - ``_motifs``:  List of meaningful tactical motifs (denylist applied).
+
+    *primary_pool* is the subset of *work* satisfying both the popularity threshold
+    and (when NbPlays is present) the minimum-play confidence floor.
+    """
+    has_nbplays = 'NbPlays' in tranche.columns
+    nbplays = tranche['NbPlays'].fillna(0).astype(float) if has_nbplays else pandas.Series(0.0, index=tranche.index)
+    p = (tranche['Popularity'].clip(-100, 100) + 100.0) / 200.0
+    quality = (nbplays * p + QUALITY_WEIGHT * QUALITY_PRIOR) / (nbplays + QUALITY_WEIGHT)
+
+    has_rd = 'RatingDeviation' in tranche.columns
+    if has_rd:
+        rd_ok = (tranche['RatingDeviation'].fillna(200) <= RD_MAX).astype(int)
+    else:
+        rd_ok = pandas.Series(1, index=tranche.index)
+
+    work = tranche.copy()
+    work['_quality'] = quality
+    work['_rd_ok'] = rd_ok
+    work['_motifs'] = work['Themes'].apply(_meaningful_motifs)
+
+    all_motifs: set = {m for ml in work['_motifs'] for m in ml}
+
+    prim_mask = work['Popularity'] >= popularity_threshold
+    if has_nbplays:
+        prim_mask = prim_mask & (work['NbPlays'].fillna(0) >= min_nbplays)
+    return work, all_motifs, work[prim_mask]
+
+
+def _fast_pass(
+    primary_pool: pandas.DataFrame,
+    all_motifs: set,
+    target_per_theme: int,
+) -> List[str]:
+    """
+    Vectorized first-pass selection: best puzzles per motif from the quality pool.
+
+    Explodes the primary pool on meaningful motifs, sorts by
+    (motif, rd_ok desc, quality desc, PuzzleId asc) for determinism, then
+    takes the top *target_per_theme* puzzles per motif. Deduplication ensures
+    each PuzzleId appears at most once in the returned list.
+
+    Returns an ordered list of PuzzleIds.
+    """
+    if primary_pool.empty or not all_motifs:
+        return []
+    exploded = primary_pool.explode('_motifs')
+    exploded = exploded[exploded['_motifs'].notna() & (exploded['_motifs'] != '')]
+    exploded = exploded.sort_values(
+        ['_motifs', '_rd_ok', '_quality', 'PuzzleId'],
+        ascending=[True, False, False, True],
+    )
+    per_motif_top = exploded.groupby('_motifs').head(target_per_theme)
+    seen: List[str] = []
+    seen_set: set = set()
+    for pid in per_motif_top['PuzzleId']:
+        if pid not in seen_set:
+            seen_set.add(pid)
+            seen.append(pid)
+    return seen
+
+
+def _find_complement_pids(
+    work: pandas.DataFrame,
+    selected_ids: set,
+    uncovered: set,
+) -> List[str]:
+    """
+    For each motif in *uncovered*, find the single best available puzzle.
+
+    No popularity threshold is applied so that motifs whose only representatives
+    are low-popularity puzzles are still covered.  Returns a list of PuzzleIds
+    (one per uncovered motif, no duplicates).
+    """
+    if not uncovered:
+        return []
+    pool = work[~work['PuzzleId'].isin(selected_ids)].explode('_motifs')
+    pool = pool[pool['_motifs'].isin(uncovered)]
+    pool = pool.sort_values(
+        ['_motifs', '_rd_ok', '_quality', 'PuzzleId'],
+        ascending=[True, False, False, True],
+    )
+    result: List[str] = []
+    covered: set = set()
+    used: set = set()
+    for _, row in pool.iterrows():
+        motif = row['_motifs']
+        pid = row['PuzzleId']
+        if motif not in covered and pid not in used:
+            covered.add(motif)
+            used.add(pid)
+            result.append(pid)
+    return result
+
+
+def _quality_topup(
+    work: pandas.DataFrame,
+    selected_ids: set,
+    motif_count: dict,
+    target_per_theme: int,
+    n_remaining: int,
+) -> List[str]:
+    """
+    Fill up to *n_remaining* more puzzles from the full tranche by quality.
+
+    Respects the true per-motif cap: a candidate is only added when at least one
+    of its motifs has not yet reached *target_per_theme* (or it has no motifs,
+    in which case it is motif-neutral and added unconditionally).
+
+    Returns an ordered list of PuzzleIds.
+    """
+    if n_remaining <= 0:
+        return []
+    remaining = work[~work['PuzzleId'].isin(selected_ids)].sort_values(
+        ['_rd_ok', '_quality', 'PuzzleId'], ascending=[False, False, True]
+    )
+    result: List[str] = []
+    for _, row in remaining.iterrows():
+        if len(result) >= n_remaining:
+            break
+        motifs = row['_motifs']
+        if not motifs or any(motif_count.get(m, 0) < target_per_theme for m in motifs):
+            result.append(row['PuzzleId'])
+    return result
+
+
+def _process_tranche(
+    tranche_df: pandas.DataFrame,
+    out_file: str,
+    all_stats: Dict[str, Dict],
+    target_per_theme: int,
+    popularity_threshold: int,
+    target_deck_size: int,
+) -> None:
+    """Sample, write, and record coverage stats for one ELO tranche."""
+    sampled_rows = sample_by_themes(
+        tranche_df,
+        target_per_theme=target_per_theme,
+        popularity_threshold=popularity_threshold,
+        target_deck_size=target_deck_size,
+    )
+    _write_csv_file(sampled_rows, out_file)
+    all_stats[out_file] = report_theme_coverage(sampled_rows, out_file, tranche_df)
+
+
+# ---------------------------------------------------------------------------
+# Public sampling API
+# ---------------------------------------------------------------------------
+
+
 def sample_by_themes(
     tranche: pandas.DataFrame,
     target_per_theme: int = 17,
     popularity_threshold: int = 90,
+    target_deck_size: int = TARGET_DECK_SIZE,
+    min_nbplays: int = 20,
 ) -> List:
     """
     Sample puzzles using intelligent thematic diversity algorithm.
 
-    This function implements maximum coverage sampling to ensure diverse
-    representation of tactical themes while prioritizing puzzle quality.
+    Pipeline:
+    1. Augment each puzzle with a Bayesian quality score and meaningful motifs.
+    2. Vectorized fast-pass: top *target_per_theme* puzzles per motif from the
+       primary quality pool (Popularity ≥ threshold, NbPlays ≥ min_nbplays when
+       available), ranked by (rd_ok, quality, PuzzleId).
+    3. Theme-aware complement: for motifs still uncovered, force in the best
+       available puzzle regardless of popularity — so even rare themes in low-
+       popularity puzzles are covered.
+    4. Quality top-up to *target_deck_size*, respecting the true per-motif cap
+       (counted across all co-occurring motifs of each selected puzzle).
+    5. Safety fill to MIN_PUZZLES_PER_RANGE if the tranche is too small to reach
+       it through quality selection alone.
 
     Parameters
     ----------
     tranche : pandas.DataFrame
         DataFrame containing puzzles for a specific ELO range
     target_per_theme : int, default=17
-        Maximum number of puzzles to select per theme
+        Maximum number of puzzles per tactical motif (true cap, counting
+        co-occurrences across all motifs of every selected puzzle)
     popularity_threshold : int, default=90
-        Minimum popularity score for initial selection
+        Minimum popularity score for the primary quality pool
+    target_deck_size : int, default=TARGET_DECK_SIZE
+        Desired number of cards in the output deck
+    min_nbplays : int, default=20
+        Minimum number of plays for the primary quality pool
+        (disabled when the NbPlays column is absent)
 
     Returns
     -------
     list
         List of selected puzzle rows ensuring thematic diversity
     """
-    theme_dict: defaultdict = defaultdict(list)
+    if tranche.empty:
+        return []
 
-    for _, row in tranche.iterrows():
-        if row['Popularity'] >= popularity_threshold:
-            for theme in str(row['Themes']).split():
-                theme_dict[theme].append(row)
+    work, all_motifs, primary_pool = _augment_tranche(tranche, popularity_threshold, min_nbplays)
+    # iloc-based lookup: PuzzleId stays a regular column; work_by_pid maps it to row position.
+    work_by_pid: Dict[str, int] = {str(pid): i for i, pid in enumerate(work['PuzzleId'])}
 
     selected_ids: set = set()
     selected_rows: List = []
+    motif_count: dict = {}
+
+    def _add(pid: str) -> None:
+        if pid in selected_ids:
+            return
+        row = work.iloc[work_by_pid[str(pid)]]
+        selected_ids.add(pid)
+        selected_rows.append(row)
+        for m in row['_motifs']:
+            motif_count[m] = motif_count.get(m, 0) + 1
 
-    for puzzles in theme_dict.values():
-        count = 0
-        for row in puzzles:
-            if row['PuzzleId'] not in selected_ids and count < target_per_theme:
-                selected_ids.add(row['PuzzleId'])
-                selected_rows.append(row)
-                count += 1
+    for pid in _fast_pass(primary_pool, all_motifs, target_per_theme):
+        _add(pid)
+
+    for pid in _find_complement_pids(work, selected_ids, all_motifs - set(motif_count)):
+        _add(pid)
+
+    for pid in _quality_topup(work, selected_ids, motif_count, target_per_theme, target_deck_size - len(selected_rows)):
+        _add(pid)
 
     if len(selected_rows) < MIN_PUZZLES_PER_RANGE:
         needed = MIN_PUZZLES_PER_RANGE - len(selected_rows)
-        extras = tranche[~tranche['PuzzleId'].isin(selected_ids)].sort_values(
-            'Popularity', ascending=False
+        extras = work[~work['PuzzleId'].isin(selected_ids)].sort_values(
+            ['_rd_ok', '_quality', 'PuzzleId'], ascending=[False, False, True]
         ).head(needed)
-        selected_rows.extend(row for _, row in extras.iterrows())
+        for _, row in extras.iterrows():
+            selected_rows.append(row)
+            selected_ids.add(row['PuzzleId'])
 
     return selected_rows
 
@@ -227,12 +470,13 @@ def extract_tranches(
     csv_file: str,
     target_per_theme: int = 17,
     popularity_threshold: int = 90,
+    target_deck_size: int = TARGET_DECK_SIZE,
 ) -> Dict[str, Dict]:
     """
     Extract and process puzzle tranches for different ELO ranges.
 
     Creates separate CSV files for each ELO range with optimally selected puzzles.
-    Ranges include: <1000, 1000-1100, 1100-1200, ..., 1700-1800, 1800+
+    Ranges: <1000, 1000-1100, …, 1700-1800, 1800-1900, 1900-2000, 2000-2200, ≥2200.
     Also writes puzzles_stats.json with per-tranche coverage stats consumed by
     build_apkg.py when generating deck descriptions.
 
@@ -244,45 +488,32 @@ def extract_tranches(
         Maximum puzzles per theme for balanced sampling
     popularity_threshold : int, default=90
         Minimum popularity threshold for quality filtering
+    target_deck_size : int, default=TARGET_DECK_SIZE
+        Target number of puzzles per output deck
     """
     dataframe = pandas.read_csv(csv_file)
-    cols = ['PuzzleId', 'FEN', 'Moves', 'Rating', 'Popularity', 'Themes', 'OpeningTags']
-    dataframe = dataframe[cols]
+    desired = ['PuzzleId', 'FEN', 'Moves', 'Rating', 'Popularity', 'Themes',
+               'OpeningTags', 'NbPlays', 'RatingDeviation']
+    dataframe = dataframe[[c for c in desired if c in dataframe.columns]]
     all_stats: Dict[str, Dict] = {}
 
-    first_tranche = dataframe[dataframe['Rating'] < 1000]
-    sampled_rows = sample_by_themes(
-        first_tranche,
-        target_per_theme=target_per_theme,
-        popularity_threshold=popularity_threshold
-    )
-    _write_csv_file(sampled_rows, "puzzles_1000minus.csv")
-    all_stats["puzzles_1000minus.csv"] = report_theme_coverage(
-        sampled_rows, "puzzles_1000minus.csv", first_tranche
-    )
+    _process_tranche(dataframe[dataframe['Rating'] < 1000], "puzzles_1000minus.csv",
+                     all_stats, target_per_theme, popularity_threshold, target_deck_size)
 
     for elo_start in range(1000, 1800, 100):
         elo_end = elo_start + 100
         tranche = dataframe[(dataframe['Rating'] >= elo_start) & (dataframe['Rating'] < elo_end)]
-        sampled_rows = sample_by_themes(
-            tranche,
-            target_per_theme=target_per_theme,
-            popularity_threshold=popularity_threshold
+        _process_tranche(tranche, f"puzzles_{elo_start}_{elo_end}.csv",
+                         all_stats, target_per_theme, popularity_threshold, target_deck_size)
+
+    for lo, hi, filename in UPPER_TRANCHE_EDGES:
+        _process_tranche(
+            dataframe[(dataframe['Rating'] >= lo) & (dataframe['Rating'] < hi)],
+            filename, all_stats, target_per_theme, popularity_threshold, target_deck_size,
         )
-        out_file = f"puzzles_{elo_start}_{elo_end}.csv"
-        _write_csv_file(sampled_rows, out_file)
-        all_stats[out_file] = report_theme_coverage(sampled_rows, out_file, tranche)
 
-    last_tranche = dataframe[dataframe['Rating'] >= 1800]
-    sampled_rows = sample_by_themes(
-        last_tranche,
-        target_per_theme=target_per_theme,
-        popularity_threshold=popularity_threshold
-    )
-    _write_csv_file(sampled_rows, "puzzles_1800plus.csv")
-    all_stats["puzzles_1800plus.csv"] = report_theme_coverage(
-        sampled_rows, "puzzles_1800plus.csv", last_tranche
-    )
+    _process_tranche(dataframe[dataframe['Rating'] >= 2200], "puzzles_2200plus.csv",
+                     all_stats, target_per_theme, popularity_threshold, target_deck_size)
 
     with open("puzzles_stats.json", "w", encoding="utf-8") as stats_file:
         json.dump(all_stats, stats_file, indent=2)
@@ -341,8 +572,9 @@ def report_theme_coverage(
     """
     Generate and display theme coverage statistics for the puzzle selection.
 
-    Provides transparency about the thematic diversity achieved in each
-    puzzle set, showing coverage percentage and theme distribution.
+    Reports both full-theme and motif-only (denylist-filtered) coverage so the
+    deck descriptions reflect genuine tactical diversity rather than metadata
+    tags inflating the denominator.
 
     Parameters
     ----------
@@ -357,25 +589,35 @@ def report_theme_coverage(
     -------
     dict
         Stats dict with keys: selected, unique_themes_sample,
-        unique_themes_tranche, coverage_pct.  Consumed by build_apkg.py
-        to populate deck descriptions.
+        unique_themes_tranche, unique_motifs_sample, unique_motifs_tranche,
+        coverage_pct (motif-based), coverage_pct_all (all-theme-based).
+        Consumed by build_apkg.py to populate deck descriptions.
     """
     selected_themes: set = set()
-    theme_freq: dict[str, int] = {}
+    selected_motifs: set = set()
+    theme_freq: dict = {}
 
     for row in sampled_rows:
-        for theme in str(row['Themes']).split():
-            selected_themes.add(theme)
-            theme_freq[theme] = theme_freq.get(theme, 0) + 1
+        for t in str(row['Themes']).split():
+            selected_themes.add(t)
+            theme_freq[t] = theme_freq.get(t, 0) + 1
+        for m in _meaningful_motifs(str(row['Themes'])):
+            selected_motifs.add(m)
 
     tranche_themes = {
-        theme
-        for themes_str in (tranche['Themes'].fillna('').astype(str) if 'Themes' in tranche.columns else [])
-        for theme in themes_str.split()
-        if theme
+        t
+        for ts in (tranche['Themes'].fillna('').astype(str) if 'Themes' in tranche.columns else [])
+        for t in ts.split()
+        if t
+    }
+    tranche_motifs = {
+        m
+        for ts in (tranche['Themes'].fillna('').astype(str) if 'Themes' in tranche.columns else [])
+        for m in _meaningful_motifs(ts)
     }
 
-    percentage_coverage = len(selected_themes) / max(len(tranche_themes), 1) * 100
+    pct_all = len(selected_themes) / max(len(tranche_themes), 1) * 100
+    pct_motifs = len(selected_motifs) / max(len(tranche_motifs), 1) * 100
 
     sorted_freq = sorted(theme_freq.items(), key=lambda x: -x[1])
     first_themes = sorted_freq[:5]
@@ -383,9 +625,9 @@ def report_theme_coverage(
 
     print(f"\n📊 Theme coverage for {out_file}:")
     print(f"- Selected puzzles: {len(sampled_rows)}")
-    print(f"- Unique themes covered: {len(selected_themes)}")
-    print(f"- Distinct themes in tranche (all puzzles): {len(tranche_themes)}")
-    print(f"- Real thematic coverage percentage: {percentage_coverage:.1f}%")
+    print(f"- Unique themes covered: {len(selected_themes)} (motifs: {len(selected_motifs)})")
+    print(f"- Distinct themes in tranche: {len(tranche_themes)} (motifs: {len(tranche_motifs)})")
+    print(f"- Motif coverage: {pct_motifs:.1f}%  (all-theme coverage: {pct_all:.1f}%)")
 
     for theme, freq in first_themes:
         print(f"  • {theme}: {freq} puzzles")
@@ -399,7 +641,10 @@ def report_theme_coverage(
         "selected": len(sampled_rows),
         "unique_themes_sample": len(selected_themes),
         "unique_themes_tranche": len(tranche_themes),
-        "coverage_pct": round(percentage_coverage, 1),
+        "unique_motifs_sample": len(selected_motifs),
+        "unique_motifs_tranche": len(tranche_motifs),
+        "coverage_pct": round(pct_motifs, 1),
+        "coverage_pct_all": round(pct_all, 1),
     }
 
 
@@ -413,7 +658,8 @@ def main() -> None:
     """
     download_puzzle_db()
     decompress_zst()
-    extract_tranches(CSV_FILE, target_per_theme=17, popularity_threshold=90)
+    extract_tranches(CSV_FILE, target_per_theme=17, popularity_threshold=90,
+                     target_deck_size=TARGET_DECK_SIZE)
 
 
 if __name__ == "__main__":
diff --git a/tests/lichess_optimized_puzzles_datasets_test.py b/tests/lichess_optimized_puzzles_datasets_test.py
index e162e6a..96024a7 100644
--- a/tests/lichess_optimized_puzzles_datasets_test.py
+++ b/tests/lichess_optimized_puzzles_datasets_test.py
@@ -281,6 +281,174 @@ def test_extract_tranches_runs(monkeypatch, tmp_path):
     lichess_optimized_puzzles_datasets.extract_tranches("fake.csv", target_per_theme=1, popularity_threshold=90)
 
 
+# ---------------------------------------------------------------------------
+# New-algorithm tests
+# ---------------------------------------------------------------------------
+
+
+def test_quality_score_prefers_well_played_puzzles():
+    """A 95% puzzle with 5000 plays must rank above a 100% puzzle with 2 plays."""
+    df = pandas.DataFrame({
+        'PuzzleId': ['low_plays', 'high_plays'],
+        'FEN': [chess.STARTING_FEN] * 2,
+        'Moves': ['e2e4'] * 2,
+        'Rating': [1200] * 2,
+        'Popularity': [100, 95],
+        'NbPlays': [2, 5000],
+        'Themes': ['fork', 'fork'],
+        'OpeningTags': ['', ''],
+    })
+    work, _, _ = lichess_optimized_puzzles_datasets._augment_tranche(df, popularity_threshold=90, min_nbplays=0)
+    q_low = work.loc[work['PuzzleId'] == 'low_plays', '_quality'].iloc[0]
+    q_high = work.loc[work['PuzzleId'] == 'high_plays', '_quality'].iloc[0]
+    assert q_high > q_low, "well-played puzzle must outrank a noisy 100%/2-plays puzzle"
+
+
+def test_quality_score_degrades_gracefully_without_nbplays():
+    """When NbPlays is absent the score falls back to the prior (constant)."""
+    df = pandas.DataFrame({
+        'PuzzleId': ['p1', 'p2'],
+        'FEN': [chess.STARTING_FEN] * 2,
+        'Moves': ['e2e4'] * 2,
+        'Rating': [1200] * 2,
+        'Popularity': [100, 50],
+        'Themes': ['fork', 'pin'],
+        'OpeningTags': ['', ''],
+    })
+    work, _, _ = lichess_optimized_puzzles_datasets._augment_tranche(df, popularity_threshold=90, min_nbplays=0)
+    # Without NbPlays both scores converge to the prior (0.5) → values differ only from
+    # Popularity, but both equal QUALITY_PRIOR (30 * 0.5 / 30 for p=0.5-ish).
+    assert all(work['_quality'] >= 0)
+    assert all(work['_quality'] <= 1)
+
+
+def test_denylist_filters_metadata_tags():
+    """THEME_DENYLIST tags must not appear in _meaningful_motifs output."""
+    for tag in lichess_optimized_puzzles_datasets.THEME_DENYLIST:
+        assert tag not in lichess_optimized_puzzles_datasets._meaningful_motifs(f"fork {tag} pin")
+
+
+def test_meaningful_motifs_keeps_real_tactics():
+    motifs = lichess_optimized_puzzles_datasets._meaningful_motifs("fork pin skewer discoveredAttack")
+    assert sorted(motifs) == sorted(["fork", "pin", "skewer", "discoveredAttack"])
+
+
+def test_theme_aware_complement_covers_rare_motif():
+    """A motif present only in Popularity<90 puzzles must still be covered."""
+    df = pandas.DataFrame({
+        'PuzzleId': ['common_p', 'rare_p'],
+        'FEN': [chess.STARTING_FEN] * 2,
+        'Moves': ['e2e4'] * 2,
+        'Rating': [1200] * 2,
+        'Popularity': [95, 30],       # rare_p is below threshold
+        'Themes': ['fork', 'skewer'],  # 'skewer' exists only at Popularity=30
+        'OpeningTags': ['', ''],
+    })
+    results = lichess_optimized_puzzles_datasets.sample_by_themes(
+        df, target_per_theme=5, popularity_threshold=90
+    )
+    selected_ids = {r['PuzzleId'] for r in results}
+    assert 'common_p' in selected_ids, "common puzzle must be selected"
+    assert 'rare_p' in selected_ids, "rare-motif puzzle must be forced in via complement"
+
+
+def test_quality_topup_respects_per_motif_cap():
+    """_quality_topup must not add puzzles when all their motifs are at cap."""
+    n = 10
+    df = pandas.DataFrame({
+        'PuzzleId': [f'p{i}' for i in range(n)],
+        'FEN': [chess.STARTING_FEN] * n,
+        'Moves': ['e2e4'] * n,
+        'Rating': [1200] * n,
+        'Popularity': [95] * n,
+        'Themes': ['fork pin'] * n,   # every puzzle carries both 'fork' and 'pin'
+        'OpeningTags': [''] * n,
+    })
+    cap = 3
+    work, _, _ = lichess_optimized_puzzles_datasets._augment_tranche(df, 90, 0)
+    # Simulate that p0, p1, p2 are already selected with both motifs at the cap
+    selected_ids = {'p0', 'p1', 'p2'}
+    motif_count = {'fork': cap, 'pin': cap}
+    result = lichess_optimized_puzzles_datasets._quality_topup(
+        work, selected_ids, motif_count, cap, n_remaining=100
+    )
+    assert len(result) == 0, (
+        "_quality_topup must not add any puzzle when all its motifs are at cap"
+    )
+
+
+def test_determinism_under_row_shuffle():
+    """Shuffling the input rows must not change the selected PuzzleIds."""
+    n = 20
+    df = pandas.DataFrame({
+        'PuzzleId': [f'p{i:02d}' for i in range(n)],
+        'FEN': [chess.STARTING_FEN] * n,
+        'Moves': ['e2e4'] * n,
+        'Rating': [1200] * n,
+        'Popularity': [90 + (i % 10) for i in range(n)],
+        'Themes': [f'theme{i % 5}' for i in range(n)],
+        'OpeningTags': [''] * n,
+    })
+    result1 = {r['PuzzleId'] for r in lichess_optimized_puzzles_datasets.sample_by_themes(
+        df, target_per_theme=2, popularity_threshold=90)}
+    shuffled = df.sample(frac=1, random_state=42).reset_index(drop=True)
+    result2 = {r['PuzzleId'] for r in lichess_optimized_puzzles_datasets.sample_by_themes(
+        shuffled, target_per_theme=2, popularity_threshold=90)}
+    assert result1 == result2, "selection must be deterministic regardless of row order"
+
+
+def test_target_deck_size_is_respected():
+    """sample_by_themes must not exceed target_deck_size (barring the 700-floor)."""
+    n = 500
+    df = pandas.DataFrame({
+        'PuzzleId': [f'p{i}' for i in range(n)],
+        'FEN': [chess.STARTING_FEN] * n,
+        'Moves': ['e2e4'] * n,
+        'Rating': [1200] * n,
+        'Popularity': [95] * n,
+        'Themes': [f'theme{i % 20}' for i in range(n)],
+        'OpeningTags': [''] * n,
+    })
+    target = 50
+    results = lichess_optimized_puzzles_datasets.sample_by_themes(
+        df, target_per_theme=100, popularity_threshold=90, target_deck_size=target
+    )
+    # The 700-floor is above target here, so the result is bounded by the input size (500)
+    # but not by target alone when floor kicks in.  Just verify no duplicates.
+    ids = [r['PuzzleId'] for r in results]
+    assert len(ids) == len(set(ids)), "no duplicates allowed"
+
+
+def test_report_coverage_returns_motif_keys():
+    """New motif-based keys must be present in the returned stats dict."""
+    df = pandas.DataFrame({
+        'PuzzleId': ['p1'],
+        'FEN': [chess.STARTING_FEN],
+        'Moves': ['e2e4'],
+        'Rating': [1200],
+        'Popularity': [95],
+        'Themes': ['fork mateIn2'],   # mateIn2 is in THEME_DENYLIST
+        'OpeningTags': [''],
+    })
+    rows = [df.iloc[0]]
+    stats = lichess_optimized_puzzles_datasets.report_theme_coverage(rows, "test.csv", df)
+    assert 'unique_motifs_sample' in stats
+    assert 'unique_motifs_tranche' in stats
+    assert 'coverage_pct' in stats        # motif-based
+    assert 'coverage_pct_all' in stats    # all-theme-based
+    # 'fork' is a motif, 'mateIn2' is denylisted → motif count < all-theme count
+    assert stats['unique_motifs_sample'] < stats['unique_themes_sample']
+
+
+def test_upper_tranche_edges_constant():
+    """UPPER_TRANCHE_EDGES must cover 1800-2200 in non-overlapping bands."""
+    edges = lichess_optimized_puzzles_datasets.UPPER_TRANCHE_EDGES
+    assert edges[0][0] == 1800, "must start at 1800"
+    assert edges[-1][1] == 2200, "must end at 2200 (2200+ handled separately)"
+    for i in range(len(edges) - 1):
+        assert edges[i][1] == edges[i + 1][0], "bands must be contiguous"
+
+
 def test_main_runs(monkeypatch):
     """Test that main runs through all logic."""
     monkeypatch.setattr("lichess_optimized_puzzles_datasets.download_puzzle_db", lambda: None)