diff --git a/delphi/docs/CLJ-PARITY-FIXES-JOURNAL.md b/delphi/docs/CLJ-PARITY-FIXES-JOURNAL.md
index dd0f17bb8..6dedd4454 100644
--- a/delphi/docs/CLJ-PARITY-FIXES-JOURNAL.md
+++ b/delphi/docs/CLJ-PARITY-FIXES-JOURNAL.md
@@ -1406,5 +1406,103 @@ sklearn-KMeans-seeding consensus (see scratch/COPILOT_MATH_QUESTIONS.md);
 no values shift at the goldens commit until D10/D11/D12 land.
 
 **Stack position.** New commit inserted between PR 14a (#2564) and D10
-(#2566). D10, D11, D12, goldens rebased cleanly on top — no conflict
-markers in `jj log -r 'edge..@+++++'`.
+(#2566). D10, D11, D12, goldens rebased cleanly on top; 2-sided docs
+conflicts in PLAN/JOURNAL at each downstream commit resolved manually
+to merge both edits (downstream docs additions kept; ns-PASS row and
+session entry preserved).
+
+## Session: PR 8 — D10 rep comment selection (2026-06-11)
+
+Landed in `/goal` mode (autonomous run targeting D10 + D11 + D12 + goldens
+as a stacked PR series). Decisions made without inline user check are
+documented in `~/polis/D10_D11_D12_GOLDENS_DECISIONS.md` for batch review.
+
+### What landed
+
+**Production code (`delphi/polismath/pca_kmeans_rep/repness.py`)** — added
+3 top-level helpers + `_finalize_row_for_output` + rewrote `select_rep_comments_df`:
+
+- `passes_by_test(s) -> bool` — Clojure `passes-by-test?` (repness.clj:165-170).
+  OR'd on `(rat, pat)` and `(rdt, pdt)` z-sig-90. **NO `pa >= 0.5` gate** —
+  the pre-D10 Python gate was a botched-port over-restriction with no
+  Clojure analog.
+- `beats_best_by_test(s, current_best_z) -> bool` — Clojure `beats-best-by-test?`
+  (repness.clj:133-139). Strict `>` on `max(rat, rdt)` vs current best z.
+- `beats_best_agr(s, current_best) -> bool` — Clojure `beats-best-agr?`
+  (repness.clj:142-162). Four-branch agree-priority logic:
+    1. `na == 0 and nd == 0` → reject.
+    2. Current best AND `current_best['ra'] > 1.0` → compare 4-way signed
+       product `ra * rat * pa * pat`.
+    3. Current best (else, `ra <= 1.0`) → compare `pa * pat`.
+    4. No current best → accept if `z90(pat)` OR `(ra > 1.0 AND pa > 0.5)`.
+- `_finalize_row_for_output(row, *, is_best_agree=False)` — Clojure
+  `finalize-cmt-stats` (repness.clj:173-188) + best-agree flagging
+  (repness.clj:262-264). Adds `best_agree=True` and `n_agree=na` keys for
+  the best-agree slot.
+- `select_rep_comments_df(stats_df, mod_out=None) -> List[Dict[str, Any]]` —
+  single-pass reduce over `stats_df.to_dict('records')` mirroring Clojure
+  `select-rep-comments` (repness.clj:212-281). Per-row state
+  `{sufficient, best, best_agree}` updated by the three helpers; final
+  assembly is dedup-best-agree-from-sufficient → sort by metric →
+  prepend best-agree → take 5 → agrees-before-disagrees.
+
+**Caller (`conv_repness`)** — dropped the `_stats_row_to_dict` wrapping
+step since `select_rep_comments_df` now returns finalized dicts directly.
+
+**Two pre-D10 bugs fixed alongside the rewrite** (research-agent flagged):
+- `pa >= 0.5 / pd >= 0.5` over-gate in the passing filter — removed. No
+  Clojure analog; was dropping legitimate candidates.
+- "Fill-from-other-category" + "first-row" fallback blocks — deleted. The
+  `:best` / `:best_agree` mechanism IS the Clojure fallback.
+
+**Tests** — 18 new tests (in `tests/test_discrepancy_fixes.py`):
+- `TestD10PassesByTest` (4 tests): agree-side significant, disagree-side
+  significant, neither significant, no `pa >= 0.5` gate.
+- `TestD10BeatsBestByTest` (3 tests): None-best, max-rat-rdt, strict `>`.
+- `TestD10BeatsBestAgr` (6 tests): Branch 1 (na=nd=0), Branch 2 (ra>1),
+  Branch 3 (ra<=1), Branch 4 z90(pat), Branch 4 (ra>1 AND pa>0.5), Branch 4
+  rejection.
+- `TestD10SelectRepCommentsBoundary` (5 tests): empty input, single unvoted
+  row → best fallback, sufficient-empty-best-agree-only, take-5 cap with
+  agrees-before-disagrees ordering, **the eviction edge case** (best_agree
+  outside sufficient evicting 5th-highest-metric).
+
+**Re-xfailed with updated reasons** (D14 / D1 upstream divergence):
+- `TestD9ZScoreThresholds::test_z_values_match_clojure`
+- `TestD5ProportionTest::test_pat_values_match_clojure_blob`
+- `TestD6TwoPropTest::test_rat_values_match_clojure_blob`
+- `TestD7RepnessMetric::test_repness_metric_matches_clojure_blob`
+- `TestD8FinalizeStats::test_repful_matches_clojure_blob`
+- `TestD10RepCommentSelection::test_rep_comments_match_clojure`
+
+Why xfailed despite D10 landing: D10 enables shared comments in the
+selection (overlap rises from 0% to ~20% on vw cold_start), but
+per-(gid, tid) stats still mismatch because Python and Clojure put
+different participants in the "same" group ID. That's upstream
+PCA/KMeans group-membership divergence (D14 / D1), not D10. D10 is
+verified via the 18 synthetic helper + boundary tests above.
+
+### Suite delta (pre/post D10)
+
+- Pre (post-14a): 295 passed, 12 skipped, 58 xfailed.
+- Post (this PR): 313 passed, 12 skipped, 58 xfailed.
+- Delta: +18 passed, 0 failed, 0 new xfailed. The +18 matches the 18 new
+  D10 synthetic tests exactly.
+
+### Decisions made autonomously (under `/goal` mode)
+
+See `~/polis/D10_D11_D12_GOLDENS_DECISIONS.md`. Highlights:
+- **S1**: Python convention key names (`repful`, `best_agree`, `n_agree`)
+  instead of Clojure hyphens. Math blob alignment is a future PR.
+- **S2**: `select_rep_comments_df` returns `List[Dict[str, Any]]` instead
+  of `pd.DataFrame` — variable extra keys (best_agree flag) make list-of-
+  dicts cleaner than DF-with-NaN-columns.
+- **D10.1**: Two pre-D10 bugs (pa>=0.5 gate, fill-from-other fallback)
+  folded into D10 rather than separate PRs — the rewrite replaces the
+  function so a surgical fix would be more noise than value.
+- **D10.7**: Real-data blob-comparison tests re-xfailed with reasons
+  pointing at D14/D1, not softened to overlap-thresholds — more honest.
+
+### What's Next
+
+PR 9 (D11) on top of D10 in the same spr stack.
diff --git a/delphi/docs/PLAN_DISCREPANCY_FIXES.md b/delphi/docs/PLAN_DISCREPANCY_FIXES.md
index 162644a66..479eb77e3 100644
--- a/delphi/docs/PLAN_DISCREPANCY_FIXES.md
+++ b/delphi/docs/PLAN_DISCREPANCY_FIXES.md
@@ -28,9 +28,9 @@ This plan's "PR N" labels map to actual GitHub PRs as follows:
 | PR 7 (D8) | #2522 | Stack 15/17 | Fix D8: finalize comment stats |
 | PR 12 (D15) | #2523 | Stack 16/17 | Fix D15: moderation handling |
 | (K-inv) | #2524 | Stack 17/17 | Fix K-means k divergence: preserve row order |
-| PR 14a (scalar deletion) | — (in flight) | — | Delete dead scalar paths in `repness.py`; migrate blob injection tests to vectorized |
-| PR ns-PASS fix | — (planned) | — | Fix `ns` to include PASS votes in `compute_group_comment_stats_df` (Clojure `count-votes` parity) |
-| PR 8 (D10) | — (WIP) | — | Fix D10: rep comment selection — **NEEDS REWORK** |
+| PR 14a (scalar deletion) | #2564 | — | Delete dead scalar paths in `repness.py`; migrate blob injection tests to vectorized |
+| PR ns-PASS fix | — (in flight) | — | Fix `ns` to include PASS votes in `compute_group_comment_stats_df` (Clojure `count-votes` parity) |
+| PR 8 (D10) | — (in flight) | — | Fix D10: rep comment selection — single-pass reduce matching Clojure |
 | PR 9 (D11) | — (WIP) | — | Fix D11: consensus selection — **NEEDS REWORK** |
 | PR 10 (D3) | — (WIP) | — | Fix D3: k-smoother buffer — **NEEDS REWORK** |
 | PR 11 (D12) | — (WIP) | — | Fix D12: comment priorities — **NEEDS REWORK** |
@@ -916,3 +916,14 @@ Tagging this as a follow-up. No code changes until we discuss.
 - **`to_dynamo_dict` parallel inline implementations** were refactored to
   route through the same helpers as `to_dict` in PR #2523 follow-up. No
   further action needed.
+- **D10 take-5 eviction edge case** (2026-06-11). The Clojure-parity
+  `select_rep_comments_df` introduced in PR 8 mirrors Clojure exactly:
+  `take(5)` runs AFTER prepending the `best_agree` slot. When `best_agree`
+  was kept by `beats_best_agr?` as a non-significant agree-priority fallback
+  (i.e. it failed `passes_by_test?` but qualified via Branch 4) AND
+  `:sufficient` already has 5 entries, the prepend pushes the total to 6
+  and `take(5)` silently evicts the 5th-highest-metric `sufficient` entry
+  — possibly a strong dissenting view. Mirrored for blob parity; flag for
+  future product review. See `# TODO(parity-eviction)` in
+  `delphi/polismath/pca_kmeans_rep/repness.py::select_rep_comments_df` and
+  the synthetic test `TestD10SelectRepCommentsBoundary::test_take_5_eviction_when_best_agree_outside_sufficient`.
diff --git a/delphi/polismath/pca_kmeans_rep/repness.py b/delphi/polismath/pca_kmeans_rep/repness.py
index f9fe1113a..0c2255227 100644
--- a/delphi/polismath/pca_kmeans_rep/repness.py
+++ b/delphi/polismath/pca_kmeans_rep/repness.py
@@ -7,7 +7,7 @@
 
 import numpy as np
 import pandas as pd
-from typing import Any, Dict, List
+from typing import Any, Dict, Iterable, List, Optional, Tuple
 
 from polismath.utils.general import AGREE, DISAGREE
 
@@ -344,106 +344,289 @@ def compute_group_comment_stats_df(votes_long: pd.DataFrame,
     return stats_df
 
 
-def select_rep_comments_df(stats_df: pd.DataFrame,
-                           agree_count: int = 3,
-                           disagree_count: int = 2) -> pd.DataFrame:
+# =============================================================================
+# D10: Selection helpers (Clojure parity for select-rep-comments)
+# =============================================================================
+#
+# Ports of Clojure's `select-rep-comments` and its three predicates from
+# math/src/polismath/math/repness.clj:133-281. Operate on per-(group, comment)
+# dict rows produced by `compute_group_comment_stats_df` (via to_dict('records')).
+# Per-row dict ops + small per-group iteration (rather than vectorized) because
+# `beats_best_agr` has a 4-branch decision against a moving "current best"
+# that updates during iteration; vectorizing would require multiple passes
+# without saving lines (per-group N typically <500 comments).
+
+
+def passes_by_test(s: Dict[str, Any]) -> bool:
+    """
+    Clojure passes-by-test? (repness.clj:165-170).
+
+    True iff the agree side OR the disagree side passes z-sig-90 on BOTH
+    the proportion test (pat/pdt) and the representativeness test (rat/rdt).
+    No probability gate — pre-D10 Python's `pa >= 0.5` gate was a Python-only
+    over-restriction without a Clojure analog.
+
+        (or (and (z-sig-90? rat) (z-sig-90? pat))
+            (and (z-sig-90? rdt) (z-sig-90? pdt)))
+    """
+    return (
+        (z_score_sig_90(s['rat']) and z_score_sig_90(s['pat']))
+        or (z_score_sig_90(s['rdt']) and z_score_sig_90(s['pdt']))
+    )
+
+
+def beats_best_by_test(s: Dict[str, Any], current_best_z: Optional[float]) -> bool:
     """
-    Select representative comments for a single group from a DataFrame.
+    Clojure beats-best-by-test? (repness.clj:133-139).
 
-    NOTE (PR 14a / D10): this is the current Python selection logic — a
-    botched port that does not match Clojure's `select-rep-comments`
-    (math/src/polismath/math/repness.clj:212-281). D10 will replace it with
-    a single-pass reduce matching Clojure (up to 5 total, agrees-first,
-    best-agree priority slot). Until then, this path is preserved as-is.
+    True if `s` has a more-representative max(rat, rdt) than `current_best_z`,
+    OR if there is no current best yet. Strict `>` (Clojure: `>`).
+
+        (or (nil? current-best-z)
+            (> (max rat rdt) current-best-z))
+    """
+    if current_best_z is None:
+        return True
+    return max(s['rat'], s['rdt']) > current_best_z
+
+
+def beats_best_agr(s: Dict[str, Any],
+                   current_best: Optional[Dict[str, Any]]) -> bool:
+    """
+    Clojure beats-best-agr? (repness.clj:142-162).
+
+    Four mutually exclusive branches:
+
+    1. `na == 0 and nd == 0`: reject. Comments with no votes never enter the
+       best-agree slot (Clojure: `(= 0 na nd)` → false).
+    2. `current_best` exists AND `current_best['ra'] > 1.0`: compare the
+       4-way signed product `ra * rat * pa * pat`. New row must beat the
+       current best on this product.
+    3. `current_best` exists (else, i.e. `current_best['ra'] <= 1.0`):
+       compare `pa * pat` only — "shoot for something generally agreed upon"
+       when the current best isn't representative enough.
+    4. No `current_best`: accept if `z90(pat)` OR `(ra > 1.0 AND pa > 0.5)`.
+
+    `current_best` here is the RAW stats row (Clojure stores raw at
+    repness.clj:250 so this comparator keeps the `ra/rat/pa/pat` surface).
+    """
+    if s['na'] == 0 and s['nd'] == 0:  # Branch 1.
+        return False
+    if current_best is not None and current_best['ra'] > 1.0:  # Branch 2.
+        return (s['ra'] * s['rat'] * s['pa'] * s['pat']) > (
+            current_best['ra'] * current_best['rat']
+            * current_best['pa'] * current_best['pat']
+        )
+    if current_best is not None:  # Branch 3.
+        return (s['pa'] * s['pat']) > (current_best['pa'] * current_best['pat'])
+    # Branch 4.
+    return z_score_sig_90(s['pat']) or (s['ra'] > 1.0 and s['pa'] > 0.5)
+
+
+def _finalize_row_for_output(row: Dict[str, Any], *,
+                             is_best_agree: bool = False) -> Dict[str, Any]:
+    """
+    Format a per-(group, comment) stats row for the final repness output
+    (math blob `repness` / `group_repness`).
+
+    Mirrors Clojure `finalize-cmt-stats` (repness.clj:173-188) plus the
+    best-agree flagging at repness.clj:262-264.
+
+    When `is_best_agree=True`, two extra keys are added:
+        - `best_agree`: True
+        - `n_agree`: the raw `na` (preserves the agree count even when the
+          row is classified as 'disagree' by `rat > rdt`).
+
+    Key naming uses Python convention (underscored). Clojure-style hyphens
+    (`repful-for`, `n-agree`, etc.) are deferred to a future math-blob
+    alignment PR (see PLAN.md "Pending — needs team discussion").
+
+    `agree_metric` / `disagree_metric` are read directly from the row
+    (produced by `compute_group_comment_stats_df`) rather than recomputed.
+    Recomputing here would duplicate the formula at repness.clj:191-193 in
+    two places and risk drift if it ever changes (decision D10.8.3).
+    """
+    repful = 'agree' if row['rat'] > row['rdt'] else 'disagree'
+    finalized: Dict[str, Any] = {
+        'comment_id': row['comment'],
+        'group_id': row['group_id'],
+        'na': int(row['na']),
+        'nd': int(row['nd']),
+        'ns': int(row['ns']),
+        'pa': row['pa'],
+        'pd': row['pd'],
+        'pat': row['pat'],
+        'pdt': row['pdt'],
+        'ra': row['ra'],
+        'rd': row['rd'],
+        'rat': row['rat'],
+        'rdt': row['rdt'],
+        'agree_metric': row['agree_metric'],
+        'disagree_metric': row['disagree_metric'],
+        'repful': repful,
+    }
+    if is_best_agree:
+        finalized['best_agree'] = True
+        finalized['n_agree'] = int(row['na'])
+    return finalized
+
+
+def select_rep_comments_df(stats_df: pd.DataFrame,
+                           mod_out: Optional[Iterable[int]] = None
+                           ) -> Tuple[pd.DataFrame, Optional[Dict[str, Any]]]:
+    """
+    Select representative comments for a single group (Clojure parity).
+
+    Single-pass reduce over the group's (gid, tid) rows, mirroring
+    `select-rep-comments` in math/src/polismath/math/repness.clj:212-281.
+
+    Per-row state {sufficient, best, best_agree}:
+      - `passes_by_test(row)` → append finalized row to `sufficient`.
+      - `:sufficient` still empty AND `beats_best_by_test` → update `best`.
+      - `beats_best_agr(row, best_agree)` → store RAW row as new `best_agree`.
+
+    Final assembly (decision S2 / D10.4):
+      - `sufficient` non-empty: dedup best_agree from sufficient → sort by
+        agree/disagree metric (descending, signed product per repness.clj:191)
+        → take up to 5 (post-prepend → 4 sufficient max) → agrees-before-
+        disagrees on the sufficient slice. The best-agree dict is returned
+        SEPARATELY so the DataFrame stays clean (no NaN best_agree/n_agree
+        columns when the slot is empty).
+      - Else: `(empty_df, best_agree_dict)` if best_agree exists, else
+        `(single_row_df_for_best, None)` if best exists, else
+        `(empty_df, None)`.
 
     Args:
-        stats_df: DataFrame with comment statistics for ONE group
-        agree_count: Number of agreement comments to select
-        disagree_count: Number of disagreement comments to select
+        stats_df: DataFrame with comment statistics for ONE group, schema
+            as produced by `compute_group_comment_stats_df`.
+        mod_out: Optional iterable of tids to exclude (moderated-out comments).
+            Filter applied before the reduce (Clojure repness.clj:222).
 
     Returns:
-        DataFrame of selected representative comments
+        `(rep_df, best_agree_dict)` tuple:
+          - `rep_df`: DataFrame of finalized rep-comment rows in math-blob
+            shape (see `_finalize_row_for_output`) — does NOT include the
+            best-agree slot, and carries NO `best_agree`/`n_agree` columns.
+            Already ordered agrees-before-disagrees and capped so that
+            `len(rep_df) + (1 if best_agree_dict else 0) <= 5`.
+          - `best_agree_dict`: Standalone finalized dict for the best-agree
+            slot (with `best_agree=True` and `n_agree=<na>`), or `None` if
+            no candidate qualified. The caller is responsible for prepending
+            it to the flat output list.
     """
+    empty_df: pd.DataFrame = pd.DataFrame()
+
     if stats_df.empty:
-        return stats_df
-
-    total_wanted = agree_count + disagree_count
-
-    # Best agree: pa > pd and passes significance tests
-    agree_candidates = stats_df[stats_df['pa'] > stats_df['pd']].copy()
-    if not agree_candidates.empty:
-        # Check significance: pat > Z_90 and rat > Z_90
-        passing_agree = agree_candidates[
-            (agree_candidates['pat'] > Z_90) &
-            (agree_candidates['rat'] > Z_90) &
-            (agree_candidates['pa'] >= 0.5)
-        ]
-        if not passing_agree.empty:
-            agree_candidates = passing_agree
-
-    # Best disagree: pd > pa and passes significance tests
-    disagree_candidates = stats_df[stats_df['pd'] > stats_df['pa']].copy()
-    if not disagree_candidates.empty:
-        passing_disagree = disagree_candidates[
-            (disagree_candidates['pdt'] > Z_90) &
-            (disagree_candidates['rdt'] > Z_90) &
-            (disagree_candidates['pd'] >= 0.5)
-        ]
-        if not passing_disagree.empty:
-            disagree_candidates = passing_disagree
-
-    # Sort candidates by metric
-    if not agree_candidates.empty:
-        agree_candidates = agree_candidates.sort_values('agree_metric', ascending=False)
-    if not disagree_candidates.empty:
-        disagree_candidates = disagree_candidates.sort_values('disagree_metric', ascending=False)
-
-    # Select top N from each category
-    selected_parts = []
-
-    if not agree_candidates.empty:
-        top_agree = agree_candidates.head(agree_count).copy()
-        top_agree['repful'] = 'agree'
-        selected_parts.append(top_agree)
-
-    if not disagree_candidates.empty:
-        top_disagree = disagree_candidates.head(disagree_count).copy()
-        top_disagree['repful'] = 'disagree'
-        selected_parts.append(top_disagree)
-
-    if selected_parts:
-        selected = pd.concat(selected_parts, ignore_index=False)
-    else:
-        selected = pd.DataFrame()
-
-    # If we couldn't find enough, try to fill from available candidates
-    # This matches the exact behavior of the old select_rep_comments() function:
-    # - First fallback adds agree_comments[agree_count:min(len, total_wanted)] regardless of
-    #   whether we exceed total_wanted (up to disagree_count more agrees)
-    # - Second fallback only runs if STILL < total_wanted
-    if len(selected) < total_wanted:
-        # Try to add more agree comments
-        # Old code: range(agree_count, min(len(agree_comments), agree_count + disagree_count))
-        if not agree_candidates.empty and len(agree_candidates) > agree_count:
-            extra_limit = min(len(agree_candidates), total_wanted)
-            extra_agrees = agree_candidates.iloc[agree_count:extra_limit].copy()
-            extra_agrees['repful'] = 'agree'
-            selected = pd.concat([selected, extra_agrees], ignore_index=False)
-
-        # Try to add more disagree comments (only if still not enough)
-        # Old code: range(disagree_count, min(len(disagree_comments), agree_count + disagree_count))
-        if len(selected) < total_wanted and not disagree_candidates.empty and len(disagree_candidates) > disagree_count:
-            extra_limit = min(len(disagree_candidates), total_wanted)
-            extra_disagrees = disagree_candidates.iloc[disagree_count:extra_limit].copy()
-            extra_disagrees['repful'] = 'disagree'
-            selected = pd.concat([selected, extra_disagrees], ignore_index=False)
-
-    # Fallback: if still empty, take first row
-    if selected.empty and not stats_df.empty:
-        selected = stats_df.head(1).copy()
-        selected['repful'] = selected['repful'].iloc[0] if 'repful' in selected.columns else 'agree'
-
-    return selected
+        return empty_df, None
+
+    mod_out_set = set(mod_out) if mod_out else set()
+    sufficient: List[Dict[str, Any]] = []
+    best: Optional[Dict[str, Any]] = None
+    # Track best's max(rat, rdt) as a sidecar scalar so we never have to mutate
+    # `best` itself with synthetic comparison keys. Avoids the leak/pop dance
+    # of stashing a `_max_rt` inside the finalized dict (decision D10.8.4).
+    best_max_rt: Optional[float] = None
+    best_agree: Optional[Dict[str, Any]] = None
+
+    # Sort by `comment` (tid) ascending BEFORE iterating, so ties in
+    # `beats_best_by_test` (max(rat, rdt) tied) and in `beats_best_agr`
+    # (Branch 2/3 product tied) resolve deterministically. The chosen order
+    # matches Clojure's named-matrix column iteration: after normalization
+    # the columns are insertion-ordered, and for cold-start that's tid
+    # ascending (see Clojure named_matrix.clj:130-131 — insertion order).
+    # All Clojure beats-*? predicates use strict `>` so the FIRST row at a
+    # tied score wins; sorting ascending here mirrors that (decision D10.8.1).
+    iter_df = stats_df.sort_values('comment', kind='mergesort')
+
+    for row in iter_df.to_dict('records'):
+        if row['comment'] in mod_out_set:
+            continue
+        if passes_by_test(row):
+            sufficient.append(_finalize_row_for_output(row))
+        # Update `best` only while sufficient is still empty (Clojure parity).
+        if not sufficient:
+            if beats_best_by_test(row, best_max_rt):
+                best = _finalize_row_for_output(row)
+                best_max_rt = max(row['rat'], row['rdt'])
+        # `best_agree` stores RAW row (Clojure repness.clj:250) so subsequent
+        # `beats_best_agr` calls keep the ra/rat/pa/pat surface.
+        if beats_best_agr(row, best_agree):
+            best_agree = row
+
+    # Build the standalone best-agree dict (or None) once — used in every
+    # assembly branch below.
+    best_agree_dict: Optional[Dict[str, Any]] = (
+        _finalize_row_for_output(best_agree, is_best_agree=True)
+        if best_agree is not None else None
+    )
+
+    # Assembly.
+    if not sufficient:
+        if best_agree_dict is not None:
+            # Best-agree slot returned separately; rep_df stays empty.
+            return empty_df, best_agree_dict
+        if best is not None:
+            return pd.DataFrame([best]), None
+        return empty_df, None
+
+    # Sufficient non-empty path.
+    best_agree_tid = best_agree['comment'] if best_agree is not None else None
+    # Dedup best_agree from sufficient (caller will re-prepend it).
+    deduped = [s for s in sufficient if s['comment_id'] != best_agree_tid]
+
+    # Sort each row by its winning-side metric (signed product, per Clojure
+    # repness.clj:191-193). Clojure (repness.clj:191-200) sorts by a single
+    # `:repness-metric` field that `finalize-cmt-stats` populates per the
+    # winning side. We achieve equivalent ranking by reading `agree_metric`
+    # for repful=='agree' rows and `disagree_metric` otherwise — same
+    # comparator value, just a different key per row (decision D10.8.2).
+    def _sort_key(s: Dict[str, Any]) -> float:
+        return s['agree_metric'] if s['repful'] == 'agree' else s['disagree_metric']
+    deduped.sort(key=_sort_key, reverse=True)
+
+    # TODO(parity-eviction): the cap of 5 INCLUDING the best-agree slot can
+    # evict the 5th-highest-metric `sufficient` entry — a strong dissenting
+    # view may be silently dropped by a weak agree-priority one. Mirrors
+    # Clojure exactly for parity; flagged in PLAN.md
+    # "Pending — needs team discussion".
+    cap = 5 - (1 if best_agree_dict is not None else 0)
+    capped = deduped[:cap]
+
+    # agrees-before-disagrees (Clojure repness.clj:203-209). Stable partition.
+    agrees = [c for c in capped if c['repful'] == 'agree']
+    disagrees = [c for c in capped if c['repful'] == 'disagree']
+    rep_df = pd.DataFrame(agrees + disagrees)
+
+    return rep_df, best_agree_dict
+
+
+def _assemble_rep_comments(stats_df: pd.DataFrame,
+                           mod_out: Optional[Iterable[int]] = None
+                           ) -> List[Dict[str, Any]]:
+    """Thin wrapper around `select_rep_comments_df` that returns the flat
+    output list (best-agree slot prepended, then the DataFrame's rows,
+    then re-partitioned agrees-before-disagrees so a `repful='disagree'`
+    best-agree slot lands in the disagrees section as in pre-S2).
+
+    Decision S2: `select_rep_comments_df` returns a `(rep_df, best_agree_dict)`
+    tuple so the DataFrame stays clean (no NaN extra-key columns). Most
+    callers — including `conv_repness` and the D10 synthetic tests — want
+    the flat List[Dict] form, so we keep one place that does the prepend
+    and the final agrees-before-disagrees stable partition.
+    """
+    rep_df, best_agree_dict = select_rep_comments_df(stats_df, mod_out=mod_out)
+    head: List[Dict[str, Any]] = [best_agree_dict] if best_agree_dict is not None else []
+    tail: List[Dict[str, Any]] = (
+        rep_df.to_dict('records') if not rep_df.empty else []
+    )
+    combined = head + tail
+    # Re-run agrees-before-disagrees stable partition so the best-agree slot
+    # ends up in the correct section per its own `repful`. This mirrors the
+    # pre-S2 behaviour where best_agree was prepended into a single list and
+    # then partitioned (Clojure repness.clj:203-209).
+    agrees = [c for c in combined if c['repful'] == 'agree']
+    disagrees = [c for c in combined if c['repful'] == 'disagree']
+    return agrees + disagrees
 
 
 def select_consensus_comments_df(stats_df: pd.DataFrame,
@@ -598,10 +781,12 @@ def conv_repness(vote_matrix_df: pd.DataFrame, group_clusters: List[Dict[str, An
             continue
 
         try:
-            rep_df = select_rep_comments_df(group_stats)
-            # Convert to list of dicts only at the end
-            rep_comments = [_stats_row_to_dict(row) for _, row in rep_df.iterrows()]
-            result['group_repness'][group_id] = rep_comments
+            # `select_rep_comments_df` now returns `(rep_df, best_agree_dict)`
+            # (decision S2) so the DataFrame stays clean. Use the
+            # `_assemble_rep_comments` wrapper to get the flat List[Dict]
+            # the math blob expects (best-agree prepended, agrees-before-
+            # disagrees partition applied).
+            result['group_repness'][group_id] = _assemble_rep_comments(group_stats)
         except Exception as e:
             print(f"Error selecting representative comments for group {group_id}: {e}")
             result['group_repness'][group_id] = []
diff --git a/delphi/tests/test_discrepancy_fixes.py b/delphi/tests/test_discrepancy_fixes.py
index f6d4714f0..eefd11e69 100644
--- a/delphi/tests/test_discrepancy_fixes.py
+++ b/delphi/tests/test_discrepancy_fixes.py
@@ -42,6 +42,12 @@
     z_score_sig_95,
     prop_test_vectorized,
     two_prop_test_vectorized,
+    # D10 selection helpers (PR 8)
+    passes_by_test,
+    beats_best_by_test,
+    beats_best_agr,
+    select_rep_comments_df,
+    _assemble_rep_comments,
 )
 from polismath.regression import get_dataset_files, get_blob_variants
 from polismath.regression.datasets import discover_datasets
@@ -617,7 +623,13 @@ def test_significance_sets_match_clojure(self, conv, clojure_blob, dataset_name)
         check.equal(len(mismatches), 0,
                     f"{len(mismatches)} groups differ in selected rep comments")
 
-    @pytest.mark.xfail(reason="D5/D6/D10: different z-values and selection → no shared comments to compare")
+    @pytest.mark.xfail(reason="gid 0↔1 label swap + group-membership divergence on cold_start "
+                              "(per workflow Investigation C 2026-06-11 — PR #2524 D14 verified "
+                              "only k count, not per-(gid, tid) memberships). D10 unlocks shared "
+                              "comments (overlap ~20%) but per-(gid, tid) z-values still differ "
+                              "because the swapped/divergent groups contain different participants. "
+                              "Fix requires canonical-group-id sorting or set-based comparison "
+                              "infrastructure.")
     def test_z_values_match_clojure(self, conv, clojure_blob, dataset_name):
         """Z-score values for shared rep comments should match Clojure.
 
@@ -753,7 +765,13 @@ def test_clojure_pat_values_consistent_with_formula(self, clojure_blob, dataset_
         print(f"[{dataset_name}] pat consistency: {total - mismatches}/{total} match formula (max_diff={max_diff:.4f})")
         check.equal(mismatches, 0, f"Clojure p-test values don't match formula for {mismatches}/{total}")
 
-    @pytest.mark.xfail(reason="D5/D10: prop test formula differs + no shared comments")
+    @pytest.mark.xfail(reason="gid 0↔1 label swap + group-membership divergence on cold_start "
+                              "(per workflow Investigation C 2026-06-11 — PR #2524 D14 verified "
+                              "only k count, not per-(gid, tid) memberships). D10 unlocks shared "
+                              "comments but per-(gid, tid) pat values still differ because the "
+                              "swapped/divergent groups contain different participants. Fix "
+                              "requires canonical-group-id sorting or set-based comparison "
+                              "infrastructure.")
     def test_pat_values_match_clojure_blob(self, conv, clojure_blob, dataset_name):
         """p-test (Clojure) vs pat (Python) for shared rep comments."""
         clojure_repness = clojure_blob.get('repness', {})
@@ -868,7 +886,13 @@ def test_two_prop_test_pseudocount_effect(self):
         check.greater(abs(result_large), abs(result_small),
                       "Large samples should produce more extreme z-scores than small ones")
 
-    @pytest.mark.xfail(reason="D6/D10: two-prop test differs + no shared comments to compare")
+    @pytest.mark.xfail(reason="gid 0↔1 label swap + group-membership divergence on cold_start "
+                              "(per workflow Investigation C 2026-06-11 — PR #2524 D14 verified "
+                              "only k count, not per-(gid, tid) memberships). D10 unlocks shared "
+                              "comments but per-(gid, tid) rat values still differ because the "
+                              "swapped/divergent groups contain different participants. Fix "
+                              "requires canonical-group-id sorting or set-based comparison "
+                              "infrastructure.")
     def test_rat_values_match_clojure_blob(self, conv, clojure_blob, dataset_name):
         """repness-test (Clojure) vs rat (Python) for shared rep comments.
 
@@ -948,7 +972,13 @@ def test_metric_formula_is_product(self):
         check.almost_equal(disagree_metric, 0.189, abs=1e-10,
                             msg=f"disagree_metric (0.7 * -0.9 * 0.2 * -1.5) = 0.189, got {disagree_metric}")
 
-    @pytest.mark.xfail(reason="D7/D10: metric formula differs + no shared comments")
+    @pytest.mark.xfail(reason="gid 0↔1 label swap + group-membership divergence on cold_start "
+                              "(per workflow Investigation C 2026-06-11 — PR #2524 D14 verified "
+                              "only k count, not per-(gid, tid) memberships). D10 unlocks shared "
+                              "comments but per-(gid, tid) repness metrics still differ because "
+                              "the swapped/divergent groups contain different participants. Fix "
+                              "requires canonical-group-id sorting or set-based comparison "
+                              "infrastructure.")
     def test_repness_metric_matches_clojure_blob(self, conv, clojure_blob, dataset_name):
         """repness (Clojure) vs agree/disagree_metric (Python) for shared comments."""
         clojure_repness = clojure_blob.get('repness', {})
@@ -1025,7 +1055,13 @@ def test_repful_classification_boundary(self):
             f"{len(mismatches)}/{len(cases)} repful mismatches:\n"
             + mismatches.to_string(index=False))
 
-    @pytest.mark.xfail(reason="D8/D10: repful logic differs + no shared comments")
+    @pytest.mark.xfail(reason="gid 0↔1 label swap + group-membership divergence on cold_start "
+                              "(per workflow Investigation C 2026-06-11 — PR #2524 D14 verified "
+                              "only k count, not per-(gid, tid) memberships). D10 unlocks shared "
+                              "comments but per-(gid, tid) repful labels still differ because the "
+                              "swapped/divergent groups contain different participants. Fix "
+                              "requires canonical-group-id sorting or set-based comparison "
+                              "infrastructure.")
     def test_repful_matches_clojure_blob(self, conv, clojure_blob, dataset_name):
         """repful-for (Clojure) vs repful (Python) for shared rep comments."""
         clojure_repness = clojure_blob.get('repness', {})
@@ -1071,7 +1107,13 @@ class TestD10RepCommentSelection:
          Clojure selects up to 5 total, agrees first, with beats-best-by-test logic
     """
 
-    @pytest.mark.xfail(reason="D10: Different selection logic than Clojure")
+    @pytest.mark.xfail(reason="gid 0↔1 label swap + group-membership divergence on cold_start "
+                              "(per workflow Investigation C 2026-06-11 — PR #2524 D14 verified "
+                              "only k count, not per-(gid, tid) memberships) prevents exact "
+                              "set match. D10 selection logic verified by TestD10PassesByTest, "
+                              "TestD10BeatsBestByTest, TestD10BeatsBestAgr, "
+                              "TestD10SelectRepCommentsBoundary. Fix requires canonical-group-id "
+                              "sorting or set-based comparison infrastructure.")
     def test_rep_comments_match_clojure(self, conv, clojure_blob, dataset_name):
         """Selected representative comments per group should match Clojure."""
         clojure_repness = clojure_blob.get('repness', {})
@@ -1104,6 +1146,482 @@ def test_rep_comments_match_clojure(self, conv, clojure_blob, dataset_name):
                      f"Only {matching_groups}/{total_groups} groups have matching rep comments")
 
 
+# ----------------------------------------------------------------------------
+# D10 — Synthetic unit tests for the new selection helpers
+# ----------------------------------------------------------------------------
+#
+# Pin the Clojure-parity semantics of `passes_by_test`, `beats_best_by_test`,
+# `beats_best_agr`, and `select_rep_comments_df`. Synthetic 1-group fixtures
+# only — no real datasets, no Clojure blob dependency.
+#
+# References:
+#   - Clojure `select-rep-comments`: math/src/polismath/math/repness.clj:212-281
+#   - Helpers `passes-by-test?` :165, `beats-best-by-test?` :133,
+#     `beats-best-agr?` :142, `finalize-cmt-stats` :173, `repness-metric` :191.
+# ----------------------------------------------------------------------------
+
+
+def _stats_row(tid, na, nd, pa, pd_, pat, pdt, ra, rd, rat, rdt, *, ns=None,
+               agree_metric=None, disagree_metric=None, repful=None, group_id=0):
+    """Build a single stats DataFrame row matching the schema produced by
+    `compute_group_comment_stats_df`. Defaults derived per Clojure recipe."""
+    if ns is None:
+        ns = na + nd
+    if agree_metric is None:
+        agree_metric = ra * rat * pa * pat
+    if disagree_metric is None:
+        disagree_metric = rd * rdt * pd_ * pdt
+    if repful is None:
+        repful = 'agree' if rat > rdt else 'disagree'
+    return {
+        'group_id': group_id, 'comment': tid,
+        'na': na, 'nd': nd, 'ns': ns,
+        'pa': pa, 'pd': pd_,
+        'pat': pat, 'pdt': pdt,
+        'ra': ra, 'rd': rd,
+        'rat': rat, 'rdt': rdt,
+        'agree_metric': agree_metric, 'disagree_metric': disagree_metric,
+        'repful': repful,
+    }
+
+
+class TestD10PassesByTest:
+    """`passes-by-test?` (repness.clj:165) — OR'd on (rat, pat) and (rdt, pdt).
+
+    NO probability threshold (`pa >= 0.5` was a Python-only over-restriction
+    in the pre-D10 botched port — Clojure has no such gate)."""
+
+    def test_agree_side_significant_passes(self):
+        row = _stats_row(1, na=8, nd=2, pa=0.75, pd_=0.25, pat=2.0, pdt=-2.0,
+                         ra=2.0, rd=0.5, rat=2.0, rdt=-2.0)  # rat,pat > Z_90
+        assert passes_by_test(row)
+
+    def test_disagree_side_significant_passes(self):
+        row = _stats_row(2, na=2, nd=8, pa=0.25, pd_=0.75, pat=-2.0, pdt=2.0,
+                         ra=0.5, rd=2.0, rat=-2.0, rdt=2.0)  # rdt,pdt > Z_90
+        assert passes_by_test(row)
+
+    def test_neither_side_significant_fails(self):
+        row = _stats_row(3, na=5, nd=5, pa=0.5, pd_=0.5, pat=0.5, pdt=0.5,
+                         ra=1.0, rd=1.0, rat=0.5, rdt=0.5)
+        assert not passes_by_test(row)
+
+    def test_no_pa_threshold_gate(self):
+        """Pre-D10 Python added `pa >= 0.5` — Clojure has no such gate. A row
+        with pa=0.4 that's otherwise significant on the agree side must pass."""
+        row = _stats_row(4, na=4, nd=6, pa=0.42, pd_=0.58, pat=2.0, pdt=-2.0,
+                         ra=2.0, rd=0.5, rat=2.0, rdt=-2.0)
+        assert passes_by_test(row), "no pa>=0.5 gate (Clojure parity)"
+
+
+class TestD10BeatsBestByTest:
+    """`beats-best-by-test?` (repness.clj:133) — max(rat, rdt) > current_best_z."""
+
+    def test_none_best_always_beats(self):
+        row = _stats_row(1, na=5, nd=2, pa=0.6, pd_=0.4, pat=1.0, pdt=-1.0,
+                         ra=1.2, rd=0.8, rat=2.0, rdt=0.5)
+        assert beats_best_by_test(row, None)
+
+    def test_max_rat_rdt_used(self):
+        row = _stats_row(1, na=5, nd=2, pa=0.6, pd_=0.4, pat=1.0, pdt=-1.0,
+                         ra=1.2, rd=0.8, rat=2.0, rdt=0.5)
+        # max = 2.0
+        assert beats_best_by_test(row, 1.5)
+        assert not beats_best_by_test(row, 2.5)
+
+    def test_strict_greater_than(self):
+        row = _stats_row(1, na=5, nd=2, pa=0.6, pd_=0.4, pat=1.0, pdt=-1.0,
+                         ra=1.2, rd=0.8, rat=2.0, rdt=0.5)
+        assert not beats_best_by_test(row, 2.0), "strict > (Clojure parity)"
+
+
+class TestD10BeatsBestAgr:
+    """`beats-best-agr?` (repness.clj:142) — 4-branch agree priority logic."""
+
+    def test_na_nd_zero_always_rejected(self):
+        """Branch 1: (= 0 na nd) → false. Unvoted comments excluded from best-agree
+        regardless of stats."""
+        unvoted = _stats_row(1, na=0, nd=0, pa=0.5, pd_=0.5, pat=1.0, pdt=1.0,
+                             ra=1.0, rd=1.0, rat=1.0, rdt=1.0)
+        assert not beats_best_agr(unvoted, None)
+        other = _stats_row(2, na=5, nd=2, pa=0.6, pd_=0.4, pat=1.0, pdt=-1.0,
+                           ra=1.2, rd=0.8, rat=2.0, rdt=0.5)
+        assert not beats_best_agr(unvoted, other)
+
+    def test_branch_2_ra_gt_1_uses_4way_product(self):
+        """Branch 2: current_best AND current_best.ra > 1.0 → compare ra*rat*pa*pat."""
+        big_ra_best = _stats_row(1, na=10, nd=0, pa=0.9, pd_=0.1, pat=2.0, pdt=-2.0,
+                                  ra=2.0, rd=0.5, rat=2.0, rdt=-2.0)
+        # ra*rat*pa*pat = 2.0*2.0*0.9*2.0 = 7.2
+        bigger = _stats_row(2, na=15, nd=0, pa=0.94, pd_=0.06, pat=3.0, pdt=-3.0,
+                             ra=2.5, rd=0.4, rat=2.5, rdt=-2.5)
+        # 2.5*2.5*0.94*3.0 = 17.625 > 7.2
+        smaller = _stats_row(3, na=5, nd=0, pa=0.86, pd_=0.14, pat=1.5, pdt=-1.5,
+                              ra=1.5, rd=0.6, rat=1.5, rdt=-1.5)
+        # 1.5*1.5*0.86*1.5 ≈ 2.9 < 7.2
+        assert beats_best_agr(bigger, big_ra_best)
+        assert not beats_best_agr(smaller, big_ra_best)
+
+    def test_branch_3_ra_le_1_uses_pa_pat_product(self):
+        """Branch 3: current_best AND current_best.ra <= 1.0 → compare pa*pat only."""
+        weak_best = _stats_row(1, na=5, nd=4, pa=0.55, pd_=0.45, pat=1.0, pdt=-1.0,
+                                ra=0.9, rd=1.1, rat=1.0, rdt=-1.0)
+        # pa*pat = 0.55
+        bigger = _stats_row(2, na=6, nd=2, pa=0.7, pd_=0.3, pat=1.2, pdt=-1.2,
+                             ra=1.0, rd=1.0, rat=0.5, rdt=-0.5)
+        # pa*pat = 0.84 > 0.55
+        assert beats_best_agr(bigger, weak_best)
+
+    def test_branch_4_no_best_accepts_via_z_sig_pat(self):
+        """Branch 4 / no current_best: accept if z90(pat) is true."""
+        row = _stats_row(1, na=6, nd=4, pa=0.58, pd_=0.42, pat=1.5, pdt=-1.5,
+                         ra=0.9, rd=1.1, rat=1.0, rdt=-1.0)  # pat=1.5 > Z_90=1.2816
+        assert beats_best_agr(row, None)
+
+    def test_branch_4_no_best_accepts_via_ra_gt_1_and_pa_gt_half(self):
+        """Branch 4 / no current_best: accept if ra > 1.0 AND pa > 0.5
+        (even when pat not significant)."""
+        row = _stats_row(1, na=5, nd=4, pa=0.55, pd_=0.45, pat=0.5, pdt=-0.5,
+                         ra=1.2, rd=0.8, rat=0.5, rdt=-0.5)
+        assert beats_best_agr(row, None)
+
+    def test_branch_4_no_best_rejects_when_neither(self):
+        """Branch 4 / no current_best: reject if neither z90(pat) nor
+        (ra > 1.0 AND pa > 0.5)."""
+        row = _stats_row(1, na=4, nd=5, pa=0.45, pd_=0.55, pat=0.5, pdt=0.5,
+                         ra=0.8, rd=1.2, rat=0.5, rdt=0.5)
+        assert not beats_best_agr(row, None)
+
+
+class TestD10SelectRepCommentsBoundary:
+    """`select_rep_comments_df` Clojure-parity boundaries."""
+
+    def test_empty_input_returns_empty(self):
+        result = _assemble_rep_comments(pd.DataFrame())
+        assert len(result) == 0
+
+    def test_single_unvoted_row_falls_through_to_best(self):
+        """`beats_best_by_test` does NOT filter na=nd=0; only `beats_best_agr`
+        Branch 1 does. So a sole na=nd=0 row still ends up in the `:best`
+        fallback (Clojure parity — repness.clj:244-247). The `:best_agree`
+        slot stays empty (Branch 1 rejects). Output is [best], not [].
+        """
+        rows = [
+            _stats_row(1, na=0, nd=0, pa=0.5, pd_=0.5, pat=0.0, pdt=0.0,
+                       ra=1.0, rd=1.0, rat=0.0, rdt=0.0),
+        ]
+        result = _assemble_rep_comments(pd.DataFrame(rows))
+        assert len(result) == 1
+        assert result[0]['comment_id'] == 1
+        # NOT the best-agree slot (Branch 1 rejected na=nd=0).
+        assert 'best_agree' not in result[0]
+
+    def test_sufficient_empty_best_agree_only(self):
+        """Sufficient empty + best_agree exists → returns [best_agree_finalized]."""
+        # passes_by_test fails (pat=pdt below z90, rat=rdt below z90).
+        # beats_best_agr triggers via Branch 4: z90(pat) is true (pat=1.5).
+        rows = [
+            _stats_row(1, na=6, nd=4, pa=0.58, pd_=0.42, pat=1.5, pdt=-1.5,
+                       ra=0.9, rd=1.1, rat=1.0, rdt=-1.0),
+            # Filler row to make this not trivially the only one — also fails
+            # passes_by_test and beats_best_agr.
+            _stats_row(2, na=3, nd=5, pa=0.4, pd_=0.6, pat=-0.5, pdt=0.5,
+                       ra=0.8, rd=1.2, rat=-0.3, rdt=0.3),
+        ]
+        result = _assemble_rep_comments(pd.DataFrame(rows))
+        assert len(result) == 1
+        # New select_rep_comments_df returns (rep_df, best_agree_dict); the
+        # `_assemble_rep_comments` wrapper returns the flat List[Dict]
+        # (decision S2).
+        row = result[0]
+        assert row['comment_id'] == 1
+        # best_agree flag emitted in Python-convention key naming (decision S1 / Q2).
+        assert row.get('best_agree') is True, "best_agree slot should be flagged"
+        assert row.get('n_agree') == 6, "n_agree should be na from the raw best-agree row"
+
+    def test_take_5_cap_agrees_before_disagrees(self):
+        """7 sufficient candidates (4 agree-passing, 3 disagree-passing).
+        Sort by metric desc → take 5 → agrees-before-disagrees."""
+        rows = [
+            # 4 agree-passing, large to small agree_metric
+            _stats_row(1, na=9, nd=1, pa=0.83, pd_=0.17, pat=2.5, pdt=-2.5,
+                       ra=2.0, rd=0.5, rat=2.5, rdt=-2.5),  # agree_metric ~10.4
+            _stats_row(2, na=8, nd=2, pa=0.75, pd_=0.25, pat=2.0, pdt=-2.0,
+                       ra=1.8, rd=0.55, rat=2.0, rdt=-2.0),  # ~5.4
+            _stats_row(3, na=7, nd=3, pa=0.67, pd_=0.33, pat=1.5, pdt=-1.5,
+                       ra=1.5, rd=0.6, rat=1.5, rdt=-1.5),  # ~2.27
+            _stats_row(4, na=6, nd=4, pa=0.58, pd_=0.42, pat=1.3, pdt=-1.3,
+                       ra=1.3, rd=0.7, rat=1.3, rdt=-1.3),  # ~1.27
+            # 3 disagree-passing, large to small disagree_metric
+            _stats_row(5, na=1, nd=9, pa=0.17, pd_=0.83, pat=-2.5, pdt=2.5,
+                       ra=0.5, rd=2.0, rat=-2.5, rdt=2.5),  # ~10.4
+            _stats_row(6, na=2, nd=8, pa=0.25, pd_=0.75, pat=-2.0, pdt=2.0,
+                       ra=0.55, rd=1.8, rat=-2.0, rdt=2.0),  # ~5.4
+            _stats_row(7, na=3, nd=7, pa=0.33, pd_=0.67, pat=-1.5, pdt=1.5,
+                       ra=0.6, rd=1.5, rat=-1.5, rdt=1.5),  # ~2.27
+        ]
+        result = _assemble_rep_comments(pd.DataFrame(rows))
+        assert len(result) == 5
+        # Agrees-before-disagrees: all agrees precede all disagrees in the output.
+        repful_values = [r['repful'] for r in result]  # List[Dict] per S2
+        last_agree_idx = -1
+        first_disagree_idx = len(repful_values)
+        for i, v in enumerate(repful_values):
+            if v == 'agree':
+                last_agree_idx = i
+            elif v == 'disagree' and first_disagree_idx == len(repful_values):
+                first_disagree_idx = i
+        assert last_agree_idx < first_disagree_idx, \
+            f"agrees must come before disagrees, got order: {repful_values}"
+
+    def test_take_5_eviction_when_best_agree_outside_sufficient(self):
+        """The eviction edge case (flagged in PLAN for future review).
+
+        Sufficient has 5 entries, best_agree is OUTSIDE sufficient (failed
+        passes_by_test). Prepending best_agree pushes total to 6, take(5) drops
+        the lowest-metric sufficient entry.
+
+        Fixture design (subtle):
+          - tid 1: best_agree slot. Fails passes_by_test (rat=1.0, pat=1.0
+            both below Z_90=1.2816). Branch 4 accepts via ra>1.0 AND pa>0.5.
+            Its agree_metric (ra*rat*pa*pat = 0.9) is LARGER than every
+            sufficient row's metric, so subsequent rows can't beat it via
+            Branch 2.
+          - tid 2-6: pass passes_by_test (rat,pat at 1.3 > Z_90), with
+            DECREASING agree_metrics all SMALLER than 0.9, so Branch 2 keeps
+            tid 1 as best_agree throughout.
+
+        Expected: tid 1 prepended, sort gives [tid 5, 4, 3, 2, 6] (desc by
+        agree_metric), take(5) drops tid 6 (smallest metric).
+
+        See PLAN.md "Pending — needs team discussion": take-5 eviction.
+        """
+        rows = [
+            # best_agree slot: fails passes_by_test, qualifies via Branch 4
+            # (ra=1.5>1 AND pa=0.6>0.5). agree_metric = 1.5*1.0*0.6*1.0 = 0.9.
+            _stats_row(1, na=6, nd=4, pa=0.6, pd_=0.4, pat=1.0, pdt=-1.0,
+                       ra=1.5, rd=0.7, rat=1.0, rdt=-1.0),
+            # 5 sufficient rows, each with agree_metric < 0.9.
+            # ra=1.0, rat=1.3, pa=0.5, pat=1.3 → agree_metric = 0.845
+            _stats_row(2, na=5, nd=5, pa=0.5, pd_=0.5, pat=1.3, pdt=-1.3,
+                       ra=1.0, rd=1.0, rat=1.3, rdt=-1.3),
+            # ra=0.9, rat=1.3, pa=0.4, pat=1.3 → agree_metric = 0.609
+            _stats_row(3, na=4, nd=6, pa=0.4, pd_=0.6, pat=1.3, pdt=-1.3,
+                       ra=0.9, rd=1.1, rat=1.3, rdt=-1.3),
+            # ra=0.8, rat=1.3, pa=0.3, pat=1.3 → agree_metric = 0.406
+            _stats_row(4, na=3, nd=7, pa=0.3, pd_=0.7, pat=1.3, pdt=-1.3,
+                       ra=0.8, rd=1.2, rat=1.3, rdt=-1.3),
+            # ra=0.7, rat=1.3, pa=0.2, pat=1.3 → agree_metric = 0.237
+            _stats_row(5, na=2, nd=8, pa=0.2, pd_=0.8, pat=1.3, pdt=-1.3,
+                       ra=0.7, rd=1.3, rat=1.3, rdt=-1.3),
+            # ra=0.6, rat=1.3, pa=0.15, pat=1.3 → agree_metric = 0.152 (smallest, evicted)
+            _stats_row(6, na=1, nd=9, pa=0.15, pd_=0.85, pat=1.3, pdt=-1.3,
+                       ra=0.6, rd=1.4, rat=1.3, rdt=-1.3),
+        ]
+        result = _assemble_rep_comments(pd.DataFrame(rows))
+        assert len(result) == 5
+        tids = [r['comment_id'] for r in result]
+        # best_agree (tid 1) prepended at position 0.
+        assert tids[0] == 1, f"best-agree slot at position 0, got {tids[0]}"
+        # Tid 6 (smallest sufficient metric) evicted.
+        assert 6 not in tids, f"lowest-metric sufficient should be evicted, got {tids}"
+        # Rest are tids 2-5 in some agree-first ordering.
+        assert set(tids[1:]) == {2, 3, 4, 5}, f"expected tids 2-5 to remain, got {tids[1:]}"
+        # best_agree flag on position 0.
+        assert result[0].get('best_agree') is True
+        assert result[0].get('n_agree') == 6  # raw na from tid 1
+
+
+class TestD10TestGaps:
+    """Additional D10 coverage filling gaps identified in decisions D10.8.
+
+    These pin behaviours not previously asserted:
+      - mod_out filtering on the best-agree path.
+      - Deterministic tiebreak (lowest tid wins) on `beats_best_by_test`
+        max(rat,rdt) ties and on the `_sort_key` agree_metric ties.
+      - Disagree-only path through assembly + agrees-before-disagrees no-op.
+      - All-uninformative `ns=0` rows: passes_by_test fails, Branch 1 rejects
+        best_agree; best may still get set via Branch 4 / beats_best_by_test.
+      - Negative-ra rows handled correctly by Branch 2 (signed 4-way product).
+    """
+
+    # --- Deliverable 3: deterministic max(rat, rdt) tiebreak ------------------
+
+    def test_tied_max_rt_uses_deterministic_tiebreak(self):
+        """Two rows with identical max(rat, rdt) — strict `>` means the FIRST
+        iterated row wins. Sorting by `comment` (tid) ascending makes that
+        the LOWER tid (Clojure named-matrix insertion order parity, decision
+        D10.8.1)."""
+        rows = [
+            # Pass through `best` slot (neither passes passes_by_test —
+            # rat/rdt below Z_90), tied max(rat, rdt) = 1.0.
+            # Insert in REVERSE tid order to prove we sort, not just take input order.
+            _stats_row(7, na=4, nd=2, pa=0.55, pd_=0.45, pat=0.5, pdt=-0.5,
+                       ra=1.0, rd=1.0, rat=1.0, rdt=-1.0),
+            _stats_row(3, na=4, nd=2, pa=0.55, pd_=0.45, pat=0.5, pdt=-0.5,
+                       ra=1.0, rd=1.0, rat=1.0, rdt=-1.0),
+        ]
+        result = _assemble_rep_comments(pd.DataFrame(rows))
+        assert len(result) == 1
+        # Tid 3 (lower) wins the `best` slot under the tid-ascending tiebreak.
+        assert result[0]['comment_id'] == 3, \
+            f"lowest-tid wins tied max(rat,rdt); got {result[0]['comment_id']}"
+
+    # --- Deliverable 5: 5 gap tests -------------------------------------------
+
+    def test_mod_out_excludes_best_agree_candidate(self):
+        """A `mod_out` tid that would otherwise own the best-agree slot is
+        filtered before the reduce (Clojure repness.clj:222). The next-best
+        candidate becomes best_agree."""
+        rows = [
+            # tid 1: would-be best_agree (ra=2.0>1, pa=0.8>0.5 → Branch 4 accepts;
+            # strong ra*rat*pa*pat = 2.0*2.0*0.8*2.0 = 6.4 so it dominates Branch 2).
+            _stats_row(1, na=8, nd=2, pa=0.8, pd_=0.2, pat=2.0, pdt=-2.0,
+                       ra=2.0, rd=0.5, rat=2.0, rdt=-2.0),
+            # tid 2: next-best (ra*rat*pa*pat = 1.5*1.5*0.7*1.5 ≈ 2.36).
+            _stats_row(2, na=6, nd=3, pa=0.7, pd_=0.3, pat=1.5, pdt=-1.5,
+                       ra=1.5, rd=0.6, rat=1.5, rdt=-1.5),
+            # tid 3: weaker.
+            _stats_row(3, na=5, nd=4, pa=0.55, pd_=0.45, pat=1.0, pdt=-1.0,
+                       ra=1.1, rd=0.9, rat=1.0, rdt=-1.0),
+        ]
+        df = pd.DataFrame(rows)
+        # Without mod_out: tid 1 wins best_agree.
+        baseline = _assemble_rep_comments(df)
+        baseline_best_agree = next(r for r in baseline if r.get('best_agree'))
+        assert baseline_best_agree['comment_id'] == 1
+        # With tid 1 moderated out: tid 2 must win best_agree, tid 1 absent.
+        result = _assemble_rep_comments(df, mod_out=[1])
+        tids = [r['comment_id'] for r in result]
+        assert 1 not in tids, f"mod_out tid 1 must be excluded, got {tids}"
+        flagged = [r for r in result if r.get('best_agree')]
+        assert len(flagged) == 1, "exactly one best_agree slot"
+        assert flagged[0]['comment_id'] == 2, \
+            f"next-best (tid 2) should become best_agree, got {flagged[0]['comment_id']}"
+
+    def test_tied_agree_metric_in_sort_uses_deterministic_tiebreak(self):
+        """Two `sufficient` rows with identical `agree_metric` resolve
+        deterministically. `list.sort` is stable in CPython, so the lower-tid
+        row (which entered `sufficient` first thanks to the tid-ascending
+        iter sort) appears first after descending sort by metric.
+
+        Decision D10.8.1: lowest tid wins ties."""
+        # Both rows pass passes_by_test (rat,pat at 2.0 > Z_90).
+        # Identical agree_metric: ra*rat*pa*pat is the SAME for both.
+        # ra=1.5, rat=2.0, pa=0.7, pat=2.0 → agree_metric = 4.2 (both).
+        # Insert in REVERSE tid order to prove the deterministic outcome
+        # comes from the sort, not the input order.
+        rows = [
+            _stats_row(9, na=7, nd=3, pa=0.7, pd_=0.3, pat=2.0, pdt=-2.0,
+                       ra=1.5, rd=0.6, rat=2.0, rdt=-2.0),
+            _stats_row(2, na=7, nd=3, pa=0.7, pd_=0.3, pat=2.0, pdt=-2.0,
+                       ra=1.5, rd=0.6, rat=2.0, rdt=-2.0),
+        ]
+        result = _assemble_rep_comments(pd.DataFrame(rows))
+        # 2 sufficient rows; one of them is also best_agree.
+        # Order: best_agree (tid 2, lowest tid wins beats_best_agr ties via
+        # strict-> first-row-wins) prepended, then deduped sufficient (tid 9).
+        tids = [r['comment_id'] for r in result]
+        assert tids[0] == 2, \
+            f"lowest-tid wins tied beats_best_agr Branch 2 product; got {tids}"
+        assert result[0].get('best_agree') is True
+
+    def test_disagree_only_group(self):
+        """All sufficient rows are `repful='disagree'`. Sort works on
+        `disagree_metric`; agrees-before-disagrees partition is a no-op."""
+        rows = [
+            # 3 disagree-passing rows (rdt,pdt > Z_90), descending disagree_metric.
+            _stats_row(1, na=1, nd=9, pa=0.17, pd_=0.83, pat=-2.5, pdt=2.5,
+                       ra=0.5, rd=2.0, rat=-2.5, rdt=2.5),  # disagree_metric ≈ 8.6
+            _stats_row(2, na=2, nd=8, pa=0.25, pd_=0.75, pat=-2.0, pdt=2.0,
+                       ra=0.55, rd=1.8, rat=-2.0, rdt=2.0),  # ≈ 5.4
+            _stats_row(3, na=3, nd=7, pa=0.33, pd_=0.67, pat=-1.5, pdt=1.5,
+                       ra=0.6, rd=1.5, rat=-1.5, rdt=1.5),  # ≈ 2.27
+        ]
+        result = _assemble_rep_comments(pd.DataFrame(rows))
+        # All rows have repful='disagree' (rdt > rat for each).
+        assert len(result) >= 1
+        assert all(r['repful'] == 'disagree' for r in result), \
+            f"all rows should be disagree, got {[r['repful'] for r in result]}"
+        assert len(result) <= 5, "take-5 cap holds"
+        # best_agree may also be present (Branch 4 doesn't require agree side
+        # to dominate — z90(pat) is false here, ra<1 for all, so Branch 4
+        # rejects all candidates and best_agree stays None for all entries).
+        # No row should be flagged as best_agree given the fixture.
+        assert not any(r.get('best_agree') for r in result), \
+            "no row qualifies for best_agree under Branch 4 with ra<1 and pat<Z_90"
+
+    def test_all_ns_zero_uninformative_rows(self):
+        """Every row has ns=0 → pa=pd=0.5 (uninformative). No row passes
+        passes_by_test (pat,pdt collapse to 1.0 via prop_test n=0 shortcut,
+        below Z_90=1.2816). `beats_best_agr` Branch 1 rejects all
+        (na=nd=0), so best_agree stays None.
+
+        `best` MAY get set via `beats_best_by_test` (no na=nd=0 guard there).
+        Output length is 0 (if no row passes either gate) or 1 (the best
+        fallback). With max(rat,rdt) > None=True on first row, best gets set,
+        so output is exactly [best]."""
+        # Build via _stats_row but override pat/pdt to match the n=0 collapse:
+        # prop_test_vectorized(0, 0) = 2*sqrt(1)*(1/1 - 0.5) = 1.0 < Z_90.
+        rows = [
+            _stats_row(1, na=0, nd=0, pa=0.5, pd_=0.5, pat=1.0, pdt=1.0,
+                       ra=1.0, rd=1.0, rat=0.5, rdt=0.5, ns=0),
+            _stats_row(2, na=0, nd=0, pa=0.5, pd_=0.5, pat=1.0, pdt=1.0,
+                       ra=1.0, rd=1.0, rat=0.3, rdt=0.4, ns=0),
+        ]
+        result = _assemble_rep_comments(pd.DataFrame(rows))
+        # passes_by_test fails (pat=1.0<Z_90, pdt=1.0<Z_90, rat<Z_90, rdt<Z_90)
+        # → sufficient empty.
+        # beats_best_agr Branch 1 rejects every row (na=nd=0)
+        # → best_agree stays None.
+        # beats_best_by_test fills `best` (no na=nd=0 guard).
+        # → output is exactly [best], length 1, no best_agree flag.
+        assert len(result) in (0, 1), \
+            f"output length must be 0 or 1, got {len(result)}"
+        # Under the current logic best gets set, so we expect 1 with no flag.
+        assert len(result) == 1
+        assert 'best_agree' not in result[0], \
+            "Branch 1 rejected na=nd=0 from best_agree, so no flag"
+
+    def test_branch_2_handles_negative_ra_correctly(self):
+        """Branch 2 (current_best.ra > 1.0) compares the SIGNED 4-way product
+        `ra * rat * pa * pat`. A candidate with negative `ra` and negative
+        `rat` produces a positive product that can beat the current best,
+        while a candidate with single negative factor produces a negative
+        product that cannot."""
+        # Set up so iteration order: tid 1 (current_best), tid 2 (negative
+        # single factor, should NOT beat), tid 3 (two negatives → positive,
+        # should beat ONLY if its product is larger).
+        current_best_row = _stats_row(
+            1, na=8, nd=2, pa=0.8, pd_=0.2, pat=2.0, pdt=-2.0,
+            ra=2.0, rd=0.5, rat=2.0, rdt=-2.0)
+        # current_best product = 2.0*2.0*0.8*2.0 = 6.4.
+
+        # Single negative factor → negative product → loses on strict >.
+        single_neg = _stats_row(
+            2, na=1, nd=1, pa=0.5, pd_=0.5, pat=1.0, pdt=1.0,
+            ra=-0.5, rd=1.0, rat=1.0, rdt=1.0)
+        # product = -0.5*1.0*0.5*1.0 = -0.25 < 6.4 → does NOT beat.
+        assert not beats_best_agr(single_neg, current_best_row), \
+            "single negative factor → negative product loses Branch 2"
+
+        # Two negatives → positive product. Make it LARGER than 6.4.
+        # ra=-5.0, rat=-2.0, pa=0.9, pat=2.0 → -5 * -2 * 0.9 * 2 = 18.0 > 6.4.
+        two_neg = _stats_row(
+            3, na=5, nd=5, pa=0.9, pd_=0.1, pat=2.0, pdt=-2.0,
+            ra=-5.0, rd=0.2, rat=-2.0, rdt=-2.0)
+        assert beats_best_agr(two_neg, current_best_row), \
+            "two negative factors → positive 18.0 > 6.4 wins Branch 2"
+
+        # Two negatives but product NOT larger → loses.
+        two_neg_small = _stats_row(
+            4, na=1, nd=1, pa=0.5, pd_=0.5, pat=1.0, pdt=1.0,
+            ra=-1.0, rd=1.0, rat=-1.0, rdt=1.0)
+        # product = -1 * -1 * 0.5 * 1 = 0.5 < 6.4 → does NOT beat.
+        assert not beats_best_agr(two_neg_small, current_best_row), \
+            "two negatives but small positive product (0.5) still loses to 6.4"
+
+
 # ============================================================================
 # D11 — Consensus Comment Selection
 # ============================================================================