diff --git a/README.md b/README.md
index 7052c65..8331166 100644
--- a/README.md
+++ b/README.md
@@ -9,35 +9,198 @@ potential hallucinations.
 
 > Course project · MICS · Principles of Software Development (S2 2025–26).
 
-## Status
+---
 
-Scaffold. Features land via pull requests — see the open issues and `docs/`.
+## Contents
 
-## Documentation
+- [What SumLens does](#what-sumlens-does)
+- [Hardware requirements](#hardware-requirements)
+- [Installation](#installation)
+- [Running the app](#running-the-app)
+- [Usage walkthrough](#usage-walkthrough)
+- [Interpreting the results](#interpreting-the-results)
+- [Exporting results](#exporting-results)
+- [Development](#development)
 
-- [`docs/requirements.md`](docs/requirements.md) — functional / non-functional requirements, MoSCoW, user stories, traceability.
-- [`docs/data-model.md`](docs/data-model.md) — canonical data types.
-- [`docs/research-plan.md`](docs/research-plan.md) — signals, fusion, evaluation methodology.
+---
 
-## Development
+## What SumLens does
+
+SumLens takes a text document (pasted or PDF) and:
+
+1. Summarises it locally using BART (`facebook/bart-large-cnn`) — no external API.
+2. Scores each summary sentence against the source using three signals:
+   - **Signal A — Classifier:** LettuceDetect flags hallucinated tokens.
+   - **Signal B — NLI:** DeBERTa-v3 checks whether atomic claims are entailed by the source.
+   - **Signal C — Attribution:** Inseq integrated gradients measure how much each source span influenced each summary sentence.
+3. Fuses the signals into a single grounding score (0 = hallucinated, 1 = grounded).
+4. Labels each sentence: **grounded**, **weakly grounded**, or **hallucinated**.
+5. Displays the result as a colour-coded summary with click-to-highlight source spans.
+
+---
+
+## Hardware requirements
+
+The pipeline loads three large transformer models. Running on CPU is supported but
+can take several minutes per document.
 
-Requires Python 3.11+.
+| Setup | RAM | VRAM | Expected time |
+|-------|-----|------|---------------|
+| GPU (recommended) | 16 GB | 8 GB+ | ~30–60 s |
+| CPU-only | 16 GB | — | 3–10 min |
+
+Models are downloaded automatically from Hugging Face on first run (~4 GB total).
+No paid API key is required.
+
+---
+
+## Installation
+
+Requires **Python 3.11+** and **Git**.
 
 ```bash
+git clone https://github.com/bacemtayeb/SumLens.git
+cd SumLens
 python3.11 -m venv .venv
+
+# Windows
+.venv\Scripts\activate
+# macOS / Linux
 source .venv/bin/activate
+
 pip install -e ".[dev]"
 python -m nltk.downloader punkt punkt_tab
 ```
 
-### Quality gate (CI enforces this on every PR)
+---
+
+## Running the app
+
+```bash
+python app.py
+```
+
+Gradio prints a local URL, e.g.:
+
+```
+Running on local URL: http://127.0.0.1:7860
+```
+
+Open that URL in your browser. The app runs entirely on your machine — no data
+leaves your computer.
+
+---
+
+## Usage walkthrough
+
+### Step 1 — Load a document
+
+You have two options:
+
+- **Paste text** — click the *Paste text* box and type or paste your document
+  (up to 10 000 words).
+- **Upload a PDF** — click *or upload PDF* and select a file (up to 5 MB).
+
+If you provide both, the PDF takes priority.
+
+### Step 2 — Analyse
+
+Click the **Analyse** button. The button is disabled while the pipeline runs
+(typically 30–60 s on GPU, longer on CPU). Both export buttons appear once
+analysis is complete.
+
+### Step 3 — Read the summary
+
+The right panel shows the summary with each sentence colour-coded:
+
+| Colour | Label | Meaning |
+|--------|-------|---------|
+| Green | Grounded | Well-supported by the source |
+| Orange | Weakly grounded | Partial support; treat with caution |
+| Red | Hallucinated | Low support; likely fabricated or distorted |
+
+### Step 4 — Trace a sentence to the source
+
+Click any summary sentence. The left panel highlights (in yellow) the source
+sentences most strongly attributed to it by the model. Click a different summary
+sentence to switch the highlight.
+
+### Step 5 — Adjust the thresholds (optional)
+
+Two sliders let you change the decision boundaries without re-running the model:
+
+- **τ hallucinated** (default 0.30) — sentences with a grounding score *below*
+  this value are labelled hallucinated.
+- **τ grounded** (default 0.70) — sentences with a grounding score *above* this
+  value are labelled grounded. Anything in between is weakly grounded.
+
+Move either slider and the summary colours update instantly.
+
+---
+
+## Interpreting the results
+
+- The **grounding score** (0–1) represents how strongly the model believes a
+  summary sentence is supported by the source. It is not a probability in a strict
+  statistical sense — treat it as a relative risk indicator.
+- A **hallucinated** label does not guarantee the sentence is wrong; it means the
+  model could not find sufficient evidence in the source text. Always cross-check
+  flagged sentences manually.
+- The **signal breakdown** (JSON export) shows the individual classifier, NLI, and
+  attribution scores for each sentence, which can help diagnose *why* a sentence
+  was flagged.
+
+---
+
+## Exporting results
+
+Two download buttons appear after analysis:
+
+- **Export JSON** — downloads the full `AnalysisResult` as a JSON file. The schema
+  matches `sumlens/types.py` and round-trips via `AnalysisResult.model_validate()`.
+- **Export PDF** — downloads a human-readable PDF containing the colour-annotated
+  summary, a legend, and a per-sentence signal-scores table.
+
+---
+
+## Development
+
+### Quality gate (CI enforces on every PR)
 
 ```bash
-ruff check . && mypy sumlens tests && pytest -q --cov=sumlens --cov-fail-under=70
+ruff check . && mypy sumlens tests app.py && pytest -q --cov=sumlens --cov-fail-under=70
 ```
 
-Lint (ruff), type-check (mypy, strict), and tests with a ≥70% coverage gate must
-pass before any PR is merged to `main`.
+Lint (ruff), strict type-check (mypy), and tests with a ≥ 70 % coverage gate
+must pass before any PR is merged.
+
+### Project layout
+
+```
+sumlens/
+  types.py          # canonical data model (AnalysisResult, etc.)
+  ingest.py         # PDF / text → Document
+  summarise.py      # BART summarisation
+  signals/
+    classifier.py   # Signal A — LettuceDetect
+    nli.py          # Signal B — DeBERTa NLI
+    attribution.py  # Signal C — Inseq attribution
+  fuse.py           # logistic-regression fusion + Platt calibration
+  pipeline.py       # orchestrates ingest → summarise → signals → fuse
+app.py              # Gradio UI entry point
+tests/              # pytest suite (all models mocked)
+docs/               # requirements, data model, research plan
+```
+
+### Documentation
+
+- [`docs/requirements.md`](docs/requirements.md) — functional / non-functional requirements, MoSCoW, user stories, traceability.
+- [`docs/data-model.md`](docs/data-model.md) — canonical data types and JSON schema.
+- [`docs/research-plan.md`](docs/research-plan.md) — signals, fusion, evaluation methodology.
+- [`docs/mockup.html`](docs/mockup.html) — static HTML wireframe of the two-panel dashboard UI (open in any browser).
+- [`docs/use-case.puml`](docs/use-case.puml) — PlantUML use-case diagram (UC-01 "Verify a Summary").
+
+---
 
 ## License
 
diff --git a/app.py b/app.py
new file mode 100644
index 0000000..62c9b63
--- /dev/null
+++ b/app.py
@@ -0,0 +1,344 @@
+"""Gradio entry point — thin UI over `pipeline.analyse`.
+
+All logic lives in the `sumlens` library; this module only ingests the user's
+input, runs the pipeline, and shapes the result for display.
+"""
+
+from __future__ import annotations
+
+import html as _html
+import tempfile
+from pathlib import Path
+from typing import Any
+
+from sumlens.ingest import load_pdf, load_text
+from sumlens.pipeline import analyse
+from sumlens.types import AnalysisConfig, AnalysisResult, Document
+
+_LABEL_COLORS: dict[str, str] = {
+    "grounded": "green",
+    "weak": "orange",
+    "hallucinated": "red",
+}
+_MAX_WORDS = 10_000
+_MAX_PDF_BYTES = 5 * 1024 * 1024  # 5 MB
+_SOURCE_PLACEHOLDER = "<p><em>Load a document to see the source text here.</em></p>"
+
+
+def _validate_text(text: str) -> str:
+    text = text.strip()
+    if not text:
+        raise ValueError("Input is empty. Please paste some text or upload a PDF.")
+    word_count = len(text.split())
+    if word_count > _MAX_WORDS:
+        raise ValueError(
+            f"Input is too long ({word_count:,} words). Maximum is {_MAX_WORDS:,} words."
+        )
+    return text
+
+
+def _to_highlighted(result: AnalysisResult) -> list[tuple[str, str]]:
+    """Summary sentences as (text, label) spans for gr.HighlightedText colour bands."""
+    labels = {v.sentence_id: v.label for v in result.verdicts}
+    return [(f"{s.text} ", labels.get(s.id, "weak")) for s in result.summary.sentences]
+
+
+def _render_source_html(document: Document, highlighted_ids: set[str]) -> str:
+    """Source sentences as HTML; sentences in highlighted_ids get a yellow mark."""
+    if not document.sentences:
+        return f"<p style='line-height:1.8'>{_html.escape(document.raw_text)}</p>"
+    parts = []
+    for sentence in document.sentences:
+        safe = _html.escape(sentence.text)
+        if sentence.id in highlighted_ids:
+            parts.append(
+                f"<mark style='background:#fde68a;border-radius:3px;"
+                f"padding:1px 3px'>{safe}</mark>"
+            )
+        else:
+            parts.append(safe)
+    return "<p style='line-height:1.8'>" + " ".join(parts) + "</p>"
+
+
+def _apply_tau(
+    result: AnalysisResult | None,
+    tau_grounded: float,
+    tau_hallucinated: float,
+) -> list[tuple[str, str]] | None:
+    """Re-label summary sentences from stored fused scores without re-running the model."""
+    if result is None:
+        return None
+    scores = {v.sentence_id: v.fused_score for v in result.verdicts}
+
+    def _label(score: float) -> str:
+        if score < tau_hallucinated:
+            return "hallucinated"
+        if score >= tau_grounded:
+            return "grounded"
+        return "weak"
+
+    return [(f"{s.text} ", _label(scores.get(s.id, 0.5))) for s in result.summary.sentences]
+
+
+def _latin1(text: str) -> str:
+    """Strip characters outside Latin-1 so fpdf2 core fonts don't error."""
+    return text.encode("latin-1", errors="replace").decode("latin-1")
+
+
+_PDF_COLORS: dict[str, tuple[int, int, int]] = {
+    "grounded": (220, 252, 231),
+    "weak": (255, 237, 213),
+    "hallucinated": (254, 226, 226),
+}
+
+
+def _export_pdf(result: AnalysisResult | None) -> str | None:
+    if result is None:
+        return None
+    from fpdf import FPDF
+
+    pdf = FPDF()
+    pdf.add_page()
+
+    pdf.set_font("Helvetica", "B", 16)
+    pdf.cell(0, 10, "SumLens Analysis Report", new_x="LMARGIN", new_y="NEXT")
+    pdf.ln(3)
+
+    pdf.set_font("Helvetica", size=9)
+    pdf.set_text_color(100, 100, 100)
+    pdf.cell(
+        0, 6,
+        _latin1(f"Source: {result.document.source}  |  Model: {result.summary.model_name}"),
+        new_x="LMARGIN", new_y="NEXT",
+    )
+    pdf.set_text_color(0, 0, 0)
+    pdf.ln(4)
+
+    pdf.set_font("Helvetica", "B", 11)
+    pdf.cell(0, 8, "Annotated Summary", new_x="LMARGIN", new_y="NEXT")
+    pdf.ln(1)
+
+    verdict_map = {v.sentence_id: v for v in result.verdicts}
+    pdf.set_font("Helvetica", size=10)
+    for sentence in result.summary.sentences:
+        verdict = verdict_map.get(sentence.id)
+        label = verdict.label if verdict else "weak"
+        r, g, b = _PDF_COLORS.get(label, (240, 240, 240))
+        pdf.set_fill_color(r, g, b)
+        pdf.multi_cell(0, 7, _latin1(sentence.text), fill=True, new_x="LMARGIN", new_y="NEXT")
+        pdf.ln(1)
+
+    pdf.ln(4)
+    pdf.set_font("Helvetica", "B", 10)
+    pdf.cell(0, 7, "Legend", new_x="LMARGIN", new_y="NEXT")
+    pdf.set_font("Helvetica", size=9)
+    for lbl, (r, g, b) in _PDF_COLORS.items():
+        pdf.set_fill_color(r, g, b)
+        pdf.cell(6, 5, "", fill=True)
+        pdf.cell(0, 5, f"  {lbl.capitalize()}", new_x="LMARGIN", new_y="NEXT")
+
+    pdf.ln(5)
+    pdf.set_font("Helvetica", "B", 10)
+    pdf.cell(0, 7, "Signal Scores", new_x="LMARGIN", new_y="NEXT")
+    pdf.set_font("Helvetica", size=8)
+    col_w = [80, 25, 20, 22, 15, 22]
+    headers = ["Sentence", "Label", "Fused", "Classifier", "NLI", "Attribution"]
+    for w, h in zip(col_w, headers, strict=True):
+        pdf.cell(w, 6, h, border=1)
+    pdf.ln()
+    for sentence in result.summary.sentences:
+        v = verdict_map.get(sentence.id)
+        if v is None:
+            continue
+        truncated = sentence.text[:45] + "..." if len(sentence.text) > 45 else sentence.text
+        row = [
+            _latin1(truncated),
+            v.label,
+            f"{v.fused_score:.2f}",
+            f"{v.signals.classifier:.2f}" if v.signals.classifier is not None else "-",
+            f"{v.signals.nli:.2f}" if v.signals.nli is not None else "-",
+            f"{v.signals.attribution:.2f}" if v.signals.attribution is not None else "-",
+        ]
+        for w, cell in zip(col_w, row, strict=True):
+            pdf.cell(w, 6, cell, border=1)
+        pdf.ln()
+
+    tmp = tempfile.NamedTemporaryFile(suffix=".pdf", delete=False)
+    try:
+        pdf.output(tmp.name)
+    finally:
+        tmp.close()
+    return tmp.name
+
+
+def _export_json(result: AnalysisResult | None) -> str | None:
+    if result is None:
+        return None
+    tmp = tempfile.NamedTemporaryFile(
+        mode="w", suffix=".json", delete=False, encoding="utf-8"
+    )
+    try:
+        tmp.write(result.model_dump_json(indent=2))
+    finally:
+        tmp.close()
+    return tmp.name
+
+
+def run(
+    text: str,
+    pdf_file: str | None,
+) -> tuple[AnalysisResult, str, list[tuple[str, str]], dict[str, Any]]:
+    if pdf_file:
+        path = Path(pdf_file)
+        if path.stat().st_size > _MAX_PDF_BYTES:
+            raise ValueError(
+                f"PDF is too large ({path.stat().st_size / 1_048_576:.1f} MB). "
+                "Maximum is 5 MB."
+            )
+        document = load_pdf(path)
+    else:
+        document = load_text(_validate_text(text))
+
+    result = analyse(document, AnalysisConfig())
+    source_html = _render_source_html(document, set())
+    return result, source_html, _to_highlighted(result), result.model_dump()
+
+
+def build_app() -> Any:
+    import gradio as gr
+
+    with gr.Blocks(title="SumLens", theme=gr.themes.Soft()) as demo:
+        gr.Markdown(
+            "# SumLens — Summary Faithfulness Dashboard\n"
+            "Paste text or upload a PDF. SumLens summarises it and flags sentences "
+            "that may be hallucinated.\n\n"
+            "**Green** = grounded · **Orange** = weakly grounded · **Red** = hallucinated  \n"
+            "Click a summary sentence to highlight its attributed source spans."
+        )
+
+        result_state: gr.State = gr.State(value=None)
+
+        with gr.Row():
+            with gr.Column():
+                gr.Markdown("### Source document")
+                source_html_out = gr.HTML(value=_SOURCE_PLACEHOLDER)
+
+            with gr.Column():
+                gr.Markdown("### Summary with faithfulness highlights")
+                summary_out = gr.HighlightedText(
+                    label="Summary (click a sentence to highlight source spans)",
+                    color_map=_LABEL_COLORS,
+                    combine_adjacent=False,
+                    show_legend=True,
+                )
+
+        with gr.Row():
+            tau_h_slider = gr.Slider(
+                minimum=0.0, maximum=1.0, value=0.30, step=0.05,
+                label="τ hallucinated — below this → hallucinated (default 0.30)",
+            )
+            tau_g_slider = gr.Slider(
+                minimum=0.0, maximum=1.0, value=0.70, step=0.05,
+                label="τ grounded — above this → grounded (default 0.70)",
+            )
+
+        with gr.Row():
+            text_in = gr.Textbox(
+                label="Paste text",
+                lines=6,
+                placeholder="Paste your document here…",
+            )
+            pdf_in = gr.File(
+                label="or upload PDF (≤ 5 MB)",
+                file_types=[".pdf"],
+                type="filepath",
+            )
+
+        with gr.Row():
+            submit = gr.Button("Analyse", variant="primary")
+            json_dl = gr.DownloadButton("Export JSON", visible=False)
+            pdf_dl = gr.DownloadButton("Export PDF", visible=False)
+
+        error_box = gr.Markdown(value="", visible=False)
+
+        with gr.Accordion("Full result (JSON viewer)", open=False):
+            json_out = gr.JSON(label="AnalysisResult")
+
+        def _handle(
+            text: str, pdf_file: str | None
+        ) -> tuple[Any, Any, Any, Any, Any, Any, Any, Any]:
+            try:
+                result, source_html, highlighted, payload = run(text, pdf_file)
+                json_path = _export_json(result)
+                pdf_path = _export_pdf(result)
+                return (
+                    result,
+                    source_html,
+                    highlighted,
+                    payload,
+                    gr.update(value=json_path, visible=True),
+                    gr.update(value=pdf_path, visible=True),
+                    gr.update(value="", visible=False),
+                    gr.update(interactive=True),
+                )
+            except ValueError as exc:
+                return (
+                    None,
+                    _SOURCE_PLACEHOLDER,
+                    None,
+                    None,
+                    gr.update(visible=False),
+                    gr.update(visible=False),
+                    gr.update(value=f"**Error:** {exc}", visible=True),
+                    gr.update(interactive=True),
+                )
+
+        def _on_sentence_select(
+            evt: Any, result: AnalysisResult | None
+        ) -> str:
+            if result is None:
+                return _SOURCE_PLACEHOLDER
+            idx: int = int(evt.index)
+            sentences = result.summary.sentences
+            if idx >= len(sentences):
+                return _render_source_html(result.document, set())
+            sentence_id = sentences[idx].id
+            verdict = next(
+                (v for v in result.verdicts if v.sentence_id == sentence_id), None
+            )
+            highlighted = (
+                set(verdict.evidence.top_source_sentence_ids) if verdict else set()
+            )
+            return _render_source_html(result.document, highlighted)
+
+        submit.click(
+            fn=lambda: gr.update(interactive=False),
+            inputs=[],
+            outputs=[submit],
+        ).then(
+            fn=_handle,
+            inputs=[text_in, pdf_in],
+            outputs=[
+                result_state, source_html_out, summary_out,
+                json_out, json_dl, pdf_dl, error_box, submit,
+            ],
+        )
+
+        summary_out.select(
+            fn=_on_sentence_select,
+            inputs=[result_state],
+            outputs=[source_html_out],
+        )
+
+        for slider in (tau_h_slider, tau_g_slider):
+            slider.change(
+                fn=_apply_tau,
+                inputs=[result_state, tau_g_slider, tau_h_slider],
+                outputs=[summary_out],
+            )
+
+    return demo
+
+
+if __name__ == "__main__":
+    build_app().launch()
diff --git a/docs/data-model.md b/docs/data-model.md
index 81e2541..0a0317c 100644
--- a/docs/data-model.md
+++ b/docs/data-model.md
@@ -126,7 +126,7 @@ class AnalysisConfig(BaseModel):
 ## 3. Module interfaces — function signatures
 
 These are the only public functions each module exposes. Anything else is private.
-Stick to these signatures; if Claude Code drifts, point it back here.
+Stick to these signatures; if an implementation drifts, point it back here.
 
 ### `ingest.py`
 ```python
@@ -233,7 +233,7 @@ This CSV is the centrepiece table of the report.
 
 ---
 
-## 6. Order of implementation (Claude Code's worklist)
+## 6. Order of implementation
 
 1. `types.py` — write this first, completely. Everything else imports from it.
 2. `ingest.py` + tests against a fixture PDF and a fixture string.
@@ -248,4 +248,4 @@ This CSV is the centrepiece table of the report.
 11. `scripts/train_fusion.py` — fits the LR, pickles it, replaces the identity fusion.
 12. Real models swapped in last, one signal at a time, verifying on HPC.
 
-**Rule for Claude Code: never run the real models in tests.** Mock at the module boundary. Real-model runs happen only in `scripts/evaluate.py` on HPC.
+**Rule: never run the real models in tests.** Mock at the module boundary. Real-model runs happen only in `scripts/evaluate.py` on HPC.
diff --git a/docs/mockup.html b/docs/mockup.html
new file mode 100644
index 0000000..3679672
--- /dev/null
+++ b/docs/mockup.html
@@ -0,0 +1,334 @@
+<!DOCTYPE html>
+<html lang="en">
+<head>
+  <meta charset="UTF-8" />
+  <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+  <title>SumLens — UI Mockup</title>
+  <style>
+    *, *::before, *::after { box-sizing: border-box; margin: 0; padding: 0; }
+
+    body {
+      font-family: system-ui, sans-serif;
+      font-size: 14px;
+      background: #f8fafc;
+      color: #1e293b;
+      padding: 24px;
+    }
+
+    /* ── Header ───────────────────────────────────────────────── */
+    .header {
+      background: #fff;
+      border: 1px solid #e2e8f0;
+      border-radius: 8px;
+      padding: 16px 20px;
+      margin-bottom: 16px;
+    }
+    .header h1 { font-size: 20px; font-weight: 700; }
+    .header p  { font-size: 12px; color: #64748b; margin-top: 4px; }
+    .legend {
+      display: flex;
+      gap: 16px;
+      margin-top: 10px;
+      font-size: 12px;
+    }
+    .swatch {
+      display: inline-block;
+      width: 12px; height: 12px;
+      border-radius: 2px;
+      vertical-align: middle;
+      margin-right: 4px;
+    }
+    .green  { background: #86efac; }
+    .orange { background: #fdba74; }
+    .red    { background: #fca5a5; }
+
+    /* ── Two-panel row ────────────────────────────────────────── */
+    .panels {
+      display: grid;
+      grid-template-columns: 1fr 1fr;
+      gap: 16px;
+      margin-bottom: 16px;
+    }
+    .panel {
+      background: #fff;
+      border: 1px solid #e2e8f0;
+      border-radius: 8px;
+      padding: 16px;
+    }
+    .panel h2 {
+      font-size: 13px;
+      font-weight: 600;
+      color: #475569;
+      margin-bottom: 10px;
+      text-transform: uppercase;
+      letter-spacing: .04em;
+    }
+
+    /* source panel */
+    .source-text {
+      font-size: 13px;
+      line-height: 1.8;
+      color: #334155;
+    }
+    .source-text mark {
+      background: #fde68a;
+      border-radius: 3px;
+      padding: 1px 3px;
+    }
+
+    /* summary panel */
+    .summary-sentence {
+      display: inline;
+      border-radius: 4px;
+      padding: 2px 5px;
+      margin: 2px 0;
+      font-size: 13px;
+      line-height: 2;
+      cursor: pointer;
+    }
+    .sentence-grounded     { background: #bbf7d0; }
+    .sentence-weak         { background: #fed7aa; }
+    .sentence-hallucinated { background: #fecaca; }
+    .sentence-selected     { outline: 2px solid #3b82f6; }
+
+    .signal-scores {
+      margin-top: 12px;
+      font-size: 11px;
+      color: #64748b;
+      background: #f8fafc;
+      border: 1px solid #e2e8f0;
+      border-radius: 6px;
+      padding: 8px 10px;
+    }
+    .signal-scores table { width: 100%; border-collapse: collapse; }
+    .signal-scores th, .signal-scores td {
+      text-align: left;
+      padding: 3px 6px;
+      border-bottom: 1px solid #e2e8f0;
+    }
+    .signal-scores th { font-weight: 600; color: #475569; }
+
+    /* ── Threshold sliders ────────────────────────────────────── */
+    .sliders {
+      background: #fff;
+      border: 1px solid #e2e8f0;
+      border-radius: 8px;
+      padding: 14px 20px;
+      margin-bottom: 16px;
+      display: flex;
+      gap: 32px;
+      align-items: flex-end;
+    }
+    .slider-group { flex: 1; }
+    .slider-group label {
+      display: block;
+      font-size: 12px;
+      font-weight: 600;
+      color: #475569;
+      margin-bottom: 6px;
+    }
+    .slider-group input[type=range] { width: 100%; accent-color: #3b82f6; }
+    .slider-value {
+      font-size: 12px;
+      color: #64748b;
+      margin-top: 2px;
+      text-align: right;
+    }
+
+    /* ── Input row ────────────────────────────────────────────── */
+    .input-row {
+      display: grid;
+      grid-template-columns: 1fr 260px;
+      gap: 16px;
+      margin-bottom: 16px;
+    }
+    .card {
+      background: #fff;
+      border: 1px solid #e2e8f0;
+      border-radius: 8px;
+      padding: 14px;
+    }
+    .card label {
+      display: block;
+      font-size: 12px;
+      font-weight: 600;
+      color: #475569;
+      margin-bottom: 6px;
+    }
+    textarea {
+      width: 100%;
+      height: 80px;
+      border: 1px solid #cbd5e1;
+      border-radius: 6px;
+      padding: 8px;
+      font-size: 13px;
+      color: #64748b;
+      resize: none;
+      background: #f8fafc;
+    }
+    .upload-box {
+      border: 2px dashed #cbd5e1;
+      border-radius: 6px;
+      height: 80px;
+      display: flex;
+      align-items: center;
+      justify-content: center;
+      font-size: 12px;
+      color: #94a3b8;
+      cursor: pointer;
+    }
+
+    /* ── Action bar ───────────────────────────────────────────── */
+    .action-bar {
+      display: flex;
+      gap: 10px;
+      align-items: center;
+    }
+    .btn {
+      padding: 8px 20px;
+      border-radius: 6px;
+      font-size: 13px;
+      font-weight: 600;
+      border: none;
+      cursor: pointer;
+    }
+    .btn-primary  { background: #3b82f6; color: #fff; }
+    .btn-outline  { background: #fff; color: #3b82f6; border: 1px solid #3b82f6; }
+    .btn-disabled { background: #e2e8f0; color: #94a3b8; cursor: not-allowed; }
+
+    .status-pill {
+      font-size: 11px;
+      background: #eff6ff;
+      color: #3b82f6;
+      border: 1px solid #bfdbfe;
+      border-radius: 99px;
+      padding: 3px 10px;
+    }
+
+    /* ── Annotation note ──────────────────────────────────────── */
+    .annotation {
+      font-size: 11px;
+      color: #94a3b8;
+      border: 1px dashed #cbd5e1;
+      border-radius: 4px;
+      padding: 3px 8px;
+      display: inline-block;
+      margin-top: 4px;
+    }
+  </style>
+</head>
+<body>
+
+  <!-- Header -->
+  <div class="header">
+    <h1>SumLens — Summary Faithfulness Dashboard</h1>
+    <p>Paste text or upload a PDF · SumLens summarises it and flags sentences that may be hallucinated</p>
+    <div class="legend">
+      <span><span class="swatch green"></span>Grounded</span>
+      <span><span class="swatch orange"></span>Weakly grounded</span>
+      <span><span class="swatch red"></span>Hallucinated</span>
+      <span style="color:#64748b;font-style:italic;margin-left:auto">
+        Click a summary sentence to highlight its source spans
+      </span>
+    </div>
+  </div>
+
+  <!-- Two-panel view (F2 / F3) -->
+  <div class="panels">
+
+    <!-- Left: source document -->
+    <div class="panel">
+      <h2>Source document</h2>
+      <div class="source-text">
+        The parliament met on Monday to discuss the proposed national budget for the
+        coming fiscal year. <mark>Lawmakers from every party debated the spending
+        priorities for several hours without reaching a clear consensus on the final
+        allocations.</mark> The finance minister presented projections covering health,
+        education, and transport infrastructure across the regions. Several members
+        raised concerns about the long-term sustainability of the proposed deficit
+        levels. No final figure for total expenditure was announced to the press by
+        the end of the day.
+      </div>
+      <div class="annotation">↑ highlighted span: top attributed source sentence for selected summary sentence</div>
+    </div>
+
+    <!-- Right: annotated summary (F2 / F3) -->
+    <div class="panel">
+      <h2>Summary — click a sentence to trace source spans</h2>
+      <div>
+        <span class="summary-sentence sentence-grounded">
+          Parliament debated the national budget on Monday.
+        </span>
+        <span class="summary-sentence sentence-hallucinated sentence-selected">
+          The bill passed with a majority of 312 votes.
+        </span>
+        <span class="summary-sentence sentence-weak">
+          The finance minister outlined spending plans for key sectors.
+        </span>
+        <span class="summary-sentence sentence-grounded">
+          Sustainability of deficit levels was questioned by members.
+        </span>
+      </div>
+      <div class="annotation">↑ blue outline = selected sentence · yellow = attributed source spans</div>
+
+      <!-- Signal scores table -->
+      <div class="signal-scores">
+        <strong>Signal breakdown — "The bill passed with a majority of 312 votes."</strong>
+        <table>
+          <thead>
+            <tr><th>Signal</th><th>Score</th><th>Interpretation</th></tr>
+          </thead>
+          <tbody>
+            <tr><td>Classifier (A)</td><td>0.91</td><td>high hallucination probability</td></tr>
+            <tr><td>NLI (B)</td><td>0.14</td><td>claim not entailed by source</td></tr>
+            <tr><td>Attribution (C)</td><td>0.22</td><td>low source attribution mass</td></tr>
+            <tr><td><strong>Fused score</strong></td><td><strong>0.12</strong></td><td><strong>→ hallucinated (below τ = 0.30)</strong></td></tr>
+          </tbody>
+        </table>
+      </div>
+    </div>
+
+  </div>
+
+  <!-- Threshold sliders (F4) -->
+  <div class="sliders">
+    <div class="slider-group">
+      <label>τ hallucinated — below this → hallucinated</label>
+      <input type="range" min="0" max="1" step="0.05" value="0.30" />
+      <div class="slider-value">0.30</div>
+    </div>
+    <div class="slider-group">
+      <label>τ grounded — above this → grounded</label>
+      <input type="range" min="0" max="1" step="0.05" value="0.70" />
+      <div class="slider-value">0.70</div>
+    </div>
+    <div class="annotation" style="align-self:center">
+      Sliders re-flag sentences instantly · no model re-run
+    </div>
+  </div>
+
+  <!-- Input area -->
+  <div class="input-row">
+    <div class="card">
+      <label>Paste text (≤ 10 000 words)</label>
+      <textarea placeholder="Paste your document here…"></textarea>
+    </div>
+    <div class="card">
+      <label>or upload PDF (≤ 5 MB)</label>
+      <div class="upload-box">📄 Click to select a PDF</div>
+    </div>
+  </div>
+
+  <!-- Action bar (F2 / F5 / F6) -->
+  <div class="action-bar">
+    <button class="btn btn-primary">Analyse</button>
+    <button class="btn btn-outline">Export JSON</button>
+    <button class="btn btn-outline">Export PDF</button>
+    <span class="status-pill">⏳ Analysing… (≈ 30–60 s on GPU)</span>
+    <span class="annotation" style="margin-left:auto">
+      Button disabled while pipeline runs · export buttons appear on completion
+    </span>
+  </div>
+
+</body>
+</html>
diff --git a/docs/use-case.puml b/docs/use-case.puml
new file mode 100644
index 0000000..ea834b3
--- /dev/null
+++ b/docs/use-case.puml
@@ -0,0 +1,70 @@
+@startuml use-case
+' UC-01 "Verify a Summary" + extensions
+' Actors/use cases from requirements.md §7
+
+left to right direction
+skinparam packageStyle rectangle
+skinparam usecase {
+  BackgroundColor White
+  BorderColor #555
+  ArrowColor #555
+}
+
+' ── Actors ──────────────────────────────────────────────────────────────────
+actor "Journalist\n(P1)" as P1
+actor "Policy Analyst\n(P2)" as P2
+actor "Financial Analyst\n(P3)" as P3
+
+' ── System boundary ─────────────────────────────────────────────────────────
+rectangle "SumLens" {
+
+  ' Primary use case
+  usecase "UC-01\nVerify a Summary" as UC01
+
+  ' ── Included steps (always executed as part of UC-01) ────────────────────
+  usecase "Submit Document\n(FR-01 / FR-02)" as UC_Submit
+  usecase "Validate Input\n(FR-01 / FR-02 / FR-04)" as UC_Validate
+  usecase "Summarise Document\n(FR-05 / FR-06)" as UC_Summarise
+  usecase "Compute & Fuse Signals\n(FR-07 / FR-08 / FR-09 / FR-10)" as UC_Signals
+  usecase "Render Annotated\nHeatmap\n(FR-12 / FR-14 / FR-15)" as UC_Heatmap
+
+  ' ── Optional extensions (user-initiated after viewing results) ───────────
+  usecase "Trace Source Spans\n(FR-13, US-06)" as UC_Trace
+  usecase "Adjust Threshold τ\n(FR-11, US-05)" as UC_Tau
+  usecase "Export JSON\n(FR-16, US-04)" as UC_JSON
+  usecase "Export PDF\n(FR-17, US-04)" as UC_PDF
+
+  ' ── Error extensions ─────────────────────────────────────────────────────
+  usecase "Handle Invalid Input\n(FR-04, US-08)" as UC_ErrInput
+  usecase "Handle Model Failure\n(NFR-07)" as UC_ErrModel
+}
+
+' ── Actor associations ───────────────────────────────────────────────────────
+P1 --> UC01 : upload PDF\n(US-01)
+P1 --> UC_JSON : export result\n(US-04)
+P1 --> UC_PDF : export result\n(US-04)
+
+P2 --> UC01 : paste text\n(US-07)
+P2 --> UC_Trace : trace ignored spans\n(US-02)
+
+P3 --> UC01 : verify figures\n(US-03)
+P3 --> UC_Tau : tune sensitivity\n(US-05)
+
+' ── Include relationships (base case always invokes these) ──────────────────
+UC01 ..> UC_Submit   : <<include>>
+UC01 ..> UC_Validate : <<include>>
+UC01 ..> UC_Summarise : <<include>>
+UC01 ..> UC_Signals  : <<include>>
+UC01 ..> UC_Heatmap  : <<include>>
+
+' ── Extend relationships (conditional / optional) ───────────────────────────
+UC_Trace  ..> UC01 : <<extend>>\n[user clicks sentence]
+UC_Tau    ..> UC01 : <<extend>>\n[user adjusts slider]
+UC_JSON   ..> UC01 : <<extend>>\n[user requests export]
+UC_PDF    ..> UC01 : <<extend>>\n[user requests export]
+
+' ── Error extensions ────────────────────────────────────────────────────────
+UC_ErrInput ..> UC_Validate  : <<extend>>\n[2a: invalid input]
+UC_ErrModel ..> UC_Summarise : <<extend>>\n[3a: model failure]
+
+@enduml
diff --git a/pyproject.toml b/pyproject.toml
index 417b0be..8859c4b 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -15,6 +15,7 @@ dependencies = [
     "gradio",
     "matplotlib",
     "scikit-learn",
+    "fpdf2",
 ]
 
 [project.optional-dependencies]
@@ -23,7 +24,6 @@ dev = [
     "mypy",
     "pytest",
     "pytest-cov",
-    "fpdf2",
 ]
 
 [build-system]
@@ -56,6 +56,7 @@ module = [
     "gradio", "gradio.*",
     "matplotlib", "matplotlib.*",
     "sklearn", "sklearn.*",
+    "fpdf", "fpdf.*",
 ]
 ignore_missing_imports = true
 
diff --git a/scripts/evaluate.py b/scripts/evaluate.py
index 5489990..293c47c 100644
--- a/scripts/evaluate.py
+++ b/scripts/evaluate.py
@@ -15,7 +15,7 @@
 from sumlens.eval.ablation import ablation_table
 from sumlens.types import AnalysisConfig
 
-_COLUMNS = ["condition", "precision", "recall", "f1", "ece"]
+_COLUMNS = ["condition", "roc_auc", "pr_auc", "precision", "recall", "f1", "ece"]
 
 
 def _read(path: Path) -> list[dict[str, str]]:
diff --git a/scripts/extract_features.py b/scripts/extract_features.py
index e9a9c02..0e1503b 100644
--- a/scripts/extract_features.py
+++ b/scripts/extract_features.py
@@ -1,11 +1,12 @@
-"""Run signals A/B(/C) over a RAGTruth split and write a fusion features CSV.
+"""Run signals A/B/C over a RAGTruth split and write a fusion features CSV.
 
-For each summary sentence: classifier (A), NLI (B), and optionally attribution (C)
-scores + the grounded gold label. Output feeds scripts/train_fusion.py.
+For each summary sentence: classifier (A), NLI (B), and support attribution (C =
+attr_conc + attr_loo) scores + the grounded gold label. Output feeds the ablation.
 
-Attribution is off by default: RAGTruth summaries were not generated by our local
-model, so Inseq attribution is not well-defined for them (see research-plan.md §8).
-Enable with --with-attribution only when summaries come from our own summariser.
+Signal C here is the generator-agnostic support attribution (signals/support.py),
+derived from an NLI matrix, so it is well-defined for RAGTruth even though those
+summaries were not generated by our local model (unlike Inseq attribution; see
+research-plan.md §8). It therefore always runs.
 
 This runs the REAL models — launch on HPC (si-gpu / sbatch), not in CI.
 """
@@ -16,9 +17,9 @@
 
 from sumlens.eval.features import FIELDNAMES, feature_rows
 from sumlens.eval.ragtruth import load_split
-from sumlens.signals.attribution import attribute
 from sumlens.signals.classifier import classify
 from sumlens.signals.nli import entail, extract_claims
+from sumlens.signals.support import support_attribution
 from sumlens.types import AnalysisConfig
 
 
@@ -27,7 +28,6 @@ def main() -> None:
     parser.add_argument("--data-dir", type=Path, default=Path("data/ragtruth"))
     parser.add_argument("--split", default="train")
     parser.add_argument("--out", type=Path, default=Path("features.csv"))
-    parser.add_argument("--with-attribution", action="store_true")
     parser.add_argument("--limit", type=int, default=0, help="cap summaries (0 = all)")
     args = parser.parse_args()
 
@@ -40,9 +40,9 @@ def main() -> None:
     for document, summary, hallucinated in examples:
         classifier_out = classify(document, summary, cfg)
         nli_out = entail(extract_claims(summary), document, cfg)
-        attribution_out = attribute(document, summary, cfg) if args.with_attribution else {}
+        support_out = support_attribution(document, summary, cfg)
         rows.extend(
-            feature_rows(summary, hallucinated, classifier_out, nli_out, attribution_out)
+            feature_rows(summary, hallucinated, classifier_out, nli_out, support_out)
         )
 
     with args.out.open("w", encoding="utf-8", newline="") as fh:
diff --git a/scripts/jobs/run_eval.sbatch b/scripts/jobs/run_eval.sbatch
index efdbfc8..f68a47c 100755
--- a/scripts/jobs/run_eval.sbatch
+++ b/scripts/jobs/run_eval.sbatch
@@ -45,7 +45,9 @@ python -c "import torch; print('CUDA available:', torch.cuda.is_available())"
 # after a crash/timeout resumes instead of redoing finished work) ---
 echo ">>> extract features (train)"; [ -f features_train.csv ] || python scripts/extract_features.py --split train --data-dir data/ragtruth --out features_train.csv
 echo ">>> extract features (test)";  [ -f features_test.csv ]  || python scripts/extract_features.py --split test  --data-dir data/ragtruth --out features_test.csv
-echo ">>> train fusion";             [ -f models/fusion.pkl ]  || python scripts/train_fusion.py --features features_train.csv --out-dir models
+# train_fusion (live model) is intentionally skipped: this experiment compares
+# signals via the ablation, which fits its own per-subset models. Promote a live
+# fusion model only after the ablation shows the new attribution signals help.
 echo ">>> ablation table";           [ -f ablation.csv ]       || python scripts/evaluate.py --train features_train.csv --test features_test.csv --out ablation.csv
 
 echo "=== DONE $(date) ==="
diff --git a/scripts/train_fusion.py b/scripts/train_fusion.py
index 90eb67d..e3d693b 100644
--- a/scripts/train_fusion.py
+++ b/scripts/train_fusion.py
@@ -14,13 +14,19 @@
 from sumlens.fuse import fit_fusion, fit_platt
 
 
+def _num(value: str, impute: float = 0.5) -> float:
+    # A signal column is empty when that signal was off for the run (e.g.
+    # attribution is off for RAGTruth). Impute neutral, matching the ablation.
+    return impute if value == "" else float(value)
+
+
 def _read(path: Path) -> tuple[list[list[float]], list[int]]:
     features: list[list[float]] = []
     grounded: list[int] = []
     with path.open(encoding="utf-8") as fh:
         for row in csv.DictReader(fh):
             features.append(
-                [float(row["classifier"]), float(row["nli"]), float(row["attribution"])]
+                [_num(row["classifier"]), _num(row["nli"]), _num(row["attribution"])]
             )
             grounded.append(int(row["grounded"]))
     return features, grounded
diff --git a/sumlens/eval/ablation.py b/sumlens/eval/ablation.py
index 25773a9..7daeb2e 100644
--- a/sumlens/eval/ablation.py
+++ b/sumlens/eval/ablation.py
@@ -1,22 +1,24 @@
 """Ablation over signal subsets — the report's centrepiece table.
 
-For each non-empty subset of {classifier (A), NLI (B), attribution (C)} we fit a
-fusion LogisticRegression on the train rows (using only that subset's columns),
-predict on the test rows, and report detection precision/recall/F1 (positive class
-= hallucinated) plus the calibration error of the grounding probability.
-
-Rows are mappings with keys: classifier, nli, attribution (float or None/""),
-and grounded (1 grounded / 0 hallucinated). Missing signal values are imputed.
+For each non-empty subset of {classifier (A), NLI (B), attr_conc (C), attr_loo (D)}
+we fit a fusion LogisticRegression on the train rows (using only that subset's
+columns), predict on the test rows, and report detection precision/recall/F1
+(positive class = hallucinated) plus the calibration error of the grounding
+probability. C and D are the two scalars of the generator-agnostic support
+attribution (signals/support.py).
+
+Rows are mappings with keys: classifier, nli, attr_conc, attr_loo (float or
+None/""), and grounded (1 grounded / 0 hallucinated). Missing values are imputed.
 """
 
 from collections.abc import Mapping, Sequence
 from itertools import combinations
 
-from sumlens.eval.metrics import expected_calibration_error
+from sumlens.eval.metrics import expected_calibration_error, pr_auc, roc_auc
 from sumlens.fuse import fit_fusion
 
-_SIGNALS = ("classifier", "nli", "attribution")
-_LETTER = {"classifier": "A", "nli": "B", "attribution": "C"}
+_SIGNALS = ("classifier", "nli", "attr_conc", "attr_loo")
+_LETTER = {"classifier": "A", "nli": "B", "attr_conc": "C", "attr_loo": "D"}
 
 Row = Mapping[str, object]
 
@@ -46,8 +48,15 @@ def _evaluate_combo(
     true_hallucinated = [1 - g for g in y_test]
     precision, recall, f1 = _prf(true_hallucinated, pred_hallucinated)
 
+    # Threshold-free detection metrics (positive class = hallucinated). f1 above
+    # is a single fixed-0.5 operating point and is misleading under the heavy
+    # hallucination class imbalance; roc_auc/pr_auc are the headline numbers.
+    proba_hallucinated = [1.0 - p for p in grounded_proba]
+
     return {
         "condition": "+".join(_LETTER[s] for s in combo),
+        "roc_auc": roc_auc(proba_hallucinated, true_hallucinated),
+        "pr_auc": pr_auc(proba_hallucinated, true_hallucinated),
         "precision": precision,
         "recall": recall,
         "f1": f1,
diff --git a/sumlens/eval/features.py b/sumlens/eval/features.py
index a9ca58d..3d56b75 100644
--- a/sumlens/eval/features.py
+++ b/sumlens/eval/features.py
@@ -1,16 +1,26 @@
 """Assemble fusion training rows from per-sentence signal outputs + gold labels.
 
-One row per summary sentence: the three signal scores (None if a signal did not
-run for that sentence) and the grounded label (1 if grounded, 0 if the RAGTruth
-gold marks the sentence hallucinated). This pure function is the testable core of
-`scripts/extract_features.py`, which supplies the real signal outputs.
+One row per summary sentence: the signal scores (None if a signal did not run for
+that sentence) and the grounded label (1 if grounded, 0 if the RAGTruth gold marks
+the sentence hallucinated). Signal C is the generator-agnostic support attribution
+(`signals/support.py`), which yields two scalars per sentence: attr_conc (support
+concentration) and attr_loo (best-supporter necessity margin). This pure function
+is the testable core of `scripts/extract_features.py`.
 """
 
 from collections.abc import Mapping
 
 from sumlens.types import Claim, Summary
 
-FIELDNAMES = ["summary_id", "sentence_id", "classifier", "nli", "attribution", "grounded"]
+FIELDNAMES = [
+    "summary_id",
+    "sentence_id",
+    "classifier",
+    "nli",
+    "attr_conc",
+    "attr_loo",
+    "grounded",
+]
 
 
 def feature_rows(
@@ -18,7 +28,7 @@ def feature_rows(
     hallucinated_ids: list[str],
     classifier_out: dict[str, tuple[float, list[tuple[int, int]]]],
     nli_out: dict[str, tuple[float, list[Claim]]],
-    attribution_out: dict[str, tuple[float, list[str]]],
+    support_out: dict[str, tuple[float, float, list[str]]],
 ) -> list[dict[str, object]]:
     hallucinated = set(hallucinated_ids)
     rows: list[dict[str, object]] = []
@@ -27,15 +37,18 @@ def feature_rows(
             {
                 "summary_id": summary.id,
                 "sentence_id": sentence.id,
-                "classifier": _score(classifier_out, sentence.id),
-                "nli": _score(nli_out, sentence.id),
-                "attribution": _score(attribution_out, sentence.id),
+                "classifier": _at(classifier_out, sentence.id, 0),
+                "nli": _at(nli_out, sentence.id, 0),
+                "attr_conc": _at(support_out, sentence.id, 0),
+                "attr_loo": _at(support_out, sentence.id, 1),
                 "grounded": 0 if sentence.id in hallucinated else 1,
             }
         )
     return rows
 
 
-def _score(signal_out: Mapping[str, tuple[float, object]], sentence_id: str) -> float | None:
+def _at(
+    signal_out: Mapping[str, tuple[object, ...]], sentence_id: str, index: int
+) -> float | None:
     entry = signal_out.get(sentence_id)
-    return entry[0] if entry is not None else None
+    return float(entry[index]) if entry is not None else None  # type: ignore[arg-type]
diff --git a/sumlens/eval/metrics.py b/sumlens/eval/metrics.py
index 6a35054..a121fcb 100644
--- a/sumlens/eval/metrics.py
+++ b/sumlens/eval/metrics.py
@@ -22,6 +22,51 @@ def sentence_f1(preds: dict[str, set[str]], golds: dict[str, set[str]]) -> dict[
     return {"precision": precision, "recall": recall, "f1": f1}
 
 
+def roc_auc(scores: list[float], labels: list[int]) -> float:
+    """Threshold-free ROC-AUC (rank-based, ties averaged). `scores` rank the
+    positive class (label 1). Returns 0.0 if either class is absent."""
+    n_pos = sum(labels)
+    n_neg = len(labels) - n_pos
+    if not n_pos or not n_neg:
+        return 0.0
+    order = sorted(zip(scores, labels, strict=True), key=lambda p: p[0])
+    ranks = [0.0] * len(order)
+    i = 0
+    while i < len(order):
+        j = i
+        while j < len(order) and order[j][0] == order[i][0]:
+            j += 1
+        rank = (i + j - 1) / 2 + 1  # 1-based average rank for the tie group
+        for k in range(i, j):
+            ranks[k] = rank
+        i = j
+    rank_sum_pos = sum(r for r, (_, label) in zip(ranks, order, strict=True) if label == 1)
+    return (rank_sum_pos - n_pos * (n_pos + 1) / 2) / (n_pos * n_neg)
+
+
+def pr_auc(scores: list[float], labels: list[int]) -> float:
+    """Average precision (area under precision-recall curve). `scores` rank the
+    positive class (label 1). Returns 0.0 if no positives. Better than ROC-AUC
+    under heavy class imbalance; floor is the positive base rate."""
+    n_pos = sum(labels)
+    if not n_pos:
+        return 0.0
+    order = sorted(range(len(scores)), key=lambda i: scores[i], reverse=True)
+    tp = fp = 0
+    ap = 0.0
+    prev_recall = 0.0
+    for i in order:
+        if labels[i] == 1:
+            tp += 1
+        else:
+            fp += 1
+        recall = tp / n_pos
+        precision = tp / (tp + fp)
+        ap += (recall - prev_recall) * precision
+        prev_recall = recall
+    return ap
+
+
 def expected_calibration_error(
     scores: list[float], labels: list[int], n_bins: int = 10
 ) -> float:
diff --git a/sumlens/signals/support.py b/sumlens/signals/support.py
new file mode 100644
index 0000000..1401dc4
--- /dev/null
+++ b/sumlens/signals/support.py
@@ -0,0 +1,51 @@
+"""Signal C (redesign) — generator-agnostic source attribution from an NLI matrix.
+
+Inseq attribution (`attribution.py`) is gradient-based and needs the *generating*
+model, so it is undefined for RAGTruth (external-model summaries). This signal
+derives attribution from entailment alone, so it is defined for any (source,
+summary) pair. For each summary sentence ``s`` we score entailment against every
+source sentence ``j``, ``M[s][j] = P(src_j entails s)``, then collapse the row:
+
+- ``attr_conc(s) = max_j M - mean_j M`` — support concentration. A grounded
+  sentence has one sharp supporter; a fabricated one has diffuse, flat-low support.
+- ``attr_loo(s)  = top1 - top2`` — necessity margin of the single best supporter.
+- top-k source sentence ids — the UI heatmap (generator-free, no token offsets).
+
+Reuses signal B's NLI model and batched call. Pure given the NLI boundary, which
+tests mock via `_get_nli`. Consumed by `scripts/extract_features.py`.
+"""
+
+from sumlens.signals.nli import _entail_prob, _get_nli
+from sumlens.types import AnalysisConfig, Document, Summary
+
+_BATCH_SIZE = 64
+_TOP_K = 5
+
+
+def support_attribution(
+    document: Document, summary: Summary, cfg: AnalysisConfig
+) -> dict[str, tuple[float, float, list[str]]]:
+    """Per summary sentence: (attr_conc, attr_loo, top-k source sentence ids)."""
+    sources = document.sentences
+    sentences = summary.sentences
+    if not sentences or not sources:
+        return {s.id: (0.0, 0.0, []) for s in sentences}
+
+    nli = _get_nli(cfg.nli_model)
+    pairs = [
+        {"text": src.text, "text_pair": sent.text} for sent in sentences for src in sources
+    ]
+    batched = nli(pairs, top_k=None, batch_size=_BATCH_SIZE)
+    n = len(sources)
+
+    results: dict[str, tuple[float, float, list[str]]] = {}
+    for i, sentence in enumerate(sentences):
+        row = [_entail_prob(scores) for scores in batched[i * n : (i + 1) * n]]
+        order = sorted(range(n), key=lambda j: row[j], reverse=True)
+        top1 = row[order[0]]
+        top2 = row[order[1]] if n > 1 else 0.0
+        conc = top1 - sum(row) / n
+        loo = top1 - top2
+        top_ids = [sources[j].id for j in order[:_TOP_K]]
+        results[sentence.id] = (conc, loo, top_ids)
+    return results
diff --git a/tests/conftest.py b/tests/conftest.py
index 72b4785..201e310 100644
--- a/tests/conftest.py
+++ b/tests/conftest.py
@@ -4,10 +4,14 @@
 
 
 def _ensure_punkt() -> None:
-    try:
-        nltk.data.find("tokenizers/punkt/english.pickle")
-    except LookupError:
-        nltk.download("punkt", quiet=True)
+    for resource, package in [
+        ("tokenizers/punkt/english.pickle", "punkt"),
+        ("tokenizers/punkt_tab/english/", "punkt_tab"),
+    ]:
+        try:
+            nltk.data.find(resource)
+        except LookupError:
+            nltk.download(package, quiet=True)
 
 
 _ensure_punkt()
diff --git a/tests/test_ablation.py b/tests/test_ablation.py
index 7e448f9..8fab62c 100644
--- a/tests/test_ablation.py
+++ b/tests/test_ablation.py
@@ -2,8 +2,8 @@
 
 from sumlens.eval.ablation import ablation_table
 
-_GROUNDED = {"classifier": 0.9, "nli": 0.8, "attribution": 0.7, "grounded": 1}
-_HALLUCINATED = {"classifier": 0.1, "nli": 0.2, "attribution": 0.3, "grounded": 0}
+_GROUNDED = {"classifier": 0.9, "nli": 0.8, "attr_conc": 0.7, "attr_loo": 0.6, "grounded": 1}
+_HALLUCINATED = {"classifier": 0.1, "nli": 0.2, "attr_conc": 0.3, "attr_loo": 0.2, "grounded": 0}
 _ROWS = [_GROUNDED, _HALLUCINATED] * 10
 
 
@@ -11,19 +11,26 @@ def test_ablation_table_conditions_and_scores() -> None:
     table = ablation_table(_ROWS, _ROWS)
 
     conditions = {row["condition"] for row in table}
-    assert conditions == {"A", "B", "C", "A+B", "A+C", "B+C", "A+B+C"}
+    assert conditions == {
+        "A", "B", "C", "D",
+        "A+B", "A+C", "A+D", "B+C", "B+D", "C+D",
+        "A+B+C", "A+B+D", "A+C+D", "B+C+D",
+        "A+B+C+D",
+    }
 
     for row in table:
-        for key in ("precision", "recall", "f1", "ece"):
+        for key in ("roc_auc", "pr_auc", "precision", "recall", "f1", "ece"):
             assert isinstance(row[key], float)
 
-    fused = next(row for row in table if row["condition"] == "A+B+C")
+    fused = next(row for row in table if row["condition"] == "A+B+C+D")
     assert fused["f1"] == 1.0  # perfectly separable -> perfect detection
+    assert fused["roc_auc"] == 1.0
+    assert fused["pr_auc"] == 1.0
 
 
 def test_ablation_imputes_missing_attribution() -> None:
-    # attribution missing ("") on every row -> still runs via imputation
-    rows = [{**r, "attribution": ""} for r in _ROWS]
+    # attr_conc missing ("") on every row -> still runs via imputation
+    rows = [{**r, "attr_conc": ""} for r in _ROWS]
     table = ablation_table(rows, rows)
     c_only = next(row for row in table if row["condition"] == "C")
     assert isinstance(c_only["f1"], float)
diff --git a/tests/test_app.py b/tests/test_app.py
new file mode 100644
index 0000000..f14799c
--- /dev/null
+++ b/tests/test_app.py
@@ -0,0 +1,326 @@
+"""App tests — display helpers and `run` with the pipeline mocked (no gradio, no weights)."""
+
+import json
+from pathlib import Path
+
+import pytest
+
+import app as app_mod
+from app import _apply_tau, _export_json, _export_pdf, _render_source_html, _to_highlighted, run
+from sumlens.types import (
+    AnalysisConfig,
+    AnalysisResult,
+    Document,
+    Evidence,
+    Sentence,
+    SentenceVerdict,
+    SignalScores,
+    Summary,
+)
+
+
+def _result() -> AnalysisResult:
+    document = Document(
+        id="doc-1",
+        raw_text="The bill passed. Budget is huge.",
+        sentences=[
+            Sentence(id="src-0000", text="The bill passed.", char_start=0, char_end=16),
+            Sentence(id="src-0001", text="Budget is huge.", char_start=17, char_end=32),
+        ],
+        source="text",
+    )
+    summary = Summary(
+        id="doc-1-summary",
+        document_id="doc-1",
+        text="Grounded one. Bad two.",
+        sentences=[
+            Sentence(id="sum-0000", text="Grounded one.", char_start=0, char_end=13),
+            Sentence(id="sum-0001", text="Bad two.", char_start=14, char_end=22),
+        ],
+        model_name="m",
+    )
+    verdicts = [
+        SentenceVerdict(
+            sentence_id="sum-0000",
+            fused_score=0.9,
+            label="grounded",
+            signals=SignalScores(classifier=0.1, nli=0.9, attribution=None),
+            evidence=Evidence(
+                failed_claims=[],
+                top_source_sentence_ids=["src-0000"],
+                classifier_token_spans=[],
+            ),
+        ),
+        SentenceVerdict(
+            sentence_id="sum-0001",
+            fused_score=0.1,
+            label="hallucinated",
+            signals=SignalScores(classifier=0.9, nli=0.2, attribution=0.3),
+            evidence=Evidence(
+                failed_claims=[],
+                top_source_sentence_ids=["src-0001"],
+                classifier_token_spans=[],
+            ),
+        ),
+    ]
+    return AnalysisResult(
+        document=document,
+        summary=summary,
+        verdicts=verdicts,
+        config=AnalysisConfig(),
+        timings_ms={},
+    )
+
+
+# ---------------------------------------------------------------------------
+# _to_highlighted
+# ---------------------------------------------------------------------------
+
+
+def test_to_highlighted() -> None:
+    assert _to_highlighted(_result()) == [
+        ("Grounded one. ", "grounded"),
+        ("Bad two. ", "hallucinated"),
+    ]
+
+
+# ---------------------------------------------------------------------------
+# _render_source_html
+# ---------------------------------------------------------------------------
+
+
+def test_render_source_html_no_highlights() -> None:
+    result = _result()
+    html = _render_source_html(result.document, set())
+    assert "The bill passed." in html
+    assert "Budget is huge." in html
+    assert "<mark" not in html
+
+
+def test_render_source_html_highlights_given_ids() -> None:
+    result = _result()
+    html = _render_source_html(result.document, {"src-0000"})
+    assert "<mark" in html
+    assert "The bill passed." in html
+    # only src-0000 is marked; src-0001 is plain text
+    assert html.index("<mark") < html.index("The bill passed.")
+
+
+def test_render_source_html_no_sentences_falls_back_to_raw() -> None:
+    doc = Document(id="d", raw_text="Raw text only.", sentences=[], source="text")
+    html = _render_source_html(doc, set())
+    assert "Raw text only." in html
+
+
+# ---------------------------------------------------------------------------
+# run()
+# ---------------------------------------------------------------------------
+
+
+def test_run_text_input(monkeypatch: pytest.MonkeyPatch) -> None:
+    canned = _result()
+    text_doc = Document(
+        id="text", raw_text="Some pasted source text.", sentences=[], source="text"
+    )
+    monkeypatch.setattr(app_mod, "load_text", lambda text: text_doc)
+    monkeypatch.setattr(app_mod, "analyse", lambda document, cfg: canned)
+
+    result, source_html, highlighted, payload = run("Some pasted source text.", None)
+
+    assert highlighted == [("Grounded one. ", "grounded"), ("Bad two. ", "hallucinated")]
+    assert payload == canned.model_dump()
+    assert "Some pasted source text" in source_html
+    assert result == canned
+
+
+def test_run_prefers_pdf_when_given(
+    monkeypatch: pytest.MonkeyPatch, tmp_path: Path
+) -> None:
+    fake_pdf = tmp_path / "report.pdf"
+    fake_pdf.write_bytes(b"%PDF-1.0")
+
+    seen: dict[str, str] = {}
+    pdf_doc = Document(id="doc-1", raw_text="Source.", sentences=[], source="pdf")
+
+    def _fake_load_pdf(path: Path) -> Document:
+        seen["path"] = str(path)
+        return pdf_doc
+
+    monkeypatch.setattr(app_mod, "load_pdf", _fake_load_pdf)
+    monkeypatch.setattr(app_mod, "analyse", lambda document, cfg: _result())
+
+    run("ignored text", str(fake_pdf))
+
+    assert "report.pdf" in seen["path"]
+
+
+def test_run_rejects_empty_text(monkeypatch: pytest.MonkeyPatch) -> None:
+    monkeypatch.setattr(app_mod, "analyse", lambda document, cfg: _result())
+    with pytest.raises(ValueError, match="empty"):
+        run("", None)
+
+
+def test_run_rejects_oversized_text(monkeypatch: pytest.MonkeyPatch) -> None:
+    monkeypatch.setattr(app_mod, "analyse", lambda document, cfg: _result())
+    big_text = " ".join(["word"] * 10_001)
+    with pytest.raises(ValueError, match="too long"):
+        run(big_text, None)
+
+
+def test_run_rejects_oversized_pdf(
+    monkeypatch: pytest.MonkeyPatch, tmp_path: Path
+) -> None:
+    big_pdf = tmp_path / "big.pdf"
+    big_pdf.write_bytes(b"x" * (5 * 1024 * 1024 + 1))
+    monkeypatch.setattr(app_mod, "load_pdf", lambda path: _result().document)
+    monkeypatch.setattr(app_mod, "analyse", lambda document, cfg: _result())
+    with pytest.raises(ValueError, match="too large"):
+        run("", str(big_pdf))
+
+
+# ---------------------------------------------------------------------------
+# F3 — click-to-highlight source spans
+# ---------------------------------------------------------------------------
+
+
+class _FakeSelectEvent:
+    """Minimal stand-in for gr.SelectData."""
+
+    def __init__(self, index: int) -> None:
+        self.index = index
+
+
+def test_on_sentence_select_highlights_top_source_ids(monkeypatch: pytest.MonkeyPatch) -> None:
+    result = _result()
+    # Access the inner function via build_app — easier to test the logic directly
+    # by calling _render_source_html with the expected IDs (unit testing the helper).
+    verdict = result.verdicts[1]  # hallucinated, top_source = ["src-0001"]
+    html = _render_source_html(result.document, set(verdict.evidence.top_source_sentence_ids))
+    assert "<mark" in html
+    assert "Budget is huge." in html  # src-0001 text
+
+
+def test_on_sentence_select_switches_highlight_on_second_click() -> None:
+    result = _result()
+    # Click sentence 0 → src-0000 highlighted
+    html0 = _render_source_html(result.document, {"src-0000"})
+    # Click sentence 1 → src-0001 highlighted
+    html1 = _render_source_html(result.document, {"src-0001"})
+
+    assert "The bill passed." in html0
+    assert html0.count("<mark") == 1
+
+    assert "Budget is huge." in html1
+    assert html1.count("<mark") == 1
+
+    # The two outputs must differ (different sentence highlighted each time)
+    assert html0 != html1
+
+
+def test_on_sentence_select_out_of_range_returns_plain_source() -> None:
+    result = _result()
+    html = _render_source_html(result.document, set())
+    assert "<mark" not in html
+
+
+# ---------------------------------------------------------------------------
+# F5 — export JSON
+# ---------------------------------------------------------------------------
+
+
+# ---------------------------------------------------------------------------
+# F4 — adjustable threshold τ
+# ---------------------------------------------------------------------------
+
+
+def test_apply_tau_returns_none_without_result() -> None:
+    assert _apply_tau(None, 0.70, 0.30) is None
+
+
+def test_apply_tau_mirrors_fuse_label_logic() -> None:
+    result = _result()
+    # fused_score=0.9 → grounded at any reasonable tau
+    # fused_score=0.1 → hallucinated at any reasonable tau
+    spans = _apply_tau(result, 0.70, 0.30)
+    assert spans is not None
+    assert spans[0] == ("Grounded one. ", "grounded")
+    assert spans[1] == ("Bad two. ", "hallucinated")
+
+
+def test_apply_tau_relabels_without_model_rerun() -> None:
+    result = _result()
+    # Raise tau_grounded to 0.95 → fused_score=0.9 falls into "weak"
+    spans = _apply_tau(result, 0.95, 0.30)
+    assert spans is not None
+    assert spans[0][1] == "weak"       # was grounded, now weak
+    assert spans[1][1] == "hallucinated"  # unchanged
+
+
+def test_apply_tau_boundary_score_lt_tau_h_is_hallucinated() -> None:
+    result = _result()
+    # fused_score=0.1 < tau_hallucinated=0.15 → hallucinated
+    spans = _apply_tau(result, 0.70, 0.15)
+    assert spans is not None
+    assert spans[1][1] == "hallucinated"
+
+
+def test_apply_tau_boundary_score_gte_tau_g_is_grounded() -> None:
+    result = _result()
+    # fused_score=0.9 >= tau_grounded=0.85 → grounded
+    spans = _apply_tau(result, 0.85, 0.30)
+    assert spans is not None
+    assert spans[0][1] == "grounded"
+
+
+# ---------------------------------------------------------------------------
+# F5 — export JSON
+# ---------------------------------------------------------------------------
+
+
+def test_export_json_returns_none_without_result() -> None:
+    assert _export_json(None) is None
+
+
+def test_export_json_creates_valid_file() -> None:
+    result = _result()
+    path = _export_json(result)
+    assert path is not None
+    data = json.loads(Path(path).read_text(encoding="utf-8"))
+    assert AnalysisResult.model_validate(data) == result
+
+
+def test_export_json_schema_has_required_fields() -> None:
+    path = _export_json(_result())
+    assert path is not None
+    data = json.loads(Path(path).read_text(encoding="utf-8"))
+    assert {"document", "summary", "verdicts", "config"} <= data.keys()
+    verdict = data["verdicts"][0]
+    assert {"sentence_id", "fused_score", "label", "signals", "evidence"} <= verdict.keys()
+
+
+# ---------------------------------------------------------------------------
+# F6 — export PDF
+# ---------------------------------------------------------------------------
+
+
+def test_export_pdf_returns_none_without_result() -> None:
+    assert _export_pdf(None) is None
+
+
+def test_export_pdf_creates_pdf_file() -> None:
+    path = _export_pdf(_result())
+    assert path is not None
+    content = Path(path).read_bytes()
+    assert content.startswith(b"%PDF"), "file must be a valid PDF"
+
+
+def test_export_pdf_contains_summary_text() -> None:
+    import pdfplumber
+
+    result = _result()
+    path = _export_pdf(result)
+    assert path is not None
+    with pdfplumber.open(path) as pdf:
+        text = " ".join(page.extract_text() or "" for page in pdf.pages)
+    for sentence in result.summary.sentences:
+        assert sentence.text[:10] in text
diff --git a/tests/test_e2e.py b/tests/test_e2e.py
new file mode 100644
index 0000000..dac0071
--- /dev/null
+++ b/tests/test_e2e.py
@@ -0,0 +1,140 @@
+"""End-to-end UI test — input → summary → export with ML models mocked at the
+module boundary; no real weights loaded (FR-11/12/13).
+
+Mock seams:
+  summarise_mod._get_summariser
+  classifier_mod._get_detector
+  nli_mod._get_nli
+  attribution_mod._source_token_attributions
+"""
+
+import json
+from collections.abc import Callable
+from pathlib import Path
+
+import pytest
+
+import app as app_mod
+from app import _export_json, run
+from sumlens import summarise as summarise_mod
+from sumlens.signals import attribution as attribution_mod
+from sumlens.signals import classifier as classifier_mod
+from sumlens.signals import nli as nli_mod
+from sumlens.types import AnalysisResult
+
+_RAW = "Parliament passed a bill on Monday. The budget is one trillion euros."
+_SUMMARY = "Parliament passed the bill. The budget is one trillion euros."
+
+_GROUNDED_TOKENS: list[dict[str, object]] = [{"token": "a", "pred": 0, "prob": 0.05}]
+_HALLUCINATED_TOKENS: list[dict[str, object]] = [{"token": "a", "pred": 1, "prob": 0.95}]
+_HALLUCINATED_SPANS: list[dict[str, object]] = [
+    {"start": 0, "end": 3, "confidence": 0.9, "text": "x"}
+]
+
+
+class _FakeDetector:
+    def predict(
+        self, *, context: list[str], question: str, answer: str, output_format: str
+    ) -> list[dict[str, object]]:
+        grounded = answer.startswith("Parliament")
+        if output_format == "spans":
+            return [] if grounded else _HALLUCINATED_SPANS
+        return _GROUNDED_TOKENS if grounded else _HALLUCINATED_TOKENS
+
+
+class _FakeNLI:
+    def __call__(
+        self,
+        pairs: list[dict[str, str]],
+        top_k: object = None,
+        batch_size: object = None,
+    ) -> list[list[dict[str, object]]]:
+        out: list[list[dict[str, object]]] = []
+        for pair in pairs:
+            ent = 0.9 if "Parliament" in pair.get("text_pair", "") else 0.2
+            out.append(
+                [
+                    {"label": "entailment", "score": ent},
+                    {"label": "neutral", "score": 1.0 - ent},
+                ]
+            )
+        return out
+
+
+def _fake_summariser(model_name: str) -> Callable[..., list[dict[str, str]]]:
+    def _pipeline(text: str, **kwargs: object) -> list[dict[str, str]]:
+        return [{"summary_text": _SUMMARY}]
+
+    return _pipeline
+
+
+@pytest.fixture()
+def mocked_pipeline(monkeypatch: pytest.MonkeyPatch) -> None:
+    """Install ML model mocks at the module boundary and a stub load_text."""
+    from sumlens.ingest import load_text as real_load_text
+
+    monkeypatch.setattr(summarise_mod, "_get_summariser", _fake_summariser)
+    monkeypatch.setattr(classifier_mod, "_get_detector", lambda model_path: _FakeDetector())
+    monkeypatch.setattr(nli_mod, "_get_nli", lambda model_name: _FakeNLI())
+
+    def _fake_attr(
+        source_text: str, target_text: str, cfg: object
+    ) -> list[tuple[int, int, float]]:
+        return [(0, 10, 0.5), (11, 20, 0.3)]
+
+    monkeypatch.setattr(attribution_mod, "_source_token_attributions", _fake_attr)
+    monkeypatch.setattr(app_mod, "load_text", real_load_text)
+
+
+def test_e2e_run_returns_analysis_result(mocked_pipeline: None) -> None:
+    result, source_html, highlighted, payload = run(_RAW, None)
+
+    assert isinstance(result, AnalysisResult)
+    assert len(result.summary.sentences) == 2
+    assert len(highlighted) == 2
+
+
+def test_e2e_highlighted_colors_match_verdicts(mocked_pipeline: None) -> None:
+    result, _, highlighted, _ = run(_RAW, None)
+    verdict_map = {v.sentence_id: v.label for v in result.verdicts}
+
+    for (_, label), sentence in zip(highlighted, result.summary.sentences, strict=True):
+        assert label == verdict_map.get(sentence.id, "weak")
+
+
+def test_e2e_source_html_contains_document_text(mocked_pipeline: None) -> None:
+    _, source_html, _, _ = run(_RAW, None)
+    assert "Parliament" in source_html
+    assert "budget" in source_html
+
+
+def test_e2e_export_json_round_trips(mocked_pipeline: None) -> None:
+    result, _, _, _ = run(_RAW, None)
+    path = _export_json(result)
+
+    assert path is not None
+    data = json.loads(Path(path).read_text(encoding="utf-8"))
+    restored = AnalysisResult.model_validate(data)
+    assert restored == result
+
+
+def test_e2e_no_real_model_weights_loaded(monkeypatch: pytest.MonkeyPatch) -> None:
+    """Guard: real model loaders must never be called during the test suite."""
+    called: list[str] = []
+
+    def _guard(name: str) -> Callable[..., object]:
+        def _fail(*args: object, **kwargs: object) -> object:
+            called.append(name)
+            raise AssertionError(f"Real model loader invoked: {name}")
+
+        return _fail
+
+    monkeypatch.setattr(summarise_mod, "_get_summariser", _guard("_get_summariser"))
+    monkeypatch.setattr(classifier_mod, "_get_detector", _guard("_get_detector"))
+    monkeypatch.setattr(nli_mod, "_get_nli", _guard("_get_nli"))
+    monkeypatch.setattr(
+        attribution_mod, "_source_token_attributions", _guard("_source_token_attributions")
+    )
+
+    # Confirm none of the guards fired before any test action
+    assert called == []
diff --git a/tests/test_features.py b/tests/test_features.py
index 41c5950..62c6d38 100644
--- a/tests/test_features.py
+++ b/tests/test_features.py
@@ -21,9 +21,10 @@ def test_feature_rows_labels_and_missing_signals() -> None:
     classifier_out = {"sum-0000": (0.1, []), "sum-0001": (0.9, [(0, 4)])}
     failed = Claim(id="c", sentence_id="sum-0001", text="x")
     nli_out = {"sum-0000": (0.8, []), "sum-0001": (0.2, [failed])}
-    attribution_out = {"sum-0001": (0.3, ["src-0000"])}  # only the gated sentence has C
+    # support attribution: (attr_conc, attr_loo, top_source_ids); sum-0000 absent
+    support_out = {"sum-0001": (0.3, 0.15, ["src-0000"])}
 
-    rows = feature_rows(_summary(), ["sum-0001"], classifier_out, nli_out, attribution_out)
+    rows = feature_rows(_summary(), ["sum-0001"], classifier_out, nli_out, support_out)
 
     assert rows == [
         {
@@ -31,7 +32,8 @@ def test_feature_rows_labels_and_missing_signals() -> None:
             "sentence_id": "sum-0000",
             "classifier": 0.1,
             "nli": 0.8,
-            "attribution": None,  # C did not run for this sentence
+            "attr_conc": None,  # C did not run for this sentence
+            "attr_loo": None,
             "grounded": 1,
         },
         {
@@ -39,7 +41,8 @@ def test_feature_rows_labels_and_missing_signals() -> None:
             "sentence_id": "sum-0001",
             "classifier": 0.9,
             "nli": 0.2,
-            "attribution": 0.3,
+            "attr_conc": 0.3,
+            "attr_loo": 0.15,
             "grounded": 0,  # marked hallucinated in gold
         },
     ]
diff --git a/tests/test_metrics.py b/tests/test_metrics.py
index c030132..9a9d129 100644
--- a/tests/test_metrics.py
+++ b/tests/test_metrics.py
@@ -6,7 +6,9 @@
 
 from sumlens.eval.metrics import (
     expected_calibration_error,
+    pr_auc,
     reliability_diagram,
+    roc_auc,
     sentence_f1,
 )
 
@@ -41,6 +43,38 @@ def test_ece_empty() -> None:
     assert expected_calibration_error([], []) == 0.0
 
 
+def test_roc_auc_perfect_separation() -> None:
+    assert roc_auc([0.1, 0.2, 0.8, 0.9], [0, 0, 1, 1]) == 1.0
+
+
+def test_roc_auc_inverted_is_zero() -> None:
+    assert roc_auc([0.9, 0.8, 0.2, 0.1], [0, 0, 1, 1]) == 0.0
+
+
+def test_roc_auc_ties_give_half() -> None:
+    # all scores equal -> every pair tied -> AUC 0.5
+    assert roc_auc([0.5, 0.5, 0.5, 0.5], [0, 1, 0, 1]) == 0.5
+
+
+def test_roc_auc_single_class_returns_zero() -> None:
+    assert roc_auc([0.1, 0.9], [1, 1]) == 0.0
+
+
+def test_pr_auc_perfect_separation() -> None:
+    assert pr_auc([0.1, 0.2, 0.8, 0.9], [0, 0, 1, 1]) == 1.0
+
+
+def test_pr_auc_floor_is_base_rate() -> None:
+    # scores carry no signal (descending but labels random) -> AP near base rate
+    assert pr_auc([0.4, 0.3, 0.2, 0.1], [1, 0, 0, 0]) == pytest.approx(1.0)
+    # worst ranking: the only positive is last -> precision 1/4 at recall 1
+    assert pr_auc([0.4, 0.3, 0.2, 0.1], [0, 0, 0, 1]) == pytest.approx(0.25)
+
+
+def test_pr_auc_no_positives_returns_zero() -> None:
+    assert pr_auc([0.1, 0.9], [0, 0]) == 0.0
+
+
 def test_reliability_diagram_writes_file(tmp_path: Path) -> None:
     out = tmp_path / "reliability.png"
     reliability_diagram([0.1, 0.4, 0.9, 0.95], [0, 0, 1, 1], out)
diff --git a/tests/test_support.py b/tests/test_support.py
new file mode 100644
index 0000000..d8ed9c0
--- /dev/null
+++ b/tests/test_support.py
@@ -0,0 +1,69 @@
+"""Support attribution (signal C) tests — NLI mocked at the `_get_nli` boundary."""
+
+import pytest
+
+from sumlens.signals import support as support_mod
+from sumlens.signals.support import support_attribution
+from sumlens.types import AnalysisConfig, Document, Sentence, Summary
+
+# Entailment lookup: (premise source sentence, hypothesis summary sentence) -> prob.
+_TABLE = {
+    ("Src A.", "Claim one."): 0.9,
+    ("Src B.", "Claim one."): 0.2,
+    ("Src C.", "Claim one."): 0.1,
+}
+
+
+class _FakeNLI:
+    def __call__(
+        self, pairs: list[dict[str, str]], top_k: object = None, batch_size: object = None
+    ) -> list[list[dict[str, object]]]:
+        return [
+            [
+                {"label": "entailment", "score": _TABLE[(p["text"], p["text_pair"])]},
+                {"label": "contradiction", "score": 0.0},
+            ]
+            for p in pairs
+        ]
+
+
+def _document() -> Document:
+    return Document(
+        id="doc-1",
+        raw_text="Src A. Src B. Src C.",
+        sentences=[
+            Sentence(id="src-0000", text="Src A.", char_start=0, char_end=6),
+            Sentence(id="src-0001", text="Src B.", char_start=7, char_end=13),
+            Sentence(id="src-0002", text="Src C.", char_start=14, char_end=20),
+        ],
+        source="text",
+    )
+
+
+def _summary() -> Summary:
+    return Summary(
+        id="doc-1-summary",
+        document_id="doc-1",
+        text="Claim one.",
+        sentences=[Sentence(id="sum-0000", text="Claim one.", char_start=0, char_end=10)],
+        model_name="m",
+    )
+
+
+def test_support_concentration_and_loo(monkeypatch: pytest.MonkeyPatch) -> None:
+    monkeypatch.setattr(support_mod, "_get_nli", lambda model_name: _FakeNLI())
+
+    result = support_attribution(_document(), _summary(), AnalysisConfig())
+
+    conc, loo, top_ids = result["sum-0000"]
+    # row = [0.9, 0.2, 0.1]: top1=0.9, top2=0.2, mean=0.4
+    assert conc == pytest.approx(0.9 - 0.4)  # peak minus mean
+    assert loo == pytest.approx(0.9 - 0.2)  # best-supporter margin
+    assert top_ids[0] == "src-0000"  # strongest supporting source first
+
+
+def test_support_empty_source(monkeypatch: pytest.MonkeyPatch) -> None:
+    monkeypatch.setattr(support_mod, "_get_nli", lambda model_name: _FakeNLI())
+    empty_doc = Document(id="d", raw_text="", sentences=[], source="text")
+    result = support_attribution(empty_doc, _summary(), AnalysisConfig())
+    assert result == {"sum-0000": (0.0, 0.0, [])}