diff --git a/README.md b/README.md
index 7052c65..8331166 100644
--- a/README.md
+++ b/README.md
@@ -9,35 +9,198 @@ potential hallucinations.
> Course project · MICS · Principles of Software Development (S2 2025–26).
-## Status
+---
-Scaffold. Features land via pull requests — see the open issues and `docs/`.
+## Contents
-## Documentation
+- [What SumLens does](#what-sumlens-does)
+- [Hardware requirements](#hardware-requirements)
+- [Installation](#installation)
+- [Running the app](#running-the-app)
+- [Usage walkthrough](#usage-walkthrough)
+- [Interpreting the results](#interpreting-the-results)
+- [Exporting results](#exporting-results)
+- [Development](#development)
-- [`docs/requirements.md`](docs/requirements.md) — functional / non-functional requirements, MoSCoW, user stories, traceability.
-- [`docs/data-model.md`](docs/data-model.md) — canonical data types.
-- [`docs/research-plan.md`](docs/research-plan.md) — signals, fusion, evaluation methodology.
+---
-## Development
+## What SumLens does
+
+SumLens takes a text document (pasted or PDF) and:
+
+1. Summarises it locally using BART (`facebook/bart-large-cnn`) — no external API.
+2. Scores each summary sentence against the source using three signals:
+ - **Signal A — Classifier:** LettuceDetect flags hallucinated tokens.
+ - **Signal B — NLI:** DeBERTa-v3 checks whether atomic claims are entailed by the source.
+ - **Signal C — Attribution:** Inseq integrated gradients measure how much each source span influenced each summary sentence.
+3. Fuses the signals into a single grounding score (0 = hallucinated, 1 = grounded).
+4. Labels each sentence: **grounded**, **weakly grounded**, or **hallucinated**.
+5. Displays the result as a colour-coded summary with click-to-highlight source spans.
+
+---
+
+## Hardware requirements
+
+The pipeline loads three large transformer models. Running on CPU is supported but
+can take several minutes per document.
-Requires Python 3.11+.
+| Setup | RAM | VRAM | Expected time |
+|-------|-----|------|---------------|
+| GPU (recommended) | 16 GB | 8 GB+ | ~30–60 s |
+| CPU-only | 16 GB | — | 3–10 min |
+
+Models are downloaded automatically from Hugging Face on first run (~4 GB total).
+No paid API key is required.
+
+---
+
+## Installation
+
+Requires **Python 3.11+** and **Git**.
```bash
+git clone https://github.com/bacemtayeb/SumLens.git
+cd SumLens
python3.11 -m venv .venv
+
+# Windows
+.venv\Scripts\activate
+# macOS / Linux
source .venv/bin/activate
+
pip install -e ".[dev]"
python -m nltk.downloader punkt punkt_tab
```
-### Quality gate (CI enforces this on every PR)
+---
+
+## Running the app
+
+```bash
+python app.py
+```
+
+Gradio prints a local URL, e.g.:
+
+```
+Running on local URL: http://127.0.0.1:7860
+```
+
+Open that URL in your browser. The app runs entirely on your machine — no data
+leaves your computer.
+
+---
+
+## Usage walkthrough
+
+### Step 1 — Load a document
+
+You have two options:
+
+- **Paste text** — click the *Paste text* box and type or paste your document
+ (up to 10 000 words).
+- **Upload a PDF** — click *or upload PDF* and select a file (up to 5 MB).
+
+If you provide both, the PDF takes priority.
+
+### Step 2 — Analyse
+
+Click the **Analyse** button. The button is disabled while the pipeline runs
+(typically 30–60 s on GPU, longer on CPU). Both export buttons appear once
+analysis is complete.
+
+### Step 3 — Read the summary
+
+The right panel shows the summary with each sentence colour-coded:
+
+| Colour | Label | Meaning |
+|--------|-------|---------|
+| Green | Grounded | Well-supported by the source |
+| Orange | Weakly grounded | Partial support; treat with caution |
+| Red | Hallucinated | Low support; likely fabricated or distorted |
+
+### Step 4 — Trace a sentence to the source
+
+Click any summary sentence. The left panel highlights (in yellow) the source
+sentences most strongly attributed to it by the model. Click a different summary
+sentence to switch the highlight.
+
+### Step 5 — Adjust the thresholds (optional)
+
+Two sliders let you change the decision boundaries without re-running the model:
+
+- **τ hallucinated** (default 0.30) — sentences with a grounding score *below*
+ this value are labelled hallucinated.
+- **τ grounded** (default 0.70) — sentences with a grounding score *above* this
+ value are labelled grounded. Anything in between is weakly grounded.
+
+Move either slider and the summary colours update instantly.
+
+---
+
+## Interpreting the results
+
+- The **grounding score** (0–1) represents how strongly the model believes a
+ summary sentence is supported by the source. It is not a probability in a strict
+ statistical sense — treat it as a relative risk indicator.
+- A **hallucinated** label does not guarantee the sentence is wrong; it means the
+ model could not find sufficient evidence in the source text. Always cross-check
+ flagged sentences manually.
+- The **signal breakdown** (JSON export) shows the individual classifier, NLI, and
+ attribution scores for each sentence, which can help diagnose *why* a sentence
+ was flagged.
+
+---
+
+## Exporting results
+
+Two download buttons appear after analysis:
+
+- **Export JSON** — downloads the full `AnalysisResult` as a JSON file. The schema
+ matches `sumlens/types.py` and round-trips via `AnalysisResult.model_validate()`.
+- **Export PDF** — downloads a human-readable PDF containing the colour-annotated
+ summary, a legend, and a per-sentence signal-scores table.
+
+---
+
+## Development
+
+### Quality gate (CI enforces on every PR)
```bash
-ruff check . && mypy sumlens tests && pytest -q --cov=sumlens --cov-fail-under=70
+ruff check . && mypy sumlens tests app.py && pytest -q --cov=sumlens --cov-fail-under=70
```
-Lint (ruff), type-check (mypy, strict), and tests with a ≥70% coverage gate must
-pass before any PR is merged to `main`.
+Lint (ruff), strict type-check (mypy), and tests with a ≥ 70 % coverage gate
+must pass before any PR is merged.
+
+### Project layout
+
+```
+sumlens/
+ types.py # canonical data model (AnalysisResult, etc.)
+ ingest.py # PDF / text → Document
+ summarise.py # BART summarisation
+ signals/
+ classifier.py # Signal A — LettuceDetect
+ nli.py # Signal B — DeBERTa NLI
+ attribution.py # Signal C — Inseq attribution
+ fuse.py # logistic-regression fusion + Platt calibration
+ pipeline.py # orchestrates ingest → summarise → signals → fuse
+app.py # Gradio UI entry point
+tests/ # pytest suite (all models mocked)
+docs/ # requirements, data model, research plan
+```
+
+### Documentation
+
+- [`docs/requirements.md`](docs/requirements.md) — functional / non-functional requirements, MoSCoW, user stories, traceability.
+- [`docs/data-model.md`](docs/data-model.md) — canonical data types and JSON schema.
+- [`docs/research-plan.md`](docs/research-plan.md) — signals, fusion, evaluation methodology.
+- [`docs/mockup.html`](docs/mockup.html) — static HTML wireframe of the two-panel dashboard UI (open in any browser).
+- [`docs/use-case.puml`](docs/use-case.puml) — PlantUML use-case diagram (UC-01 "Verify a Summary").
+
+---
## License
diff --git a/app.py b/app.py
new file mode 100644
index 0000000..62c9b63
--- /dev/null
+++ b/app.py
@@ -0,0 +1,344 @@
+"""Gradio entry point — thin UI over `pipeline.analyse`.
+
+All logic lives in the `sumlens` library; this module only ingests the user's
+input, runs the pipeline, and shapes the result for display.
+"""
+
+from __future__ import annotations
+
+import html as _html
+import tempfile
+from pathlib import Path
+from typing import Any
+
+from sumlens.ingest import load_pdf, load_text
+from sumlens.pipeline import analyse
+from sumlens.types import AnalysisConfig, AnalysisResult, Document
+
+_LABEL_COLORS: dict[str, str] = {
+ "grounded": "green",
+ "weak": "orange",
+ "hallucinated": "red",
+}
+_MAX_WORDS = 10_000
+_MAX_PDF_BYTES = 5 * 1024 * 1024 # 5 MB
+_SOURCE_PLACEHOLDER = "
Load a document to see the source text here.
"
+
+
+def _validate_text(text: str) -> str:
+ text = text.strip()
+ if not text:
+ raise ValueError("Input is empty. Please paste some text or upload a PDF.")
+ word_count = len(text.split())
+ if word_count > _MAX_WORDS:
+ raise ValueError(
+ f"Input is too long ({word_count:,} words). Maximum is {_MAX_WORDS:,} words."
+ )
+ return text
+
+
+def _to_highlighted(result: AnalysisResult) -> list[tuple[str, str]]:
+ """Summary sentences as (text, label) spans for gr.HighlightedText colour bands."""
+ labels = {v.sentence_id: v.label for v in result.verdicts}
+ return [(f"{s.text} ", labels.get(s.id, "weak")) for s in result.summary.sentences]
+
+
+def _render_source_html(document: Document, highlighted_ids: set[str]) -> str:
+ """Source sentences as HTML; sentences in highlighted_ids get a yellow mark."""
+ if not document.sentences:
+ return f"
{_html.escape(document.raw_text)}
"
+ parts = []
+ for sentence in document.sentences:
+ safe = _html.escape(sentence.text)
+ if sentence.id in highlighted_ids:
+ parts.append(
+ f"{safe}"
+ )
+ else:
+ parts.append(safe)
+ return "
" + " ".join(parts) + "
"
+
+
+def _apply_tau(
+ result: AnalysisResult | None,
+ tau_grounded: float,
+ tau_hallucinated: float,
+) -> list[tuple[str, str]] | None:
+ """Re-label summary sentences from stored fused scores without re-running the model."""
+ if result is None:
+ return None
+ scores = {v.sentence_id: v.fused_score for v in result.verdicts}
+
+ def _label(score: float) -> str:
+ if score < tau_hallucinated:
+ return "hallucinated"
+ if score >= tau_grounded:
+ return "grounded"
+ return "weak"
+
+ return [(f"{s.text} ", _label(scores.get(s.id, 0.5))) for s in result.summary.sentences]
+
+
+def _latin1(text: str) -> str:
+ """Strip characters outside Latin-1 so fpdf2 core fonts don't error."""
+ return text.encode("latin-1", errors="replace").decode("latin-1")
+
+
+_PDF_COLORS: dict[str, tuple[int, int, int]] = {
+ "grounded": (220, 252, 231),
+ "weak": (255, 237, 213),
+ "hallucinated": (254, 226, 226),
+}
+
+
+def _export_pdf(result: AnalysisResult | None) -> str | None:
+ if result is None:
+ return None
+ from fpdf import FPDF
+
+ pdf = FPDF()
+ pdf.add_page()
+
+ pdf.set_font("Helvetica", "B", 16)
+ pdf.cell(0, 10, "SumLens Analysis Report", new_x="LMARGIN", new_y="NEXT")
+ pdf.ln(3)
+
+ pdf.set_font("Helvetica", size=9)
+ pdf.set_text_color(100, 100, 100)
+ pdf.cell(
+ 0, 6,
+ _latin1(f"Source: {result.document.source} | Model: {result.summary.model_name}"),
+ new_x="LMARGIN", new_y="NEXT",
+ )
+ pdf.set_text_color(0, 0, 0)
+ pdf.ln(4)
+
+ pdf.set_font("Helvetica", "B", 11)
+ pdf.cell(0, 8, "Annotated Summary", new_x="LMARGIN", new_y="NEXT")
+ pdf.ln(1)
+
+ verdict_map = {v.sentence_id: v for v in result.verdicts}
+ pdf.set_font("Helvetica", size=10)
+ for sentence in result.summary.sentences:
+ verdict = verdict_map.get(sentence.id)
+ label = verdict.label if verdict else "weak"
+ r, g, b = _PDF_COLORS.get(label, (240, 240, 240))
+ pdf.set_fill_color(r, g, b)
+ pdf.multi_cell(0, 7, _latin1(sentence.text), fill=True, new_x="LMARGIN", new_y="NEXT")
+ pdf.ln(1)
+
+ pdf.ln(4)
+ pdf.set_font("Helvetica", "B", 10)
+ pdf.cell(0, 7, "Legend", new_x="LMARGIN", new_y="NEXT")
+ pdf.set_font("Helvetica", size=9)
+ for lbl, (r, g, b) in _PDF_COLORS.items():
+ pdf.set_fill_color(r, g, b)
+ pdf.cell(6, 5, "", fill=True)
+ pdf.cell(0, 5, f" {lbl.capitalize()}", new_x="LMARGIN", new_y="NEXT")
+
+ pdf.ln(5)
+ pdf.set_font("Helvetica", "B", 10)
+ pdf.cell(0, 7, "Signal Scores", new_x="LMARGIN", new_y="NEXT")
+ pdf.set_font("Helvetica", size=8)
+ col_w = [80, 25, 20, 22, 15, 22]
+ headers = ["Sentence", "Label", "Fused", "Classifier", "NLI", "Attribution"]
+ for w, h in zip(col_w, headers, strict=True):
+ pdf.cell(w, 6, h, border=1)
+ pdf.ln()
+ for sentence in result.summary.sentences:
+ v = verdict_map.get(sentence.id)
+ if v is None:
+ continue
+ truncated = sentence.text[:45] + "..." if len(sentence.text) > 45 else sentence.text
+ row = [
+ _latin1(truncated),
+ v.label,
+ f"{v.fused_score:.2f}",
+ f"{v.signals.classifier:.2f}" if v.signals.classifier is not None else "-",
+ f"{v.signals.nli:.2f}" if v.signals.nli is not None else "-",
+ f"{v.signals.attribution:.2f}" if v.signals.attribution is not None else "-",
+ ]
+ for w, cell in zip(col_w, row, strict=True):
+ pdf.cell(w, 6, cell, border=1)
+ pdf.ln()
+
+ tmp = tempfile.NamedTemporaryFile(suffix=".pdf", delete=False)
+ try:
+ pdf.output(tmp.name)
+ finally:
+ tmp.close()
+ return tmp.name
+
+
+def _export_json(result: AnalysisResult | None) -> str | None:
+ if result is None:
+ return None
+ tmp = tempfile.NamedTemporaryFile(
+ mode="w", suffix=".json", delete=False, encoding="utf-8"
+ )
+ try:
+ tmp.write(result.model_dump_json(indent=2))
+ finally:
+ tmp.close()
+ return tmp.name
+
+
+def run(
+ text: str,
+ pdf_file: str | None,
+) -> tuple[AnalysisResult, str, list[tuple[str, str]], dict[str, Any]]:
+ if pdf_file:
+ path = Path(pdf_file)
+ if path.stat().st_size > _MAX_PDF_BYTES:
+ raise ValueError(
+ f"PDF is too large ({path.stat().st_size / 1_048_576:.1f} MB). "
+ "Maximum is 5 MB."
+ )
+ document = load_pdf(path)
+ else:
+ document = load_text(_validate_text(text))
+
+ result = analyse(document, AnalysisConfig())
+ source_html = _render_source_html(document, set())
+ return result, source_html, _to_highlighted(result), result.model_dump()
+
+
+def build_app() -> Any:
+ import gradio as gr
+
+ with gr.Blocks(title="SumLens", theme=gr.themes.Soft()) as demo:
+ gr.Markdown(
+ "# SumLens — Summary Faithfulness Dashboard\n"
+ "Paste text or upload a PDF. SumLens summarises it and flags sentences "
+ "that may be hallucinated.\n\n"
+ "**Green** = grounded · **Orange** = weakly grounded · **Red** = hallucinated \n"
+ "Click a summary sentence to highlight its attributed source spans."
+ )
+
+ result_state: gr.State = gr.State(value=None)
+
+ with gr.Row():
+ with gr.Column():
+ gr.Markdown("### Source document")
+ source_html_out = gr.HTML(value=_SOURCE_PLACEHOLDER)
+
+ with gr.Column():
+ gr.Markdown("### Summary with faithfulness highlights")
+ summary_out = gr.HighlightedText(
+ label="Summary (click a sentence to highlight source spans)",
+ color_map=_LABEL_COLORS,
+ combine_adjacent=False,
+ show_legend=True,
+ )
+
+ with gr.Row():
+ tau_h_slider = gr.Slider(
+ minimum=0.0, maximum=1.0, value=0.30, step=0.05,
+ label="τ hallucinated — below this → hallucinated (default 0.30)",
+ )
+ tau_g_slider = gr.Slider(
+ minimum=0.0, maximum=1.0, value=0.70, step=0.05,
+ label="τ grounded — above this → grounded (default 0.70)",
+ )
+
+ with gr.Row():
+ text_in = gr.Textbox(
+ label="Paste text",
+ lines=6,
+ placeholder="Paste your document here…",
+ )
+ pdf_in = gr.File(
+ label="or upload PDF (≤ 5 MB)",
+ file_types=[".pdf"],
+ type="filepath",
+ )
+
+ with gr.Row():
+ submit = gr.Button("Analyse", variant="primary")
+ json_dl = gr.DownloadButton("Export JSON", visible=False)
+ pdf_dl = gr.DownloadButton("Export PDF", visible=False)
+
+ error_box = gr.Markdown(value="", visible=False)
+
+ with gr.Accordion("Full result (JSON viewer)", open=False):
+ json_out = gr.JSON(label="AnalysisResult")
+
+ def _handle(
+ text: str, pdf_file: str | None
+ ) -> tuple[Any, Any, Any, Any, Any, Any, Any, Any]:
+ try:
+ result, source_html, highlighted, payload = run(text, pdf_file)
+ json_path = _export_json(result)
+ pdf_path = _export_pdf(result)
+ return (
+ result,
+ source_html,
+ highlighted,
+ payload,
+ gr.update(value=json_path, visible=True),
+ gr.update(value=pdf_path, visible=True),
+ gr.update(value="", visible=False),
+ gr.update(interactive=True),
+ )
+ except ValueError as exc:
+ return (
+ None,
+ _SOURCE_PLACEHOLDER,
+ None,
+ None,
+ gr.update(visible=False),
+ gr.update(visible=False),
+ gr.update(value=f"**Error:** {exc}", visible=True),
+ gr.update(interactive=True),
+ )
+
+ def _on_sentence_select(
+ evt: Any, result: AnalysisResult | None
+ ) -> str:
+ if result is None:
+ return _SOURCE_PLACEHOLDER
+ idx: int = int(evt.index)
+ sentences = result.summary.sentences
+ if idx >= len(sentences):
+ return _render_source_html(result.document, set())
+ sentence_id = sentences[idx].id
+ verdict = next(
+ (v for v in result.verdicts if v.sentence_id == sentence_id), None
+ )
+ highlighted = (
+ set(verdict.evidence.top_source_sentence_ids) if verdict else set()
+ )
+ return _render_source_html(result.document, highlighted)
+
+ submit.click(
+ fn=lambda: gr.update(interactive=False),
+ inputs=[],
+ outputs=[submit],
+ ).then(
+ fn=_handle,
+ inputs=[text_in, pdf_in],
+ outputs=[
+ result_state, source_html_out, summary_out,
+ json_out, json_dl, pdf_dl, error_box, submit,
+ ],
+ )
+
+ summary_out.select(
+ fn=_on_sentence_select,
+ inputs=[result_state],
+ outputs=[source_html_out],
+ )
+
+ for slider in (tau_h_slider, tau_g_slider):
+ slider.change(
+ fn=_apply_tau,
+ inputs=[result_state, tau_g_slider, tau_h_slider],
+ outputs=[summary_out],
+ )
+
+ return demo
+
+
+if __name__ == "__main__":
+ build_app().launch()
diff --git a/docs/data-model.md b/docs/data-model.md
index 81e2541..0a0317c 100644
--- a/docs/data-model.md
+++ b/docs/data-model.md
@@ -126,7 +126,7 @@ class AnalysisConfig(BaseModel):
## 3. Module interfaces — function signatures
These are the only public functions each module exposes. Anything else is private.
-Stick to these signatures; if Claude Code drifts, point it back here.
+Stick to these signatures; if an implementation drifts, point it back here.
### `ingest.py`
```python
@@ -233,7 +233,7 @@ This CSV is the centrepiece table of the report.
---
-## 6. Order of implementation (Claude Code's worklist)
+## 6. Order of implementation
1. `types.py` — write this first, completely. Everything else imports from it.
2. `ingest.py` + tests against a fixture PDF and a fixture string.
@@ -248,4 +248,4 @@ This CSV is the centrepiece table of the report.
11. `scripts/train_fusion.py` — fits the LR, pickles it, replaces the identity fusion.
12. Real models swapped in last, one signal at a time, verifying on HPC.
-**Rule for Claude Code: never run the real models in tests.** Mock at the module boundary. Real-model runs happen only in `scripts/evaluate.py` on HPC.
+**Rule: never run the real models in tests.** Mock at the module boundary. Real-model runs happen only in `scripts/evaluate.py` on HPC.
diff --git a/docs/mockup.html b/docs/mockup.html
new file mode 100644
index 0000000..3679672
--- /dev/null
+++ b/docs/mockup.html
@@ -0,0 +1,334 @@
+
+
+
+
+
+ SumLens — UI Mockup
+
+
+
+
+
+
+
SumLens — Summary Faithfulness Dashboard
+
Paste text or upload a PDF · SumLens summarises it and flags sentences that may be hallucinated
+
+ Grounded
+ Weakly grounded
+ Hallucinated
+
+ Click a summary sentence to highlight its source spans
+
+
+
+
+
+
+
+
+
+
Source document
+
+ The parliament met on Monday to discuss the proposed national budget for the
+ coming fiscal year. Lawmakers from every party debated the spending
+ priorities for several hours without reaching a clear consensus on the final
+ allocations. The finance minister presented projections covering health,
+ education, and transport infrastructure across the regions. Several members
+ raised concerns about the long-term sustainability of the proposed deficit
+ levels. No final figure for total expenditure was announced to the press by
+ the end of the day.
+
+
↑ highlighted span: top attributed source sentence for selected summary sentence
+
+
+
+
+
Summary — click a sentence to trace source spans
+
+
+ Parliament debated the national budget on Monday.
+
+
+ The bill passed with a majority of 312 votes.
+
+
+ The finance minister outlined spending plans for key sectors.
+
+
+ Sustainability of deficit levels was questioned by members.
+
+