A free, private tool for removing student personal information from assessment documents — before you share them with anyone.
Built for Australian teachers, psychologists, and support staff who handle sensitive student records. Everything runs on your own computer — Mac or Windows. No accounts. No subscriptions. No data ever leaves your machine.
Download the latest version for your platform from GitHub Releases:
Platform File Notes macOS (Apple Silicon) .dmginstallerDrag to Applications. First launch shows a one-time security prompt — see Security warnings on first launch. Windows (64-bit) .exeinstallerRun the installer. First launch may show "Windows protected your PC" — see Security warnings on first launch. LibreOffice (free) is required to process Word documents. The app will prompt you to install it on first launch if it's missing. Download LibreOffice. If you only work with PDFs, LibreOffice is not needed.
If you prefer to run from source, see Installation Guide or Desktop App (Developer) below.
This release is all about reliability and safety — making the app harder to trip up, and clearer when something needs your attention. The way you use it is exactly the same. Most important for a tool that handles students' personal information: when redaction can't be done properly, the app now tells you, instead of quietly handing you a file that might still contain personal details.
- Safer redaction — If a document can't be fully redacted, it's now clearly reported as failed rather than saved as a file that looks finished but might still contain personal information.
- Scanned pages protected — If a page is an image-only scan and the text-reader (OCR) isn't available, the app stops and tells you instead of skipping that page silently.
- No half-finished files — If something fails partway through, the incomplete file is cleaned up automatically, so it can't be mistaken for a finished one.
- More accurate name matching — Short names (like "Joe") no longer accidentally remove unrelated words (like "Joelle") in PDF form fields.
- No more endless spinning — If the engine ever stops responding, the app times out and lets you try again instead of hanging forever.
- Recovers from engine hiccups — If the background engine stops unexpectedly, the app notices and prompts you to reopen it.
- Friendlier messages and a daily check for updates while the app is open.
Earlier reliability work from v1.3.0 — plain-English error messages, the engine startup banner, the "Something went wrong" recovery screen, cancel-and-clean-up, and the nothing-to-redact screen — is all still here.
When sharing student assessment reports — with other schools, services, or agencies — Australian privacy law and professional ethics require that identifying information be removed. Doing this manually is slow, error-prone, and stressful.
The Bulk Redaction Tool automates this process. You point it at a folder of documents, tell it the student's name, and it:
- Finds every piece of personally identifiable information (PII) in the documents
- Shows you each item for approval — you stay in control
- Burns the approved items out of the PDFs permanently (not just visually covered — the text is gone)
- Saves redacted copies alongside the originals, which are never touched
- Produces a full audit log of everything that was redacted
Plain English: It's like using a black marker on paper, except it works on PDFs and Word documents, it finds things you might miss, and it can't be undone by selecting the text.
| Desktop App | Streamlit (browser) | |
|---|---|---|
| What | Native app for Mac and Windows (Electron) | Opens in your web browser |
| Best for | Everyday use | Developers / advanced users |
| Run | Double-click the installed app, or cd desktop && npm run dev:electron |
source venv/bin/activate && streamlit run app.py |
| Status | Primary — actively developed | Legacy — still works, not the focus |
Both use the same detection and redaction engines underneath.
![]() Step 1 — Select Folder & Student Details |
![]() Step 2 — Document Conversion |
![]() Step 4 — Final Confirmation |
![]() Redaction in Progress |
This matters most for a tool handling children's data.
| Guarantee | Detail |
|---|---|
| Original files never modified | Redacted copies are saved separately. Your source documents are untouched. |
| Text is permanently destroyed | Redacted text cannot be recovered via copy/paste, search, or any PDF tool. It is not hidden — it is gone. |
| Metadata is stripped | Author names, dates, and hidden document properties (XMP data) are removed from output PDFs. |
| 100% local processing | No internet connection required. Your documents never leave your computer. |
| No accounts or cloud services | Nothing is uploaded anywhere. Ever. |
| Scanned pages handled | Image-only pages (scans) are redacted via OCR + image rewriting. No page is left behind. |
| Redaction verified | After redaction, the tool re-scans the output at 300 DPI to confirm the text is visually gone. |
| Form fields cleaned | Interactive PDF form fields (AcroForm widgets) containing PII are deleted — not just hidden. |
| Signatures detected | Handwritten signature images are automatically identified and blacked out using heuristic analysis. |
| Full audit trail | A redaction_log.txt records every item redacted, with page numbers and confidence levels. |
flowchart TD
S["0. First-Run Setup\nLibreOffice check — first launch only\nSkip if you only work with PDFs"] --> A
A["1. Select Folder\nChoose a folder of student documents"] --> B
B["2. Enter Student Details\nName, parent names (optional)"] --> C
C["3. Convert Documents\nWord files to PDF automatically"] --> D
D["4. Detect PII\n2 detection engines run in parallel"] --> E
E["5. You Review Each Item\nApprove or reject every finding"] --> F
F["6. Confirm\nSummary of what will be redacted"] --> G
G["7. Apply Redactions\nPermanent, verified, metadata-stripped"] --> H
H["8. Done\nRedacted files + audit log saved"]
style S fill:#94A3B8,color:#fff
style A fill:#4A90D9,color:#fff
style B fill:#4A90D9,color:#fff
style C fill:#7B68EE,color:#fff
style D fill:#E8A838,color:#fff
style E fill:#E8A838,color:#fff
style F fill:#5BAD6F,color:#fff
style G fill:#5BAD6F,color:#fff
style H fill:#5BAD6F,color:#fff
The tool uses two detection engines simultaneously, then merges and deduplicates the results. This dual approach catches far more than either method alone:
graph TD
DOC["Document Text"] --> RE
DOC --> PR
RE["Regex Engine\nAustralian phone, Medicare,\naddress, DOB patterns\n+ names you enter"]
PR["Presidio + spaCy NER\nMicrosoft's AI recogniser\n+ 6 custom AU patterns\n+ automatic name variations"]
RE --> MRG
PR --> MRG
MRG["Merge and Deduplicate\nHighest confidence wins\nwhen engines overlap"]
MRG --> OUT["PII Matches\nRanked by confidence (0.0 - 1.0)\nReady for your review"]
style RE fill:#4A90D9,color:#fff
style PR fill:#7B68EE,color:#fff
style MRG fill:#888,color:#fff
style OUT fill:#5BAD6F,color:#fff
Why two engines?
- Regex is fast and precise for structured data (phone numbers, Medicare numbers, addresses) and for the names you enter
- Presidio + spaCy uses machine learning to recognise names and locations even when they appear in unexpected formats. When it finds a name, it automatically generates variations — first name, last name, initials — and searches for those too, so informal mentions aren't missed
All detection is tuned for Australian documents and naming conventions.
| Category | Examples | Engine |
|---|---|---|
| Student name (all variations) | Full name, first name, last name, initials | Regex + Presidio |
| Parent / guardian names | Names provided by you, or found near keywords like "Mother:", "Father:" | Regex + Presidio |
| Family member names | Siblings, carers, emergency contacts | Regex + Presidio |
| Organisation names | Schools, clinics, hospitals — user-provided, word-level matching | Regex |
| Phone numbers | Mobile (04xx), landline, +61 format | Regex + Presidio |
| Email addresses | Any format | Regex |
| Home address | Street, suburb, state, postcode | Regex + Presidio |
| Date of birth | Only when labelled (DOB:, Date of Birth:, etc.) | Regex |
| Medicare number | 10-digit format, only when "Medicare" appears nearby | Regex + Presidio |
| Centrelink CRN | 9-character reference, only when labelled | Regex |
| Student ID | 3-letter prefix + 3 or more digits | Regex |
| Person names (unlabelled) | AI-detected names in free text | Presidio |
| Location mentions | Suburb and location references | Presidio |
The tool doesn't just search for the exact name you typed. It automatically generates variations of the student name and checks for all of them:
| Input | Variations Generated |
|---|---|
Joe Bloggs |
"Joe Bloggs", "Joe", "Bloggs", "J Bloggs", "J. Bloggs", "JF", "J.F." |
It also handles:
- Possessive forms: "Joe's" and "Joe's" (curly apostrophes) are matched as "Joe"
- Contextual family detection: If a line contains "(mother)" or "(father)", nearby names are flagged even without explicit labels
- Parenthetical name patterns: "Joe (parent: Sarah Bloggs)" catches both the student and parent name
- Short names preserved: Even 2-character names like "Jo" are matched if they exactly match the student name you entered
If the student's name appears in the filename of a document (e.g. Joe_Bloggs_Assessment.pdf), the output file's name will have the PII replaced with [REDACTED]:
Input: Joe_Bloggs_Assessment.pdf
Output: [REDACTED]_[REDACTED]_Assessment_redacted.pdf
This prevents accidental disclosure through file names in shared folders or email attachments.
The tool uses three different redaction strategies depending on the type of content in each PDF page. This happens automatically — you don't need to choose.
Most PDFs have a searchable text layer. For these pages:
- The tool searches the text layer for each approved PII item
- It draws a redaction annotation over the matching text
- It applies the redaction — permanently destroying the underlying text
- The redacted area becomes a solid black rectangle
This uses PyMuPDF's apply_redactions() with images=PDF_REDACT_IMAGE_NONE — meaning images on text-layer pages are never touched, only the text.
Scanned documents (where each page is a photograph or scan) have no text layer — the words exist only as pixels in an image. For these pages:
- The page is rendered at 300 DPI to a high-resolution image
- Tesseract OCR reads every word and its position on the page
- Each OCR word is compared against the approved PII list using intelligent matching:
- Punctuation-stripped comparison (handles "Joe," matching "Joe")
- Possessive handling ("Joe's" matches "Joe")
- Special character preservation for emails/URLs ("joe@email.com" matched as-is)
- Matching words are blacked out by drawing filled rectangles on the image
- The original page content is replaced with the redacted image
Plain English: The tool photographs the scanned page, reads the text in the photo using OCR, blacks out the PII words on the photo, then replaces the original page with the blacked-out version.
Some PDFs contain interactive form fields (text boxes, dropdowns) — called AcroForm widgets. These can contain PII that is invisible to text-layer search. After text-layer and OCR redaction, the tool scans every form widget, reads its field value, and deletes any widget containing PII.
Handwritten signatures embedded as images in PDFs are automatically detected and blacked out. The tool examines every embedded image on every page using four heuristic gates:
- Aspect ratio — signatures are wide and short (width / height > 2.0)
- Position — signatures don't span the full page width (< 250 points)
- Pixel size — large enough to be a real signature (> 50 px wide, < 200 px tall)
- Ink ratio — thin pen strokes on white background (< 30% dark pixels)
This runs on every page, not just pages with other detected PII.
The tool checks each page independently:
| Page Type | Detection Method | Redaction Method |
|---|---|---|
| Has text layer | page.get_text("words") returns words |
Text-layer redaction (Strategy 1) |
| Image-only (scan) | No text, but images present | OCR image redaction (Strategy 2) |
| Has form widgets | page.widgets() returns annotations |
Widget deletion (Strategy 3, runs after 1 or 2) |
| Has embedded images | page.get_images() returns image refs |
Signature detection (Strategy 4, runs on all pages) |
A single PDF can have mixed pages — some with text, some scanned. Each page gets the right strategy automatically.
- Professional names (psychologists, teachers, doctors — unless they match the student name)
- Assessment dates (unless explicitly labelled as a date of birth)
- Technical language, scores, and diagnostic terms
- Non-signature images (logos, charts, photos that don't match the signature heuristic)
Every detected item is scored from 0.0 (uncertain) to 1.0 (certain). You see this score when reviewing — it helps you decide whether to approve or skip borderline items. You always have the final say.
The desktop app includes UX features designed for non-technical users:
- First-run setup screen — automatically checks for LibreOffice on launch. If missing, a guided install prompt appears with a direct download link and a "Check Again" button. Auto-advances when LibreOffice is detected. Skip it if you only work with PDFs.
- Auto-update notifications — the app silently checks for updates in the background. When a new version is ready, a banner appears with download progress and a one-click "Restart Now" to install.
- First-run walkthrough — 4-step guided introduction that appears on first launch. Dismissible, with a "Quick Guide" button in the sidebar to re-open it any time.
- Contextual help tooltips —
?icons next to every input field explaining what it does and why, in plain English. - Before/after preview — Split-view comparison of original vs redacted pages on the completion screen. Images are fetched on-demand and never persisted to disk.
- Per-document summary cards — Expandable cards showing category breakdown and confidence indicators for each document. Never displays actual PII text.
- Witty progress comments — During the redaction step (which can take a minute for large batches), rotating teacher-themed comments keep you entertained. Shuffled randomly each time.
- Custom output location — Save redacted files to the default subfolder or browse to any location on your computer.
- About modal — Three tabs (About, How to Use, Features & Detection) accessible from the sidebar. Includes the full walkthrough content plus detection engine explanations.
- Cancel and clean up — Cancelling mid-redaction shows which files were already written and lets you delete them with one click.
- Nothing to redact screen — If detection finds nothing, the app tells you clearly and lets you go back to adjust names or try another folder.
- Engine startup banner — If the backend takes a moment to start, an amber banner explains what's happening instead of showing a misleading folder error.
- Recovery screen — Unexpected crashes show a "Something went wrong" fallback with a one-click restart rather than a blank window.
| Requirement | macOS | Windows |
|---|---|---|
| OS | macOS 12+ (Apple Silicon or Intel) | Windows 10/11 (64-bit) |
| LibreOffice | Needed for Word docs — download | Needed for Word docs — download |
| Tesseract OCR | Bundled in the app | Bundled in the app |
| Disk space | ~2 GB | ~2 GB |
| RAM | 8 GB recommended | 8 GB recommended |
| Internet | Only during installation | Only during installation |
The desktop app bundles Python, Tesseract, and all AI models. You only need to install LibreOffice separately if you want to process Word documents (.doc/.docx). If you only work with PDFs, nothing else is needed.
| Requirement | Details |
|---|---|
| Python | Version 3.13 or later |
| LibreOffice | Required for Word to PDF conversion |
| Tesseract OCR | Required for scanned/image-only PDFs |
| Disk space | ~2 GB (for the spaCy AI model) |
Linux is not currently supported but is on the roadmap.
The app isn't yet registered with Apple or Microsoft (the paid code-signing certificates that remove these prompts are on the roadmap). The first time you open it, your computer shows a one-time "unrecognised app" warning. This is expected — the app is safe, runs entirely on your own computer, and none of your documents ever leave it.
macOS
When you first open it, macOS may say "App can't be opened because it is from an unidentified developer" (or "…because Apple cannot check it for malicious software").
- Drag Redaction Tool from the
.dmginto your Applications folder. - In Applications, right-click (or Control-click) the app and choose Open.
- Click Open in the dialog that appears.
You only need to do this once — macOS remembers your choice.
Newer macOS (Sequoia and later): if there's no Open button in the dialog, go to Apple menu → System Settings → Privacy & Security, scroll to the bottom, click Open Anyway, and confirm.
Windows
Because the installer isn't code-signed, Microsoft Defender SmartScreen may show a blue "Windows protected your PC" screen.
- If your browser warns about the download, choose Keep.
- On the blue SmartScreen box, click the small More info link.
- Click the Run anyway button that appears, then follow the installer.
You only need to do this once per new version. Code signing for both platforms is planned for a future release.
Note: This guide is for running from source on macOS. Most users should download the desktop app from GitHub Releases instead — no installation steps required.
This guide assumes no prior experience with Terminal or coding. Take it one step at a time. If anything goes wrong, see Troubleshooting.
Terminal is a built-in Mac app that lets you type instructions to your computer.
- Press Command + Space to open Spotlight Search
- Type
Terminaland press Enter - A window with a text prompt will appear — this is normal
Homebrew is a free tool that makes installing other software easy. If you've already done this before, skip to Step 3.
Paste this into Terminal and press Enter:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"Follow the on-screen instructions. It may ask for your Mac password (you won't see it as you type — that's normal).
LibreOffice converts Word documents to PDF for processing.
brew install --cask libreofficeTesseract reads text from scanned documents and images.
brew install tesseractbrew install python@3.13If you have git installed:
git clone https://github.com/mrdavearms/student-doc-redactor.git
cd student-doc-redactorOr download the ZIP file from GitHub:
- Go to github.com/mrdavearms/student-doc-redactor
- Click the green Code button, then Download ZIP
- Unzip the downloaded file
- In Terminal, navigate to the folder:
cd ~/Downloads/student-doc-redactor
This creates a private workspace for the tool's Python code (so it doesn't interfere with anything else on your Mac):
python3.13 -m venv venv
source venv/bin/activate
pip install -r requirements.txtThis step downloads the AI models and may take 5-10 minutes. You'll see a progress bar. That's normal.
python -m spacy download en_core_web_lgYou're ready to run the tool. You won't need to repeat these steps — just start from Running the App next time.
- Open Terminal
- Navigate to the tool's folder:
cd ~/student-doc-redactor
- Run the app:
./run.sh
- Your browser will open automatically to
http://localhost:8501
To stop: press Control + C in Terminal.
To run the desktop app in development mode:
cd desktop && npm install && npm run dev:electronThis starts Vite (hot reload), auto-spawns the FastAPI backend, and opens the Electron window.
To build installers locally:
# Mac DMG (run on macOS)
cd desktop && npm run dist:mac
# Windows installer (run on Windows)
cd desktop && npm run dist:winThe output appears in desktop/release/.
If LibreOffice is not installed when the app first opens, you land here instead of the main workflow. The screen prompts you to download and install LibreOffice, then click Check Again — the app re-checks automatically and advances when LibreOffice is found.
Click Skip for now — I only have PDFs to bypass this screen entirely if you don't need Word document support.
This screen only appears once. On subsequent launches with LibreOffice present, the app goes straight to Screen 1.
- Folder path: Paste or type the full path to a folder containing the student's documents (PDFs and/or Word files), or click Browse to select it.
- Student name: First and last name. The tool automatically generates variations (first name only, last name only, initials, etc.)
- Parent/Guardian names: Optional. Helps catch parent names that appear in documents.
- Other family names: Optional. Siblings, carers, emergency contacts.
- Organisation names: Optional. Schools, clinics, hospitals — any org that could identify the student.
- Redact headers & footers: Optional. Blanks the top and bottom of every page to remove letterheads and addresses.
Each field has a ? tooltip explaining what it does and why it matters.
Tip: To find a folder's path on Mac, right-click the folder in Finder, hold Option, and select Copy "folder" as Pathname.
The tool shows which documents were found and whether Word files were successfully converted to PDF. PII detection runs automatically after conversion completes.
This is the most important screen. You review every item the tool found — nothing is redacted without your approval.
- Tick the checkbox next to items you want redacted — everything is ticked by default
- Leave items unticked only if you are certain they should stay (e.g. a score label misread as a name)
- Confidence badges: High (green), Medium (amber), Low (rose) — helps you decide on borderline items
- Accept All & Continue: Accepts all ticked items and moves to the summary — recommended for most users
- Documents with no PII are automatically skipped
A summary of how many items across how many documents will be redacted, broken down by category.
Output folder options:
- Inside the source folder (default) — a
redactedsubfolder is created alongside your originals - Choose a different location — browse to save redacted files anywhere on your computer
- Green banner: Redaction succeeded and was verified
- Orange banner: Some pages were image-only (scanned) and were redacted via OCR — review recommended
- Red banner: A verification check failed — review that document carefully
- Before/after preview: Side-by-side comparison of original and redacted pages
- Document summary cards: Per-document breakdown of what was redacted, by category
- Open Folder button to jump straight to the output
- Redaction Log: Expandable audit trail of every item redacted
After processing, redacted files are saved to your chosen location (default: a redacted subfolder):
your-folder/
├── original-document.pdf <-- never modified
├── original-document.docx <-- never modified
├── redacted/
│ ├── original-document_redacted.pdf <-- redacted copy
│ └── another-doc_redacted.pdf
└── redaction_log.txt <-- full audit trail
redaction_log.txt records every redaction:
Document: Assessment_Report.pdf
Page 2, Line 4 | "Joe Bloggs" | Student name | confidence: 1.00
Page 2, Line 7 | "04 1234 5678" | Phone number | confidence: 0.98
Page 3, Line 1 | "joe@mail.com" | Email address | confidence: 0.97
NOTE: Scanned_Report.pdf
Pages 1-3 used OCR redaction (image-only pages) — review recommended
Keep this log. It is your record of what was removed and when.
brew install --cask libreofficebrew install tesseractYour virtual environment may not be active. Run:
source venv/bin/activate
pip install -r requirements.txtStreamlit will automatically try the next available port (8502, 8503, etc.). Check the Terminal output for the correct URL. For the desktop app, the backend runs on port 8765 — if that's in use, check for another instance running.
Manually navigate to: http://localhost:8501
Make sure LibreOffice is installed (Step 3 above). On Intel Macs, the tool also checks /usr/local/bin/soffice (Homebrew) and the app bundle automatically.
Make sure Tesseract is installed (Step 4 above). Documents that are scans of printed pages (image-only PDFs) are read via OCR — the quality of detection depends on the scan quality.
Scanned pages are redacted using OCR (optical character recognition). The quality depends on scan quality — blurry or low-resolution scans may cause Tesseract to misread words. If you see PII surviving redaction:
- Check the scan quality — re-scan at 300 DPI or higher if possible
- The audit log will note which pages used OCR redaction
- For very poor scans, manual redaction may still be needed
gantt
title Bulk Redaction Tool — Development Status
dateFormat YYYY-MM
section Core Engine
Regex PII detection :done, 2025-02, 2025-04
2-engine AI detection :done, 2025-04, 2026-01
Metadata stripping :done, 2026-01, 2026-02
OCR verification :done, 2026-01, 2026-02
Form widget redaction :done, 2026-02, 2026-03
Filename PII redaction :done, 2026-02, 2026-03
OCR redaction (scanned pages) :done, 2026-02, 2026-03
Signature detection :done, 2026-02, 2026-03
Organisation name detection :done, 2026-02, 2026-03
Header/footer zone blanking :done, 2026-02, 2026-03
257-test suite :done, 2026-02, 2026-04
section Desktop App — v1.0
Electron + React + FastAPI :done, 2026-02, 2026-03
Mac DMG build :done, 2026-02, 2026-03
Walkthrough + onboarding :done, 2026-02, 2026-03
Contextual help tooltips :done, 2026-02, 2026-03
Before/after preview :done, 2026-02, 2026-03
Custom output path :done, 2026-02, 2026-03
Witty progress comments :done, 2026-02, 2026-03
section Desktop App — v1.1
Windows support + installer :done, 2026-03, 2026-04
First-run setup screen :done, 2026-03, 2026-04
Auto-update system :done, 2026-03, 2026-04
section Desktop App — v1.2
292-test suite :done, 2026-04, 2026-05
Audit security fixes :done, 2026-04, 2026-05
section Desktop App — v1.3
Plain-English error messages :done, 2026-05, 2026-05
Engine startup banner :done, 2026-05, 2026-05
React error boundary :done, 2026-05, 2026-05
Cancel + partial output cleanup :done, 2026-05, 2026-05
Nothing-to-redact screen :done, 2026-05, 2026-05
section Desktop App — v1.3.1
Redaction safety hardening :done, 2026-05, 2026-05
Request timeouts + crash recovery :done, 2026-05, 2026-05
section Coming Soon
Mac code signing :active, 2026-05, 2026-07
section Future
Linux support :2026-07, 2026-10
Batch processing (multiple students) :2026-07, 2026-10
GitHub: https://github.com/mrdavearms/student-doc-redactor (primary)
GitLab: https://gitlab.com/davearmswork/bulk-redaction-tool (mirror)
Branches: main (stable) · test (development)
student-doc-redactor/
├── app.py # Streamlit entry point (legacy)
├── run.sh # Streamlit launch script
├── requirements.txt # Python dependencies
├── CLAUDE.md # AI development context
│
├── src/
│ ├── core/
│ │ ├── pii_orchestrator.py # 2-engine orchestrator (main detection entry point)
│ │ ├── pii_detector.py # Regex detection engine + PIIMatch dataclass
│ │ ├── presidio_recognizers.py # 6 custom Australian Presidio recognizers
│ │ ├── nickname_map.py # ~100-entry Australian nickname dictionary
│ │ ├── redactor.py # Multi-path redaction + metadata strip + signature detection
│ │ ├── text_extractor.py # Text + OCR extraction from PDFs
│ │ ├── document_converter.py # LibreOffice Word to PDF conversion
│ │ ├── binary_resolver.py # Cross-platform binary path resolution
│ │ ├── logger.py # Audit log generation and save
│ │ └── session_state.py # Streamlit session management
│ ├── services/
│ │ ├── conversion_service.py # Document conversion business logic
│ │ ├── detection_service.py # PII detection business logic
│ │ └── redaction_service.py # Redaction orchestration + custom output paths
│ └── ui/
│ └── screens.py # All 5 Streamlit screens
│
├── backend/
│ ├── main.py # FastAPI API layer + detection cache
│ └── schemas.py # Pydantic request/response models
│
├── desktop/
│ ├── electron/
│ │ ├── main.cjs # Electron main process (spawns backend)
│ │ └── preload.cjs # Electron preload (IPC bridge)
│ ├── src/
│ │ ├── App.tsx # React entry point, screen router
│ │ ├── main.tsx # Vite entry point
│ │ ├── store.ts # Zustand single store
│ │ ├── api.ts # HTTP client for backend
│ │ ├── types.ts # TypeScript type definitions
│ │ ├── index.css # Tailwind v4 theme + utility classes
│ │ ├── electron.d.ts # Electron IPC type declarations
│ │ ├── hooks/
│ │ │ └── useUpdater.ts # Auto-update state machine
│ │ ├── lib/
│ │ │ └── errorMessage.ts # User-friendly error message formatter
│ │ ├── pages/
│ │ │ ├── FolderSelection.tsx # Step 1: folder + student details
│ │ │ ├── ConversionStatus.tsx # Step 2: Word to PDF conversion
│ │ │ ├── DocumentReview.tsx # Step 3: review detected PII
│ │ │ ├── NoPiiFound.tsx # Terminal: nothing to redact
│ │ │ ├── FinalConfirmation.tsx # Step 4: confirm + output options
│ │ │ ├── Completion.tsx # Step 5: results + preview
│ │ │ └── Setup.tsx # First-run: LibreOffice check
│ │ └── components/
│ │ ├── Layout.tsx # Main layout, animated transitions
│ │ ├── Sidebar.tsx # Step indicator, logo, walkthrough
│ │ ├── Walkthrough.tsx # 4-step first-run onboarding
│ │ ├── ErrorBoundary.tsx # React error boundary wrapper
│ │ ├── ErrorFallback.tsx # Error boundary fallback UI
│ │ ├── HelpTip.tsx # Contextual ? tooltip popover
│ │ ├── AboutModal.tsx # 3-tab About dialog
│ │ ├── PreviewSection.tsx # Before/after PDF preview
│ │ ├── DocumentCard.tsx # Per-document summary card
│ │ ├── RedactionProgress.tsx # Progress bar + witty comments
│ │ └── UpdateBanner.tsx # Auto-update notification bar
│ ├── tests/
│ │ ├── errorMessage.test.ts # friendlyError unit tests
│ │ └── routing.test.ts # Post-detection routing tests
│ ├── package.json
│ ├── vite.config.ts
│ └── vitest.config.ts
│
├── docs/
│ ├── plans/ # Implementation plans (reference only)
│ └── legacy/ # Outdated docs moved from root
│
└── tests/
├── test_pii_detector.py # 52 tests
├── test_pii_detector_names.py # 65 tests
├── test_pii_orchestrator.py # 25 tests
├── test_presidio_recognizers.py # 18 tests
├── test_redactor.py # 11 tests
├── test_signature_detection.py # 16 tests
├── test_ocr_redaction.py # 28 tests
├── test_ocr_verification.py # 7 tests
├── test_metadata_stripping.py # 8 tests
├── test_widget_redaction.py # 6 tests
├── test_filename_redaction.py # 13 tests
├── test_zone_redaction.py # 5 tests
├── test_session_state.py # 2 tests
├── test_binary_resolver.py # 6 tests
├── test_integration.py # 10 tests
├── test_adversarial.py # 7 tests
├── test_false_positives.py # 4 tests
└── test_cleanup_api.py # 13 tests
venv/bin/python3.13 -m pytest tests/ -vAll 292 tests should pass in under 5 minutes.
| Component | Technology |
|---|---|
| Desktop UI | Electron + React + Vite + Tailwind v4 + Framer Motion |
| Desktop state | Zustand |
| API layer | FastAPI + Pydantic |
| Legacy UI | Streamlit |
| PDF processing | PyMuPDF (fitz) |
| Image redaction | Pillow (PIL) ImageDraw |
| AI / NER | Microsoft Presidio + spaCy en_core_web_lg |
| OCR | Tesseract + pytesseract |
| Word conversion | LibreOffice headless |
| Tests | pytest (292 tests) + Vitest |
| Language | Python 3.13+ / TypeScript |
This tool is actively developed. Bug reports and suggestions are welcome — please open an issue on GitHub.
If you are a teacher, school psychologist, or support staff and would like to share feedback about what the tool does or doesn't catch in real documents (without sharing the documents themselves), please open an issue with the label feedback.
This project is currently unlicensed (private development). A licence will be added when the public release is made. Until then, please do not redistribute.
Built for Australian educators handling sensitive student data. All processing is local. Your students' information stays on your computer.



