Skip to content

Latest commit

 

History

History
159 lines (115 loc) · 8.25 KB

File metadata and controls

159 lines (115 loc) · 8.25 KB

LogNotes

Python Whisper UI Audio Hotkeys

LogNotes is a lightweight, local speech-to-text application that transcribes your recorded notes and pastes the result wherever your cursor is placed. It's primarily designed to "log" short notes. I use it quite often when instructing coding agents (e.g, when providing feedback, describing bugs, or outlining requirements).

The app uses Whisper for transcription. The current version is still very much a work in progress, but I'll definitely be working on further improvements.

LogNotes is built as an Electron front end (the UI) over a Python back end that runs the ML pipeline and OS integration. See ARCHITECTURE.md for how it all fits together.

Inspiration

I built LogNotes because I wanted a free, fully local alternative to Whispr Flow. I wanted something I could run on-device, without needing a subscription. The goal was to create the same core experience, even if it was a bit slower, and process voice recordings locally.

When looking into open source solutions, I came across Handy by @cjpais. I used this app's code as a reference for several optimizations to make LogNotes faster and extensible. I would definitely recommend checking out Handy as well.

Requirements

  • Python 3.10+ (the back-end ML pipeline)
  • Node.js 18+ (the Electron front end)

Installation

1. Clone and Set Up the Python Back End

cd LogNotes
python -m venv venv

# Windows
venv\Scripts\activate

# macOS/Linux
source venv/bin/activate

pip install -r requirements.txt

2. Install the Electron Front End

cd electron
npm install

Desktop App

Build a Windows installer with build\build-electron.ps1 (produces dist-electron\LogNotes Setup *.exe).

Things To Know

  • Transcription Speed: Typically bigger speech-to-text models will take a bit longer to transcribe the text, but they have better accuracy. I have optimized the speed as much as I can. It should be reasonably fast even on CPUs.
  • First Transcription: The very first transcription may be slow since the speech detection tool has to start up. You may notice the same difference in speed after restarting the app.
  • Hotkeys: If you're using LogNotes primarily on a specific app (e.g., Cursor, Obsidian, Jira), make sure your hotkey doesn't conflict with any existing shortcuts in those apps.
  • Windows App Builds: If you make changes to the code and rebuild the Windows app it will take several minutes (just fyi).
  • Antivirus Scans: The first time you run the Windows app your antivirus software may need to scan it before you can use it. Once the scan is over just reopen the app and it should work as expected.

Key Features

  • Local Processing - All transcription happens on your machine. Silence is automatically filtered out.
  • Flexible Recording Modes - Choose between Hold mode (press and hold) or Toggle mode (click to start/stop).
  • Whisper Transcription - Choose between Whisper base / small. CUDA is auto-detected and used when available.
  • Session Activity Tab - Every transcription this session is retained in RAM so you can retry it with a different model, copy the text, or delete it.
  • Always-Visible Recording Overlay - Small borderless status indicator pinned to a screen corner; drag to reposition, right-click to cycle corners.
  • Checkpoint Pasting - Sentences are pasted as soon as Whisper finishes each one, so partial text is preserved if processing fails mid-stream

Security Features

  • No Audio Storage - Recordings are held in memory only during the app session and never written to disk. The Activity tab keeps recent clips in RAM so you can retry a transcription with a different model. Everything is cleared on app close
  • Config Validation - All configuration values are validated against whitelists on load.
  • Atomic Config Permissions - The config file is created with 0o600 permissions in a single os.open() call, with no readable window between creation and chmod.
  • Bounded Activity Memory - The session audio cap is enforced before adding each new entry, preventing a single long recording from temporarily spiking RAM past the limit.
  • Pinned Model Versions - External model downloads use pinned versions.

Usage

Start the App

cd electron && npm start

This launches the Electron UI, which spawns the Python back end automatically. Activate the venv first so the back end's Python dependencies are available.

Recording

Hold Mode (default):

  1. Open any text editor or input field where you want to paste text
  2. Press and hold the hotkey (default: Ctrl+Shift+D)
  3. Speak your text
  4. Release the hotkey
  5. Wait for processing - the transcribed text will be pasted at your cursor

Toggle Mode:

  1. Open any text editor or input field where you want to paste text
  2. Press the hotkey once to start recording
  3. Speak your text
  4. Press the hotkey again to stop recording
  5. Wait for processing - the transcribed text will be pasted at your cursor

Project Structure

LogNotes/
├── sidecar.py                 # Python back end: WebSocket/RPC server + bridge
├── requirements.txt           # Python dependencies
├── ARCHITECTURE.md            # Full architecture reference
├── electron/                  # Electron front end
│   ├── main.js               # Process spawn/supervision, tray, lifecycle
│   ├── preload.js            # Hardened contextBridge surface
│   ├── package.json          # electron-builder config
│   └── renderer/             # index.html/renderer.js (tabs), overlay.html/.js
├── src/
│   ├── controller.py         # LogNotesController — the orchestrator (Tk-free)
│   ├── ui_bridge.py          # UIBridge protocol the controller talks through
│   ├── config.py             # Schema, validation, 0o600 save, ConfigStore
│   ├── activity.py           # In-memory ActivityStore (session-scoped)
│   ├── paths.py              # User data / cache dir + bundled-asset resolution
│   ├── audio/                # recorder.py (sounddevice)
│   ├── transcription/        # registry.py, whisper.py, device.py
│   ├── input/                # hotkey.py (pynput), paster.py (paste/clipboard)
│   └── ui/                   # Legacy Tk UI (reference only; entry main.py)
├── build/                    # PyInstaller specs + build-electron.ps1
├── tests/                    # unittest suite + sidecar protocol smoke test
└── documentation/
    ├── configuration.md      # Config file, schema, settings, validation
    └── troubleshooting.md    # Common issues and fixes

Tech Stack

Component Library
Front end Electron
Back-end IPC WebSocket (loopback)
Transcription faster-whisper
Audio Recording sounddevice
Global Hotkeys pynput
Text Pasting pynput + pyperclip

Documentation

  • Architecture — The full picture: the Electron + Python-back-end split, the transcription pipeline, the IPC protocol, module layout, packaging, and the security model.
  • Configuration — Covers the config file location, full settings schema, valid values for each option, and the validation rules applied on load.
  • Troubleshooting — Step-by-step fixes for common issues including hotkeys not firing, audio problems, transcription quality, and packaged build failures.

Disclaimer

This project was built entirely with Claude Code and OpenAI Codex. While the code has been reviewed, AI-generated code can contain bugs or issues that are easy to miss. If you spot anything significant, please open an issue. I'd genuinely appreciate it.

License

MIT