LocalWhisper

A local, privacy-first voice dictation app for Windows — similar to Wispr Flow. Hold a hotkey, speak, release — your words are transcribed by faster-whisper and optionally cleaned up by a local LLM, then pasted into whatever window is active.

All processing is 100% local. No internet required after setup.

Features

Hold-to-speak with configurable hotkey (default: Space)
Silero VAD for automatic speech segmentation
faster-whisper (tiny → large-v3) with CUDA / CPU fallback
Optional LLM cleanup via llama-cpp-python + any GGUF model
Minimal always-on-top overlay with live waveform
Full settings GUI — no config file editing needed

Requirements

Windows 10 or 11
Python 3.11+
A microphone
(Optional) NVIDIA GPU with CUDA 11.8+ for fast inference

Installation

1 — Clone the repo

git clone https://github.com/your-username/local-whisper.git
cd local-whisper
python -m venv .venv
.venv\Scripts\activate

2 — Install PyTorch

Choose the right command for your hardware at https://pytorch.org/get-started/locally/.

With CUDA 12.1 (recommended for NVIDIA GPUs):

pip install torch --index-url https://download.pytorch.org/whl/cu121

CPU only:

pip install torch --index-url https://download.pytorch.org/whl/cpu

3 — Install remaining dependencies

pip install -r requirements.txt

4 — Install llama-cpp-python with CUDA support

The default llama-cpp-python wheel is CPU-only. For GPU inference install the pre-built CUDA wheel (replace cu121 with your CUDA version):

pip install llama-cpp-python \
  --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121

Or build from source (requires CMake and Visual Studio Build Tools):

set CMAKE_ARGS=-DGGML_CUDA=on
pip install llama-cpp-python --upgrade --force-reinstall --no-cache-dir

If you skip this step the LLM cleanup step runs on CPU (slower but functional).

Downloading the default LLM model (Gemma 2B Q4_K_M)

Create a free account at https://huggingface.co
Accept the Gemma licence at https://huggingface.co/google/gemma-2-2b-it
Download the quantised GGUF from the community repo: https://huggingface.co/bartowski/gemma-2-2b-it-GGUF

The file you want: gemma-2-2b-it-Q4_K_M.gguf (~1.6 GB)

Place it in the models/ directory:

models/
└── gemma-2-2b-q4_k_m.gguf

You can use any GGUF model — set the path in Settings → LLM Cleanup.

Running the app

python main.py

A small overlay appears in the bottom-right corner of your screen.

Note: The keyboard library captures global hotkeys and may require running the terminal as Administrator if certain keys don't register. Right-clicking the overlay → Quit to exit cleanly.

Usage

Action	Result
Hold configured hotkey	Start recording; overlay shows 🔴 Listening…
Release hotkey	Stop recording; transcription begins
VAD detects silence (≥ 0.8 s)	Auto-triggers transcription even while holding
Transcription complete	Text is pasted into the active window
Right-click overlay	Open Settings or Quit
Drag overlay	Reposition it anywhere on screen

Changing the hotkey

Open Settings (right-click the overlay) → General tab → click the hotkey button and press the key you want to use.

Recommended keys to avoid conflicts with normal typing:

F2 or F3 — easy to reach, rarely used by apps
Right Ctrl — comfortable to hold
Caps Lock — repurpose a rarely-used key

Configuration

Settings are stored in config.json (auto-created on first run). All options are accessible through the Settings GUI.

{
  "hotkey":                 "space",
  "whisper_model":          "small",
  "language":               "auto",
  "llm_enabled":            true,
  "llm_model_path":         "models/gemma-2-2b-q4_k_m.gguf",
  "gpu_layers":             -1,
  "vad_sensitivity":        0.5,
  "insertion_method":       "clipboard",
  "autostart":              false,
  "unload_timeout_minutes": 5
}

Key	Values	Description
`hotkey`	keyboard key name	Key to hold for recording
`whisper_model`	`tiny` `base` `small` `medium` `large-v2` `large-v3`	Whisper model size
`language`	`auto` or language code	Transcription language
`llm_enabled`	`true` / `false`	Enable LLM cleanup step
`llm_model_path`	file path	Path to GGUF model file
`gpu_layers`	`-1` = all, `0` = CPU	GPU layers for LLM
`vad_sensitivity`	`0.0–1.0`	Higher = less sensitive
`insertion_method`	`clipboard` / `typewrite`	How text is pasted
`unload_timeout_minutes`	integer	Idle timeout before model unload

Performance notes

Hardware	Whisper `small`	Gemma 2B Q4
GTX 1650 Super, 16 GB RAM	~0.8 s	~1.2 s
CPU only (Ryzen 5)	~3–6 s	~5–10 s

End-to-end latency target: < 3 s on GPU.

Models are loaded on first use and automatically unloaded after the configured idle timeout to keep background RAM under 200 MB.

Troubleshooting

Hotkey doesn't work — try running the terminal as Administrator.

"Cannot open microphone" — check that no other app has exclusive access to the mic, and that the correct device is selected in Settings → Audio.

LLM cleanup disabled / model not found — place the GGUF file at the path shown in Settings → LLM Cleanup, or browse to it.

Text not pasting — some apps (e.g. certain games, admin windows) block Ctrl+V. Switch to Typewrite in Settings → Advanced.

CUDA not detected — ensure the CUDA-enabled PyTorch build is installed (see step 2). The app falls back to CPU silently; check app.log for details.

Logs are written to app.log in the working directory.

Project structure

LocalWhisper/
├── main.py               # Entry point + AppController
├── config.json           # User settings (auto-created)
├── app.log               # Runtime log (auto-created)
├── core/
│   ├── audio.py          # Mic capture + Silero VAD
│   ├── transcriber.py    # faster-whisper wrapper
│   ├── llm.py            # llama-cpp-python wrapper
│   └── inserter.py       # Text insertion (clipboard / typewrite)
├── ui/
│   ├── overlay.py        # Floating status widget
│   └── settings.py       # Settings dialog
├── models/               # Place GGUF files here
└── requirements.txt

Licence

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.claude		.claude
core		core
ui		ui
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LocalWhisper

Features

Requirements

Installation

1 — Clone the repo

2 — Install PyTorch

3 — Install remaining dependencies

4 — Install llama-cpp-python with CUDA support

Downloading the default LLM model (Gemma 2B Q4_K_M)

Running the app

Usage

Changing the hotkey

Configuration

Performance notes

Troubleshooting

Project structure

Licence

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LocalWhisper

Features

Requirements

Installation

1 — Clone the repo

2 — Install PyTorch

3 — Install remaining dependencies

4 — Install llama-cpp-python with CUDA support

Downloading the default LLM model (Gemma 2B Q4_K_M)

Running the app

Usage

Changing the hotkey

Configuration

Performance notes

Troubleshooting

Project structure

Licence

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages