Skip to content

shineexxx/LocalWhisper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LocalWhisper

A local, privacy-first voice dictation app for Windows — similar to Wispr Flow. Hold a hotkey, speak, release — your words are transcribed by faster-whisper and optionally cleaned up by a local LLM, then pasted into whatever window is active.

All processing is 100% local. No internet required after setup.


Features

  • Hold-to-speak with configurable hotkey (default: Space)
  • Silero VAD for automatic speech segmentation
  • faster-whisper (tiny → large-v3) with CUDA / CPU fallback
  • Optional LLM cleanup via llama-cpp-python + any GGUF model
  • Minimal always-on-top overlay with live waveform
  • Full settings GUI — no config file editing needed

Requirements

  • Windows 10 or 11
  • Python 3.11+
  • A microphone
  • (Optional) NVIDIA GPU with CUDA 11.8+ for fast inference

Installation

1 — Clone the repo

git clone https://github.com/your-username/local-whisper.git
cd local-whisper
python -m venv .venv
.venv\Scripts\activate

2 — Install PyTorch

Choose the right command for your hardware at https://pytorch.org/get-started/locally/.

With CUDA 12.1 (recommended for NVIDIA GPUs):

pip install torch --index-url https://download.pytorch.org/whl/cu121

CPU only:

pip install torch --index-url https://download.pytorch.org/whl/cpu

3 — Install remaining dependencies

pip install -r requirements.txt

4 — Install llama-cpp-python with CUDA support

The default llama-cpp-python wheel is CPU-only. For GPU inference install the pre-built CUDA wheel (replace cu121 with your CUDA version):

pip install llama-cpp-python \
  --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121

Or build from source (requires CMake and Visual Studio Build Tools):

set CMAKE_ARGS=-DGGML_CUDA=on
pip install llama-cpp-python --upgrade --force-reinstall --no-cache-dir

If you skip this step the LLM cleanup step runs on CPU (slower but functional).


Downloading the default LLM model (Gemma 2B Q4_K_M)

  1. Create a free account at https://huggingface.co

  2. Accept the Gemma licence at https://huggingface.co/google/gemma-2-2b-it

  3. Download the quantised GGUF from the community repo: https://huggingface.co/bartowski/gemma-2-2b-it-GGUF

    The file you want: gemma-2-2b-it-Q4_K_M.gguf (~1.6 GB)

  4. Place it in the models/ directory:

    models/
    └── gemma-2-2b-q4_k_m.gguf
    

You can use any GGUF model — set the path in Settings → LLM Cleanup.


Running the app

python main.py

A small overlay appears in the bottom-right corner of your screen.

Note: The keyboard library captures global hotkeys and may require running the terminal as Administrator if certain keys don't register. Right-clicking the overlay → Quit to exit cleanly.


Usage

Action Result
Hold configured hotkey Start recording; overlay shows 🔴 Listening…
Release hotkey Stop recording; transcription begins
VAD detects silence (≥ 0.8 s) Auto-triggers transcription even while holding
Transcription complete Text is pasted into the active window
Right-click overlay Open Settings or Quit
Drag overlay Reposition it anywhere on screen

Changing the hotkey

Open Settings (right-click the overlay) → General tab → click the hotkey button and press the key you want to use.

Recommended keys to avoid conflicts with normal typing:

  • F2 or F3 — easy to reach, rarely used by apps
  • Right Ctrl — comfortable to hold
  • Caps Lock — repurpose a rarely-used key

Configuration

Settings are stored in config.json (auto-created on first run). All options are accessible through the Settings GUI.

{
  "hotkey":                 "space",
  "whisper_model":          "small",
  "language":               "auto",
  "llm_enabled":            true,
  "llm_model_path":         "models/gemma-2-2b-q4_k_m.gguf",
  "gpu_layers":             -1,
  "vad_sensitivity":        0.5,
  "insertion_method":       "clipboard",
  "autostart":              false,
  "unload_timeout_minutes": 5
}
Key Values Description
hotkey keyboard key name Key to hold for recording
whisper_model tiny base small medium large-v2 large-v3 Whisper model size
language auto or language code Transcription language
llm_enabled true / false Enable LLM cleanup step
llm_model_path file path Path to GGUF model file
gpu_layers -1 = all, 0 = CPU GPU layers for LLM
vad_sensitivity 0.0–1.0 Higher = less sensitive
insertion_method clipboard / typewrite How text is pasted
unload_timeout_minutes integer Idle timeout before model unload

Performance notes

Hardware Whisper small Gemma 2B Q4
GTX 1650 Super, 16 GB RAM ~0.8 s ~1.2 s
CPU only (Ryzen 5) ~3–6 s ~5–10 s

End-to-end latency target: < 3 s on GPU.

Models are loaded on first use and automatically unloaded after the configured idle timeout to keep background RAM under 200 MB.


Troubleshooting

Hotkey doesn't work — try running the terminal as Administrator.

"Cannot open microphone" — check that no other app has exclusive access to the mic, and that the correct device is selected in Settings → Audio.

LLM cleanup disabled / model not found — place the GGUF file at the path shown in Settings → LLM Cleanup, or browse to it.

Text not pasting — some apps (e.g. certain games, admin windows) block Ctrl+V. Switch to Typewrite in Settings → Advanced.

CUDA not detected — ensure the CUDA-enabled PyTorch build is installed (see step 2). The app falls back to CPU silently; check app.log for details.

Logs are written to app.log in the working directory.


Project structure

LocalWhisper/
├── main.py               # Entry point + AppController
├── config.json           # User settings (auto-created)
├── app.log               # Runtime log (auto-created)
├── core/
│   ├── audio.py          # Mic capture + Silero VAD
│   ├── transcriber.py    # faster-whisper wrapper
│   ├── llm.py            # llama-cpp-python wrapper
│   └── inserter.py       # Text insertion (clipboard / typewrite)
├── ui/
│   ├── overlay.py        # Floating status widget
│   └── settings.py       # Settings dialog
├── models/               # Place GGUF files here
└── requirements.txt

Licence

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages