Skip to content

mrbuslov/extension-voice-transcriber

Repository files navigation

Voice Transcriber

A VS Code extension that records your voice and transcribes it using OpenAI Whisper or a local Whisper-compatible API. Can optionally clean up the text with an LLM.

image

Features

  • Record audio directly in VS Code with real-time visualization
  • Upload audio or video files — audio track is extracted automatically (MP4, MKV, MOV, AVI, WebM)
  • Transcribe long recordings (1hr+) — split into 10-min chunks behind the scenes
  • Transcribe via OpenAI Whisper or your own local server
  • Clean up filler words and fix punctuation with LLM (optional)
  • Keep your last 10 transcriptions
  • Auto-copy results to clipboard
  • Recover recordings if VS Code crashes

ffmpeg Installation (required for native recording and long files)

The extension uses ffmpeg for native recording, splitting long files, and extracting audio from video uploads. Without ffmpeg, recording falls back to the browser (works, but lower quality and no long-file support).

macOS:

brew install ffmpeg

Linux:

sudo apt install ffmpeg

Windows:

winget install ffmpeg
# or
choco install ffmpeg

Usage

  1. Click the microphone icon in the top-right of your editor
  2. Set up your provider (OpenAI or local)
  3. Hit "Start Recording" and speak
  4. Hit "Stop" — text is automatically copied to clipboard

Configuration

OpenAI

Get an API key from platform.openai.com/api-keys, select "OpenAI" as provider, paste your key, and save.

Local server

Any Whisper-compatible API works:

Just enter the URL, e.g. http://localhost:8000/v1/audio/transcriptions.

LLM text cleanup

When using OpenAI, you can enable "Clean up text with LLM" to remove filler words, fix punctuation, and add paragraph breaks.

Models available: gpt-4o-mini (default, cheapest), gpt-4o, gpt-4-turbo, gpt-3.5-turbo.

Languages

Auto-detect or pick manually: English, Russian, Ukrainian, Spanish, French, German, Italian, Portuguese, Polish, Japanese, Korean, Chinese, and more.

Troubleshooting

Microphone access denied

macOS: System Settings → Privacy & Security → Microphone → enable VS Code → restart VS Code

Windows: Settings → Privacy → Microphone → allow app access

Linux: Check PulseAudio/PipeWire settings with pavucontrol, make sure no other app is blocking the mic

How to check logs

Command Palette (Ctrl+Shift+P / Cmd+Shift+P) → "Developer: Open Webview Developer Tools" → pick Voice Transcriber → Console tab

Transcription fails

  • Check your API key
  • For local API — make sure the server is running and URL is correct
  • Check your internet connection

Large files and video uploads

Recordings over 24 MB are transcoded to 128 kbps MP3 and split into 10-minute chunks, each transcribed separately and concatenated. Video uploads (MP4, MKV, MOV, AVI, WebM) have their audio track extracted automatically. Both features require ffmpeg.

Privacy

  • API keys are stored in VS Code's secure storage (system keychain)
  • Audio goes directly to OpenAI or your local API
  • Nothing is saved to disk

For Developers

Setup

npm install
npm run compile

Press F5 to launch the Extension Development Host.

Commands

npm run compile   # build once
npm run watch     # rebuild on changes

Publishing to VS Code Marketplace

Prerequisites

  1. Microsoft account — account.microsoft.com
  2. Azure DevOps org — dev.azure.com
  3. Publisher ID — marketplace.visualstudio.com/manage

Get a Personal Access Token (PAT)

  1. Go to dev.azure.com → profile → Personal access tokens → New Token
  2. Organization: All accessible organizations
  3. Scopes: Custom defined → Marketplace → Manage
  4. Copy the token (shown only once)

Update package.json

{
  "publisher": "your-publisher-id",
  "icon": "resources/icon.png"
}

Icon must be a 128×128 PNG.

Publish

npm install -g @vscode/vsce
vsce login your-publisher-id
vsce publish

Update version

vsce publish patch  # 0.1.0 → 0.1.1
vsce publish minor  # 0.1.0 → 0.2.0
vsce publish major  # 0.1.0 → 1.0.0

Other useful commands

vsce package                      # create .vsix without publishing
vsce show publisher.extension     # show extension info
vsce unpublish publisher.ext      # remove from marketplace

License

MIT

About

VS Code extension that allows developers to record voice, transcribe it (speech to text) using OpenAI Whisper API or a local Whisper-compatible API, and optionally clean up the transcribed text using an LLM.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors