ut is a Sway-first dictation CLI for Wayland.
The MVP uses a two-tap toggle flow:
- first
utinvocation (orut start) starts recording - second
utinvocation (orut stop) stops recording, processes audio with the specified LLM, and pastes the result ut abortcancels the active session
Install ut:
cargo install --locked --git https://github.com/dstoc/utCreate ~/.config/ut/config.toml and specify model config (see below).
Confirm dependencies, audio input, and config are valid:
ut healthAdd Sway bindings, e.g.:
bindsym $mod+equal exec ut start
bindsym --release $mod+equal exec ut stopRust:
- stable Rust toolchain
- Cargo
System libraries:
- ALSA development package
pkg-config
On Debian or Ubuntu this typically means:
sudo apt install build-essential pkg-config libasound2-devut expects these commands at runtime:
swaymsgwl-copywl-pastewtypenotify-send
On Debian or Ubuntu:
sudo apt install sway wl-clipboard wtype libnotify-bincargo buildThe default build enables two features:
audio-capture— microphone capture viacpal(requires the ALSA development headers)ui— the status overlay, drawn withwgpuon a Wayland layer-shell surface
To build without the overlay and its graphics dependencies:
cargo build --no-default-features --features audio-captureIf the build fails with an alsa.pc or alsa-sys error, install the ALSA development package and make sure pkg-config can find it.
Optional install without the overlay:
cargo install --locked --git https://github.com/dstoc/ut --no-default-features --features audio-captureThe current default model settings target:
- URL:
http://127.0.0.1:11434/v1 - model:
unsloth/gemma-4-E2B-it-GGUF:Q4_K_XL - optional auth: set
model.api_keydirectly, ormodel.api_key_envto read a bearer token from an environment variable at request time
The request uses the OpenAI-compatible chat completions API with input_audio.
ut
ut toggle
ut start
ut stop
ut abort
ut status
ut healthut without a subcommand is equivalent to ut toggle.
Config is loaded from:
$XDG_CONFIG_HOME/ut/config.toml- or
~/.config/ut/config.toml
Defaults:
[recording]
max_seconds = 29
trim_silence = true
trim_padding_ms = 500
[model]
url = "http://127.0.0.1:11434/v1"
model = "unsloth/gemma-4-E2B-it-GGUF:Q4_K_XL"
timeout_seconds = 60
[paste]
method = "clipboard"
restore_clipboard = true
restore_delay_ms = 100
on_focus_changed = "copy"
[status_ui]
enabled = true
width = 200
height = 200
x = 0.5
y = 0.8
fade_out_ms = 350Optional model auth:
[model]
api_key = "sk-..."
# or:
api_key_env = "OPENAI_API_KEY"If both are set, api_key wins. When a key is resolved, ut sends it as Authorization: Bearer <key>.
Health check:
ut healthThis checks that the config parses and passes basic validation, required helper commands are present in PATH, and an input audio stream can be opened.
App rule examples:
[prompts]
terminal = """You are a dictation engine.
Return only the final insertable text.
Remove filler words, repeated fragments, and obvious false starts.
Follow any formatting instructions in the audio first.
Format the result as a shell command only."""
[[app_rules]]
app_id = "kitty"
prompt = "terminal"
paste_keys = "ctrl+shift+v"[[app_rules]]
app_id = "Code"
prompt = "default"Model examples:
[model]
model = "gemini-3.1-flash-lite"
url = "https://generativelanguage.googleapis.com/v1beta/openai"
api_key_env = "GEMINI_API_KEY"Provider-specific request fields can be added under model.extra_body. These are
merged into the top-level chat-completions JSON body after ut assembles its
standard model, messages, and temperature fields. Reserved keys cannot be
overridden.
For example, with llama-server:
[model]
model = "meta-llama/Llama-3.3-70B-Instruct"
url = "http://127.0.0.1:8080/v1"
[model.extra_body]
thinking_budget_tokens = 1024ut captures Sway context at:
- recording start
- recording stop
- pre-paste
If the focused Sway container changes before paste, ut copies the generated text to the clipboard and does not auto-paste into the new window.
app_rules can also override the clipboard paste shortcut with paste_keys. Use +-separated key names such as ctrl+shift+v; ut translates them into the matching wtype modifier/key sequence and falls back to Ctrl+V when the field is omitted.
Build error:
alsa-sys/alsa.pcmissing Installlibasound2-devandpkg-config.
Runtime error:
failed to run wl-copy,wl-paste,wtype, ornotify-sendInstall the missing Wayland helper packages.
No transcription:
- verify the local model server is reachable at
127.0.0.1:11434 - verify the configured model supports
audioinput