feat: Support pluggable speech recognition engines beyond the Web Speech API

## Context

Today both `Vocal` and `useVocal` are hard-wired to the browser's native **Web Speech API** (`SpeechRecognition`). The recognition engine lives inside `@untemps/vocal`: `createVocal()` instantiates `window.SpeechRecognition`/`webkitSpeechRecognition` internally, and `isSupported()` probes for that global. `react-vocal` only ever consumes the resulting `VocalInstance`.

This couples the whole library to one engine, which has real limitations:

- **No cross-browser coverage** — Firefox has no `SpeechRecognition`; on most platforms `isSupported()` returns `false` and the component renders nothing.
- **No offline / on-device option** — e.g. Vosk, whisper.cpp / `transformers.js`.
- **No cloud STT option** — e.g. Deepgram, Google Cloud Speech-to-Text, Azure Speech, OpenAI/Whisper API — which consumers may already pay for and want for accuracy, custom vocabulary, or diarization.

There is currently no public seam to swap the engine.

## Proposal

Introduce a **pluggable speech-recognition engine** (adapter) abstraction so consumers can supply their own backend while keeping the existing event model, commands, timeouts and accessibility behaviour untouched. The Web Speech API stays the **default**, so this is purely additive and non-breaking.

The core mechanism belongs in **`@untemps/vocal`** (where the engine is wired); `react-vocal` then surfaces it through `useVocal` and `Vocal`.

### 1. `@untemps/vocal` — engine contract + injection

Define an engine interface that abstracts the parts `createVocal` currently assumes about `SpeechRecognition`, and emits the existing `eventTypes` (`start`, `end`, `result`, `error`, `speechstart`, `speechend`, `nomatch`, `permission`, …). Sketch:

```ts
export interface SpeechEngine {
  start(options?: { signal?: AbortSignal }): Promise<void>
  stop(): void
  abort(): void
  // normalized event stream consumed by Vocal core
  on<T extends EventType>(type: T, cb: EventHandlerFor<T>): void
  off<T extends EventType>(type: T, cb?: EventHandlerFor<T>): void
  cleanup(): void
  readonly isSupported: boolean
}

export type SpeechEngineFactory = (options: VocalOptions) => SpeechEngine
```

`createVocal` accepts an optional engine factory and defaults to the built-in Web Speech engine:

```ts
createVocal({ lang, grammars, maxAlternatives, continuous, engine: myEngineFactory })
```

The existing Web Speech behaviour is refactored into a default `webSpeechEngine` implementing this interface — no behaviour change when no engine is passed.

### 2. `react-vocal` — expose the seam

- `useVocal(...)` gains a way to pass a custom engine factory, forwarded to `createVocal`.
- `Vocal` gains an `engine` prop, forwarded to `useVocal`.
- `isSupported()` becomes engine-aware: when a custom engine is provided, support is determined by the engine (so a cloud/offline engine can render the button even on Firefox).

## Key considerations

- **Result normalization.** `react-vocal`'s `onResult` currently receives a raw `SpeechRecognitionEvent` (and `_onResult` reads `event.results`). A custom engine won't produce that shape. We need either a normalized result payload emitted by all engines, or a documented adapter shape, so `tryMatchCommand` / `useCommands` keep working. This is the main design decision.
- **Permission / `getUserMedia`.** Cloud and on-device engines manage microphone capture themselves; the `permission` event contract must stay meaningful (or be opt-out per engine).
- **`grammars` / `maxAlternatives` / `continuous`.** Engine-specific support — define how unsupported options degrade.
- **Async/streaming engines.** Cloud engines stream audio and return interim/final transcripts asynchronously; the engine adapter must map that onto the existing synchronous-ish event lifecycle.
- **Bundle size.** Engines must be tree-shakeable / opt-in; no cloud SDK should be pulled into the default build.
- **Types.** Export `SpeechEngine` / `SpeechEngineFactory` from both packages.

## Backward compatibility

Fully backward compatible: omitting `engine` keeps the current Web Speech API behaviour and the existing `isSupported()` semantics.

## Acceptance criteria

- [ ] `@untemps/vocal` exposes a documented `SpeechEngine` interface and accepts an `engine` factory in `createVocal` (defaulting to the built-in Web Speech engine).
- [ ] `useVocal` and `Vocal` accept and forward a custom engine; `Vocal` gains an `engine` prop.
- [ ] `isSupported` is engine-aware.
- [ ] `onResult` receives a normalized result usable by `useCommands` regardless of engine.
- [ ] Default (no engine) behaviour is unchanged and covered by existing tests.
- [ ] A reference/example custom engine (mock or a real one such as Deepgram/Whisper) is documented in the README.
- [ ] Tests cover engine injection, support detection, and result normalization.

> Note: the bulk of this work (engine contract + default Web Speech engine) lives in `@untemps/vocal`. This issue tracks the `react-vocal` side (props/hook seam, docs, tests) and the cross-package coordination.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Support pluggable speech recognition engines beyond the Web Speech API #239

Context

Proposal

1. `@untemps/vocal` — engine contract + injection

2. `react-vocal` — expose the seam

Key considerations

Backward compatibility

Acceptance criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat: Support pluggable speech recognition engines beyond the Web Speech API #239

Description

Context

Proposal

1. @untemps/vocal — engine contract + injection

2. react-vocal — expose the seam

Key considerations

Backward compatibility

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

1. `@untemps/vocal` — engine contract + injection

2. `react-vocal` — expose the seam