feat(clone): Inworld provider with persisted, reusable voices#431
feat(clone): Inworld provider with persisted, reusable voices#431gianpaj wants to merge 3 commits into
Conversation
Add Inworld TTS as a third voice-clone provider (alongside Mistral/Replicate), selectable via a new engine dropdown, and let users save and reuse cloned voices. - Inworld clone provider: voices:clone + tts/v1/voice (non-streaming, MP3 out), server-side Basic auth (INWORLD_API_KEY), $25/1M-char cost, user-selectable. - Persist cloned voices in a new RLS-scoped `audio_references` table; reuse a saved voice to synthesize without re-uploading or re-cloning; auto-save new clones. - GET /api/audio-references (list) and DELETE /api/audio-references/[id] which also removes the Inworld-side voice (404-tolerant); clone page dropdown + name field + delete confirm modal. - i18n across all 6 locales; unit tests for the route, reuse path, and new API routes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
|
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Claude finished @gianpaj's task in 3m 20s —— View job PR Review — Inworld cloned voicesReviewed the full diff (provider, API routes, migration, queries, frontend, tests, i18n). Overall this is a well-structured, thorough PR — clean provider abstraction, defense-in-depth RLS (policies and explicit 🔴 Worth addressing
🟡 Confirm intent
🟢 Nits / minor
✅ Things done well
I focused on review feedback only and did not change any code. Happy to implement any of the above if you'd like — just say the word. |
There was a problem hiding this comment.
Code Review
This pull request introduces Inworld TTS voice cloning as a third provider option, allowing users to explicitly select it alongside Mistral and Replicate. It adds support for saving, reusing, and deleting cloned Inworld voices via a new audio_references database table and associated API endpoints (/api/audio-references). The review feedback highlights three important improvements: preventing the delete confirmation dialog from closing prematurely during asynchronous deletion, validating the selected locale for the explicitly chosen Mistral provider, and enforcing a maximum length constraint on the voice name in the backend to match the frontend limits.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| <AlertDialogAction asChild> | ||
| <Button | ||
| disabled={isDeleting} | ||
| onClick={handleDelete} | ||
| variant="destructive" | ||
| > | ||
| {isDeleting | ||
| ? `${t('deleteVoiceConfirm')}...` | ||
| : t('deleteVoiceConfirm')} | ||
| </Button> | ||
| </AlertDialogAction> |
There was a problem hiding this comment.
When using AlertDialogAction with an async onClick handler like handleDelete, the dialog will close immediately when clicked. To prevent the dialog from closing prematurely (especially if the deletion fails or while it is in progress), you should call event.preventDefault() in the onClick handler. Since handleDelete already calls setIsOpen(false) on success, this will ensure the dialog only closes after a successful deletion.
<AlertDialogAction asChild>
<Button
disabled={isDeleting}
onClick={(event) => {
event.preventDefault();
handleDelete();
}}
variant="destructive"
>
{isDeleting
? t('deleteVoiceConfirm') + '...'
: t('deleteVoiceConfirm')}
</Button>
</AlertDialogAction>
| function validateProviderLocale(provider: CloneProvider, locale: string): void { | ||
| if (provider === 'inworld' && !isInworldSupportedLocale(locale)) { | ||
| throw createRouteError( | ||
| `Inworld voice cloning does not support the language: ${locale}`, | ||
| 400, | ||
| 'errors.providerLocaleUnsupported', | ||
| { provider, locale }, | ||
| ); | ||
| } | ||
| } |
There was a problem hiding this comment.
The validateProviderLocale function currently only validates the locale for the inworld provider. However, since the frontend now allows explicitly selecting the mistral provider, we should also validate that the selected locale is supported by Mistral (i.e., is a Voxtral-supported locale). Otherwise, selecting Mistral with an unsupported language like Japanese will result in unexpected API failures or wasted credits.
function validateProviderLocale(provider: CloneProvider, locale: string): void {
if (provider === 'inworld' && !isInworldSupportedLocale(locale)) {
throw createRouteError(
"Inworld voice cloning does not support the language: " + locale,
400,
'errors.providerLocaleUnsupported',
{ provider, locale },
);
}
if (provider === 'mistral' && !isVoxtralCloneLocale(locale)) {
throw createRouteError(
"Mistral voice cloning does not support the language: " + locale,
400,
'errors.providerLocaleUnsupported',
{ provider, locale },
);
}
}| if (provider === 'inworld' && !formInput.voiceName) { | ||
| throw createRouteError( | ||
| 'A voice name is required to save the cloned voice.', | ||
| 400, | ||
| 'errors.voiceNameRequired', | ||
| ); | ||
| } |
There was a problem hiding this comment.
To prevent potential database or third-party API issues with excessively long voice names, we should validate the length of voiceName in the backend to match the frontend's maxLength={60} constraint.
// Saving a new Inworld voice requires a name.
if (provider === 'inworld') {
if (!formInput.voiceName) {
throw createRouteError(
'A voice name is required to save the cloned voice.',
400,
'errors.voiceNameRequired',
);
}
if (formInput.voiceName.length > 60) {
throw createRouteError(
'Voice name must be 60 characters or less.',
400,
'errors.voiceNameRequired',
);
}
}| ); | ||
| } | ||
|
|
||
| throw createProviderUnavailableRouteError('inworld'); |
There was a problem hiding this comment.
Inworld 4xx (non-403) errors are surfaced as 503 "temporarily unavailable".
throwInworldRouteError only special-cases 403; every other non-2xx (including 400/413 — e.g. bad reference audio, unsupported langCode, payload too large) is mapped to createProviderUnavailableRouteError('inworld'), which renders as "provider is temporarily unavailable, please try again."
For a genuine client-side problem this is misleading and invites pointless retries that keep failing (and on the new-clone path, each retry mints another Inworld voice). Consider forwarding 4xx as a non-retryable client error and reserving the 503 mapping for error.transient (5xx):
if (error.status === 403) { /* guardrail */ }
if (error.status >= 400 && error.status < 500) {
throw createRouteError(/* a 4xx client error */, error.status, ...);
}
throw createProviderUnavailableRouteError('inworld');|
|
||
| let result: Awaited<ReturnType<typeof synthesizeWithInworld>>; | ||
| try { | ||
| result = await synthesizeWithInworld({ text, locale, voiceId }); |
There was a problem hiding this comment.
Reuse synthesizes with the currently selected locale, not the locale the voice was cloned with.
audio_references doesn't persist the clone-time locale/langCode, so reuse takes locale from the request (the dropdown). A user can clone a voice as en and then re-synthesize it as ja simply by changing the language selector. If Inworld TTS-2 is genuinely multilingual per voice this may be intentional — but if a voice is language-bound, store the locale on the audio_references row at creation and reuse that here instead of trusting the request locale. Worth confirming either way.
| if ( | ||
| provider === 'inworld' && | ||
| createdInworldVoiceId && | ||
| formInput.voiceName | ||
| ) { | ||
| const inserted = await insertAudioReference({ | ||
| userId: user.id, | ||
| provider: 'inworld', | ||
| voiceId: createdInworldVoiceId, | ||
| name: formInput.voiceName, | ||
| isPaid: userHasPaid, | ||
| }); | ||
|
|
||
| if (inserted.error) { | ||
| captureException(inserted.error, { | ||
| user: { id: user.id }, | ||
| extra: { voiceId: createdInworldVoiceId }, | ||
| }); | ||
| } else if (inserted.data) { | ||
| createdAudioReference = { | ||
| id: inserted.data.id, | ||
| name: inserted.data.name, | ||
| voice_id: inserted.data.voice_id, | ||
| }; | ||
| } | ||
| } |
There was a problem hiding this comment.
Orphaned remote voice when the DB insert fails. If insertAudioReference returns an error (e.g. the global voice_id UNIQUE constraint, or a transient DB failure), it's captured to Sentry but the Inworld voice created moments earlier in cloneVoiceWithInworld is never deleted — it lingers in the Inworld workspace untracked and un-reusable. Since deleteInworldVoice already exists and tolerates 404s, consider calling it to roll back the remote voice when persistence fails (the user still gets their generated audio; they just lose the un-saveable voice).
| if (provider === 'inworld') { | ||
| // inworld-tts-2 - $25 per 1M characters | ||
| return text | ||
| ? (text.length / 1_000_000) * INWORLD_TTS2_DOLLARS_PER_MILLION_CHARS | ||
| : -1; | ||
| } |
There was a problem hiding this comment.
Heads-up on credit pricing vs. actual cost for Inworld. getDollarCost correctly models Inworld at $25/1M chars, but the credits actually charged to the user come from estimateCredits(text, 'clone') (route.ts) — a word-count × clone multiplier that's provider-agnostic. So a clone charges the same credits whether it runs on Replicate ($0.012 flat) or Inworld. Inworld supports Voxtral locales where paid users can submit up to 4000 chars ($0.10/generation), making it the most expensive provider by far while charging identical credits. Worth confirming the credit model intentionally absorbs that delta, or making the estimate provider-aware.
|
The latest updates on your projects. Learn more about Argos notifications ↗︎
|
There was a problem hiding this comment.
Important
One issue worth addressing before merge: a successfully-minted (and billable) Inworld voice can be orphaned with no cleanup when a later step in the clone request fails. Everything else is solid — clean RLS, good test coverage, careful delete path.
Reviewed changes — initial review of adding Inworld TTS as a third, persisted/reusable voice-clone provider.
- Inworld client —
lib/clone/inworld.tsaddscreateInworldVoice,synthesizeWithInworld,cloneVoiceWithInworld, and a 404-tolerantdeleteInworldVoice, withAuthorization: Basic <INWORLD_API_KEY>kept server-side and anInworldErrorthat flags 5xx as transient. - Clone route plumbing —
app/api/clone-voice/route.tsadds user-selectable provider parsing,validateProviderLocale, Inworld 3–15s constraints, mp3 output, a new-clone path that persists the voice, andhandleInworldVoiceReusethat re-synthesizes a savedvoiceId(redis-cached) without re-cloning. - Reusable-voice persistence — new RLS-scoped
audio_referencestable (voice_idglobally UNIQUE, FK cascade,updated_attrigger), query helpers,GET /api/audio-references, andDELETE /api/audio-references/[id](provider-side delete first, 502 keeps the row, 404 tolerated). - UI + i18n — engine dropdown, saved-voice dropdown with an AlertDialog delete confirm, voice-name field, reuse/new branching in
new.client.tsx, and strings translated across all 6 locales (full key parity). - Tests —
audio-references.test.tsand a new Inworld suite inclone-voice.test.tscover new-clone, reuse, unsupported-locale, provider 503, and delete 404/502, with MSW + query mocks insetup.ts.
⚠️ A minted Inworld voice can be orphaned when a later step fails
Once createInworldVoice succeeds it mints a persistent, billable voice in the Inworld workspace. Three later failure points in the new-clone request leave that voice with no cleanup and no way for the user to ever remove it:
- A synthesis failure inside
cloneVoiceWithInworld(the clone already succeeded) throws into the outercatch, which returns an error without deleting the voice. uploadGeneratedAudiothrowing after voice creation hits the samecatch→ 500, voice retained.insertAudioReferencefailing is caught and reported to Sentry, but the request still returns 200; the voice exists in Inworld with noaudio_referencesrow, so it never appears in the saved list and can't be deleted via the UI.
The DELETE route deliberately removes the provider-side voice first so it never orphans a remote voice behind a deleted row — the create path has no symmetric guarantee.
Technical details
# A minted Inworld voice can be orphaned when a later step fails
## Affected sites
- `apps/web/lib/clone/inworld.ts` `cloneVoiceWithInworld` — `createInworldVoice` then `synthesizeWithInworld`; a synth failure after a successful clone discards the new `voiceId`.
- `apps/web/app/api/clone-voice/route.ts` (~new lines 1756-1776) — `cloneVoiceWithInworld` + `uploadGeneratedAudio` inside the main `try`; any throw goes to the outer `catch` (~new line 1877) which performs no provider-side cleanup.
- `apps/web/app/api/clone-voice/route.ts` (~new lines 1822-1841) — `insertAudioReference` error is captured but the request returns 200, leaving the remote voice untracked and undeletable.
## Required outcome
- A voice minted by `createInworldVoice` must not persist in the Inworld workspace if the request cannot complete and persist an `audio_references` row for it.
## Suggested approach (optional)
- Track `createdInworldVoiceId` and, in the failure paths (outer catch for the new-clone branch, and the `insertAudioReference` error branch), best-effort `deleteInworldVoice(createdInworldVoiceId)` (it already tolerates 404). On `insertAudioReference` failure specifically, prefer rolling back the remote voice and returning an error rather than a 200, so the user isn't told the voice was saved when it wasn't.
## Open questions for the human
- Is silently orphaning acceptable for now (e.g. a periodic reconciliation/cleanup job exists or is planned), or should the request fail closed and roll back the remote voice? This affects both cost and the user-facing "saved voice" contract.ℹ️ New-clone cache hit returns before the voice is minted or saved
The new-clone path runs the shared redis output cache check (redis.get(filename), keyed on reference-audio hash + text + locale + provider) and returns the cached URL before the Inworld voice is created or persisted.
For the same reference audio + text + locale submitted twice with a voice name entered, the second request returns audio but mints/saves no voice — so the headline "save and reuse" affordance silently produces nothing, with no error shown.
Technical details
# New-clone cache hit returns before the voice is minted or saved
## Affected sites
- `apps/web/app/api/clone-voice/route.ts` (~line 1622) — `const cachedOutputUrl = await redis.get<string>(filename)` early-returns for all providers, ahead of the Inworld voice-create/persist block (~lines 1756-1841).
## Required outcome
- For Inworld new-clone requests, a cache hit on the synthesized output should not bypass minting/persisting a reusable voice when the user asked to save one — or the UX should make clear no voice was saved.
## Open questions for the human
- Is this acceptable given the low probability of identical audio+text resubmission, or should the Inworld new-clone path skip the output cache (since each request is meant to create a distinct saved voice)?ℹ️ Nitpicks
handleInworldVoiceReusesynthesizes with the currentselectedLocale.language, not the locale the voice was originally cloned with (the original locale isn't stored onaudio_references). Likely fine for the multilingualinworld-tts-2, but switching the language dropdown while reusing a saved voice changes thelanguagesent to Inworld.- The engine dropdown lets users pick Inworld with an Inworld-unsupported locale; the generate button stays enabled and the request only fails server-side with
errors.providerLocaleUnsupported. Consider disabling/annotating the Inworld option for unsupported locales. createInworldVoiceis exported fromlib/clone/inworld.tsbut only consumed internally bycloneVoiceWithInworld; it can be unexported unless reuse elsewhere is intended.
Claude Opus | 𝕏
- Map Inworld 4xx (non-403) responses as non-retryable client errors instead of 503 "temporarily unavailable", which invited retries that re-minted voices. - Roll back the created Inworld voice via deleteInworldVoice when the audio_references insert fails, so we don't orphan an untracked remote voice. - Validate the locale when Mistral is explicitly selected (Voxtral-only). - Cap voiceName length on the backend (60) to match the frontend. - Keep the delete confirmation dialog open until the request resolves. - Add translations + tests for the new error paths. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Thanks for the reviews 🙏 — pushed
Added tests for each new path and translations across all 6 locales. Not changed (by design, happy to revisit):
|
There was a problem hiding this comment.
Important
The new commit added rollback for the insertAudioReference failure path, but only one of the three orphan paths flagged in the prior review is now covered. A minted, billable Inworld voice can still be orphaned on a synthesis-after-clone failure or an upload failure. See the inline comment.
Reviewed changes — incremental review of the single new commit addressing prior review feedback on the Inworld voice-clone path.
- Orphan rollback (partial) — on
insertAudioReferencefailure the new-clone path now best-effortdeleteInworldVoice(createdInworldVoiceId)and reports the rollback error to Sentry, instead of returning 200 with an untracked remote voice. - Inworld
4xxhandling —throwInworldRouteErrornow maps non-4034xxresponses to a non-retryableerrors.providerRequestRejectedso failing requests don't masquerade as "temporarily unavailable" and mint another voice on retry. - Explicit-Mistral locale guard —
validateProviderLocalerejects unsupported locales when Mistral is explicitly selected (errors.providerLocaleUnsupported) rather than failing later/wasting credits. - Voice-name length cap — server-side
MAX_VOICE_NAME_LENGTH = 60validation with a newerrors.voiceNameTooLongcode. - Delete dialog — the AlertDialog confirm button now
preventDefault()s and awaitshandleDelete, keeping the dialog open until the request resolves. - i18n + tests —
voiceNameTooLong/providerRequestRejectedadded across all 6 locales; new tests cover the Mistral-locale rejection, over-length voice name, Inworld4xx, and theinsertAudioReference-failure rollback.
Claude Opus | 𝕏
|
|
||
| createdInworldVoiceId = result.voiceId; | ||
|
|
||
| outputUrl = await uploadGeneratedAudio( |
There was a problem hiding this comment.
insertAudioReference failure path. The other two orphan paths the prior review enumerated still mint a billable Inworld voice with no cleanup:
- A synthesis failure inside
cloneVoiceWithInworld(the clone already succeeded) throws into the innercatch→throwInworldRouteError, but thevoiceIdis never surfaced, so it can't be rolled back. - An
uploadGeneratedAudiofailure here throws to the outercatch(line ~1930), which performs no provider-side cleanup — andcreatedInworldVoiceIdis declared inside thetry(line 1779), so it isn't even in scope there.
Technical details
# Minted Inworld voice still orphaned on synth/upload failure
## Affected sites
- `apps/web/lib/clone/inworld.ts` `cloneVoiceWithInworld` (~lines 237-258) — `createInworldVoice` mints the voice, then `synthesizeWithInworld`; a synth failure throws without exposing the new `voiceId`.
- `apps/web/app/api/clone-voice/route.ts:1810` — `uploadGeneratedAudio` runs after `createdInworldVoiceId` is set; a throw here reaches the outer catch with no rollback.
- `apps/web/app/api/clone-voice/route.ts:1779` — `createdInworldVoiceId` is scoped inside the `try`, so the outer catch (~line 1930) can't reference it.
## Required outcome
- A voice minted by `createInworldVoice` must not persist in the Inworld workspace if synthesis or upload fails before an `audio_references` row is written.
## Suggested approach (optional)
- Have `cloneVoiceWithInworld` clean up its own minted voice (or surface the `voiceId` on the thrown error) when the synth step fails, so the clone-then-synth wrapper is atomic.
- Hoist `createdInworldVoiceId` above the `try`, or wrap the inworld branch so the outer catch best-effort `deleteInworldVoice(createdInworldVoiceId)` (already 404-tolerant) before returning the 500.
## Open questions for the human
- Is best-effort rollback on every failure path the intended contract, or is a periodic reconciliation/cleanup job planned to sweep orphaned voices instead?Let paid users start a voice call with the Inworld realtime engine using a cloned voice, alongside the default Grok engine. - Add `inworld-realtime` model + `audioReferenceId` through SessionConfig and the call-token schema/helpers. - call-token route: when model is inworld-realtime, gate on paid, resolve the audio_references id to the Inworld voice_id (ownership-checked) and pass it to the agent; Grok path unchanged. - Clone-only `POST /api/audio-references` (paid) mints an Inworld voice from an uploaded sample (no text/synthesis), reusing a shared prepareInworldReferenceAudio helper; rolls back the remote voice if the DB insert fails. - Call page: paid-gated engine dropdown + reusable Inworld voice picker (refactor CloneInworldVoiceSelect to be callback-based) with upload-to-create; Connect is gated until a voice is selected; voice changes trigger a reconnect. - i18n across all 6 locales; route + schema tests; test-harness fixes (isFreeUserOverCallLimit mock, crypto.randomUUID in the crypto mock). - Includes regenerated supabase types and local supabase snippets. Note: the LiveKit agent worker (sexycall) must be updated out-of-repo to handle model=inworld-realtime + the Inworld voice_id. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Added a follow-up commit (
|
There was a problem hiding this comment.
ℹ️ No new issues in the Call-page integration — one minor repo-hygiene nit below.
Reviewed changes — incremental review of the single new commit (1476dbe) adding an Inworld engine option to the Call page on top of the previously-reviewed clone-page work.
- Clone-only mint endpoint —
POST /api/audio-referencesmints a reusable Inworld voice from an upload (no synthesis): auth +hasUserPaidgate, multipart validation (name ≤ 60, locale viaisInworldSupportedLocale, size ≤CLONING_FILE_MAX_SIZE, 3s min duration), the new sharedprepareInworldReferenceAudiohelper,createInworldVoicewith4xx→400 /5xx→503 mapping, and best-effortdeleteInworldVoicerollback wheninsertAudioReferenceerrors. - Call-token Inworld branch —
app/api/call-token/route.tsresolvesinworld-realtimecalls via a paid gate + RLS-scopedgetAudioReferenceById(id, user.id)(403/400/404) intoresolvedVoiceId = voice_id; the Grok DB-voice path is unchanged. - Model + session config —
ModelId.INWORLD_REALTIME,audioReferenceIdthreaded throughsession-config,default-config,call-token-schema, andplayground-state-helpers. - UI —
CloneInworldVoiceSelectrefactored fromdispatch/selectedAudioReferenceIdto callback props (value/onChange/onVoiceDeleted); new engineSelectinconfiguration-form(Inworld disabled for free users) + newInworldVoiceSection;connect-buttondisables Connect when Inworld is selected with no voice;audio_reference_idadded toRECONNECT_REQUIRED_FIELDS. - i18n + tests — new
callnamespace strings across all 6 locales; newcall-token-route.test.ts, schema coverage incall-token.test.ts, and aPOSTsuite (mint / 403 / missing-name / insert-rollback) inaudio-references.test.ts.
ℹ️ Nitpicks
- Two local Supabase CLI scratch files are committed:
scripts/supabase/.branches/_current_branch(main) andscripts/supabase/.temp/cli-latest(v2.107.0). The.gitignorethat excludes.branches/.templives only underapps/web/supabase/, so thescripts/supabase/copies slipped in. Remove them and add ascripts/supabase/.gitignore(or extend an existing ignore) so CLI state isn't tracked.
Note: the prior review's open thread on
clone-voice/route.ts:1810(a minted Inworld voice can still be orphaned on a synthesis-after-clone or upload failure) is out of scope for this commit and remains unaddressed.
Claude Opus | 𝕏

Summary
Adds Inworld TTS as a third voice-clone provider (alongside Mistral/Replicate), selectable from a new engine dropdown on the clone page. Inworld cloned voices are now persisted in a new
audio_referencestable so users can reuse a saved voice (synthesize without re-uploading/re-cloning) and delete it (DB row + the Inworld-side voice).Changes
voices:clone+tts/v1/voice(MP3 output), server-sideAuthorization: BasicviaINWORLD_API_KEY,$25/1M-char cost, user-selectable engine; locale→langCode mapping and 3–15s reference constraints.audio_referencestable (migration), auto-save on every new Inworld clone, reuse path that skips upload/re-clone, andGET /api/audio-references+DELETE /api/audio-references/[id](delete also removes the Inworld voice, tolerating a 404).How to test
INWORLD_API_KEY(base64 Basic credential) and apply the migration (audio_references)./dashboard/clone, choose Inworld, upload a 5–15s sample, enter a voice name + text → generates an MP3 and saves the voice.Scope
Checklist
pnpm run fixallpnpm run type-checkNotes for reviewers
audio_referencesmigration applied andINWORLD_API_KEYset.lib/supabase/types.d.tswas hand-edited to include the new table (gen-types targets the remote project, which won't have the table until the migration is pushed) — re-runpnpm generate-supabase-typesafter pushing the migration to regenerate canonically.audio_referencesand removed (locally + remotely) on delete.is_paidcolumn is in place to support free-vs-paid limits later).🤖 Generated with Claude Code