Skip to content

feat(clone): Inworld provider with persisted, reusable voices#431

Open
gianpaj wants to merge 3 commits into
mainfrom
feat/inworld-cloned-voices
Open

feat(clone): Inworld provider with persisted, reusable voices#431
gianpaj wants to merge 3 commits into
mainfrom
feat/inworld-cloned-voices

Conversation

@gianpaj

@gianpaj gianpaj commented Jun 21, 2026

Copy link
Copy Markdown
Owner

Summary

Adds Inworld TTS as a third voice-clone provider (alongside Mistral/Replicate), selectable from a new engine dropdown on the clone page. Inworld cloned voices are now persisted in a new audio_references table so users can reuse a saved voice (synthesize without re-uploading/re-cloning) and delete it (DB row + the Inworld-side voice).

Changes

  • Inworld provider: non-streaming voices:clone + tts/v1/voice (MP3 output), server-side Authorization: Basic via INWORLD_API_KEY, $25/1M-char cost, user-selectable engine; locale→langCode mapping and 3–15s reference constraints.
  • Reusable voices: new RLS-scoped audio_references table (migration), auto-save on every new Inworld clone, reuse path that skips upload/re-clone, and GET /api/audio-references + DELETE /api/audio-references/[id] (delete also removes the Inworld voice, tolerating a 404).
  • UI + i18n: clone-page engine dropdown, saved-voice dropdown, voice-name field, and an AlertDialog delete confirm; new strings translated across all 6 locales.

How to test

  1. Set INWORLD_API_KEY (base64 Basic credential) and apply the migration (audio_references).
  2. On /dashboard/clone, choose Inworld, upload a 5–15s sample, enter a voice name + text → generates an MP3 and saves the voice.
  3. Re-select the saved voice from the dropdown (upload hidden) → generates again without re-cloning.
  4. Delete the saved voice via the trash button + confirm → row removed and the Inworld voice is deleted.
  5. Confirm Auto/Mistral/Replicate paths are unchanged.

Scope

  • Frontend
  • Backend
  • Database

Checklist

  • I self-reviewed this PR
  • I ran pnpm run fixall
  • I ran pnpm run type-check
  • I added or updated tests where needed
  • I updated translations for any user-facing text
  • I documented env vars, migrations, or rollout notes if needed

Notes for reviewers

  • Rollout: requires the audio_references migration applied and INWORLD_API_KEY set. lib/supabase/types.d.ts was hand-edited to include the new table (gen-types targets the remote project, which won't have the table until the migration is pushed) — re-run pnpm generate-supabase-types after pushing the migration to regenerate canonically.
  • New Inworld clones each create a persistent voice in the Inworld workspace; they're now tracked in audio_references and removed (locally + remotely) on delete.
  • No per-user cap on saved voices yet (the is_paid column is in place to support free-vs-paid limits later).

🤖 Generated with Claude Code

Add Inworld TTS as a third voice-clone provider (alongside Mistral/Replicate),
selectable via a new engine dropdown, and let users save and reuse cloned voices.

- Inworld clone provider: voices:clone + tts/v1/voice (non-streaming, MP3 out),
  server-side Basic auth (INWORLD_API_KEY), $25/1M-char cost, user-selectable.
- Persist cloned voices in a new RLS-scoped `audio_references` table; reuse a saved
  voice to synthesize without re-uploading or re-cloning; auto-save new clones.
- GET /api/audio-references (list) and DELETE /api/audio-references/[id] which also
  removes the Inworld-side voice (404-tolerant); clone page dropdown + name field +
  delete confirm modal.
- i18n across all 6 locales; unit tests for the route, reuse path, and new API routes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@bolt-new-by-stackblitz

Copy link
Copy Markdown

Review PR in StackBlitz Codeflow Run & review this pull request in StackBlitz Codeflow.

@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@vercel

vercel Bot commented Jun 21, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
sexyvoice Ready Ready Preview, Comment Jun 21, 2026 1:59pm
sexyvoice-docs Ready Ready Preview, Comment Jun 21, 2026 1:59pm

Request Review

@claude

claude Bot commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

Claude finished @gianpaj's task in 3m 20s —— View job


PR Review — Inworld cloned voices

Reviewed the full diff (provider, API routes, migration, queries, frontend, tests, i18n). Overall this is a well-structured, thorough PR — clean provider abstraction, defense-in-depth RLS (policies and explicit user_id filters), sensible delete ordering, good test coverage, and complete translations across all 6 locales. A few things worth addressing before merge; I've left inline comments on the most concrete ones.

🔴 Worth addressing

  • Inworld 4xx → 503 mislabeling (clone-voice/route.ts:1388): throwInworldRouteError maps every non-403 error to "provider temporarily unavailable, please try again". A 400/413 (bad audio, bad langCode, payload too large) then looks transient and invites retries that keep failing — and on the new-clone path each failed retry mints another Inworld voice. Distinguish 4xx client errors from 5xx/transient. (inline)
  • Orphaned remote voice on DB-insert failure (clone-voice/route.ts:1817-1842): if insertAudioReference fails (global voice_id UNIQUE, transient DB error), the just-created Inworld voice is never cleaned up and becomes untracked/un-reusable. Consider rolling back via the existing deleteInworldVoice. (inline)

🟡 Confirm intent

  • Reuse uses the request locale, not the clone-time locale (clone-voice/route.ts:1458): audio_references doesn't store the locale, so a voice cloned as en can be re-synthesized as ja by changing the dropdown. Fine if Inworld TTS-2 is per-voice multilingual; otherwise persist & reuse the original locale. (inline)
  • Credit model vs. real cost (utils.ts:411): credits charged come from the provider-agnostic estimateCredits(text, 'clone'), but Inworld ($25/1M chars, up to 4000 chars on paid Voxtral locales ≈ $0.10/gen) is far costlier than Replicate (~$0.012 flat) for identical credits. Confirm this delta is intentionally absorbed or make the estimate provider-aware. (inline)

🟢 Nits / minor

  • No server-side guard for Inworld's ≤4MB reference limit — an oversized clip is rejected by Inworld as a 4xx and (per the issue above) surfaces as 503. A client-side/early check would give a clearer message.
  • deleteVoiceConfirm busy state reuses the same label ("Delete..."); a dedicated "Deleting…" string reads better.
  • DELETE flow is nicely self-healing: provider-first deletion means a failed DB delete leaves a row that a retry will clean up (Inworld 404 tolerated). Good call, just noting it's intentional.

✅ Things done well

  • InworldError with transient/status, server-only Authorization: Basic, 404-tolerant delete.
  • Migration follows repo conventions (FK index, RLS with (select auth.uid()), reuses update_updated_at_column). Hand-edited types.d.ts matches the schema; the rollout note to regenerate after pushing is appreciated.
  • Reuse cache key (locale-provider-text-voiceId) is correctly scoped; cache hits return creditsUsed: 0 without scheduling billing.
  • Solid tests covering 404 tolerance, 502-keeps-row, and the GET/DELETE contracts.

I focused on review feedback only and did not change any code. Happy to implement any of the above if you'd like — just say the word.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces Inworld TTS voice cloning as a third provider option, allowing users to explicitly select it alongside Mistral and Replicate. It adds support for saving, reusing, and deleting cloned Inworld voices via a new audio_references database table and associated API endpoints (/api/audio-references). The review feedback highlights three important improvements: preventing the delete confirmation dialog from closing prematurely during asynchronous deletion, validating the selected locale for the explicitly chosen Mistral provider, and enforcing a maximum length constraint on the voice name in the backend to match the frontend limits.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +138 to +148
<AlertDialogAction asChild>
<Button
disabled={isDeleting}
onClick={handleDelete}
variant="destructive"
>
{isDeleting
? `${t('deleteVoiceConfirm')}...`
: t('deleteVoiceConfirm')}
</Button>
</AlertDialogAction>

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

When using AlertDialogAction with an async onClick handler like handleDelete, the dialog will close immediately when clicked. To prevent the dialog from closing prematurely (especially if the deletion fails or while it is in progress), you should call event.preventDefault() in the onClick handler. Since handleDelete already calls setIsOpen(false) on success, this will ensure the dialog only closes after a successful deletion.

                <AlertDialogAction asChild>
                  <Button
                    disabled={isDeleting}
                    onClick={(event) => {
                      event.preventDefault();
                      handleDelete();
                    }}
                    variant="destructive"
                  >
                    {isDeleting
                      ? t('deleteVoiceConfirm') + '...'
                      : t('deleteVoiceConfirm')}
                  </Button>
                </AlertDialogAction>

Comment on lines +355 to +364
function validateProviderLocale(provider: CloneProvider, locale: string): void {
if (provider === 'inworld' && !isInworldSupportedLocale(locale)) {
throw createRouteError(
`Inworld voice cloning does not support the language: ${locale}`,
400,
'errors.providerLocaleUnsupported',
{ provider, locale },
);
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The validateProviderLocale function currently only validates the locale for the inworld provider. However, since the frontend now allows explicitly selecting the mistral provider, we should also validate that the selected locale is supported by Mistral (i.e., is a Voxtral-supported locale). Otherwise, selecting Mistral with an unsupported language like Japanese will result in unexpected API failures or wasted credits.

function validateProviderLocale(provider: CloneProvider, locale: string): void {
  if (provider === 'inworld' && !isInworldSupportedLocale(locale)) {
    throw createRouteError(
      "Inworld voice cloning does not support the language: " + locale,
      400,
      'errors.providerLocaleUnsupported',
      { provider, locale },
    );
  }
  if (provider === 'mistral' && !isVoxtralCloneLocale(locale)) {
    throw createRouteError(
      "Mistral voice cloning does not support the language: " + locale,
      400,
      'errors.providerLocaleUnsupported',
      { provider, locale },
    );
  }
}

Comment thread apps/web/app/api/clone-voice/route.ts Outdated
Comment on lines +1587 to +1593
if (provider === 'inworld' && !formInput.voiceName) {
throw createRouteError(
'A voice name is required to save the cloned voice.',
400,
'errors.voiceNameRequired',
);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To prevent potential database or third-party API issues with excessively long voice names, we should validate the length of voiceName in the backend to match the frontend's maxLength={60} constraint.

    // Saving a new Inworld voice requires a name.
    if (provider === 'inworld') {
      if (!formInput.voiceName) {
        throw createRouteError(
          'A voice name is required to save the cloned voice.',
          400,
          'errors.voiceNameRequired',
        );
      }
      if (formInput.voiceName.length > 60) {
        throw createRouteError(
          'Voice name must be 60 characters or less.',
          400,
          'errors.voiceNameRequired',
        );
      }
    }

);
}

throw createProviderUnavailableRouteError('inworld');

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inworld 4xx (non-403) errors are surfaced as 503 "temporarily unavailable".

throwInworldRouteError only special-cases 403; every other non-2xx (including 400/413 — e.g. bad reference audio, unsupported langCode, payload too large) is mapped to createProviderUnavailableRouteError('inworld'), which renders as "provider is temporarily unavailable, please try again."

For a genuine client-side problem this is misleading and invites pointless retries that keep failing (and on the new-clone path, each retry mints another Inworld voice). Consider forwarding 4xx as a non-retryable client error and reserving the 503 mapping for error.transient (5xx):

if (error.status === 403) { /* guardrail */ }
if (error.status >= 400 && error.status < 500) {
  throw createRouteError(/* a 4xx client error */, error.status, ...);
}
throw createProviderUnavailableRouteError('inworld');


let result: Awaited<ReturnType<typeof synthesizeWithInworld>>;
try {
result = await synthesizeWithInworld({ text, locale, voiceId });

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reuse synthesizes with the currently selected locale, not the locale the voice was cloned with.

audio_references doesn't persist the clone-time locale/langCode, so reuse takes locale from the request (the dropdown). A user can clone a voice as en and then re-synthesize it as ja simply by changing the language selector. If Inworld TTS-2 is genuinely multilingual per voice this may be intentional — but if a voice is language-bound, store the locale on the audio_references row at creation and reuse that here instead of trusting the request locale. Worth confirming either way.

Comment on lines +1817 to +1842
if (
provider === 'inworld' &&
createdInworldVoiceId &&
formInput.voiceName
) {
const inserted = await insertAudioReference({
userId: user.id,
provider: 'inworld',
voiceId: createdInworldVoiceId,
name: formInput.voiceName,
isPaid: userHasPaid,
});

if (inserted.error) {
captureException(inserted.error, {
user: { id: user.id },
extra: { voiceId: createdInworldVoiceId },
});
} else if (inserted.data) {
createdAudioReference = {
id: inserted.data.id,
name: inserted.data.name,
voice_id: inserted.data.voice_id,
};
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Orphaned remote voice when the DB insert fails. If insertAudioReference returns an error (e.g. the global voice_id UNIQUE constraint, or a transient DB failure), it's captured to Sentry but the Inworld voice created moments earlier in cloneVoiceWithInworld is never deleted — it lingers in the Inworld workspace untracked and un-reusable. Since deleteInworldVoice already exists and tolerates 404s, consider calling it to roll back the remote voice when persistence fails (the user still gets their generated audio; they just lose the un-saveable voice).

Comment thread apps/web/lib/utils.ts
Comment on lines +411 to +416
if (provider === 'inworld') {
// inworld-tts-2 - $25 per 1M characters
return text
? (text.length / 1_000_000) * INWORLD_TTS2_DOLLARS_PER_MILLION_CHARS
: -1;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heads-up on credit pricing vs. actual cost for Inworld. getDollarCost correctly models Inworld at $25/1M chars, but the credits actually charged to the user come from estimateCredits(text, 'clone') (route.ts) — a word-count × clone multiplier that's provider-agnostic. So a clone charges the same credits whether it runs on Replicate ($0.012 flat) or Inworld. Inworld supports Voxtral locales where paid users can submit up to 4000 chars ($0.10/generation), making it the most expensive provider by far while charging identical credits. Worth confirming the credit model intentionally absorbs that delta, or making the estimate provider-aware.

@argos-ci

argos-ci Bot commented Jun 21, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Argos notifications ↗︎

Build Status Details Updated (UTC)
default (Inspect) ⚠️ Changes detected (Review) 3 changed Jun 21, 2026, 2:00 PM

@pullfrog pullfrog Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

One issue worth addressing before merge: a successfully-minted (and billable) Inworld voice can be orphaned with no cleanup when a later step in the clone request fails. Everything else is solid — clean RLS, good test coverage, careful delete path.

Reviewed changes — initial review of adding Inworld TTS as a third, persisted/reusable voice-clone provider.

  • Inworld clientlib/clone/inworld.ts adds createInworldVoice, synthesizeWithInworld, cloneVoiceWithInworld, and a 404-tolerant deleteInworldVoice, with Authorization: Basic <INWORLD_API_KEY> kept server-side and an InworldError that flags 5xx as transient.
  • Clone route plumbingapp/api/clone-voice/route.ts adds user-selectable provider parsing, validateProviderLocale, Inworld 3–15s constraints, mp3 output, a new-clone path that persists the voice, and handleInworldVoiceReuse that re-synthesizes a saved voiceId (redis-cached) without re-cloning.
  • Reusable-voice persistence — new RLS-scoped audio_references table (voice_id globally UNIQUE, FK cascade, updated_at trigger), query helpers, GET /api/audio-references, and DELETE /api/audio-references/[id] (provider-side delete first, 502 keeps the row, 404 tolerated).
  • UI + i18n — engine dropdown, saved-voice dropdown with an AlertDialog delete confirm, voice-name field, reuse/new branching in new.client.tsx, and strings translated across all 6 locales (full key parity).
  • Testsaudio-references.test.ts and a new Inworld suite in clone-voice.test.ts cover new-clone, reuse, unsupported-locale, provider 503, and delete 404/502, with MSW + query mocks in setup.ts.

⚠️ A minted Inworld voice can be orphaned when a later step fails

Once createInworldVoice succeeds it mints a persistent, billable voice in the Inworld workspace. Three later failure points in the new-clone request leave that voice with no cleanup and no way for the user to ever remove it:

  • A synthesis failure inside cloneVoiceWithInworld (the clone already succeeded) throws into the outer catch, which returns an error without deleting the voice.
  • uploadGeneratedAudio throwing after voice creation hits the same catch → 500, voice retained.
  • insertAudioReference failing is caught and reported to Sentry, but the request still returns 200; the voice exists in Inworld with no audio_references row, so it never appears in the saved list and can't be deleted via the UI.

The DELETE route deliberately removes the provider-side voice first so it never orphans a remote voice behind a deleted row — the create path has no symmetric guarantee.

Technical details
# A minted Inworld voice can be orphaned when a later step fails

## Affected sites
- `apps/web/lib/clone/inworld.ts` `cloneVoiceWithInworld``createInworldVoice` then `synthesizeWithInworld`; a synth failure after a successful clone discards the new `voiceId`.
- `apps/web/app/api/clone-voice/route.ts` (~new lines 1756-1776) — `cloneVoiceWithInworld` + `uploadGeneratedAudio` inside the main `try`; any throw goes to the outer `catch` (~new line 1877) which performs no provider-side cleanup.
- `apps/web/app/api/clone-voice/route.ts` (~new lines 1822-1841) — `insertAudioReference` error is captured but the request returns 200, leaving the remote voice untracked and undeletable.

## Required outcome
- A voice minted by `createInworldVoice` must not persist in the Inworld workspace if the request cannot complete and persist an `audio_references` row for it.

## Suggested approach (optional)
- Track `createdInworldVoiceId` and, in the failure paths (outer catch for the new-clone branch, and the `insertAudioReference` error branch), best-effort `deleteInworldVoice(createdInworldVoiceId)` (it already tolerates 404). On `insertAudioReference` failure specifically, prefer rolling back the remote voice and returning an error rather than a 200, so the user isn't told the voice was saved when it wasn't.

## Open questions for the human
- Is silently orphaning acceptable for now (e.g. a periodic reconciliation/cleanup job exists or is planned), or should the request fail closed and roll back the remote voice? This affects both cost and the user-facing "saved voice" contract.

ℹ️ New-clone cache hit returns before the voice is minted or saved

The new-clone path runs the shared redis output cache check (redis.get(filename), keyed on reference-audio hash + text + locale + provider) and returns the cached URL before the Inworld voice is created or persisted.

For the same reference audio + text + locale submitted twice with a voice name entered, the second request returns audio but mints/saves no voice — so the headline "save and reuse" affordance silently produces nothing, with no error shown.

Technical details
# New-clone cache hit returns before the voice is minted or saved

## Affected sites
- `apps/web/app/api/clone-voice/route.ts` (~line 1622) — `const cachedOutputUrl = await redis.get<string>(filename)` early-returns for all providers, ahead of the Inworld voice-create/persist block (~lines 1756-1841).

## Required outcome
- For Inworld new-clone requests, a cache hit on the synthesized output should not bypass minting/persisting a reusable voice when the user asked to save one — or the UX should make clear no voice was saved.

## Open questions for the human
- Is this acceptable given the low probability of identical audio+text resubmission, or should the Inworld new-clone path skip the output cache (since each request is meant to create a distinct saved voice)?

ℹ️ Nitpicks

  • handleInworldVoiceReuse synthesizes with the current selectedLocale.language, not the locale the voice was originally cloned with (the original locale isn't stored on audio_references). Likely fine for the multilingual inworld-tts-2, but switching the language dropdown while reusing a saved voice changes the language sent to Inworld.
  • The engine dropdown lets users pick Inworld with an Inworld-unsupported locale; the generate button stays enabled and the request only fails server-side with errors.providerLocaleUnsupported. Consider disabling/annotating the Inworld option for unsupported locales.
  • createInworldVoice is exported from lib/clone/inworld.ts but only consumed internally by cloneVoiceWithInworld; it can be unexported unless reuse elsewhere is intended.

Pullfrog  | Fix it ➔View workflow run | Using Claude Opus𝕏

- Map Inworld 4xx (non-403) responses as non-retryable client errors instead
  of 503 "temporarily unavailable", which invited retries that re-minted voices.
- Roll back the created Inworld voice via deleteInworldVoice when the
  audio_references insert fails, so we don't orphan an untracked remote voice.
- Validate the locale when Mistral is explicitly selected (Voxtral-only).
- Cap voiceName length on the backend (60) to match the frontend.
- Keep the delete confirmation dialog open until the request resolves.
- Add translations + tests for the new error paths.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@gianpaj

gianpaj commented Jun 21, 2026

Copy link
Copy Markdown
Owner Author

Thanks for the reviews 🙏 — pushed 4a8cb07e addressing the actionable findings:

  • Inworld 4xx → 503 mislabel (claude/gemini): throwInworldRouteError now forwards non-403 4xx as a non-retryable client error (errors.providerRequestRejected) instead of "temporarily unavailable", so bad-input requests no longer invite retries that re-mint voices.
  • Orphaned remote voice on insert failure (claude): if insertAudioReference fails, we now roll back the just-created Inworld voice via the existing deleteInworldVoice (best-effort, 404-tolerant). The user still receives their generated audio.
  • Mistral locale not validated on explicit select (gemini): validateProviderLocale now rejects non-Voxtral locales when Mistral is explicitly chosen.
  • Backend voiceName cap (gemini): rejected over 60 chars to match the frontend.
  • Delete dialog closed before completion (gemini): the confirm action now preventDefault()s and only closes once the request resolves.

Added tests for each new path and translations across all 6 locales.

Not changed (by design, happy to revisit):

  • Reuse uses the request locale, not the clone-time locale — Inworld TTS‑2 is multilingual per voice, so cross-lingual reuse is intentional. If you'd prefer language-bound voices, I can persist locale on the audio_references row.
  • Credits are provider-agnosticestimateCredits(text,'clone') is the existing model; Inworld being pricier is a product decision rather than a code bug.

@pullfrog pullfrog Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

The new commit added rollback for the insertAudioReference failure path, but only one of the three orphan paths flagged in the prior review is now covered. A minted, billable Inworld voice can still be orphaned on a synthesis-after-clone failure or an upload failure. See the inline comment.

Reviewed changes — incremental review of the single new commit addressing prior review feedback on the Inworld voice-clone path.

  • Orphan rollback (partial) — on insertAudioReference failure the new-clone path now best-effort deleteInworldVoice(createdInworldVoiceId) and reports the rollback error to Sentry, instead of returning 200 with an untracked remote voice.
  • Inworld 4xx handlingthrowInworldRouteError now maps non-403 4xx responses to a non-retryable errors.providerRequestRejected so failing requests don't masquerade as "temporarily unavailable" and mint another voice on retry.
  • Explicit-Mistral locale guardvalidateProviderLocale rejects unsupported locales when Mistral is explicitly selected (errors.providerLocaleUnsupported) rather than failing later/wasting credits.
  • Voice-name length cap — server-side MAX_VOICE_NAME_LENGTH = 60 validation with a new errors.voiceNameTooLong code.
  • Delete dialog — the AlertDialog confirm button now preventDefault()s and awaits handleDelete, keeping the dialog open until the request resolves.
  • i18n + testsvoiceNameTooLong / providerRequestRejected added across all 6 locales; new tests cover the Mistral-locale rejection, over-length voice name, Inworld 4xx, and the insertAudioReference-failure rollback.

Pullfrog  | Fix all ➔Fix 👍s ➔View workflow run | Using Claude Opus𝕏


createdInworldVoiceId = result.voiceId;

outputUrl = await uploadGeneratedAudio(

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ The rollback added below only covers the insertAudioReference failure path. The other two orphan paths the prior review enumerated still mint a billable Inworld voice with no cleanup:

  • A synthesis failure inside cloneVoiceWithInworld (the clone already succeeded) throws into the inner catchthrowInworldRouteError, but the voiceId is never surfaced, so it can't be rolled back.
  • An uploadGeneratedAudio failure here throws to the outer catch (line ~1930), which performs no provider-side cleanup — and createdInworldVoiceId is declared inside the try (line 1779), so it isn't even in scope there.
Technical details
# Minted Inworld voice still orphaned on synth/upload failure

## Affected sites
- `apps/web/lib/clone/inworld.ts` `cloneVoiceWithInworld` (~lines 237-258) — `createInworldVoice` mints the voice, then `synthesizeWithInworld`; a synth failure throws without exposing the new `voiceId`.
- `apps/web/app/api/clone-voice/route.ts:1810``uploadGeneratedAudio` runs after `createdInworldVoiceId` is set; a throw here reaches the outer catch with no rollback.
- `apps/web/app/api/clone-voice/route.ts:1779``createdInworldVoiceId` is scoped inside the `try`, so the outer catch (~line 1930) can't reference it.

## Required outcome
- A voice minted by `createInworldVoice` must not persist in the Inworld workspace if synthesis or upload fails before an `audio_references` row is written.

## Suggested approach (optional)
- Have `cloneVoiceWithInworld` clean up its own minted voice (or surface the `voiceId` on the thrown error) when the synth step fails, so the clone-then-synth wrapper is atomic.
- Hoist `createdInworldVoiceId` above the `try`, or wrap the inworld branch so the outer catch best-effort `deleteInworldVoice(createdInworldVoiceId)` (already 404-tolerant) before returning the 500.

## Open questions for the human
- Is best-effort rollback on every failure path the intended contract, or is a periodic reconciliation/cleanup job planned to sweep orphaned voices instead?

Let paid users start a voice call with the Inworld realtime engine using a
cloned voice, alongside the default Grok engine.

- Add `inworld-realtime` model + `audioReferenceId` through SessionConfig and
  the call-token schema/helpers.
- call-token route: when model is inworld-realtime, gate on paid, resolve the
  audio_references id to the Inworld voice_id (ownership-checked) and pass it to
  the agent; Grok path unchanged.
- Clone-only `POST /api/audio-references` (paid) mints an Inworld voice from an
  uploaded sample (no text/synthesis), reusing a shared prepareInworldReferenceAudio
  helper; rolls back the remote voice if the DB insert fails.
- Call page: paid-gated engine dropdown + reusable Inworld voice picker (refactor
  CloneInworldVoiceSelect to be callback-based) with upload-to-create; Connect is
  gated until a voice is selected; voice changes trigger a reconnect.
- i18n across all 6 locales; route + schema tests; test-harness fixes
  (isFreeUserOverCallLimit mock, crypto.randomUUID in the crypto mock).
- Includes regenerated supabase types and local supabase snippets.

Note: the LiveKit agent worker (sexycall) must be updated out-of-repo to handle
model=inworld-realtime + the Inworld voice_id.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@gianpaj

gianpaj commented Jun 21, 2026

Copy link
Copy Markdown
Owner Author

Added a follow-up commit (1476dbe7) extending this PR to the Call page: paid users can now start a voice call with the Inworld realtime engine using a cloned voice.

  • inworld-realtime model + audioReferenceId threaded through SessionConfig / call-token schema.
  • /api/call-token: paid-gated Inworld branch resolves the audio_references id → Inworld voice_id (ownership-checked) for the agent; Grok path unchanged.
  • Clone-only POST /api/audio-references (paid) mints a voice from an uploaded sample — no text/synthesis — reusing a shared prepareInworldReferenceAudio helper, with remote rollback on insert failure.
  • Call page: engine dropdown (Inworld paid-only) + reusable voice picker (refactored CloneInworldVoiceSelect to be callback-based) with upload-to-create; Connect gated until a voice is chosen.
  • Route + schema tests; i18n for all 6 locales; harness fixes (isFreeUserOverCallLimit mock + crypto.randomUUID). Also includes a regenerated types.d.ts.

⚠️ Out of repo: the LiveKit agent worker (sexycall) must be updated to handle model=inworld-realtime + the Inworld voice_id before calls are actually Inworld-voiced end-to-end.

@pullfrog pullfrog Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ℹ️ No new issues in the Call-page integration — one minor repo-hygiene nit below.

Reviewed changes — incremental review of the single new commit (1476dbe) adding an Inworld engine option to the Call page on top of the previously-reviewed clone-page work.

  • Clone-only mint endpointPOST /api/audio-references mints a reusable Inworld voice from an upload (no synthesis): auth + hasUserPaid gate, multipart validation (name ≤ 60, locale via isInworldSupportedLocale, size ≤ CLONING_FILE_MAX_SIZE, 3s min duration), the new shared prepareInworldReferenceAudio helper, createInworldVoice with 4xx→400 / 5xx→503 mapping, and best-effort deleteInworldVoice rollback when insertAudioReference errors.
  • Call-token Inworld branchapp/api/call-token/route.ts resolves inworld-realtime calls via a paid gate + RLS-scoped getAudioReferenceById(id, user.id) (403/400/404) into resolvedVoiceId = voice_id; the Grok DB-voice path is unchanged.
  • Model + session configModelId.INWORLD_REALTIME, audioReferenceId threaded through session-config, default-config, call-token-schema, and playground-state-helpers.
  • UICloneInworldVoiceSelect refactored from dispatch/selectedAudioReferenceId to callback props (value/onChange/onVoiceDeleted); new engine Select in configuration-form (Inworld disabled for free users) + new InworldVoiceSection; connect-button disables Connect when Inworld is selected with no voice; audio_reference_id added to RECONNECT_REQUIRED_FIELDS.
  • i18n + tests — new call namespace strings across all 6 locales; new call-token-route.test.ts, schema coverage in call-token.test.ts, and a POST suite (mint / 403 / missing-name / insert-rollback) in audio-references.test.ts.

ℹ️ Nitpicks

  • Two local Supabase CLI scratch files are committed: scripts/supabase/.branches/_current_branch (main) and scripts/supabase/.temp/cli-latest (v2.107.0). The .gitignore that excludes .branches/.temp lives only under apps/web/supabase/, so the scripts/supabase/ copies slipped in. Remove them and add a scripts/supabase/.gitignore (or extend an existing ignore) so CLI state isn't tracked.

Note: the prior review's open thread on clone-voice/route.ts:1810 (a minted Inworld voice can still be orphaned on a synthesis-after-clone or upload failure) is out of scope for this commit and remains unaddressed.

Pullfrog  | Fix it ➔View workflow run | Using Claude Opus𝕏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant