feat(api): OpenAI-compatible /v1 endpoint surface (chat, embeddings, audio, realtime) by HexaField · Pull Request #854 · coasys/ad4m

HexaField · 2026-06-08T13:36:25Z

Summary

Adds an industry-standard, OpenAI-compatible HTTP/WebSocket API surface mounted at /v1, alongside the existing AD4M WS-RPC transport. Any tool that speaks OpenAI — the official openai SDKs, LangChain, LlamaIndex, the Vercel AI SDK, Cursor/Continue, Open WebUI, the llm CLI — can point its base_url at a running AD4M executor and just work.

All five phases from the design proposal land together:

Phase	Endpoint(s)	Status
1	`GET /v1/models`, `POST /v1/chat/completions` (non-streaming), `POST /v1/completions` (legacy), `POST /v1/embeddings`, OpenAI error envelope	✅
2	`POST /v1/chat/completions` with `stream: true` — token-by-token SSE	✅
3	`POST /v1/audio/transcriptions` — multipart upload, batch Whisper	✅ (WAV 16 kHz mono PCM first cut)
4	`POST /v1/audio/speech` — TTS	✅ via OpenAI-compatible remote passthrough
5	`GET /v1/realtime` — WebSocket streaming STT (OpenAI Realtime subset)	✅

The native WS-RPC ai.* surface, core/AIClient, Flux, and the launcher are unchanged. Existing clients see no behaviour change.

What changed

New module `rust-executor/src/api/openai_compat/`

File	Role
`mod.rs`	module wiring
`router.rs`	axum router mounted at `/v1` and `/api/v1/openai/v1`
`types.rs`	serde structs mirroring OpenAI's wire schema
`errors.rs`	`{ "error": { "message", "type", "param", "code" } }` envelope + `IntoResponse`
`model_selector.rs`	resolve `model` strings → AD4M `model_id` (id → name → `"default"`)
`models.rs`	`GET /v1/models`
`chat.rs`	`POST /v1/chat/completions` (oneshot + SSE) and `POST /v1/completions`
`embeddings.rs`	`POST /v1/embeddings` — raw `f32` arrays
`audio.rs`	`POST /v1/audio/transcriptions` (multipart STT) + `POST /v1/audio/speech` (TTS)
`tts_passthrough.rs`	OpenAI-compatible upstream passthrough for `/v1/audio/speech`
`realtime.rs`	`GET /v1/realtime` WebSocket (subset of OpenAI Realtime transcription)

AIService surface additions

Two new public methods on AIService so the OpenAI layer can drive the existing per-model LLM threads without bypassing them:

prompt_messages(model_id, messages: Vec<(role, content)>) -> PromptResult — builds an ephemeral in-memory AITask for the duration of the call (no DB writes), spawns it on the per-model thread, prompts, removes. Stateless by design — backs /v1/chat/completions and /v1/completions.
prompt_messages_stream(model_id, messages) -> (mpsc::Receiver<String>, oneshot::Receiver<PromptResult>) — same shape but the LLM thread forwards each Kalosm token through an mpsc channel. Backs stream: true.

Plus:

transcribe_buffer(model_id, samples, auth_token) -> String — open / feed / drain / close loop around the existing transcription engine. Backs the batch /v1/audio/transcriptions endpoint.
A new LLMTaskRequest::PromptStream variant in the per-model thread's command channel. Local Kalosm calls task.run(prompt).into_stream() and forwards tokens; remote upstreams collapse to a single terminal chunk (a native streaming upstream client is a follow-up).

Auth + billing

Reuses the existing JWT extractor (AuthContext) and capability framework verbatim. The OpenAI layer changes the shape of requests, not the policy:

Authorization: Bearer <jwt> (or ?token= for the WebSocket realtime endpoint, same as events_ws)
Capabilities: chat/completions/embeddings → AI_PROMPT, transcriptions + realtime → AI_TRANSCRIBE, speech → AI_PROMPT, models → AI_READ
bill_compute deducts per-user credits with the existing operation labels (ai_prompt, ai_embedding, ai_transcription, ai_tts).
InsufficientCredits → HTTP 429 with type: "insufficient_quota" matching OpenAI's quota-exhaustion shape so SDKs that retry on 429 do the right thing.

Error envelope

Every error response carries the OpenAI envelope:

{ "error": { "message": "...", "type": "invalid_request_error", "param": null, "code": "..." } }

With correct HTTP status codes: 400 / 401 / 403 / 404 / 429 / 500 / 501.

Audio decode (Phase 3 scope note)

The batch transcription endpoint accepts 16 kHz mono PCM WAV out of the box — the simplest container that requires no new dependencies and matches Whisper's reference format. Other containers (mp3, m4a, ogg, flac) return a clear 400 pointing at the Content-Type so clients can transcode locally; full container support via symphonia + rubato is a self-contained follow-up that wasn't worth pulling into the same diff.

TTS backend (Phase 4 scope note)

/v1/audio/speech forwards to a configured OpenAI-compatible upstream via reqwest (no new dependencies). The proposal calls out passthrough as a valid Phase-4 deliverable alongside local Kokoro/Piper backends; the local engines pull in ort + voice model assets that deserve their own review and didn't make sense to ship in the same PR as the rest of the surface.

When no upstream is configured the endpoint returns 501 with a clear message pointing at the configuration path.

Routing

The /v1 surface mounts at two paths:

/v1/... — the canonical OpenAI path
/api/v1/openai/v1/... — alias for proxies that hard-code the /api/v1 prefix from AD4M's native surface

Both share handlers and AppState.

Test plan

What's still deferred (not in this PR)

Full audio container support (symphonia + rubato for mp3/m4a/ogg/flac decode + resample). Out of scope; clients should transcode to 16 kHz mono WAV today.
Local TTS backend (Kokoro via ort, or Piper). ModelType::TextToSpeech enum variant, a dedicated TTS-typed model registration, and the local backend itself ship in a follow-up.
Tokenizer-exact usage counts. Local Kalosm prompts still report chars/4 estimates; remote upstreams already return exact counts and we forward those. Wiring exact counts on the local path means reading the Kalosm tokenizer's encoded length per request — straightforward but better tracked alongside the per-token billing follow-up.
Per-model concurrency. Each model is currently a single thread processing requests serially. A public OpenAI endpoint invites concurrent load; a small worker pool per model + HTTP 429 backpressure is the obvious next step.

Companion / follow-up artefacts

None for this PR — there's no client-side migration needed. External tools point their base_url at the executor and that's it. AD4M-native clients keep using core/AIClient against the WS-RPC surface unchanged.

cc @lucksus @data-bot-coasys for review.

Summary by CodeRabbit

Release Notes

New Features

Added OpenAI-compatible API endpoints at /v1 for chat completions, embeddings, audio transcription, and text-to-speech synthesis.
Streaming chat completions now available via standard OpenAI API protocol.
WebSocket real-time audio transcription endpoint launched.
Models listing endpoint provides access to registered models.

…audio, realtime) Adds an industry-standard, OpenAI-compatible HTTP/WS API mounted at \`/v1\` alongside the existing AD4M WS-RPC transport. Any tool that speaks OpenAI — the official \`openai\` SDKs, LangChain, LlamaIndex, the Vercel AI SDK, Cursor/Continue, Open WebUI, \`llm\` — can point \`base_url\` at the executor and just work. All five proposal phases land together. Surface ======= GET /v1/models list registered models POST /v1/chat/completions stateless chat, oneshot + SSE streaming POST /v1/completions legacy text-completion shim POST /v1/embeddings raw f32 arrays (no zlib+b64) POST /v1/audio/transcriptions multipart batch whisper POST /v1/audio/speech TTS, remote OpenAI-compatible passthrough GET /v1/realtime WS, OpenAI Realtime-style streaming STT Also mounted at \`/api/v1/openai/v1\` for proxies that hard-code the existing AD4M prefix. What changed ============ New module \`api/openai_compat/\` (11 files, ~1200 LOC): mod.rs module wiring router.rs axum router mounted at /v1 and /api/v1/openai/v1 types.rs serde structs mirroring OpenAI's wire schema errors.rs { error: { message, type, param, code } } envelope model_selector.rs resolve model strings → AD4M model_id (id → name → "default") models.rs GET /v1/models chat.rs POST /v1/chat/completions (oneshot + SSE) + completions embeddings.rs POST /v1/embeddings — raw f32 arrays audio.rs POST /v1/audio/transcriptions + speech tts_passthrough.rs OpenAI-compatible upstream passthrough for /v1/audio/speech realtime.rs GET /v1/realtime WebSocket (subset of OpenAI Realtime STT) AIService surface additions ============================ AIService::prompt_messages(model_id, messages: Vec<(role, content)>) Stateless ephemeral task: build → spawn → prompt → remove on the existing per-model LLM thread. No DB writes. AIService::prompt_messages_stream(model_id, messages) Same shape but the LLM thread forwards each Kalosm token through an mpsc::UnboundedReceiver<String>. Backs stream: true. Local Kalosm uses ChatResponseBuilder's Stream<Item=String> impl; remote upstreams degrade to one terminal chunk (native streaming upstream client is a follow-up). AIService::transcribe_buffer(model_id, samples, auth_token) One-shot transcription: open → feed → drain broadcast → close. Backs the batch /v1/audio/transcriptions endpoint. LLMTaskRequest::PromptStream new variant in the per-model thread's command channel; carries token_sender + done_sender. PromptResult now derives Debug + Clone (needed for oneshot transit). Auth + billing =============== Reuses the existing JWT extractor (AuthContext from auth.rs) and capability framework verbatim. Authorization: Bearer <jwt> (or ?token= for the realtime WS, same as events_ws). Capabilities map: chat/completions/embeddings → AI_PROMPT transcriptions + realtime → AI_TRANSCRIBE speech → AI_PROMPT models → AI_READ bill_compute deducts per-user credits with the existing operation labels (ai_prompt, ai_embedding, ai_transcription, ai_tts). InsufficientCredits → HTTP 429 with type: "insufficient_quota" matching OpenAI's quota-exhaustion shape so SDKs retrying on 429 do the right thing. Error envelope =============== Every error response carries: { "error": { "message", "type", "param", "code" } } With correct HTTP status (400 / 401 / 403 / 404 / 429 / 500 / 501). Audio decode (Phase 3 scope note) ================================== Batch transcription accepts 16 kHz mono PCM WAV out of the box (the simplest container that requires no new deps and matches Whisper's reference format). Other containers (mp3, m4a, ogg, flac) return a 400 pointing at the Content-Type so clients can transcode locally; full container support via \`symphonia\` + \`rubato\` is a self- contained follow-up that wasn't worth pulling into the same diff. TTS backend (Phase 4 scope note) ================================= /v1/audio/speech forwards to a configured OpenAI-compatible upstream via the existing reqwest dep (no new deps). The proposal allows passthrough as a valid Phase-4 deliverable alongside local Kokoro/Piper backends; the local engines pull in ort + voice model assets that deserve their own review and didn't make sense to ship in the same diff. No upstream configured → 501 with a clear message. Dependency surface =================== Only change: axum gains the \`multipart\` feature (already had \`ws\`). No new top-level deps. Native surface unchanged ========================= The WS-RPC ai.* methods, core/AIClient, Flux, and the launcher are untouched. Existing clients see no behaviour change. Refs: \`PROPOSAL_AI_OPENAI_COMPATIBLE_ENDPOINT.md\`

coderabbitai · 2026-06-08T13:36:35Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9b5d9ff9-50c8-43d0-8805-d1aafe8de362

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

✅ Review completed - (🔄 Check again to review again)

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/openai-compat-endpoint

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

🧹 Nitpick comments (2)

rust-executor/src/api/openai_compat/realtime.rs (1)

171-186: ⚖️ Poor tradeoff

Synchronous drain may miss transcription deltas between appends.

The try_recv() loop (line 178) only drains events available at the moment of the append. If Whisper produces transcription output after the drain completes but before the next append arrives, those deltas won't be delivered until the subsequent append—or not at all if the client stops sending audio.

Consider spawning a background forwarder task that continuously drains the broadcast receiver, or documenting this as a known limitation of the simplified protocol.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rust-executor/src/api/openai_compat/realtime.rs` around lines 171 - 186, The
synchronous drain using try_recv() in the while loop only retrieves
transcription deltas available at the moment of append, which can miss deltas
produced by Whisper after the drain completes but before the next append
arrives. To fix this, either spawn a background forwarder task that continuously
drains the broadcast receiver (rx) in a separate async task instead of draining
synchronously on each append, or add clear documentation explaining this
behavior as a known limitation where transcription deltas may be delayed or
missed if audio appends stop after Whisper produces output.

rust-executor/src/api/openai_compat/chat.rs (1)

247-262: ⚡ Quick win

done_rx result is ignored; consider using it for accurate billing.

The done_rx receiver carries PromptResult with actual prompt_tokens and completion_tokens, but let _ = done_rx.await discards them. The flat 1.0 billing charge (line 253-257) could be replaced with token-based billing using these counts for parity with the non-streaming path.

Suggested improvement

-            if let Some(email) = user_email(&auth_clone) {
-                let _ = bill_compute(
-                    &email,
-                    1.0,
-                    "ai_prompt",
-                    Some("v1/chat/completions[stream]"),
-                );
-            }
-            let _ = done_rx.await;
+            // Await the final result to get token counts for billing
+            if let Ok(Ok(result)) = done_rx.await {
+                if let Some(email) = user_email(&auth_clone) {
+                    let total = (result.prompt_tokens + result.completion_tokens) as f64;
+                    let _ = bill_compute(
+                        &email,
+                        total,  // or apply a per-token rate
+                        "ai_prompt",
+                        Some("v1/chat/completions[stream]"),
+                    );
+                }
+            }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rust-executor/src/api/openai_compat/chat.rs` around lines 247 - 262, The code
currently discards the result from done_rx.await by using let _ =, but this
receiver carries PromptResult containing actual prompt_tokens and
completion_tokens that should be used for accurate billing. Instead of the flat
1.0 charge being applied to the bill_compute call, capture the PromptResult from
done_rx.await, extract the prompt_tokens and completion_tokens values from it,
and use those actual token counts when calling bill_compute to achieve
token-based billing parity with the non-streaming chat.completions path.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@rust-executor/src/ai_service/mod.rs`:
- Around line 1164-1171: The issue is in the "user" match arm where consecutive
user messages cause content loss. When a second user message arrives
(current_user is Some), the first user message is temporarily saved to
final_prompt, but then overwritten by setting current_user to the new message,
resulting in the first message being discarded. Instead of replacing prev_user
with current_user, concatenate them together or append the new content to the
existing content so both consecutive user messages are preserved. Modify the
logic where prev_user is taken and content is assigned to current_user to
combine the messages rather than discard the earlier one.

In `@rust-executor/src/api/openai_compat/audio.rs`:
- Around line 229-267: The WAV chunk parsing loop validates that the header can
be read with `pos + 8 <= bytes.len()` but fails to validate that the entire
chunk data fits within the buffer before advancing `pos` by the chunk size.
After reading the chunk size value with `u32::from_le_bytes`, add a bounds check
to ensure `pos + size <= bytes.len()` before processing any chunk data. This
check should occur immediately after parsing the size and before the match
statement, so that any chunk with an inflated size is rejected early rather than
allowing `pos` to advance beyond the buffer bounds, which could cause panics or
undefined behavior on subsequent iterations.

In `@rust-executor/src/api/openai_compat/chat.rs`:
- Around line 80-88: The billing check in the code block (lines 80-88) occurs
after the inference work is already completed via prompt_messages, meaning
compute resources are consumed before validating the user has sufficient
credits. Move the credit check before calling prompt_messages so that users are
rejected upfront if they lack sufficient credits, rather than having work
performed that cannot be billed. Alternatively, if post-inference billing is
required, remove the error return from the bill_compute failure handling and
instead log the billing failure while still returning the inference result to
the user.

In `@rust-executor/src/api/openai_compat/embeddings.rs`:
- Around line 55-57: The current pattern in the bill_compute call only matches
and handles BillingError::InsufficientCredits, which means other billing errors
are silently dropped and the function continues to return a successful 200
response, creating unbilled calls when billing is degraded. Change the error
handling to catch all billing failures from bill_compute, not just
InsufficientCredits. When any Err result is returned from bill_compute, return
an appropriate error response (such as 402 Payment Required) instead of allowing
the embedding response to succeed. This ensures that all billing failures,
regardless of the error type, properly reject the request rather than silently
succeeding.

In `@rust-executor/src/api/openai_compat/model_selector.rs`:
- Around line 24-36: The special case handling for the requested model being
"default" is occurring too early in the resolution logic, before checking if
there is an actual model with the id or name "default". Move the `if requested
== "default"` block that calls `Ad4mDb::with_global_instance` to resolve the
default model (currently checking the keyword and using `get_default_model` with
`type_label`) to after the id/name resolution passes so that exact id/name
matches are attempted first, and only if no model is found by id or name should
the keyword "default" be resolved to the configured default model.

In `@rust-executor/src/api/openai_compat/tts_passthrough.rs`:
- Around line 102-114: The first `.find()` call in the candidate selection logic
only checks if m.api.is_some() but doesn't verify that the api_type is OpenAi,
allowing a non-OpenAI model with the requested name to incorrectly match. Add
the same api_type verification to the first `.find()` condition that checks
m.api.as_ref().map(|a| matches!(a.api_type,
ModelApiType::OpenAi)).unwrap_or(false), so both the exact-match path and the
fallback path enforce that the model is OpenAI-compatible.
- Around line 48-58: The url variable constructed at lines 48-54 is not being
used in the client.post() call at line 58. Instead of using the url variable,
the code rebuilds the URL with format!("{url}/v1/audio/speech"), which is
redundant and likely incorrect. Remove this dead code by using the url variable
directly in the client.post() method call, rather than reconstructing it with an
additional format! macro.

---

Nitpick comments:
In `@rust-executor/src/api/openai_compat/chat.rs`:
- Around line 247-262: The code currently discards the result from done_rx.await
by using let _ =, but this receiver carries PromptResult containing actual
prompt_tokens and completion_tokens that should be used for accurate billing.
Instead of the flat 1.0 charge being applied to the bill_compute call, capture
the PromptResult from done_rx.await, extract the prompt_tokens and
completion_tokens values from it, and use those actual token counts when calling
bill_compute to achieve token-based billing parity with the non-streaming
chat.completions path.

In `@rust-executor/src/api/openai_compat/realtime.rs`:
- Around line 171-186: The synchronous drain using try_recv() in the while loop
only retrieves transcription deltas available at the moment of append, which can
miss deltas produced by Whisper after the drain completes but before the next
append arrives. To fix this, either spawn a background forwarder task that
continuously drains the broadcast receiver (rx) in a separate async task instead
of draining synchronously on each append, or add clear documentation explaining
this behavior as a known limitation where transcription deltas may be delayed or
missed if audio appends stop after Whisper produces output.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: eff96a40-6e67-4337-9fbe-d1a34800b5b6

📥 Commits

Reviewing files that changed from the base of the PR and between f852873 and bfd01c5.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (14)

rust-executor/Cargo.toml
rust-executor/src/ai_service/mod.rs
rust-executor/src/api/mod.rs
rust-executor/src/api/openai_compat/audio.rs
rust-executor/src/api/openai_compat/chat.rs
rust-executor/src/api/openai_compat/embeddings.rs
rust-executor/src/api/openai_compat/errors.rs
rust-executor/src/api/openai_compat/mod.rs
rust-executor/src/api/openai_compat/model_selector.rs
rust-executor/src/api/openai_compat/models.rs
rust-executor/src/api/openai_compat/realtime.rs
rust-executor/src/api/openai_compat/router.rs
rust-executor/src/api/openai_compat/tts_passthrough.rs
rust-executor/src/api/openai_compat/types.rs

coderabbitai · 2026-06-15T23:35:14Z

+                "user" => {
+                    if let Some(prev_user) = current_user.take() {
+                        // user→user with no assistant in between; treat
+                        // earlier user as a finalised final prompt slot
+                        // (will be overwritten if more turns follow).
+                        final_prompt = prev_user;
+                    }
+                    current_user = Some(content.clone());


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Consecutive user messages cause content loss.

When two user messages appear without an intervening assistant message (e.g., [user1, user2]), user1 is assigned to final_prompt but then immediately overwritten when user2 sets current_user. At the end of parsing, only user2 survives as final_prompt—user1 is silently discarded.

Consider concatenating consecutive user messages or converting earlier ones into the system prompt:

Proposed fix

"user" => { if let Some(prev_user) = current_user.take() { - // user→user with no assistant in between; treat - // earlier user as a finalised final prompt slot - // (will be overwritten if more turns follow). - final_prompt = prev_user; + // user→user with no assistant in between; + // concatenate into a single user turn. + current_user = Some(format!("{}\n{}", prev_user, content)); + } else { + current_user = Some(content.clone()); } - current_user = Some(content.clone()); }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"user" => {

if let Some(prev_user) = current_user.take() {

// user→user with no assistant in between; treat

// earlier user as a finalised final prompt slot

// (will be overwritten if more turns follow).

final_prompt = prev_user;

}

current_user = Some(content.clone());

"user" => {

if let Some(prev_user) = current_user.take() {

// user→user with no assistant in between;

// concatenate into a single user turn.

current_user = Some(format!("{}\n{}", prev_user, content));

} else {

current_user = Some(content.clone());

}

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@rust-executor/src/ai_service/mod.rs` around lines 1164 - 1171, The issue is in the "user" match arm where consecutive user messages cause content loss. When a second user message arrives (current_user is Some), the first user message is temporarily saved to final_prompt, but then overwritten by setting current_user to the new message, resulting in the first message being discarded. Instead of replacing prev_user with current_user, concatenate them together or append the new content to the existing content so both consecutive user messages are preserved. Modify the logic where prev_user is taken and content is assigned to current_user to combine the messages rather than discard the earlier one.

coderabbitai · 2026-06-15T23:35:14Z

+    while pos + 8 <= bytes.len() {
+        let id = &bytes[pos..pos + 4];
+        let size = u32::from_le_bytes([
+            bytes[pos + 4],
+            bytes[pos + 5],
+            bytes[pos + 6],
+            bytes[pos + 7],
+        ]) as usize;
+        pos += 8;
+        match id {
+            b"fmt " => {
+                if pos + 16 > bytes.len() {
+                    return Err(OpenAIError::invalid_request("Truncated WAV fmt chunk"));
+                }
+                audio_format = u16::from_le_bytes([bytes[pos], bytes[pos + 1]]);
+                channels = u16::from_le_bytes([bytes[pos + 2], bytes[pos + 3]]);
+                sample_rate = u32::from_le_bytes([
+                    bytes[pos + 4],
+                    bytes[pos + 5],
+                    bytes[pos + 6],
+                    bytes[pos + 7],
+                ]);
+                bits_per_sample = u16::from_le_bytes([bytes[pos + 14], bytes[pos + 15]]);
+                pos += size;
+            }
+            b"data" => {
+                data_offset = Some(pos);
+                data_size = size;
+                pos += size;
+            }
+            _ => {
+                // Skip unknown chunks (LIST, INFO, etc).
+                pos += size;
+            }
+        }
+        if pos % 2 != 0 {
+            pos += 1; // RIFF chunks are word-aligned.
+        }
+    }


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Chunk size validation could read past buffer bounds.

The loop checks pos + 8 <= bytes.len() (line 229) but doesn't validate that pos + size stays within bounds before advancing. A malformed WAV with an inflated chunk size could cause pos to overflow or wrap, potentially reading garbage or causing a panic on subsequent iterations.

Proposed fix

while pos + 8 <= bytes.len() { let id = &bytes[pos..pos + 4]; let size = u32::from_le_bytes([ bytes[pos + 4], bytes[pos + 5], bytes[pos + 6], bytes[pos + 7], ]) as usize; pos += 8; + // Clamp size to remaining bytes to handle malformed/truncated files + let size = size.min(bytes.len().saturating_sub(pos)); match id {

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

while pos + 8 <= bytes.len() {

let id = &bytes[pos..pos + 4];

let size = u32::from_le_bytes([

bytes[pos + 4],

bytes[pos + 5],

bytes[pos + 6],

bytes[pos + 7],

]) as usize;

pos += 8;

match id {

b"fmt " => {

if pos + 16 > bytes.len() {

return Err(OpenAIError::invalid_request("Truncated WAV fmt chunk"));

}

audio_format = u16::from_le_bytes([bytes[pos], bytes[pos + 1]]);

channels = u16::from_le_bytes([bytes[pos + 2], bytes[pos + 3]]);

sample_rate = u32::from_le_bytes([

bytes[pos + 4],

bytes[pos + 5],

bytes[pos + 6],

bytes[pos + 7],

]);

bits_per_sample = u16::from_le_bytes([bytes[pos + 14], bytes[pos + 15]]);

pos += size;

}

b"data" => {

data_offset = Some(pos);

data_size = size;

pos += size;

}

_ => {

// Skip unknown chunks (LIST, INFO, etc).

pos += size;

}

}

if pos % 2 != 0 {

pos += 1; // RIFF chunks are word-aligned.

}

}

while pos + 8 <= bytes.len() {

let id = &bytes[pos..pos + 4];

let size = u32::from_le_bytes([

bytes[pos + 4],

bytes[pos + 5],

bytes[pos + 6],

bytes[pos + 7],

]) as usize;

pos += 8;

// Clamp size to remaining bytes to handle malformed/truncated files

let size = size.min(bytes.len().saturating_sub(pos));

match id {

b"fmt " => {

if pos + 16 > bytes.len() {

return Err(OpenAIError::invalid_request("Truncated WAV fmt chunk"));

}

audio_format = u16::from_le_bytes([bytes[pos], bytes[pos + 1]]);

channels = u16::from_le_bytes([bytes[pos + 2], bytes[pos + 3]]);

sample_rate = u32::from_le_bytes([

bytes[pos + 4],

bytes[pos + 5],

bytes[pos + 6],

bytes[pos + 7],

]);

bits_per_sample = u16::from_le_bytes([bytes[pos + 14], bytes[pos + 15]]);

pos += size;

}

b"data" => {

data_offset = Some(pos);

data_size = size;

pos += size;

}

_ => {

// Skip unknown chunks (LIST, INFO, etc).

pos += size;

}

}

if pos % 2 != 0 {

pos += 1; // RIFF chunks are word-aligned.

}

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@rust-executor/src/api/openai_compat/audio.rs` around lines 229 - 267, The WAV chunk parsing loop validates that the header can be read with `pos + 8 <= bytes.len()` but fails to validate that the entire chunk data fits within the buffer before advancing `pos` by the chunk size. After reading the chunk size value with `u32::from_le_bytes`, add a bounds check to ensure `pos + size <= bytes.len()` before processing any chunk data. This check should occur immediately after parsing the size and before the match statement, so that any chunk with an inflated size is rejected early rather than allowing `pos` to advance beyond the buffer bounds, which could cause panics or undefined behavior on subsequent iterations.

coderabbitai · 2026-06-15T23:35:14Z

+    if let Some(email) = user_email(&auth) {
+        if let Err(BillingError::InsufficientCredits) =
+            bill_compute(&email, 1.0, "ai_prompt", Some("v1/completions"))
+        {
+            return Err(OpenAIError::insufficient_quota(
+                "Insufficient compute credits",
+            ));
+        }
+    }


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Billing check after inference creates inconsistent behavior.

The billing check runs after prompt_messages completes (line 76-78), so the compute is consumed before checking credits. If the user has insufficient credits, they receive an error but the work was already done. This differs from typical pre-check patterns that reject requests upfront.

Consider either:

Checking credits before inference and failing fast

Removing the error return here (log the billing failure but return the result)

The current approach risks delivering no value despite consuming resources.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@rust-executor/src/api/openai_compat/chat.rs` around lines 80 - 88, The billing check in the code block (lines 80-88) occurs after the inference work is already completed via prompt_messages, meaning compute resources are consumed before validating the user has sufficient credits. Move the credit check before calling prompt_messages so that users are rejected upfront if they lack sufficient credits, rather than having work performed that cannot be billed. Alternatively, if post-inference billing is required, remove the error return from the bill_compute failure handling and instead log the billing failure while still returning the inference result to the user.

coderabbitai · 2026-06-15T23:35:14Z

+        if let Err(BillingError::InsufficientCredits) =
+            bill_compute(&email, 1.0, "ai_embedding", Some("v1/embeddings"))
+        {


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don’t swallow non-quota billing failures
Lines 55-57 only handle InsufficientCredits; other billing errors are dropped and the embedding response still returns 200. That creates unbilled successful calls when billing is degraded.

Suggested patch

- if let Some(email) = crate::agent::capabilities::user_email_from_token(auth.auth_token.clone()) - { - if let Err(BillingError::InsufficientCredits) = - bill_compute(&email, 1.0, "ai_embedding", Some("v1/embeddings")) - { - return Err(OpenAIError::insufficient_quota( - "Insufficient compute credits", - )); - } - } + if let Some(email) = crate::agent::capabilities::user_email_from_token(auth.auth_token.clone()) + { + match bill_compute(&email, 1.0, "ai_embedding", Some("v1/embeddings")) { + Ok(_) => {} + Err(BillingError::InsufficientCredits) => { + return Err(OpenAIError::insufficient_quota( + "Insufficient compute credits", + )); + } + Err(e) => { + return Err(OpenAIError::internal(format!("Billing error: {e}"))); + } + } + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if let Err(BillingError::InsufficientCredits) =

bill_compute(&email, 1.0, "ai_embedding", Some("v1/embeddings"))

{

if let Some(email) = crate::agent::capabilities::user_email_from_token(auth.auth_token.clone())

{

match bill_compute(&email, 1.0, "ai_embedding", Some("v1/embeddings")) {

Ok(_) => {}

Err(BillingError::InsufficientCredits) => {

return Err(OpenAIError::insufficient_quota(

"Insufficient compute credits",

));

}

Err(e) => {

return Err(OpenAIError::internal(format!("Billing error: {e}")));

}

}

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@rust-executor/src/api/openai_compat/embeddings.rs` around lines 55 - 57, The current pattern in the bill_compute call only matches and handles BillingError::InsufficientCredits, which means other billing errors are silently dropped and the function continues to return a successful 200 response, creating unbilled calls when billing is degraded. Change the error handling to catch all billing failures from bill_compute, not just InsufficientCredits. When any Err result is returned from bill_compute, return an appropriate error response (such as 402 Payment Required) instead of allowing the embedding response to succeed. This ensures that all billing failures, regardless of the error type, properly reject the request rather than silently succeeding.

coderabbitai · 2026-06-15T23:35:14Z

+    // Resolve `default` via the DB's per-type pointer.  This mirrors
+    // `AIService::replace_model_variables` so OpenAI callers can use the
+    // same shorthand as native AD4M ones.
+    if requested == "default" {
+        return Ad4mDb::with_global_instance(|db| db.get_default_model(expected_type.clone()))
+            .map_err(|e| OpenAIError::internal(format!("Database error: {e}")))?
+            .ok_or_else(|| {
+                OpenAIError::not_found(format!(
+                    "No default {} model configured on this executor",
+                    type_label(&expected_type)
+                ))
+            });
+    }


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Honor the documented resolution precedence for "default"
Line 27 resolves "default" before the id/name passes, which makes any real model id/name "default" unreachable. Move the keyword handling after Lines 42-65 to preserve id/name-first semantics.

Suggested patch

- // Resolve `default` via the DB's per-type pointer. This mirrors - // `AIService::replace_model_variables` so OpenAI callers can use the - // same shorthand as native AD4M ones. - if requested == "default" { - return Ad4mDb::with_global_instance(|db| db.get_default_model(expected_type.clone())) - .map_err(|e| OpenAIError::internal(format!("Database error: {e}")))? - .ok_or_else(|| { - OpenAIError::not_found(format!( - "No default {} model configured on this executor", - type_label(&expected_type) - )) - }); - } - let models = Ad4mDb::with_global_instance(|db| db.get_models()) .map_err(|e| OpenAIError::internal(format!("Database error listing models: {e}")))?; @@ if let Some(m) = models.iter().find(|m| m.name == requested) { @@ return Ok(m.id.clone()); } + + // Pass 3: reserved keyword `default`. + if requested == "default" { + return Ad4mDb::with_global_instance(|db| db.get_default_model(expected_type.clone())) + .map_err(|e| OpenAIError::internal(format!("Database error: {e}")))? + .ok_or_else(|| { + OpenAIError::not_found(format!( + "No default {} model configured on this executor", + type_label(&expected_type) + )) + }); + }

Also applies to: 41-65

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@rust-executor/src/api/openai_compat/model_selector.rs` around lines 24 - 36, The special case handling for the requested model being "default" is occurring too early in the resolution logic, before checking if there is an actual model with the id or name "default". Move the `if requested == "default"` block that calls `Ad4mDb::with_global_instance` to resolve the default model (currently checking the keyword and using `get_default_model` with `type_label`) to after the id/name resolution passes so that exact id/name matches are attempted first, and only if no model is found by id or name should the keyword "default" be resolved to the configured default model.

coderabbitai · 2026-06-15T23:35:14Z

+    let url = format!(
+        "{}/audio/speech",
+        upstream
+            .base_url
+            .trim_end_matches('/')
+            .trim_end_matches("/v1"),
+    );
+
+    let client = reqwest::Client::new();
+    let resp = client
+        .post(format!("{url}/v1/audio/speech"))


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Dead code: url on line 48 is immediately discarded.

Line 48 builds a url string including /audio/speech, but line 58 rebuilds the URL from the trimmed base_url and appends /v1/audio/speech again. The intermediate format! at line 48-54 is dead code.

Proposed fix

- let url = format!( - "{}/audio/speech", - upstream - .base_url - .trim_end_matches('/') - .trim_end_matches("/v1"), - ); + let base = upstream + .base_url + .trim_end_matches('/') + .trim_end_matches("/v1"); let client = reqwest::Client::new(); let resp = client - .post(format!("{url}/v1/audio/speech")) + .post(format!("{base}/v1/audio/speech")) .bearer_auth(&upstream.api_key)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

let url = format!(

"{}/audio/speech",

upstream

.base_url

.trim_end_matches('/')

.trim_end_matches("/v1"),

);

let client = reqwest::Client::new();

let resp = client

.post(format!("{url}/v1/audio/speech"))

let base = upstream

.base_url

.trim_end_matches('/')

.trim_end_matches("/v1");

let client = reqwest::Client::new();

let resp = client

.post(format!("{base}/v1/audio/speech"))

.bearer_auth(&upstream.api_key)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@rust-executor/src/api/openai_compat/tts_passthrough.rs` around lines 48 - 58, The url variable constructed at lines 48-54 is not being used in the client.post() call at line 58. Instead of using the url variable, the code rebuilds the URL with format!("{url}/v1/audio/speech"), which is redundant and likely incorrect. Remove this dead code by using the url variable directly in the client.post() method call, rather than reconstructing it with an additional format! macro.

coderabbitai · 2026-06-15T23:35:14Z

+    let candidate = models
+        .iter()
+        .find(|m| m.model_type == ModelType::Llm && m.api.is_some() && m.name == requested_model)
+        .or_else(|| {
+            models.iter().find(|m| {
+                m.model_type == ModelType::Llm
+                    && m.api
+                        .as_ref()
+                        .map(|a| matches!(a.api_type, ModelApiType::OpenAi))
+                        .unwrap_or(false)
+            })
+        })
+        .ok_or(TtsPassthroughError::NotConfigured)?;


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Exact-match path doesn't verify api_type is OpenAI.

The first .find() at line 104 checks m.api.is_some() but doesn't verify api_type == OpenAi. A model named "tts-1" with a non-OpenAI API (e.g., local) would match, and the passthrough would attempt to use credentials that may not work with an OpenAI-compatible endpoint.

Proposed fix

let candidate = models .iter() - .find(|m| m.model_type == ModelType::Llm && m.api.is_some() && m.name == requested_model) + .find(|m| { + m.model_type == ModelType::Llm + && m.api + .as_ref() + .map(|a| matches!(a.api_type, ModelApiType::OpenAi)) + .unwrap_or(false) + && m.name == requested_model + }) .or_else(|| {

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

let candidate = models

.iter()

.find(|m| m.model_type == ModelType::Llm && m.api.is_some() && m.name == requested_model)

.or_else(|| {

models.iter().find(|m| {

m.model_type == ModelType::Llm

&& m.api

.as_ref()

.map(|a| matches!(a.api_type, ModelApiType::OpenAi))

.unwrap_or(false)

})

})

.ok_or(TtsPassthroughError::NotConfigured)?;

let candidate = models

.iter()

.find(|m| {

m.model_type == ModelType::Llm

&& m.api

.as_ref()

.map(|a| matches!(a.api_type, ModelApiType::OpenAi))

.unwrap_or(false)

&& m.name == requested_model

})

.or_else(|| {

models.iter().find(|m| {

m.model_type == ModelType::Llm

&& m.api

.as_ref()

.map(|a| matches!(a.api_type, ModelApiType::OpenAi))

.unwrap_or(false)

})

})

.ok_or(TtsPassthroughError::NotConfigured)?;

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@rust-executor/src/api/openai_compat/tts_passthrough.rs` around lines 102 - 114, The first `.find()` call in the candidate selection logic only checks if m.api.is_some() but doesn't verify that the api_type is OpenAi, allowing a non-OpenAI model with the requested name to incorrectly match. Add the same api_type verification to the first `.find()` condition that checks m.api.as_ref().map(|a| matches!(a.api_type, ModelApiType::OpenAi)).unwrap_or(false), so both the exact-match path and the fallback path enforce that the model is OpenAI-compatible.

HexaField marked this pull request as ready for review June 15, 2026 23:27

Merge branch 'dev' into feat/openai-compat-endpoint

bfd01c5

coderabbitai Bot reviewed Jun 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(api): OpenAI-compatible /v1 endpoint surface (chat, embeddings, audio, realtime)#854

feat(api): OpenAI-compatible /v1 endpoint surface (chat, embeddings, audio, realtime)#854
HexaField wants to merge 2 commits into
devfrom
feat/openai-compat-endpoint

HexaField commented Jun 8, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 8, 2026 •

edited

Loading

Review skipped

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 15, 2026

Uh oh!

coderabbitai Bot Jun 15, 2026

Uh oh!

coderabbitai Bot Jun 15, 2026

Uh oh!

coderabbitai Bot Jun 15, 2026

Uh oh!

coderabbitai Bot Jun 15, 2026

Uh oh!

coderabbitai Bot Jun 15, 2026

Uh oh!

coderabbitai Bot Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

-        if let Err(BillingError::InsufficientCredits) =
-            bill_compute(&email, 1.0, "ai_embedding", Some("v1/embeddings"))
-        {
+    if let Some(email) = crate::agent::capabilities::user_email_from_token(auth.auth_token.clone())
+    {
+        match bill_compute(&email, 1.0, "ai_embedding", Some("v1/embeddings")) {
+            Ok(_) => {}
+            Err(BillingError::InsufficientCredits) => {
+                return Err(OpenAIError::insufficient_quota(
+                    "Insufficient compute credits",
+                ));
+            }
+            Err(e) => {
+                return Err(OpenAIError::internal(format!("Billing error: {e}")));
+            }
+        }
+    }

Uh oh!

Conversation

HexaField commented Jun 8, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

New module rust-executor/src/api/openai_compat/

AIService surface additions

Auth + billing

Error envelope

Audio decode (Phase 3 scope note)

TTS backend (Phase 4 scope note)

Routing

Test plan

What's still deferred (not in this PR)

Companion / follow-up artefacts

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

HexaField commented Jun 8, 2026 •

edited by coderabbitai Bot

Loading

New module `rust-executor/src/api/openai_compat/`

coderabbitai Bot commented Jun 8, 2026 •

edited

Loading