Skip to content

fix: parity & bug-fix wave on the 0.6.6/0.6.7 features — EvalSession TS port, explicit-kwarg precedence, env-key path, telemetry chain guard#172

Merged
nicolotognoni merged 9 commits into
mainfrom
fix/parity-bugfix-wave
Jun 11, 2026
Merged

fix: parity & bug-fix wave on the 0.6.6/0.6.7 features — EvalSession TS port, explicit-kwarg precedence, env-key path, telemetry chain guard#172
nicolotognoni merged 9 commits into
mainfrom
fix/parity-bugfix-wave

Conversation

@nicolotognoni

Copy link
Copy Markdown
Collaborator

Summary

  • Consolidation pass on the already-shipped 0.6.6/0.6.7 feature wave: close Python⇄TypeScript parity gaps, fix bugs, and harden integration — no new feature surface beyond parity.
  • Biggest item: EvalSession ported to TypeScript (the last big parity gap) — a real-pipeline eval harness driving the actual StreamHandler, with chainable assertions, scripted LLM provider, and an unstubbed getpatter eval CLI.
  • Plus five targeted fixes: explicit voice=/model= precedence over engine markers (Python), OPENAI_API_KEY env handling on the provider-string path (both SDKs — TS had a latent dead-call), a telemetry close/chain race guard (TS), barge_in_mode/barge_in_confirm_ms exposed on the Python agent() factory, and missing regression tests for the 0.6.7 telemetry delivery fix.

Implementation

  • EvalSession TS (src/evals/): EvalSession constructs a real pipeline-mode StreamHandler and injects turns through the live-call path; fakes only at the STT/TTS/audio-sender/LLM boundary. expect() assertions, LLMJudge, EvalCase.agent/llmProvider routing, JSON/YAML suite loader, CLI eval run with exit codes 0/1/2. Field names, defaults, and report rows byte-compatible with Python. Reviewed post-port: arguments field parity, bounded await of aborted dispatch before timeout throw, crypto-based call ids, judge response guard.
  • Explicit-kwarg precedence (Python): voice/model moved to sentinel None defaults so explicit values — even equal to the documented default — win over engine= marker values (mirrors TS ?? resolution). Telemetry model resolution updated consistently.
  • Env-key path (both SDKs): provider="openai_realtime" with no configured key now backfills OPENAI_API_KEY from the environment into the local config so the call path actually uses it. Python previously rejected despite its error message promising env support; TS accepted at validation but dialed with an empty key.
  • Telemetry (TS): the post-flush chain now stops once close() has begun (parity with Python's not _closed); added the missing regression tests in both SDKs for the 0.6.7 "event recorded during an in-flight flush" fix, using a gated real HTTP collector.
  • Python factory parity: barge_in_mode / barge_in_confirm_ms as agent() keywords with validation (were dataclass-only).
  • Hygiene: removed leftover temporary DIAG INFO instrumentation from the TS stream handler; scrubbed external-project references from source comments; dropped a stale DTMF TODO.
  • Docs: local_recording/localRecording documented across both SDK doc trees (8 pages); runnable examples added for pause-resume barge-in, preemptive generation, and smart-turn semantic EOU (both languages) + examples index.

Breaking change?

No. Two edge-case behavior corrections, both documented under ## Unreleased → Fixed:

  • explicit model="gpt-realtime-mini" + engine with a different model now runs the explicit model in Python (previously the engine's — TS already did this);
  • provider="openai_realtime" with only an env key is now accepted in Python (previously rejected) and actually works in TS (previously dead call).

Test plan

  • Python: pytest tests/ — 2648 passed, 8 skipped, 2 xfailed
  • TypeScript: npm test (2096 passed, 9 skipped) + npm run lint + npm run build
  • Cross-SDK parity suite: 10/10 — verified both with and without OPENAI_API_KEY in the env (the env-set run is what caught the env-key divergence)
  • New regression tests RED-verified against reverted fixes (telemetry chain, kwarg precedence)
  • E2E smoke: not required (no pipeline/handler behavior change beyond the eval harness, which is test-infrastructure)

Docs updates

  • docs/python-sdk/{features,reference,local-mode,events}.mdx, docs/typescript-sdk/{features,reference,local-mode,events}.mdx (local recording)
  • docs/examples/: pause-resume-barge-in.{py,ts}, preemptive-generation.{py,ts}, smart-turn-detection.{py,ts}, README.md index

@mintlify

mintlify Bot commented Jun 11, 2026

Copy link
Copy Markdown

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
patter-06b046ce 🟢 Ready View Preview Jun 11, 2026, 2:28 PM

💡 Tip: Enable Workflows to automatically generate PRs for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant