feat(model-service): proxy supports stream + replay, byte passthrough, ForwardBackend/ReplayBackend#935
Merged
Merged
Conversation
…d/replay 替换 model-service `proxy` 模式手写的 httpx forward + retry_async,改为基于 litellm SDK 调用,同时新增 chat/completions 轨迹的录制与顺序回放能力,服务 SWE-agent / mini-swe-agent / OpenHands 等 deterministic agent 的无 LLM 成本调试。 主要改动: - proxy.py 改用 litellm.acompletion(num_retries / extra_headers / streaming) - 新增 TrajectoryRecorder(CustomLogger) 录制 StandardLoggingPayload 到 JSONL - 新增 TrajectoryReplayer(CustomLLM) + SequentialCursor 顺序回放单个 jsonl 文件 - ModelServiceConfig 新增 num_retries / traj_enabled / traj_file / replay_traj_path - CLI 新增 --num-retries / --traj-file(同时承担 replay 入口) - local 模式保留旧 record_traj 装饰器,不受影响 - 删除 examples 旧 YAML,改 README 主推纯 CLI 启动方式 - docs/dev/litellm_proxy_refactor.md 写明设计与 breaking change Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…s cost calc Three fixes to proxy.py uncovered while testing against DashScope (glm-5): - Extract Bearer token from incoming Authorization header and pass as litellm api_key kwarg; setting it via extra_headers does not work because litellm always regenerates Authorization from api_key. Authorization is now stripped from the forwarded header set. - Switch upstream prefix from openai/<model> to custom_openai/<model>. This is litellm's standard pattern for OpenAI-compatible third-party endpoints (DashScope, ModelScope, Groq, Mistral, ...) and avoids "model isn't mapped" on arbitrary upstream model names. - Pass input_cost_per_token=0 / output_cost_per_token=0 so litellm's cost calculator does not raise "model isn't mapped" on unknown models and pollute StandardLoggingPayload.response_cost_failure_debug_information. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ursor The replay path no longer goes through litellm. We have a complete OpenAI-shape response on disk, so routing it through CustomLLM/CustomStreamWrapper just to translate formats was pure overhead — and the source of every replay-side bug (cursor-exhausted retried 6× and wrapped as APIConnectionError, GenericStreamingChunk type gymnastics, finish_reason hardcoded to "stop", reasoning_content dropped on streaming, tool_calls reconstruction left as TODO). Changes: - traj_replayer.py: delete TrajectoryReplayer(CustomLLM) and helpers; keep just SequentialCursor. Cursor exhaustion now raises a plain TrajectoryExhausted. - proxy.py: in replay mode, fetch from app.state.replay_cursor and emit either the raw response dict (non-stream) or one SSE chunk + [DONE] (stream). The stream path renames message → delta and preserves all fields verbatim (finish_reason, tool_calls, reasoning_content, ...). - main.py: rename _configure_litellm_for_proxy → _configure_proxy_integrations. Replay branch now just attaches a SequentialCursor to app.state; no litellm.custom_provider_map registration. - Tests: drop the CustomLLM-based replayer tests; keep cursor tests; add three end-to-end proxy replay tests covering non-stream / stream / cursor exhausted. 43 passed. Direct curl against DashScope glm-5: record + replay (both modes) verified end-to-end. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…penai SDK as parser Replaces litellm.acompletion with raw httpx forwarding. The proxy no longer parses or rewrites the OpenAI protocol on the forward path — body bytes go upstream as-is, response bytes come back as-is. The openai SDK is kept solely as a parser library for the recording side: ChatCompletionChunk + the official ChatCompletionStreamState aggregate streaming chunks into a final ChatCompletion that the recorder writes to JSONL. This restores the proxy's original "transparent forward" intent and eliminates several litellm-specific pain points encountered during testing: - No "model isn't mapped yet" cost-calc exception (no calc happens at all). - No need for the input_cost_per_token=0 / custom_openai prefix workarounds. - Authorization header passes through verbatim (no api_key extraction kludge). - Provider-specific fields (reasoning_content, citations, ...) are preserved byte-for-byte going to the client AND auto-aggregated in the recorded traj (openai SDK uses extra="allow" pydantic mode). - Cursor exhaustion in replay returns 404 directly, never gets retried. Changes: - pyproject.toml: drop litellm>=1.50.0, add openai>=1.50.0 and httpx - proxy.py: rewrite forward path with httpx; record streams via dual-purpose byte forwarding + parallel SSE parsing into ChatCompletionStreamState - traj_recorder.py: drop CustomLogger inheritance; expose explicit recorder.record(request, response, status, ...) API called from proxy.py - main.py: attach recorder/cursor to app.state instead of registering with litellm.callbacks / litellm.custom_provider_map - test_proxy.py: rewritten to use httpx.MockTransport for upstream mocking; cover byte passthrough, provider-specific field preservation, error forwarding, recorder invocation, replay paths - test_traj_recorder.py: rewritten for the explicit-call API 36 passed. End-to-end verified against DashScope glm-5: streaming record, non-stream replay, streaming replay, cursor exhausted -> 404 all work. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ments) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…orwardBackend Strategy pattern eliminates the replay/forward branch inside chat_completions. The backend is selected once at startup (_configure_proxy_integrations) and attached to app.state.backend; the endpoint just parses the request and dispatches. Each backend keeps the stream/non-stream branch local to itself. A union type alias _CompletionBackend documents the closed set of backends and a typed _get_backend(request) accessor wraps the app.state read. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ename replay_traj_path → replay_traj_file The Backend classes are imported by main.py and tests, so the leading underscore mis-signalled them as module-internal. Rename the config field to align with traj_file naming. Also drop the defensive getattr() for args.traj_file — argparse always sets it. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tion errors
Retry was lost when litellm was dropped (it provided num_retries internally).
Restored using a unified _send_with_retry helper that:
- Always opens upstream with stream=True so the same code path serves both
stream and non-stream callers (non-stream just await resp.aread()).
- Retries on httpx.TimeoutException, httpx.ConnectError, and HTTP statuses
in config.retryable_status_codes (default [429, 500]).
- Defaults: 6 attempts, exponential backoff 2s→32s with jitter — matches
the original perform_llm_request behavior.
- For stream: retry happens before any byte is yielded; mid-stream drops
are not retried (would corrupt downstream).
Module-level retry constants are read at call time so tests can monkeypatch
them. Added 4 tests covering: success after retry, exhausted retries returning
last response, non-whitelisted status not retried, and stream retry path.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…upstream
A tiny FastAPI mock app runs in a background thread via uvicorn; the proxy
calls it over real TCP through its own httpx.AsyncClient — production code
path, no transport injection or patching. Three scenarios:
- non-stream forward: vendor field round-trips, recorder writes JSONL
- stream forward: SSE chunks reach the client, recorder gets aggregated
final completion
- record-then-replay: replay phase uses a bogus base_url to prove the
upstream isn't called
Tests use FastAPI's TestClient (sync) so the test bodies read top-down with
no async noise; the async wiring lives inside MockUpstreamServer.
Drive-by cleanups in proxy.py: localize the openai SDK imports inside the
streaming aggregator (only needed there), and drop the now-unused
_RETRY_EXCEPTIONS constant.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…alls deltas
A recorded non-stream message.tool_calls carries no 'index' field, but the
OpenAI streaming spec requires it on chunk deltas. Without it, downstream
clients using the openai SDK reject the replay-stream chunk with a pydantic
ValidationError ('Field required: index'). completion_to_chunk_dict now
injects a positional index when missing (existing 'index' is preserved).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…RecordReplay
Rename test_proxy_e2e.py → test_proxy_record_replay.py to make the file
purpose explicit (the suite revolves around the record→replay capability).
Refactor the test surface:
- MockUpstream class encapsulates the FastAPI app, server lifecycle, the
canonical reply, and an assert_message() helper. Test data and the
handler stay in sync because they share the same constants.
- TestProxyRecordReplay class groups the three tests with shorter names:
test_forward_non_stream
test_forward_stream
test_replay (parametrized over record × replay stream/non-stream)
- _call_chat_completions helper unifies stream/non-stream call sites.
Expand coverage to 2 parallel tool_calls (get_weather + get_time) — exercises
the openai SDK aggregator's multi-index tool_call assembly.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…sse_utils→sse, merge traj_*→traj The integrations/ directory only ever held two files (traj_recorder, traj_replayer) and the litellm CustomLogger angle that justified the name is long gone. Both modules share one JSONL schema, so collapsing them into a single traj.py keeps the schema and its read/write halves visible together. sse_utils.py → sse.py: the codec is the module's whole purpose, the _utils suffix added nothing. Drop traj_recorder.now() — a one-line wrapper around time.time() with no callers. Also remove a stray _get_or_create_metrics_monitor patch in test_forward_invokes_recorder_on_success: OTLP create-time failure only logs a warning, so the patch was protecting against nothing. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…_file→replay_file The old names were ambiguous: "traj_file" alone gave no hint of write vs read, and the CLI flag --traj-file was actually wired to config.replay_traj_file — same word pointing in opposite directions depending on context. New names mirror the backend pair (ForwardBackend = recording, ReplayBackend = replay) so the role is obvious at the field. CLI is split into two independent flags accordingly. Recorder constructor still takes traj_file= since it names the JSONL file type, not its role; only the config-field / CLI surface changes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…el_validator Setting both at once was silently resolved in favor of replay (the backend factory checks replay_file first), masking what is really a configuration error. A Pydantic model_validator now rejects the combination at construction time; validate_assignment=True extends the check to CLI-style field-by-field overrides applied after a yaml load. Three tests added: construction-time mutex, assignment-time mutex, and the existing yaml-load test split into one-side-only variants since the original deliberately set both fields. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…a staticmethod Module-level function with a single call site inside ReplayBackend; the SSE chunk-emit shape is purely a replay-mode implementation detail. Moving it inside the class also makes the pairing with the JSON branch in serve() visible at a glance. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…solve_base_url Single call site inside ForwardBackend.serve, and the function reads only self._config — drop the redundant config parameter and rename to _resolve_base_url to make the multi-source fallback (proxy_base_url → proxy_rules[model] → proxy_rules['default']) explicit. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…Backend as _stream_and_record Same single-call-site argument as the previous moves: 3 of the 7 kwargs were just relaying self._config / self._recorder. As an instance method the parameter list drops to 4 and the streaming path mirrors the structure of ReplayBackend._sse_iter. _send_with_retry stays at module scope — it's a pure helper bound to the httpx.AsyncClient lifecycle, not to any backend state. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…-service/ Old examples/model_service/README.md was stale: still mentioned litellm, StandardLoggingPayload, --num-retries, and the conflated --traj-file flag. Rewritten to reflect current shape: ForwardBackend / ReplayBackend pair, recording_file / replay_file mutex, retry-on-status-code with the documented attempt budget, openai SDK only used as the stream-state aggregator behind the forwarding path. Also calls out that the rock model-service start subcommand hasn't been wired up with the new flags yet. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…model-service start The two flags previously existed only on the python -m rock.sdk.model.server.main entry; rock model-service start (which subprocess-spawns that same module) had no way to thread them through, forcing users to bypass the CLI for any record/replay scenario. Wire them through ModelServiceCommand argparse → ModelService.start → start_sandbox_service → subprocess argv. Add three tests around the argv construction (default omits both flags, recording_file forwarded, replay_file forwarded). Doc updated: drop the python -m caveat and switch all example commands to rock model-service start. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…replay-file wiring The existing tests/unit/sdk/model/test_service.py covers the subprocess argv construction (catches cmd-string typos like --recording_file vs --recording-file) but mocks nothing above the SDK layer, so a missing kwarg in ModelServiceCommand.arun would slip through. Add tests/unit/cli/command/test_model_service.py mirroring the test_job.py pattern: drive the real argparse sub-parser end-to-end and mock ModelService.start to assert the kwargs it receives. Covers the new flags both in isolation and in their default (omitted) state. Two layers, two bug surfaces — together they cover the full path from CLI argv to subprocess argv. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
….py to fix CI collision
tests/integration/envhub/test_service.py shares the same basename, and pytest's
default importmode (prepend) collapses both into a single 'test_service' module
in sys.modules, so collection fails as soon as both files are picked up:
import file mismatch:
imported module 'test_service' has this __file__ attribute: .../envhub/test_service.py
which is not the same as the test file we want to collect: .../sdk/model/test_service.py
Renaming the new file is the smallest fix and keeps importmode=prepend behavior
unchanged for the rest of the suite. The new name also describes the file
better (it tests how start_sandbox_service builds the subprocess argv).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The file is the only one in the model-service unit suite that boots a real uvicorn upstream in a background thread and drives the proxy through real HTTP — append _e2e to make that scope obvious in the file name. It stays under tests/unit/ because the project's integration/ tier is reserved for tests requiring out-of-process services (Docker, Ray, admin), which this one doesn't. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
zhongwen666
pushed a commit
to zhongwen666/ROCK
that referenced
this pull request
May 17, 2026
…, ForwardBackend/ReplayBackend (alibaba#935) * refactor(model-service): rebuild proxy on litellm SDK with traj record/replay 替换 model-service `proxy` 模式手写的 httpx forward + retry_async,改为基于 litellm SDK 调用,同时新增 chat/completions 轨迹的录制与顺序回放能力,服务 SWE-agent / mini-swe-agent / OpenHands 等 deterministic agent 的无 LLM 成本调试。 主要改动: - proxy.py 改用 litellm.acompletion(num_retries / extra_headers / streaming) - 新增 TrajectoryRecorder(CustomLogger) 录制 StandardLoggingPayload 到 JSONL - 新增 TrajectoryReplayer(CustomLLM) + SequentialCursor 顺序回放单个 jsonl 文件 - ModelServiceConfig 新增 num_retries / traj_enabled / traj_file / replay_traj_path - CLI 新增 --num-retries / --traj-file(同时承担 replay 入口) - local 模式保留旧 record_traj 装饰器,不受影响 - 删除 examples 旧 YAML,改 README 主推纯 CLI 启动方式 - docs/dev/litellm_proxy_refactor.md 写明设计与 breaking change Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(model-service): pass api_key + use custom_openai prefix + suppress cost calc Three fixes to proxy.py uncovered while testing against DashScope (glm-5): - Extract Bearer token from incoming Authorization header and pass as litellm api_key kwarg; setting it via extra_headers does not work because litellm always regenerates Authorization from api_key. Authorization is now stripped from the forwarded header set. - Switch upstream prefix from openai/<model> to custom_openai/<model>. This is litellm's standard pattern for OpenAI-compatible third-party endpoints (DashScope, ModelScope, Groq, Mistral, ...) and avoids "model isn't mapped" on arbitrary upstream model names. - Pass input_cost_per_token=0 / output_cost_per_token=0 so litellm's cost calculator does not raise "model isn't mapped" on unknown models and pollute StandardLoggingPayload.response_cost_failure_debug_information. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(model-service): drop CustomLLM, serve replay directly from cursor The replay path no longer goes through litellm. We have a complete OpenAI-shape response on disk, so routing it through CustomLLM/CustomStreamWrapper just to translate formats was pure overhead — and the source of every replay-side bug (cursor-exhausted retried 6× and wrapped as APIConnectionError, GenericStreamingChunk type gymnastics, finish_reason hardcoded to "stop", reasoning_content dropped on streaming, tool_calls reconstruction left as TODO). Changes: - traj_replayer.py: delete TrajectoryReplayer(CustomLLM) and helpers; keep just SequentialCursor. Cursor exhaustion now raises a plain TrajectoryExhausted. - proxy.py: in replay mode, fetch from app.state.replay_cursor and emit either the raw response dict (non-stream) or one SSE chunk + [DONE] (stream). The stream path renames message → delta and preserves all fields verbatim (finish_reason, tool_calls, reasoning_content, ...). - main.py: rename _configure_litellm_for_proxy → _configure_proxy_integrations. Replay branch now just attaches a SequentialCursor to app.state; no litellm.custom_provider_map registration. - Tests: drop the CustomLLM-based replayer tests; keep cursor tests; add three end-to-end proxy replay tests covering non-stream / stream / cursor exhausted. 43 passed. Direct curl against DashScope glm-5: record + replay (both modes) verified end-to-end. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(model-service): drop litellm, use httpx byte-passthrough + openai SDK as parser Replaces litellm.acompletion with raw httpx forwarding. The proxy no longer parses or rewrites the OpenAI protocol on the forward path — body bytes go upstream as-is, response bytes come back as-is. The openai SDK is kept solely as a parser library for the recording side: ChatCompletionChunk + the official ChatCompletionStreamState aggregate streaming chunks into a final ChatCompletion that the recorder writes to JSONL. This restores the proxy's original "transparent forward" intent and eliminates several litellm-specific pain points encountered during testing: - No "model isn't mapped yet" cost-calc exception (no calc happens at all). - No need for the input_cost_per_token=0 / custom_openai prefix workarounds. - Authorization header passes through verbatim (no api_key extraction kludge). - Provider-specific fields (reasoning_content, citations, ...) are preserved byte-for-byte going to the client AND auto-aggregated in the recorded traj (openai SDK uses extra="allow" pydantic mode). - Cursor exhaustion in replay returns 404 directly, never gets retried. Changes: - pyproject.toml: drop litellm>=1.50.0, add openai>=1.50.0 and httpx - proxy.py: rewrite forward path with httpx; record streams via dual-purpose byte forwarding + parallel SSE parsing into ChatCompletionStreamState - traj_recorder.py: drop CustomLogger inheritance; expose explicit recorder.record(request, response, status, ...) API called from proxy.py - main.py: attach recorder/cursor to app.state instead of registering with litellm.callbacks / litellm.custom_provider_map - test_proxy.py: rewritten to use httpx.MockTransport for upstream mocking; cover byte passthrough, provider-specific field preservation, error forwarding, recorder invocation, replay paths - test_traj_recorder.py: rewritten for the explicit-call API 36 passed. End-to-end verified against DashScope glm-5: streaming record, non-stream replay, streaming replay, cursor exhausted -> 404 all work. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor: use openai sdk * chore(model-service): remove litellm remnants (num_retries, stale comments) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(model-service): split proxy handler into _ReplayBackend / _ForwardBackend Strategy pattern eliminates the replay/forward branch inside chat_completions. The backend is selected once at startup (_configure_proxy_integrations) and attached to app.state.backend; the endpoint just parses the request and dispatches. Each backend keeps the stream/non-stream branch local to itself. A union type alias _CompletionBackend documents the closed set of backends and a typed _get_backend(request) accessor wraps the app.state read. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(model-service): drop _ prefix from public Backend classes, rename replay_traj_path → replay_traj_file The Backend classes are imported by main.py and tests, so the leading underscore mis-signalled them as module-internal. Rename the config field to align with traj_file naming. Also drop the defensive getattr() for args.traj_file — argparse always sets it. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(model-service): restore retry on retryable_status_codes + connection errors Retry was lost when litellm was dropped (it provided num_retries internally). Restored using a unified _send_with_retry helper that: - Always opens upstream with stream=True so the same code path serves both stream and non-stream callers (non-stream just await resp.aread()). - Retries on httpx.TimeoutException, httpx.ConnectError, and HTTP statuses in config.retryable_status_codes (default [429, 500]). - Defaults: 6 attempts, exponential backoff 2s→32s with jitter — matches the original perform_llm_request behavior. - For stream: retry happens before any byte is yielded; mid-stream drops are not retried (would corrupt downstream). Module-level retry constants are read at call time so tests can monkeypatch them. Added 4 tests covering: success after retry, exhausted retries returning last response, non-whitelisted status not retried, and stream retry path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(model-service): add e2e tests against an in-thread uvicorn mock upstream A tiny FastAPI mock app runs in a background thread via uvicorn; the proxy calls it over real TCP through its own httpx.AsyncClient — production code path, no transport injection or patching. Three scenarios: - non-stream forward: vendor field round-trips, recorder writes JSONL - stream forward: SSE chunks reach the client, recorder gets aggregated final completion - record-then-replay: replay phase uses a bogus base_url to prove the upstream isn't called Tests use FastAPI's TestClient (sync) so the test bodies read top-down with no async noise; the async wiring lives inside MockUpstreamServer. Drive-by cleanups in proxy.py: localize the openai SDK imports inside the streaming aggregator (only needed there), and drop the now-unused _RETRY_EXCEPTIONS constant. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(model-service): inject positional index into replay-stream tool_calls deltas A recorded non-stream message.tool_calls carries no 'index' field, but the OpenAI streaming spec requires it on chunk deltas. Without it, downstream clients using the openai SDK reject the replay-stream chunk with a pydantic ValidationError ('Field required: index'). completion_to_chunk_dict now injects a positional index when missing (existing 'index' is preserved). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(model-service): refactor proxy e2e into MockUpstream + TestProxyRecordReplay Rename test_proxy_e2e.py → test_proxy_record_replay.py to make the file purpose explicit (the suite revolves around the record→replay capability). Refactor the test surface: - MockUpstream class encapsulates the FastAPI app, server lifecycle, the canonical reply, and an assert_message() helper. Test data and the handler stay in sync because they share the same constants. - TestProxyRecordReplay class groups the three tests with shorter names: test_forward_non_stream test_forward_stream test_replay (parametrized over record × replay stream/non-stream) - _call_chat_completions helper unifies stream/non-stream call sites. Expand coverage to 2 parallel tool_calls (get_weather + get_time) — exercises the openai SDK aggregator's multi-index tool_call assembly. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * chore: remove useless comment in pyproject.toml * chore: remove uesless dev docs * refactor(model-service): flatten layout — drop integrations/, rename sse_utils→sse, merge traj_*→traj The integrations/ directory only ever held two files (traj_recorder, traj_replayer) and the litellm CustomLogger angle that justified the name is long gone. Both modules share one JSONL schema, so collapsing them into a single traj.py keeps the schema and its read/write halves visible together. sse_utils.py → sse.py: the codec is the module's whole purpose, the _utils suffix added nothing. Drop traj_recorder.now() — a one-line wrapper around time.time() with no callers. Also remove a stray _get_or_create_metrics_monitor patch in test_forward_invokes_recorder_on_success: OTLP create-time failure only logs a warning, so the patch was protecting against nothing. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * refactor(model-service): rename traj_file→recording_file, replay_traj_file→replay_file The old names were ambiguous: "traj_file" alone gave no hint of write vs read, and the CLI flag --traj-file was actually wired to config.replay_traj_file — same word pointing in opposite directions depending on context. New names mirror the backend pair (ForwardBackend = recording, ReplayBackend = replay) so the role is obvious at the field. CLI is split into two independent flags accordingly. Recorder constructor still takes traj_file= since it names the JSONL file type, not its role; only the config-field / CLI surface changes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(model-service): enforce recording_file/replay_file mutex via model_validator Setting both at once was silently resolved in favor of replay (the backend factory checks replay_file first), masking what is really a configuration error. A Pydantic model_validator now rejects the combination at construction time; validate_assignment=True extends the check to CLI-style field-by-field overrides applied after a yaml load. Three tests added: construction-time mutex, assignment-time mutex, and the existing yaml-load test split into one-side-only variants since the original deliberately set both fields. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * refactor(model-service): move _replay_sse_iter into ReplayBackend as a staticmethod Module-level function with a single call site inside ReplayBackend; the SSE chunk-emit shape is purely a replay-mode implementation detail. Moving it inside the class also makes the pairing with the JSON branch in serve() visible at a glance. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * refactor(model-service): move get_base_url into ForwardBackend as _resolve_base_url Single call site inside ForwardBackend.serve, and the function reads only self._config — drop the redundant config parameter and rename to _resolve_base_url to make the multi-source fallback (proxy_base_url → proxy_rules[model] → proxy_rules['default']) explicit. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * refactor(model-service): move _forward_stream_and_record into ForwardBackend as _stream_and_record Same single-call-site argument as the previous moves: 3 of the 7 kwargs were just relaying self._config / self._recorder. As an instance method the parameter list drops to 4 and the streaming path mirrors the structure of ReplayBackend._sse_iter. _send_with_retry stays at module scope — it's a pure helper bound to the httpx.AsyncClient lifecycle, not to any backend state. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(model-service): move + rewrite proxy README under docs/dev/model-service/ Old examples/model_service/README.md was stale: still mentioned litellm, StandardLoggingPayload, --num-retries, and the conflated --traj-file flag. Rewritten to reflect current shape: ForwardBackend / ReplayBackend pair, recording_file / replay_file mutex, retry-on-status-code with the documented attempt budget, openai SDK only used as the stream-state aggregator behind the forwarding path. Also calls out that the rock model-service start subcommand hasn't been wired up with the new flags yet. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(model-service): expose --recording-file / --replay-file on rock model-service start The two flags previously existed only on the python -m rock.sdk.model.server.main entry; rock model-service start (which subprocess-spawns that same module) had no way to thread them through, forcing users to bypass the CLI for any record/replay scenario. Wire them through ModelServiceCommand argparse → ModelService.start → start_sandbox_service → subprocess argv. Add three tests around the argv construction (default omits both flags, recording_file forwarded, replay_file forwarded). Doc updated: drop the python -m caveat and switch all example commands to rock model-service start. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(model-service): add CLI-layer coverage for --recording-file / --replay-file wiring The existing tests/unit/sdk/model/test_service.py covers the subprocess argv construction (catches cmd-string typos like --recording_file vs --recording-file) but mocks nothing above the SDK layer, so a missing kwarg in ModelServiceCommand.arun would slip through. Add tests/unit/cli/command/test_model_service.py mirroring the test_job.py pattern: drive the real argparse sub-parser end-to-end and mock ModelService.start to assert the kwargs it receives. Covers the new flags both in isolation and in their default (omitted) state. Two layers, two bug surfaces — together they cover the full path from CLI argv to subprocess argv. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(model-service): rename test_service.py → test_service_subprocess.py to fix CI collision tests/integration/envhub/test_service.py shares the same basename, and pytest's default importmode (prepend) collapses both into a single 'test_service' module in sys.modules, so collection fails as soon as both files are picked up: import file mismatch: imported module 'test_service' has this __file__ attribute: .../envhub/test_service.py which is not the same as the test file we want to collect: .../sdk/model/test_service.py Renaming the new file is the smallest fix and keeps importmode=prepend behavior unchanged for the rest of the suite. The new name also describes the file better (it tests how start_sandbox_service builds the subprocess argv). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(model-service): rename test_proxy_record_replay.py → ..._e2e.py The file is the only one in the model-service unit suite that boots a real uvicorn upstream in a background thread and drives the proxy through real HTTP — append _e2e to make that scope obvious in the file name. It stays under tests/unit/ because the project's integration/ tier is reserved for tests requiring out-of-process services (Docker, Ray, admin), which this one doesn't. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Refit
rock model-serviceproxy to cover three previously missing capabilities while keeping the existing forward path drop-in compatible.resp.aiter_bytes()yielded byte-for-byte to the client; recorder runs on an independent path that feeds chunks into openai SDK'sChatCompletionStreamStateto aggregate the finalChatCompletionfor the trajectory.ReplayBackendserves recorded responses directly from aSequentialCursorwithout any upstream call. Stream replay re-emits the recorded message as one SSE chunk +[DONE]. Cursor exhaustion → 404.reasoning_content,provider_specific_fields, …) survive intact in both forward and replay paths.Strategy split:
ForwardBackendandReplayBackendare attached toapp.state.backendat startup; the endpoint just dispatches.What changed for users
rock model-service start:--recording-file PATH/--replay-file PATH(mutually exclusive). The same fields land in YAML asrecording_file/replay_file.traj_file/replay_traj_fileconfig fields were renamed to make write-vs-read direction explicit; the CLI flag--traj-file(which previously aliased toreplay_traj_file) is gone.model_validatorrejects setting both fields at once (also fires on field-level reassignment viavalidate_assignment=True), preventing the silent "record into the file you're replaying from" bug.retryable_status_codes(default[429, 500]), 6 attempts, 2s × 2 backoff with jitter; for streaming, retries only fire before the first byte reaches the client.File layout
rock/sdk/model/server/integrations/{traj_recorder,traj_replayer}.py→ merged intorock/sdk/model/server/traj.py; the integrations/ subdir is removed.rock/sdk/model/server/sse_utils.py→ renamed tosse.py(the codec is the module's whole purpose).Tests
Three layers, all in
tests/unit/:completion_to_chunk_dictround-trips, the OpenAI-spec-requiredindexinjection on replay-stream tool_calls (regression for a realChatCompletionChunk.model_validatefailure).MockUpstreamowns the canonical reply and assertion helper;TestProxyRecordReplaycovers forward stream/non-stream + the 2×2 record/replay matrix with paralleltool_calls.ModelServiceCommandhandler passes toModelService.start; test_service_subprocess.py mockssubprocess.Popenand asserts the argvstart_sandbox_servicebuilds. Together they catch both handler-omits-a-kwarg and cmd-string typos.108 unit tests pass in <3s.
Test plan
uv run pytest tests/unit/sdk/model/ tests/unit/cli/command/ -q→ 108 passeduv run ruff check .→ clean (pre-commit hooks gate every commit)rock model-service start --type proxy --proxy-base-url https://dashscope.aliyuncs.com/compatible-mode/v1 --recording-file /tmp/r.jsonl→ recordedqwen-plusnon-stream +glm-5non-stream +glm-5stream; restarted with--replay-fileand re-issued the three calls — replayed responses match the recorded ones byte for content /reasoning_content(368 chars on the non-stream call, 1787 chars aggregated on the stream call), stream replay emits one chunk +[DONE]glm-5— every SSE chunk delivered tocurlcarriesdelta.reasoning_content(anddelta.contentonce it starts), upstream 61.7 KB stream forwarded byte-for-byte to the client; recorder writes one aggregated trajectory line at the endfixes #934
🤖 Generated with Claude Code