diff --git a/docs/dev/model-service/proxy.md b/docs/dev/model-service/proxy.md new file mode 100644 index 0000000000..b0f77da0f3 --- /dev/null +++ b/docs/dev/model-service/proxy.md @@ -0,0 +1,154 @@ +# model-service `proxy` 模式 + +`rock model-service` 的 proxy 模式在 `/v1/chat/completions` 上提供一个 OpenAI 兼容的转发层, +两种工作模式互斥: + +| 模式 | 触发条件 | 上游调用 | 写盘 | +|-----------|---------------------------------------|----------|----------------------| +| Recording | 默认 | 真实调用 | append 到 JSONL traj | +| Replay | `--replay-file` / `replay_file` 设置 | 不调用 | 不写 | + +设计目标是让 SWE-agent / mini-swe-agent / OpenHands 等 agent 框架在录制 → 回放之间无感切换: +agent 不变,只换 base URL。 + +下文所有命令以 `rock model-service start` 启动;该子命令最终会 `subprocess` 拉起 +`rock.sdk.model.server.main`,两者支持的 flag 一致。直接调试时也可以用 +`python -m rock.sdk.model.server.main` 跳过 PID 文件管理。 + +--- + +## 1. Recording(默认) + +转发到单个上游,每次调用 append 一行 JSONL 到 `recording_file`(缺省 `LOG_DIR/LLMTraj.jsonl`, +其中 `LOG_DIR = $ROCK_MODEL_SERVICE_DATA_DIR`): + +```bash +export OPENAI_API_KEY="sk-..." +export ROCK_MODEL_SERVICE_DATA_DIR=/tmp/rock-traj + +rock model-service start \ + --type proxy \ + --proxy-base-url https://api.openai.com/v1 \ + --port 8080 +``` + +调用: + +```bash +curl -X POST http://127.0.0.1:8080/v1/chat/completions \ + -H "Authorization: Bearer $OPENAI_API_KEY" \ + -H "Content-Type: application/json" \ + -d '{"model":"gpt-3.5-turbo","messages":[{"role":"user","content":"hi"}]}' + +cat /tmp/rock-traj/LLMTraj.jsonl | jq '.model, .response.choices[0].message.content' +``` + +流式同样支持,上游字节原样转给客户端,recorder 在后台聚合最终的 `ChatCompletion` 写盘 +(用 openai SDK 的 `ChatCompletionStreamState`,所以 `tool_calls.function.arguments` 等 +跨 chunk 拼接的字段会被还原成完整形态): + +```bash +curl -N -X POST http://127.0.0.1:8080/v1/chat/completions \ + -H "Authorization: Bearer $OPENAI_API_KEY" \ + -H "Content-Type: application/json" \ + -d '{"model":"gpt-3.5-turbo","stream":true,"messages":[{"role":"user","content":"count to 5"}]}' +``` + +显式指定写到别的路径: + +```bash +rock model-service start \ + --type proxy \ + --proxy-base-url https://api.openai.com/v1 \ + --recording-file /tmp/my-session.jsonl \ + --port 8080 +``` + +--- + +## 2. Replay + +把 `--replay-file` 指到一个录好的 jsonl,proxy 不再访问真实 LLM,按录制顺序返回响应; +agent 把 base URL 换成 `http://127.0.0.1:8081/v1` 即可重放: + +```bash +rock model-service start \ + --type proxy \ + --replay-file /tmp/rock-traj/LLMTraj.jsonl \ + --port 8081 +``` + +行为细节: + +- cursor 单调推进,每次请求消耗一条记录;用尽后返回 **404**。 +- 流式请求会拿录制的 `ChatCompletion` 重新发一帧 SSE chunk + `[DONE]`。 + `tool_calls` 的 `index` 字段会被自动注入(OpenAI 的流式协议要求 chunk delta 上有 `index`, + 但录制态的 `message.tool_calls` 没有)。 +- request 里的 `model` 会跟录制的 `model` 比对,不一致只打 warning,不阻断。 + +`recording_file` 和 `replay_file` 是**互斥**的——同时配置(无论是 CLI 还是 YAML)会在启动时 +被 Pydantic `model_validator` 拦下并报 `ValidationError`,避免"录到一半把源文件覆盖"这类隐性 bug。 + +--- + +## 3. 重试和超时 + +- 默认对 connection error / timeout 和 `retryable_status_codes`(默认 `[429, 500]`)触发重试, + 最多 6 次,指数退避 2s 起步 ×2 + 抖动;最后一次仍失败时把上游响应原样转给客户端 + (**不**包装成 502/504,让 agent 自己看到真实状态码)。 +- 对**流式**请求,重试只发生在第一个字节抵达客户端**之前**——一旦字节流开始转发, + 连接中断不会重试(已发出去的字节无法收回)。 + +```bash +rock model-service start \ + --type proxy \ + --proxy-base-url https://api.openai.com/v1 \ + --retryable-status-codes 429,500,502,503 \ + --request-timeout 60 \ + --port 8080 +``` + +--- + +## 4. 多模型路由(YAML) + +按 model name 分流到不同上游需要 YAML(CLI 只暴露单一 `--proxy-base-url`)。新建 `routes.yaml`: + +```yaml +proxy_rules: + gpt-3.5-turbo: "https://api.openai.com/v1" + gpt-4o: "https://api.openai.com/v1" + default: "https://api-inference.modelscope.cn/v1" + +retryable_status_codes: [429, 500, 502] +request_timeout: 60 +recording_file: /tmp/rock-traj/multi.jsonl +``` + +启动: + +```bash +rock model-service start \ + --type proxy \ + --config-file routes.yaml \ + --port 8080 +``` + +CLI flag(`--proxy-base-url` / `--port` / `--retryable-status-codes` / ...)覆盖 YAML 同名字段。 +路由解析顺序:`proxy_base_url` → `proxy_rules[model]` → `proxy_rules["default"]`,都没有则 400。 + +--- + +## 5. 实现要点(仅供参考) + +- `chat_completions` endpoint 把请求分发给 `app.state.backend`,后者要么是 `ForwardBackend` + 要么是 `ReplayBackend`,由启动时的 `_configure_proxy_integrations` 根据 `replay_file` + 是否设置二选一注入。 +- `ForwardBackend` 走 httpx 字节透传:non-stream 是 `await resp.aread()`,stream 是 + `resp.aiter_bytes()` 直接 yield 给客户端,**不**经过任何 SDK 的反序列化/再序列化,所以上游 + 返回的 `reasoning_content` / `provider_specific_fields` 等任意 vendor 字段都不会被吃掉。 + recorder 在另一条独立路径上把字节流喂给 openai SDK 的 stream-state aggregator,仅用于写盘。 +- `ReplayBackend` 完全本地,不持有 httpx client。 + +更深入的代码导览看 [rock/sdk/model/server/api/proxy.py](../../../rock/sdk/model/server/api/proxy.py) +顶部的 module docstring。 diff --git a/pyproject.toml b/pyproject.toml index badb7d1a4b..d7d7a591b0 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -86,6 +86,8 @@ model-service = [ "psutil", "swebench", "alibabacloud_cr20181201==2.0.5", + "openai>=1.50.0", + "httpx", ] diff --git a/rock/cli/command/model_service.py b/rock/cli/command/model_service.py index 87e6ca60e6..03cc59582d 100644 --- a/rock/cli/command/model_service.py +++ b/rock/cli/command/model_service.py @@ -82,6 +82,8 @@ async def arun(self, args: argparse.Namespace): proxy_base_url=args.proxy_base_url, retryable_status_codes=args.retryable_status_codes, request_timeout=args.request_timeout, + recording_file=args.recording_file, + replay_file=args.replay_file, ) logger.info(f"model service started, pid: {pid}") with open(self.DEFAULT_MODEL_SERVICE_PID_FILE, "w") as f: @@ -178,6 +180,18 @@ async def add_parser_to(subparsers: argparse._SubParsersAction): default=None, help="Request timeout in seconds. Overrides config file.", ) + start_parser.add_argument( + "--recording-file", + type=str, + default=None, + help="Proxy mode only: where to write the trajectory JSONL. Defaults to LOG_DIR/LLMTraj.jsonl.", + ) + start_parser.add_argument( + "--replay-file", + type=str, + default=None, + help="Proxy mode only: replay from a recorded .jsonl traj file. Mutually exclusive with --recording-file.", + ) watch_agent_parser = model_service_subparsers.add_parser( "watch-agent", diff --git a/rock/sdk/model/server/api/proxy.py b/rock/sdk/model/server/api/proxy.py index fb2b7bec3c..73f74e3f62 100644 --- a/rock/sdk/model/server/api/proxy.py +++ b/rock/sdk/model/server/api/proxy.py @@ -1,13 +1,45 @@ +"""OpenAI-compatible chat/completions proxy with trajectory record/replay. + +Two backends share the ``/v1/chat/completions`` route: + +1. **ForwardBackend** (default) — body bytes are POSTed verbatim to the + configured upstream via plain ``httpx``. The upstream response is forwarded + byte-for-byte back to the client (raw JSON for non-stream, raw SSE bytes + for stream). On the side we run a parser (``ChatCompletionChunk`` + + ``ChatCompletionStreamState`` from the openai SDK) to aggregate streaming + chunks into a final ChatCompletion that the recorder writes to JSONL. The + forward path itself does NOT depend on OpenAI types — anything the upstream + returns (provider-specific ``reasoning_content``, ``citations``, ...) is + passed through untouched. + +2. **ReplayBackend** (``replay_file`` set) — the request is served + directly from the next record in the ``SequentialCursor`` without any + upstream call. Streaming emits the recorded response as one SSE chunk + + ``[DONE]``. +""" + +from __future__ import annotations + +import asyncio +import json +import random +import time +from collections.abc import AsyncIterator from typing import Any import httpx from fastapi import APIRouter, HTTPException, Request -from fastapi.responses import JSONResponse +from fastapi.responses import JSONResponse, Response, StreamingResponse from rock.logger import init_logger from rock.sdk.model.server.config import ModelServiceConfig -from rock.sdk.model.server.utils import record_traj -from rock.utils import retry_async +from rock.sdk.model.server.sse import ( + SSE_DONE, + completion_to_chunk_dict, + encode_sse_event, + parse_sse_data_chunks, +) +from rock.sdk.model.server.traj import SequentialCursor, TrajectoryExhausted, TrajectoryRecorder logger = init_logger(__name__) @@ -15,111 +47,352 @@ proxy_router = APIRouter() -# Global HTTP client with a persistent connection pool -http_client = httpx.AsyncClient() +# Headers we never forward upstream: +# - host / content-length: rebuilt by httpx for the upstream request +# - transfer-encoding / connection: RFC 7230 hop-by-hop, scoped to one connection +_HEADERS_NOT_TO_FORWARD = frozenset({"host", "content-length", "transfer-encoding", "connection"}) +# Retry knobs for upstream POST. Read at call-time so tests can monkeypatch them. +# Default: up to 6 attempts with exponential backoff (2s → 4s → 8s → 16s → 32s, jittered). +_RETRY_MAX_ATTEMPTS = 6 +_RETRY_DELAY_SECONDS = 2.0 +_RETRY_BACKOFF = 2.0 -@retry_async( - max_attempts=6, - delay_seconds=2.0, - backoff=2.0, # Exponential backoff (2s, 4s, 8s, 16s, 32s). - jitter=True, # Adds randomness to prevent "thundering herd" effect on the backend. - exceptions=(httpx.TimeoutException, httpx.ConnectError, httpx.HTTPStatusError), -) -async def perform_llm_request(url: str, body: dict, headers: dict, config: ModelServiceConfig): - """ - Forwards the request and triggers retry ONLY if the status code - is in the explicit retryable whitelist. + +async def _send_with_retry( + client: httpx.AsyncClient, + url: str, + *, + body_bytes: bytes, + headers: dict[str, str], + retryable_codes: list[int], +) -> httpx.Response: + """POST with retry on connection errors and whitelisted statuses, returning + an open streaming response. + + Always uses ``stream=True`` so the same path serves both stream and non-stream + callers — non-stream just calls ``await resp.aread()`` to materialize the body. + Assumes a failed upstream returns its error body before any byte is yielded + to downstream (so retry can still discard it cleanly). + + Caller MUST ``await resp.aclose()`` after consuming. """ - response = await http_client.post(url, json=body, headers=headers, timeout=config.request_timeout) - status_code = response.status_code + last_exc: Exception | None = None + delay = _RETRY_DELAY_SECONDS + for attempt in range(1, _RETRY_MAX_ATTEMPTS + 1): + try: + resp = await client.send( + client.build_request("POST", url, content=body_bytes, headers=headers), + stream=True, + ) + except (httpx.TimeoutException, httpx.ConnectError) as exc: + last_exc = exc + if attempt >= _RETRY_MAX_ATTEMPTS: + raise + logger.warning(f"connect failed (attempt {attempt}/{_RETRY_MAX_ATTEMPTS}): {exc}") + await asyncio.sleep(random.uniform(0, delay * 2)) + delay *= _RETRY_BACKOFF + continue - # Check against the explicit whitelist - if status_code in config.retryable_status_codes: - logger.warning(f"Retryable error detected: {status_code}. Triggering retry for {url}...") - response.raise_for_status() + if resp.status_code in retryable_codes and attempt < _RETRY_MAX_ATTEMPTS: + await resp.aclose() + logger.warning(f"upstream status {resp.status_code}, retry {attempt}/{_RETRY_MAX_ATTEMPTS}") + await asyncio.sleep(random.uniform(0, delay * 2)) + delay *= _RETRY_BACKOFF + continue - return response + return resp + raise last_exc # pragma: no cover # unreachable -def get_base_url(model_name: str, config: ModelServiceConfig) -> str: - """ - Selects the target backend URL based on model name matching. - If proxy_base_url is configured, it takes precedence over proxy_rules. - """ - # If direct proxy base URL is configured, return it directly (bypass model name matching) - if config.proxy_base_url: - return config.proxy_base_url.rstrip("/") - - if not model_name: - raise HTTPException(status_code=400, detail="Model name is required for routing.") - - rules = config.proxy_rules - base_url = rules.get(model_name) or rules.get("default") - if not base_url: - raise HTTPException( - status_code=400, detail=f"Model '{model_name}' is not configured and no 'default' rule found." +def _filter_headers(headers) -> dict[str, str]: + """Drop headers that are scoped to the client↔proxy hop or rebuilt by httpx. + ``Authorization`` is forwarded verbatim — proxy stays stateless about which + API key the client uses.""" + out = {} + for key, value in headers.items(): + if key.lower() in _HEADERS_NOT_TO_FORWARD: + continue + out[key] = value + return out + + +class ReplayBackend: + """Serves requests from a pre-recorded trajectory; no upstream calls made.""" + + def __init__(self, cursor: SequentialCursor) -> None: + self._cursor = cursor + + async def serve(self, *, model_name: str, is_stream: bool, **_: Any) -> Response: + try: + record = await self._cursor.next(expected_model=model_name) + except TrajectoryExhausted as exc: + raise HTTPException(status_code=404, detail=str(exc)) + + response_dict = record.get("response") + if not isinstance(response_dict, dict): + raise HTTPException( + status_code=500, + detail=f"replay record at step {self._cursor.position - 1} has no usable response dict", + ) + logger.info(f"[replay] step {self._cursor.position}/{self._cursor.total} served for model={model_name!r}") + + if is_stream: + return StreamingResponse( + self._sse_iter(response_dict, model=model_name), + media_type="text/event-stream", + ) + return JSONResponse(status_code=200, content=response_dict) + + @staticmethod + async def _sse_iter(response: dict, *, model: str) -> AsyncIterator[bytes]: + """Emit a recorded response as one SSE chunk + ``[DONE]``.""" + yield encode_sse_event(completion_to_chunk_dict(response, model=model)) + yield SSE_DONE + + +class ForwardBackend: + """Forwards requests byte-for-byte to the upstream and optionally records the trajectory.""" + + def __init__(self, config: ModelServiceConfig, recorder: TrajectoryRecorder | None = None) -> None: + self._config = config + self._recorder = recorder + + def _resolve_base_url(self, model_name: str) -> str: + """Pick the upstream base URL by model name. + + ``proxy_base_url`` takes precedence; falls back to ``proxy_rules[model]`` and + then ``proxy_rules["default"]``. Trailing slashes are stripped so the caller + can append ``/chat/completions`` directly. + """ + if self._config.proxy_base_url: + return self._config.proxy_base_url.rstrip("/") + + if not model_name: + raise HTTPException(status_code=400, detail="Model name is required for routing.") + + rules = self._config.proxy_rules + base_url = rules.get(model_name) or rules.get("default") + if not base_url: + raise HTTPException( + status_code=400, + detail=f"Model '{model_name}' is not configured and no 'default' rule found.", + ) + + return base_url.rstrip("/") + + async def serve( + self, + *, + model_name: str, + is_stream: bool, + body_bytes: bytes, + fwd_headers: dict[str, str], + request_dict: dict[str, Any], + **_: Any, + ) -> Response: + upstream_url = f"{self._resolve_base_url(model_name)}/chat/completions" + logger.info(f"Routing model {model_name!r} to {upstream_url}") + + if is_stream: + return StreamingResponse( + self._stream_and_record( + upstream_url=upstream_url, + body_bytes=body_bytes, + fwd_headers=fwd_headers, + request_dict=request_dict, + ), + media_type="text/event-stream", + ) + + # Non-stream: same retry path as stream (open with stream=True), then aread() the body. + start = time.time() + async with httpx.AsyncClient(timeout=self._config.request_timeout) as client: + try: + resp = await _send_with_retry( + client, + upstream_url, + body_bytes=body_bytes, + headers=fwd_headers, + retryable_codes=self._config.retryable_status_codes, + ) + except httpx.TimeoutException as exc: + if self._recorder is not None: + await self._recorder.record( + request=request_dict, + response=None, + status="failure", + start_time=start, + end_time=time.time(), + error=f"timeout: {exc}", + ) + raise HTTPException(status_code=504, detail=f"Upstream timed out: {exc}") + except httpx.RequestError as exc: + if self._recorder is not None: + await self._recorder.record( + request=request_dict, + response=None, + status="failure", + start_time=start, + end_time=time.time(), + error=f"{type(exc).__name__}: {exc}", + ) + raise HTTPException(status_code=502, detail=f"Upstream request failed: {exc}") + + try: + response_bytes = await resp.aread() + status_code = resp.status_code + content_type = resp.headers.get("content-type", "application/json") + finally: + await resp.aclose() + + response_text = response_bytes.decode("utf-8", errors="replace") + response_dict: dict | None = None + try: + parsed = json.loads(response_text) if response_text else None + if isinstance(parsed, dict): + response_dict = parsed + except json.JSONDecodeError: + pass + + if self._recorder is not None: + await self._recorder.record( + request=request_dict, + response=response_dict, + status="success" if status_code < 400 else "failure", + start_time=start, + end_time=time.time(), + error=None if status_code < 400 else f"upstream_status={status_code}", + ) + + # Forward bytes verbatim — preserves any provider-specific fields untouched. + return Response(content=response_bytes, status_code=status_code, media_type=content_type) + + async def _stream_and_record( + self, + *, + upstream_url: str, + body_bytes: bytes, + fwd_headers: dict[str, str], + request_dict: dict[str, Any], + ) -> AsyncIterator[bytes]: + """SSE bytes are forwarded verbatim; chunks are parsed in parallel and + aggregated into the final ChatCompletion that the recorder writes to JSONL. + + Retry on connection errors and whitelisted statuses happens BEFORE any byte + is yielded; mid-stream connection drops are not retried (would corrupt the + client transmission).""" + # openai SDK is used purely as a stream-aggregation parser — keep the import + # local so module load doesn't pull it in for callers that never stream. + from openai.lib.streaming.chat import ChatCompletionStreamState + from openai.types.chat import ChatCompletionChunk + + state = ChatCompletionStreamState() + start = time.time() + parse_buffer = b"" + upstream_status = 0 + + async with httpx.AsyncClient(timeout=self._config.request_timeout) as client: + try: + resp = await _send_with_retry( + client, + upstream_url, + body_bytes=body_bytes, + headers=fwd_headers, + retryable_codes=self._config.retryable_status_codes, + ) + except (httpx.TimeoutException, httpx.ConnectError) as exc: + if self._recorder is not None: + await self._recorder.record( + request=request_dict, + response=None, + status="failure", + start_time=start, + end_time=time.time(), + error=f"{type(exc).__name__}: {exc}", + ) + return + + try: + upstream_status = resp.status_code + async for chunk in resp.aiter_bytes(): + yield chunk + chunk_dicts, parse_buffer = parse_sse_data_chunks(parse_buffer + chunk) + for chunk_dict in chunk_dicts: + try: + state.handle_chunk(ChatCompletionChunk.model_validate(chunk_dict)) + except Exception as exc: # parser error: forward continues, traj will be partial + logger.debug(f"[record] chunk parse failed (forward continues): {exc}") + except httpx.RequestError as exc: + # Connection died mid-stream — bytes already sent reach the client; + # record what we got and return. + if self._recorder is not None: + await self._recorder.record( + request=request_dict, + response=None, + status="failure", + start_time=start, + end_time=time.time(), + error=f"{type(exc).__name__}: {exc}", + ) + return + finally: + await resp.aclose() + + if self._recorder is None: + return + + status = "success" if upstream_status < 400 else "failure" + final_dict: dict | None = None + if status == "success": + try: + final_dict = state.get_final_completion().model_dump() + except Exception as exc: + logger.warning(f"[record] stream aggregation failed: {exc}") + + await self._recorder.record( + request=request_dict, + response=final_dict, + status=status, + start_time=start, + end_time=time.time(), + error=None if status == "success" else f"upstream_status={upstream_status}", ) - return base_url.rstrip("/") + +CompletionBackend = ReplayBackend | ForwardBackend + + +def _get_backend(request: Request) -> CompletionBackend: + """Typed accessor for the backend attached at startup by ``_configure_proxy_integrations``.""" + return request.app.state.backend @proxy_router.post("/v1/chat/completions") -@record_traj -async def chat_completions(body: dict[str, Any], request: Request): - """ - OpenAI-compatible chat completions proxy endpoint. - Handles routing, header transparent forwarding, and automatic retries. - """ - config = request.app.state.model_service_config - - # Step 1: Model Routing - model_name = body.get("model", "") - base_url = get_base_url(model_name, config) - target_url = f"{base_url}/chat/completions" - logger.info(f"Routing model '{model_name}' to URL: {target_url}") - - # Step 2: Header Cleaning - # Preserve 'Authorization' for authentication while removing hop-by-hop transport headers. - forwarded_headers = {} - for key, value in request.headers.items(): - if key.lower() in ["host", "content-length", "content-type", "transfer-encoding"]: - continue - forwarded_headers[key] = value - - # Step 3: Strategy Enforcement - # Force non-streaming mode for the MVP phase to ensure stability. - if body.get("stream") is True: - raise HTTPException( - status_code=400, - detail="Streaming requests (stream=True) are not supported in the current version. Please set stream=False or omit the stream parameter.", - ) - body["stream"] = False +async def chat_completions(request: Request): + """OpenAI-compatible chat completions proxy endpoint. + Reads the body as raw bytes (no parsing on the forward path) and delegates + to the backend attached at startup (replay or forward). + """ + body_bytes = await request.body() try: - # Step 4: Execute Request with Retry Logic - response = await perform_llm_request(target_url, body, forwarded_headers, config) - return JSONResponse(status_code=response.status_code, content=response.json()) - - except httpx.HTTPStatusError as e: - # Forward the raw backend error message to the client. - # This allows the Agent-side logic to detect keywords like 'context length exceeded' - # or 'content violation' and raise appropriate exceptions. - error_text = e.response.text if e.response else "No error details" - status_code = e.response.status_code if e.response else 502 - logger.error(f"Final failure after retries. Status: {status_code}, Response: {error_text}") - return JSONResponse( - status_code=status_code, - content={ - "error": { - "message": f"LLM backend error: {error_text}", - "type": "proxy_retry_failed", - "code": status_code, - } - }, - ) - except Exception as e: - logger.error(f"Unexpected proxy error: {str(e)}") - # Raise standard 500 for non-HTTP related errors or system errors - raise HTTPException(status_code=500, detail=str(e)) + request_dict = json.loads(body_bytes) if body_bytes else {} + except json.JSONDecodeError: + raise HTTPException(status_code=400, detail="Request body is not valid JSON.") + if not isinstance(request_dict, dict): + raise HTTPException(status_code=400, detail="Request body must be a JSON object.") + + model_name = request_dict.get("model", "") + is_stream = bool(request_dict.get("stream")) + fwd_headers = _filter_headers(request.headers) + + backend = _get_backend(request) + return await backend.serve( + model_name=model_name, + is_stream=is_stream, + body_bytes=body_bytes, + fwd_headers=fwd_headers, + request_dict=request_dict, + ) diff --git a/rock/sdk/model/server/config.py b/rock/sdk/model/server/config.py index 2c96992b5c..e734c29878 100644 --- a/rock/sdk/model/server/config.py +++ b/rock/sdk/model/server/config.py @@ -1,7 +1,7 @@ from pathlib import Path import yaml -from pydantic import BaseModel, Field +from pydantic import BaseModel, ConfigDict, Field, model_validator from rock import env_vars @@ -27,6 +27,10 @@ class ModelServiceConfig(BaseModel): """Configuration for the LLM Model Service.""" + # validate_assignment=True so the recording/replay mutex below also fires when + # CLI overrides are applied field-by-field (not only at construction time). + model_config = ConfigDict(validate_assignment=True) + host: str = "0.0.0.0" """Server host address.""" @@ -51,6 +55,23 @@ class ModelServiceConfig(BaseModel): request_timeout: int = Field(default=120) """Request timeout in seconds.""" + recording_file: str | None = Field(default=None) + """Recording mode output: where ForwardBackend writes the trajectory JSONL. + None → uses TRAJ_FILE (LOG_DIR/LLMTraj.jsonl).""" + + replay_file: str | None = Field(default=None) + """Replay mode input: a .jsonl trajectory file. When set, ReplayBackend serves + requests from recorded responses instead of calling a real upstream.""" + + @model_validator(mode="after") + def _recording_replay_mutually_exclusive(self): + if self.recording_file and self.replay_file: + raise ValueError( + "recording_file and replay_file are mutually exclusive — " + "set one (recording mode) or the other (replay mode), not both." + ) + return self + @classmethod def from_file(cls, config_path: str | None = None): """ diff --git a/rock/sdk/model/server/main.py b/rock/sdk/model/server/main.py index 7f8dabebe2..89e87ac0f9 100644 --- a/rock/sdk/model/server/main.py +++ b/rock/sdk/model/server/main.py @@ -11,7 +11,7 @@ from rock.logger import init_logger from rock.sdk.model.server.api.local import init_local_api, local_router from rock.sdk.model.server.api.proxy import proxy_router -from rock.sdk.model.server.config import ModelServiceConfig +from rock.sdk.model.server.config import TRAJ_FILE, ModelServiceConfig # Configure logging logger = init_logger(__name__) @@ -52,6 +52,33 @@ async def global_exception_handler(request, exc): return app +def _configure_proxy_integrations(app: FastAPI, config: ModelServiceConfig) -> None: + """Attach the appropriate backend to ``app.state.backend``. + + - Replay mode (``replay_file`` set): ``ReplayBackend`` wrapping a + ``SequentialCursor``; no recorder — replaying back into the source file + would corrupt it. + - Forward mode (default): ``ForwardBackend`` with a ``TrajectoryRecorder`` + writing to ``recording_file`` (or ``TRAJ_FILE`` if unset). + """ + from rock.sdk.model.server.api.proxy import ForwardBackend, ReplayBackend + + if config.replay_file: + from rock.sdk.model.server.traj import SequentialCursor + + cursor = SequentialCursor.load(config.replay_file) + app.state.backend = ReplayBackend(cursor) + logger.info(f"replay backend attached, replay_file={config.replay_file}") + return + + from rock.sdk.model.server.traj import TrajectoryRecorder + + recording_path = config.recording_file or TRAJ_FILE + recorder = TrajectoryRecorder(traj_file=recording_path) + app.state.backend = ForwardBackend(config, recorder=recorder) + logger.info(f"forward backend attached, recording_file={recording_path}") + + def main( model_servie_type: str, config: ModelServiceConfig, @@ -63,6 +90,7 @@ def main( asyncio.run(init_local_api()) app.include_router(local_router, prefix="", tags=["local"]) else: + _configure_proxy_integrations(app, config) app.include_router(proxy_router, prefix="", tags=["proxy"]) logger.info(f"Starting LLM Service on {config.host}:{config.port}, type: {model_servie_type}") @@ -100,6 +128,12 @@ def create_config_from_args(args) -> ModelServiceConfig: if args.request_timeout: config.request_timeout = args.request_timeout logger.info(f"request_timeout set from command line: {args.request_timeout}s") + if args.recording_file: + config.recording_file = args.recording_file + logger.info(f"recording_file set from command line: {args.recording_file}") + if args.replay_file: + config.replay_file = args.replay_file + logger.info(f"replay mode enabled via --replay-file: {args.replay_file}") return config @@ -142,6 +176,18 @@ def create_config_from_args(args) -> ModelServiceConfig: parser.add_argument( "--request-timeout", type=int, default=None, help="Request timeout in seconds. Overrides config file." ) + parser.add_argument( + "--recording-file", + type=str, + default=None, + help="Forward mode: where to write the trajectory JSONL. Defaults to TRAJ_FILE.", + ) + parser.add_argument( + "--replay-file", + type=str, + default=None, + help="Replay mode: path to a recorded .jsonl traj file. Disables real LLM upstreams.", + ) args = parser.parse_args() config = create_config_from_args(args) diff --git a/rock/sdk/model/server/sse.py b/rock/sdk/model/server/sse.py new file mode 100644 index 0000000000..f1cca034e6 --- /dev/null +++ b/rock/sdk/model/server/sse.py @@ -0,0 +1,99 @@ +"""SSE codec utilities for the chat/completions proxy. + +Three pure helpers, no openai/litellm dependencies: + +- :func:`parse_sse_data_chunks` — incremental SSE byte stream → list of decoded + ``data:`` payload dicts (used by the forward path to feed chunks into the + stream-state aggregator while bytes pass through verbatim to the client). +- :func:`completion_to_chunk_dict` — convert a non-streaming ``chat.completion`` + response into a single ``chat.completion.chunk`` dict, by renaming + ``message`` → ``delta``. Used by the replay path's streaming output. +- :func:`encode_sse_event` — encode a payload dict as ``data: \\n\\n`` + bytes (one SSE event). +""" + +from __future__ import annotations + +import json +import time +import uuid +from typing import Final + +# Terminal SSE event sent at the end of a chat/completions stream. +SSE_DONE: Final[bytes] = b"data: [DONE]\n\n" + + +def parse_sse_data_chunks(buffer: bytes) -> tuple[list[dict], bytes]: + """Extract complete SSE events from a (possibly partial) byte buffer. + + Returns ``(chunks, leftover)``: the parsed ``data:`` JSON payload dicts and + the bytes that did not yet form a complete event (``\\n\\n``-terminated). + + - ``data: [DONE]`` is skipped (terminal marker, has no JSON payload). + - Lines that don't start with ``data:`` (``event:`` / ``id:`` / blank) + are ignored. + - Malformed JSON in a ``data:`` line is silently skipped — caller logs at + its own discretion (typically ``debug``). + + Caller pattern:: + + chunks, buffer = parse_sse_data_chunks(buffer + new_bytes) + for chunk_dict in chunks: + ... feed to aggregator, etc ... + """ + chunks: list[dict] = [] + while b"\n\n" in buffer: + event, buffer = buffer.split(b"\n\n", 1) + for raw_line in event.split(b"\n"): + line = raw_line.decode("utf-8", errors="replace").strip() + if not line.startswith("data:"): + continue + payload = line[len("data:") :].strip() + if not payload or payload == "[DONE]": + continue + try: + chunks.append(json.loads(payload)) + except json.JSONDecodeError: + continue + return chunks, buffer + + +def completion_to_chunk_dict(response: dict, *, model: str) -> dict: + """Convert a recorded ``chat.completion`` dict into a single + ``chat.completion.chunk`` dict, suitable for re-streaming. + + Only ``message`` → ``delta`` is renamed; every other field (including + provider-specific extras like ``reasoning_content`` inside the message) + flows through unchanged. ``id`` / ``created`` are synthesized when missing. + + ``tool_calls`` items get a positional ``index`` injected if missing — the + OpenAI streaming spec requires it on chunk deltas (a recorded non-stream + ``message.tool_calls`` carries no ``index``, but downstream stream parsers + e.g. the openai SDK will reject the chunk without one). + """ + choices_in = response.get("choices") or [] + choices_out = [] + for choice in choices_in: + delta = dict(choice.get("message") or {}) + if "tool_calls" in delta and delta["tool_calls"]: + delta["tool_calls"] = [{"index": tc.get("index", i), **tc} for i, tc in enumerate(delta["tool_calls"])] + choices_out.append( + { + "index": choice.get("index", 0), + "delta": delta, + "finish_reason": choice.get("finish_reason"), + "logprobs": choice.get("logprobs"), + } + ) + return { + "id": response.get("id") or f"chatcmpl-{uuid.uuid4()}", + "object": "chat.completion.chunk", + "created": response.get("created") or int(time.time()), + "model": response.get("model") or model, + "choices": choices_out, + } + + +def encode_sse_event(data: dict) -> bytes: + """Encode a JSON payload as one SSE ``data:`` event (terminated by ``\\n\\n``).""" + return f"data: {json.dumps(data, ensure_ascii=False)}\n\n".encode() diff --git a/rock/sdk/model/server/traj.py b/rock/sdk/model/server/traj.py new file mode 100644 index 0000000000..e12c229c7f --- /dev/null +++ b/rock/sdk/model/server/traj.py @@ -0,0 +1,156 @@ +"""Trajectory record + replay for the chat/completions proxy. + +Two halves around the same JSONL schema (one record per line): + +- :class:`TrajectoryRecorder` — invoked by the forward path after each upstream + call (success or failure). Appends a small dict with + ``request`` / ``response`` / ``status`` / ``response_time`` / ``model`` / + ``stream``, and reports OTLP RT/count metrics. Stores responses verbatim + (provider-specific fields like ``reasoning_content`` survive); for streaming + calls ``response`` is the aggregated final ChatCompletion produced by + ``ChatCompletionStreamState.get_final_completion().model_dump()``. + +- :class:`SequentialCursor` — loads a JSONL trajectory once at startup; + ``await cursor.next(expected_model=...)`` hands out the next record (full + payload dict) and advances. Going past the end raises + :class:`TrajectoryExhausted` so the proxy can return a clean 404. +""" + +from __future__ import annotations + +import asyncio +import json +import os +from pathlib import Path +from typing import Any + +from rock.logger import init_logger +from rock.sdk.model.server.utils import ( + MODEL_SERVICE_REQUEST_COUNT, + MODEL_SERVICE_REQUEST_RT, + _get_or_create_metrics_monitor, +) + +logger = init_logger(__name__) + + +# --------------------------------------------------------------------------- +# Recorder +# --------------------------------------------------------------------------- + + +class TrajectoryRecorder: + """Appends one JSONL line per chat/completions call and reports OTLP metrics.""" + + def __init__(self, traj_file: str | os.PathLike) -> None: + self.traj_file = Path(traj_file) + self.traj_file.parent.mkdir(parents=True, exist_ok=True) + self._lock = asyncio.Lock() + self._monitor = _get_or_create_metrics_monitor() + + async def record( + self, + *, + request: dict[str, Any], + response: dict[str, Any] | None, + status: str, + start_time: float, + end_time: float, + error: str | None = None, + ) -> None: + rt_seconds = end_time - start_time + payload = { + "model": request.get("model"), + "stream": bool(request.get("stream")), + "status": status, + "response_time": rt_seconds, + "start_time": start_time, + "end_time": end_time, + "request": request, + "response": response, + "error": error, + } + + line = json.dumps(payload, ensure_ascii=False, default=str) + "\n" + async with self._lock: + await asyncio.to_thread(self._write_line, line) + + attrs = { + "type": "chat_completions", + "status": status, + "sandbox_id": os.getenv("ROCK_SANDBOX_ID", "unknown"), + } + self._monitor.record_gauge_by_name(MODEL_SERVICE_REQUEST_RT, rt_seconds * 1000.0, attributes=attrs) + self._monitor.record_counter_by_name(MODEL_SERVICE_REQUEST_COUNT, 1, attributes=attrs) + + def _write_line(self, line: str) -> None: + with self.traj_file.open("a", encoding="utf-8") as f: + f.write(line) + + +# --------------------------------------------------------------------------- +# Replay cursor +# --------------------------------------------------------------------------- + + +class TrajectoryExhausted(Exception): + """Raised by ``SequentialCursor.next`` when all recorded steps have been served.""" + + def __init__(self, position: int, total: int) -> None: + super().__init__(f"trajectory exhausted at step {position} (total recorded steps={total})") + self.position = position + self.total = total + + +class SequentialCursor: + """Hands out trajectory records one at a time, in recorded order.""" + + def __init__(self, records: list[dict]) -> None: + self.records = records + self._idx = 0 + self._lock = asyncio.Lock() + + @classmethod + def load(cls, path: str | os.PathLike) -> SequentialCursor: + path = Path(path) + if not path.is_file(): + raise FileNotFoundError(f"traj file not found: {path}") + + records: list[dict] = [] + with path.open("r", encoding="utf-8") as fp: + for line in fp: + line = line.strip() + if not line: + continue + records.append(json.loads(line)) + + logger.info(f"[traj-replay] loaded {len(records)} record(s) from {path}") + return cls(records) + + async def next(self, expected_model: str | None = None) -> dict: + async with self._lock: + if self._idx >= len(self.records): + raise TrajectoryExhausted(position=self._idx, total=len(self.records)) + record = self.records[self._idx] + self._idx += 1 + current_idx = self._idx - 1 + + if expected_model: + recorded_model = record.get("model") + if recorded_model and recorded_model != expected_model: + logger.warning( + f"[traj-replay] step {current_idx} model mismatch: " + f"recorded={recorded_model!r} requested={expected_model!r}" + ) + return record + + def reset(self) -> None: + self._idx = 0 + + @property + def position(self) -> int: + return self._idx + + @property + def total(self) -> int: + return len(self.records) diff --git a/rock/sdk/model/server/utils.py b/rock/sdk/model/server/utils.py index 20ae8896dc..639ca3995b 100644 --- a/rock/sdk/model/server/utils.py +++ b/rock/sdk/model/server/utils.py @@ -38,7 +38,7 @@ def _write_traj(data: dict): def record_traj(func: Callable): - """Decorator to record chat completions input/output as traj.""" + """Decorator to record chat completions input/output as traj (local mode only).""" @wraps(func) async def wrapper(*args, **kwargs): diff --git a/rock/sdk/model/service.py b/rock/sdk/model/service.py index b1b523ed27..24cd7ede38 100644 --- a/rock/sdk/model/service.py +++ b/rock/sdk/model/service.py @@ -17,6 +17,8 @@ def start_sandbox_service( proxy_base_url: str | None = None, retryable_status_codes: str | None = None, request_timeout: int | None = None, + recording_file: str | None = None, + replay_file: str | None = None, ) -> subprocess.Popen: """start sandbox service""" current_file = Path(__file__).resolve() @@ -38,6 +40,10 @@ def start_sandbox_service( cmd.extend(["--retryable-status-codes", retryable_status_codes]) if request_timeout: cmd.extend(["--request-timeout", str(request_timeout)]) + if recording_file: + cmd.extend(["--recording-file", recording_file]) + if replay_file: + cmd.extend(["--replay-file", replay_file]) process = subprocess.Popen(cmd, cwd=str(service_dir)) return process @@ -51,6 +57,8 @@ async def start( proxy_base_url: str | None = None, retryable_status_codes: str | None = None, request_timeout: int | None = None, + recording_file: str | None = None, + replay_file: str | None = None, ) -> str: process = self.start_sandbox_service( model_service_type=model_service_type, @@ -60,6 +68,8 @@ async def start( proxy_base_url=proxy_base_url, retryable_status_codes=retryable_status_codes, request_timeout=request_timeout, + recording_file=recording_file, + replay_file=replay_file, ) pid = process.pid diff --git a/tests/unit/cli/command/test_model_service.py b/tests/unit/cli/command/test_model_service.py new file mode 100644 index 0000000000..86849c718b --- /dev/null +++ b/tests/unit/cli/command/test_model_service.py @@ -0,0 +1,120 @@ +"""Unit tests for rock.cli.command.model_service.ModelServiceCommand. + +Drive the sub-parser end-to-end with argparse so the surface that users +actually type at the terminal is what we exercise. ``ModelService.start`` is +mocked — these tests assert wiring (argparse → handler → SDK call), not the +subprocess command construction (covered separately in +tests/unit/sdk/model/test_service.py). +""" + +from __future__ import annotations + +import argparse +import asyncio +from unittest.mock import AsyncMock + +import pytest + +from rock.cli.command.model_service import ModelServiceCommand + + +def _build_parser() -> argparse.ArgumentParser: + """Top-level parser with `model-service` subcommand wired in, same as the CLI.""" + top = argparse.ArgumentParser(prog="rock") + subparsers = top.add_subparsers(dest="command") + asyncio.run(ModelServiceCommand.add_parser_to(subparsers)) + return top + + +@pytest.fixture +def isolate_pid_file(monkeypatch, tmp_path): + """Redirect PID dir/file into tmp so arun() doesn't touch ./data/cli/model.""" + monkeypatch.setattr(ModelServiceCommand, "DEFAULT_MODEL_SERVICE_DIR", str(tmp_path)) + monkeypatch.setattr(ModelServiceCommand, "DEFAULT_MODEL_SERVICE_PID_FILE", str(tmp_path / "pid.txt")) + + +@pytest.fixture +def fake_start(monkeypatch): + """Replace ModelService.start with an AsyncMock returning a fixed pid.""" + mock = AsyncMock(return_value="12345") + monkeypatch.setattr("rock.cli.command.model_service.ModelService.start", mock) + return mock + + +# ---------- argparse: the new flags must parse ---------- + + +def test_recording_file_flag_parses(): + parser = _build_parser() + ns = parser.parse_args(["model-service", "start", "--type", "proxy", "--recording-file", "/tmp/out.jsonl"]) + assert ns.recording_file == "/tmp/out.jsonl" + assert ns.replay_file is None + + +def test_replay_file_flag_parses(): + parser = _build_parser() + ns = parser.parse_args(["model-service", "start", "--type", "proxy", "--replay-file", "/tmp/in.jsonl"]) + assert ns.replay_file == "/tmp/in.jsonl" + assert ns.recording_file is None + + +def test_neither_flag_defaults_to_none(): + parser = _build_parser() + ns = parser.parse_args(["model-service", "start", "--type", "proxy"]) + assert ns.recording_file is None + assert ns.replay_file is None + + +# ---------- handler: passes parsed args through to ModelService.start ---------- + + +def test_start_handler_forwards_recording_file(isolate_pid_file, fake_start): + parser = _build_parser() + ns = parser.parse_args( + [ + "model-service", + "start", + "--type", + "proxy", + "--proxy-base-url", + "https://api.openai.com/v1", + "--recording-file", + "/tmp/out.jsonl", + ] + ) + asyncio.run(ModelServiceCommand().arun(ns)) + + kwargs = fake_start.call_args.kwargs + assert kwargs["recording_file"] == "/tmp/out.jsonl" + assert kwargs["replay_file"] is None + assert kwargs["proxy_base_url"] == "https://api.openai.com/v1" + assert kwargs["model_service_type"] == "proxy" + + +def test_start_handler_forwards_replay_file(isolate_pid_file, fake_start): + parser = _build_parser() + ns = parser.parse_args( + [ + "model-service", + "start", + "--type", + "proxy", + "--replay-file", + "/tmp/in.jsonl", + ] + ) + asyncio.run(ModelServiceCommand().arun(ns)) + + kwargs = fake_start.call_args.kwargs + assert kwargs["replay_file"] == "/tmp/in.jsonl" + assert kwargs["recording_file"] is None + + +def test_start_handler_omits_both_when_unset(isolate_pid_file, fake_start): + parser = _build_parser() + ns = parser.parse_args(["model-service", "start", "--type", "proxy"]) + asyncio.run(ModelServiceCommand().arun(ns)) + + kwargs = fake_start.call_args.kwargs + assert kwargs["recording_file"] is None + assert kwargs["replay_file"] is None diff --git a/tests/unit/sdk/model/test_proxy.py b/tests/unit/sdk/model/test_proxy.py index edce5584cb..345ea31775 100644 --- a/tests/unit/sdk/model/test_proxy.py +++ b/tests/unit/sdk/model/test_proxy.py @@ -1,14 +1,24 @@ -from unittest.mock import AsyncMock, MagicMock, patch +"""Tests for the chat/completions proxy. + +Forward path is exercised by pointing the proxy at an httpx ``MockTransport`` +(no real network). Replay path is exercised end-to-end via the FastAPI test +client. Config / CLI / metrics-singleton tests round out the file. +""" + +import argparse +import json +from unittest.mock import MagicMock, patch import httpx import pytest import yaml -from fastapi import FastAPI, Request -from httpx import ASGITransport, AsyncClient, HTTPStatusError, Request, Response +from fastapi import FastAPI +from httpx import ASGITransport, AsyncClient -from rock.sdk.model.server.api.proxy import perform_llm_request, proxy_router +from rock.sdk.model.server.api.proxy import proxy_router from rock.sdk.model.server.config import ModelServiceConfig from rock.sdk.model.server.main import create_config_from_args, lifespan +from rock.sdk.model.server.traj import SequentialCursor from rock.sdk.model.server.utils import ( MODEL_SERVICE_REQUEST_COUNT, MODEL_SERVICE_REQUEST_RT, @@ -16,361 +26,568 @@ record_traj, ) -# Initialize a temporary FastAPI application for testing the router -test_app = FastAPI() -test_app.include_router(proxy_router) -mock_config = ModelServiceConfig() -test_app.state.model_service_config = mock_config +def _build_app(config: ModelServiceConfig, *, replay_cursor=None, recorder=None) -> FastAPI: + """Build a FastAPI app with the proxy router and the given config attached.""" + from rock.sdk.model.server.api.proxy import ForwardBackend, ReplayBackend + + app = FastAPI() + app.state.model_service_config = config + if replay_cursor is not None: + app.state.backend = ReplayBackend(replay_cursor) + else: + app.state.backend = ForwardBackend(config, recorder=recorder) + app.include_router(proxy_router) + return app + + +def _patch_httpx_with_handler(handler): + """Patch ``proxy.httpx.AsyncClient`` so each ``async with httpx.AsyncClient(...)`` + returns a real client wrapping ``MockTransport(handler)``.""" + real_client_cls = httpx.AsyncClient # capture before patching kicks in + transport = httpx.MockTransport(handler) + + def factory(*args, **kwargs): + kwargs.pop("timeout", None) # transport supplies the response, no timeout needed + return real_client_cls(transport=transport, **kwargs) + + return patch("rock.sdk.model.server.api.proxy.httpx.AsyncClient", side_effect=factory) + + +def _success_response_json(*, model: str = "gpt-3.5-turbo", content: str = "hi") -> dict: + return { + "id": "chatcmpl-1", + "object": "chat.completion", + "created": 1234, + "model": model, + "choices": [ + { + "index": 0, + "message": {"role": "assistant", "content": content}, + "finish_reason": "stop", + } + ], + "usage": {"prompt_tokens": 1, "completion_tokens": 1, "total_tokens": 2}, + } + + +# ---------- Forward path: routing ---------- @pytest.mark.asyncio -async def test_chat_completions_routing_success(): - """ - Test the high-level routing logic. - """ - patch_path = "rock.sdk.model.server.api.proxy.perform_llm_request" - - with patch(patch_path, new_callable=AsyncMock) as mock_request: - mock_resp = MagicMock(spec=Response) - mock_resp.status_code = 200 - mock_resp.json.return_value = {"id": "chat-123", "choices": []} - mock_request.return_value = mock_resp - - transport = ASGITransport(app=test_app) +async def test_forward_routes_by_model_name_to_proxy_rules(): + captured = {} + + def handler(request: httpx.Request) -> httpx.Response: + captured["url"] = str(request.url) + captured["body"] = json.loads(request.content) + return httpx.Response(200, json=_success_response_json()) + + app = _build_app(ModelServiceConfig()) + with _patch_httpx_with_handler(handler): + transport = ASGITransport(app=app) async with AsyncClient(transport=transport, base_url="http://test") as ac: - payload = {"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "hello"}]} - response = await ac.post("/v1/chat/completions", json=payload) + r = await ac.post( + "/v1/chat/completions", + json={"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "hi"}]}, + ) - assert response.status_code == 200 - call_args = mock_request.call_args[0] - assert call_args[0] == "https://api.openai.com/v1/chat/completions" - assert mock_request.called + assert r.status_code == 200 + assert captured["url"] == "https://api.openai.com/v1/chat/completions" + assert captured["body"]["model"] == "gpt-3.5-turbo" @pytest.mark.asyncio -async def test_chat_completions_fallback_to_default_when_not_found(): - """ - Test that an unrecognized model name correctly falls back to the 'default' URL. - """ - patch_path = "rock.sdk.model.server.api.proxy.perform_llm_request" - - with patch(patch_path, new_callable=AsyncMock) as mock_request: - mock_resp = MagicMock(spec=Response) - mock_resp.status_code = 200 - mock_resp.json.return_value = {"id": "chat-fallback", "choices": []} - mock_request.return_value = mock_resp - - config = test_app.state.model_service_config - default_base_url = config.proxy_rules["default"].rstrip("/") - expected_target_url = f"{default_base_url}/chat/completions" - - transport = ASGITransport(app=test_app) +async def test_forward_falls_back_to_default_for_unknown_model(): + captured = {} + + def handler(request: httpx.Request) -> httpx.Response: + captured["url"] = str(request.url) + return httpx.Response(200, json=_success_response_json(model="some-random")) + + config = ModelServiceConfig() + expected_default = config.proxy_rules["default"].rstrip("/") + "/chat/completions" + app = _build_app(config) + + with _patch_httpx_with_handler(handler): + transport = ASGITransport(app=app) async with AsyncClient(transport=transport, base_url="http://test") as ac: - payload = { - "model": "some-random-unsupported-model", # This model is NOT in proxy_rules - "messages": [{"role": "user", "content": "hello"}], - } - response = await ac.post("/v1/chat/completions", json=payload) + r = await ac.post( + "/v1/chat/completions", + json={"model": "some-random", "messages": [{"role": "user", "content": "hi"}]}, + ) + + assert r.status_code == 200 + assert captured["url"] == expected_default - assert response.status_code == 200 - # Verify that perform_llm_request was called with the DEFAULT URL - call_args = mock_request.call_args[0] - actual_url = call_args[0] +@pytest.mark.asyncio +async def test_forward_400_when_no_rule_and_no_default(): + config = ModelServiceConfig() + config.proxy_rules = {} + app = _build_app(config) + + transport = ASGITransport(app=app) + async with AsyncClient(transport=transport, base_url="http://test") as ac: + r = await ac.post( + "/v1/chat/completions", + json={"model": "any", "messages": [{"role": "user", "content": "hi"}]}, + ) - assert actual_url == expected_target_url - assert mock_request.called + assert r.status_code == 400 + assert "not configured" in r.json()["detail"] @pytest.mark.asyncio -async def test_chat_completions_routing_absolute_fail(): - """ - Test that both the specific model and the 'default' rule are missing. - """ - empty_config = ModelServiceConfig() - empty_config.proxy_rules = {} - - with patch.object(test_app.state, "model_service_config", empty_config): - transport = ASGITransport(app=test_app) +async def test_forward_proxy_base_url_overrides_proxy_rules(): + captured = {} + + def handler(request: httpx.Request) -> httpx.Response: + captured["url"] = str(request.url) + return httpx.Response(200, json=_success_response_json()) + + config = ModelServiceConfig() + config.proxy_base_url = "https://custom-endpoint.example.com/v1" + app = _build_app(config) + + with _patch_httpx_with_handler(handler): + transport = ASGITransport(app=app) async with AsyncClient(transport=transport, base_url="http://test") as ac: - payload = {"model": "any-model", "messages": [{"role": "user", "content": "hello"}]} - response = await ac.post("/v1/chat/completions", json=payload) + await ac.post( + "/v1/chat/completions", + json={"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "hi"}]}, + ) - assert response.status_code == 400 - detail = response.json()["detail"] - assert "not configured" in detail + assert captured["url"] == "https://custom-endpoint.example.com/v1/chat/completions" -@pytest.mark.asyncio -async def test_perform_llm_request_retry_on_whitelist(): - """ - Test that the proxy retries when receiving a whitelisted error code. - """ - client_post_path = "rock.sdk.model.server.api.proxy.http_client.post" +# ---------- Forward path: byte passthrough ---------- - # Patch asyncio.sleep inside the retry module to avoid actual waiting - with ( - patch(client_post_path, new_callable=AsyncMock) as mock_post, - patch("rock.utils.retry.asyncio.sleep", return_value=None), - ): - # 1. Setup Failed Response (429) - resp_429 = MagicMock(spec=Response) - resp_429.status_code = 429 - error_429 = HTTPStatusError("Rate Limited", request=MagicMock(spec=Request), response=resp_429) - # 2. Setup Success Response (200) - resp_200 = MagicMock(spec=Response) - resp_200.status_code = 200 - resp_200.json.return_value = {"ok": True} +@pytest.mark.asyncio +async def test_forward_response_body_is_byte_for_byte_passthrough(): + """Upstream's exact JSON bytes (incl. provider-specific fields) reach the client.""" + upstream_payload = { + "id": "x", + "object": "chat.completion", + "model": "glm-5", + "choices": [ + { + "index": 0, + "message": {"role": "assistant", "content": "hi", "reasoning_content": "...think..."}, + "finish_reason": "stop", + } + ], + "provider_specific_fields": {"vendor_field": "vendor_value"}, + } - # Sequence: Fail with 429, then Succeed with 200 - mock_post.side_effect = [error_429, resp_200] + def handler(request: httpx.Request) -> httpx.Response: + return httpx.Response(200, json=upstream_payload) - result = await perform_llm_request("http://fake.url", {}, {}, mock_config) + app = _build_app(ModelServiceConfig()) + with _patch_httpx_with_handler(handler): + transport = ASGITransport(app=app) + async with AsyncClient(transport=transport, base_url="http://test") as ac: + r = await ac.post( + "/v1/chat/completions", + json={"model": "glm-5", "messages": [{"role": "user", "content": "hi"}]}, + ) - assert result.status_code == 200 - assert mock_post.call_count == 2 + body = r.json() + assert body["choices"][0]["message"]["reasoning_content"] == "...think..." + assert body["provider_specific_fields"] == {"vendor_field": "vendor_value"} @pytest.mark.asyncio -async def test_perform_llm_request_no_retry_on_non_whitelist(): - """ - Test that the proxy DOES NOT retry for non-retryable codes (e.g., 401). - It should return the error response immediately. - """ - client_post_path = "rock.sdk.model.server.api.proxy.http_client.post" +async def test_forward_propagates_upstream_status_and_body_on_4xx(): + """Upstream 4xx is forwarded verbatim — proxy doesn't re-shape error JSON.""" + err_body = {"error": {"message": "context length exceeded", "type": "BadRequestError"}} + + def handler(request: httpx.Request) -> httpx.Response: + return httpx.Response(400, json=err_body) + + app = _build_app(ModelServiceConfig()) + with _patch_httpx_with_handler(handler): + transport = ASGITransport(app=app) + async with AsyncClient(transport=transport, base_url="http://test") as ac: + r = await ac.post( + "/v1/chat/completions", + json={"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "hi"}]}, + ) + + assert r.status_code == 400 + assert r.json() == err_body - with patch(client_post_path, new_callable=AsyncMock) as mock_post: - # Mock 401 Unauthorized (NOT in the retry whitelist) - resp_401 = MagicMock(spec=Response) - resp_401.status_code = 401 - resp_401.json.return_value = {"error": "Invalid API Key"} - # The function should return this response directly - mock_post.return_value = resp_401 +@pytest.mark.asyncio +async def test_forward_authorization_header_passes_through(): + captured = {} + + def handler(request: httpx.Request) -> httpx.Response: + captured["headers"] = dict(request.headers) + return httpx.Response(200, json=_success_response_json()) - result = await perform_llm_request("http://fake.url", {}, {}, mock_config) + app = _build_app(ModelServiceConfig()) + with _patch_httpx_with_handler(handler): + transport = ASGITransport(app=app) + async with AsyncClient(transport=transport, base_url="http://test") as ac: + await ac.post( + "/v1/chat/completions", + json={"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "hi"}]}, + headers={"Authorization": "Bearer sk-abc", "X-Trace": "t1"}, + ) - assert result.status_code == 401 - # Call count must be 1, meaning no retries were attempted - assert mock_post.call_count == 1 + # Authorization and custom X-* headers are forwarded verbatim. We don't assert + # on framing headers (connection / content-length / accept-encoding) because + # httpx rebuilds them itself for the outgoing request. + auth_value = captured["headers"].get("Authorization") or captured["headers"].get("authorization") + assert auth_value == "Bearer sk-abc" + fwd_lower = {k.lower() for k in captured["headers"]} + assert "x-trace" in fwd_lower @pytest.mark.asyncio -async def test_perform_llm_request_network_timeout_retry(): - """ - Test that network-level exceptions (like Timeout) also trigger retries. - """ - client_post_path = "rock.sdk.model.server.api.proxy.http_client.post" +async def test_forward_502_on_upstream_connection_failure(monkeypatch): + """ConnectError → 502. Retry disabled here to keep the test fast.""" + monkeypatch.setattr("rock.sdk.model.server.api.proxy._RETRY_MAX_ATTEMPTS", 1) - with ( - patch(client_post_path, new_callable=AsyncMock) as mock_post, - patch("rock.utils.retry.asyncio.sleep", return_value=None), - ): - resp_200 = MagicMock(spec=Response) - resp_200.status_code = 200 + def handler(request: httpx.Request) -> httpx.Response: + raise httpx.ConnectError("upstream is down") + + app = _build_app(ModelServiceConfig()) + with _patch_httpx_with_handler(handler): + transport = ASGITransport(app=app) + async with AsyncClient(transport=transport, base_url="http://test") as ac: + r = await ac.post( + "/v1/chat/completions", + json={"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "hi"}]}, + ) - mock_post.side_effect = [httpx.TimeoutException("Network Timeout"), resp_200] + assert r.status_code == 502 - result = await perform_llm_request("http://fake.url", {}, {}, mock_config) - assert result.status_code == 200 - assert mock_post.call_count == 2 +# ---------- Forward path: retry ---------- @pytest.mark.asyncio -async def test_lifespan_initialization_with_config(tmp_path): - """ - Test that the application correctly initializes and overrides defaults - when a valid configuration file path is provided. - """ - conf_file = tmp_path / "proxy.yml" - conf_file.write_text(yaml.dump({"proxy_rules": {"my-model": "http://custom-url"}, "request_timeout": 50})) +async def test_forward_retries_on_retryable_status_then_succeeds(monkeypatch): + """A 429 is retried; the next attempt's 200 is returned to the client.""" + monkeypatch.setattr("rock.sdk.model.server.api.proxy._RETRY_DELAY_SECONDS", 0.0) - # Initialize App and load config from file - config = ModelServiceConfig.from_file(str(conf_file)) - app = FastAPI(lifespan=lambda app: lifespan(app, config)) + attempts = [] - async with lifespan(app, config): - app_config = app.state.model_service_config - # Verify that the config reflects file content instead of defaults - assert app_config.proxy_rules["my-model"] == "http://custom-url" - assert app_config.request_timeout == 50 - assert "gpt-3.5-turbo" not in app_config.proxy_rules + def handler(request: httpx.Request) -> httpx.Response: + attempts.append(1) + if len(attempts) < 3: + return httpx.Response(429, json={"error": "rate limited"}) + return httpx.Response(200, json=_success_response_json(content="finally")) + + app = _build_app(ModelServiceConfig()) # default retryable_status_codes = [429, 500] + with _patch_httpx_with_handler(handler): + transport = ASGITransport(app=app) + async with AsyncClient(transport=transport, base_url="http://test") as ac: + r = await ac.post( + "/v1/chat/completions", + json={"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "hi"}]}, + ) + + assert r.status_code == 200 + assert r.json()["choices"][0]["message"]["content"] == "finally" + assert len(attempts) == 3 @pytest.mark.asyncio -async def test_lifespan_initialization_no_config(): - """ - Test that the application initializes with default ModelServiceConfig - settings when no configuration file path is provided. - """ - config = ModelServiceConfig() - app = FastAPI(lifespan=lambda app: lifespan(app, config)) +async def test_forward_returns_last_response_when_retries_exhausted(monkeypatch): + """All attempts return 429 → the final 429 body+status is forwarded verbatim.""" + monkeypatch.setattr("rock.sdk.model.server.api.proxy._RETRY_MAX_ATTEMPTS", 3) + monkeypatch.setattr("rock.sdk.model.server.api.proxy._RETRY_DELAY_SECONDS", 0.0) - async with lifespan(app, config): - app_config = app.state.model_service_config - # Verify that default rules (e.g., 'gpt-3.5-turbo') are loaded - assert "gpt-3.5-turbo" in app_config.proxy_rules - assert app_config.request_timeout == 120 + attempts = [] + def handler(request: httpx.Request) -> httpx.Response: + attempts.append(1) + return httpx.Response(429, json={"error": "still rate limited"}) -@pytest.mark.asyncio -async def test_lifespan_invalid_config_path(): - """ - Test that providing a non-existent configuration file path causes - ModelServiceConfig.from_file to raise a FileNotFoundError. - """ - # Expect FileNotFoundError when loading from non-existent file - with pytest.raises(FileNotFoundError): - ModelServiceConfig.from_file("/tmp/non_existent_file.yml") + app = _build_app(ModelServiceConfig()) + with _patch_httpx_with_handler(handler): + transport = ASGITransport(app=app) + async with AsyncClient(transport=transport, base_url="http://test") as ac: + r = await ac.post( + "/v1/chat/completions", + json={"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "hi"}]}, + ) + + assert r.status_code == 429 + assert r.json() == {"error": "still rate limited"} + assert len(attempts) == 3 @pytest.mark.asyncio -async def test_proxy_base_url_overrides_proxy_rules(tmp_path): - """ - Test that when proxy_base_url is set, all requests are forwarded to that URL, - bypassing proxy_rules entirely. - """ - config = ModelServiceConfig() - config.proxy_base_url = "https://custom-endpoint.example.com/v1" +async def test_forward_does_not_retry_non_whitelisted_status(monkeypatch): + """400 is not in retryable_status_codes → forwarded immediately, no retry.""" + monkeypatch.setattr("rock.sdk.model.server.api.proxy._RETRY_DELAY_SECONDS", 0.0) - test_app = FastAPI() - test_app.state.model_service_config = config - test_app.include_router(proxy_router) + attempts = [] - with patch("rock.sdk.model.server.api.proxy.perform_llm_request", new_callable=AsyncMock) as mock_request: - mock_resp = MagicMock(spec=Response) - mock_resp.status_code = 200 - mock_resp.json.return_value = {"id": "chat-123", "choices": []} - mock_request.return_value = mock_resp + def handler(request: httpx.Request) -> httpx.Response: + attempts.append(1) + return httpx.Response(400, json={"error": "bad request"}) - transport = ASGITransport(app=test_app) + app = _build_app(ModelServiceConfig()) + with _patch_httpx_with_handler(handler): + transport = ASGITransport(app=app) async with AsyncClient(transport=transport, base_url="http://test") as ac: - # Even when requesting gpt-3.5-turbo, should forward to proxy_base_url - payload = {"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "hello"}]} - response = await ac.post("/v1/chat/completions", json=payload) + r = await ac.post( + "/v1/chat/completions", + json={"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "hi"}]}, + ) - assert response.status_code == 200 - # Verify request was sent to proxy_base_url - call_args = mock_request.call_args[0] - assert call_args[0] == "https://custom-endpoint.example.com/v1/chat/completions" + assert r.status_code == 400 + assert len(attempts) == 1 @pytest.mark.asyncio -async def test_config_loads_host_and_port_from_file(tmp_path): - """ - Test that ModelServiceConfig correctly loads host and port from config file. - """ - conf_file = tmp_path / "proxy.yml" - conf_file.write_text( - yaml.dump({"host": "127.0.0.1", "port": 9000, "proxy_rules": {"my-model": "http://my-backend"}}) +async def test_forward_stream_retries_on_retryable_status_then_succeeds(monkeypatch): + """Streaming: 500 on first attempt, then 200 SSE on second — client sees only the 200 body.""" + monkeypatch.setattr("rock.sdk.model.server.api.proxy._RETRY_DELAY_SECONDS", 0.0) + + attempts = [] + sse_body = ( + b'data: {"id":"x","object":"chat.completion.chunk","choices":[{"index":0,' + b'"delta":{"content":"hello"},"finish_reason":null}]}\n\n' + b"data: [DONE]\n\n" ) - config = ModelServiceConfig.from_file(str(conf_file)) + def handler(request: httpx.Request) -> httpx.Response: + attempts.append(1) + if len(attempts) < 2: + return httpx.Response(500, json={"error": "internal"}) + return httpx.Response(200, content=sse_body, headers={"content-type": "text/event-stream"}) - assert config.host == "127.0.0.1" - assert config.port == 9000 - assert config.proxy_rules["my-model"] == "http://my-backend" + app = _build_app(ModelServiceConfig()) + with _patch_httpx_with_handler(handler): + transport = ASGITransport(app=app) + async with AsyncClient(transport=transport, base_url="http://test") as ac: + r = await ac.post( + "/v1/chat/completions", + json={"model": "gpt-3.5-turbo", "stream": True, "messages": [{"role": "user", "content": "hi"}]}, + ) + body = r.text + assert "hello" in body + assert "[DONE]" in body + assert "internal" not in body # the 500 attempt is not leaked to client + assert len(attempts) == 2 + + +# ---------- Forward path: recording ---------- + + +@pytest.mark.asyncio +async def test_forward_invokes_recorder_on_success(tmp_path): + """When a recorder is attached to the backend, success calls write a JSONL line.""" + from rock.sdk.model.server.traj import TrajectoryRecorder + + upstream_payload = _success_response_json(content="recorded reply") + traj_file = tmp_path / "traj.jsonl" + + def handler(request: httpx.Request) -> httpx.Response: + return httpx.Response(200, json=upstream_payload) -def test_config_default_host_and_port(): - """ - Test default values for host and port. - """ config = ModelServiceConfig() - assert config.host == "0.0.0.0" - assert config.port == 8080 + with _patch_httpx_with_handler(handler): + recorder = TrajectoryRecorder(traj_file=traj_file) + app = _build_app(config, recorder=recorder) + transport = ASGITransport(app=app) + async with AsyncClient(transport=transport, base_url="http://test") as ac: + await ac.post( + "/v1/chat/completions", + json={"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "hi"}]}, + ) + + line = traj_file.read_text(encoding="utf-8").strip() + record = json.loads(line) + assert record["status"] == "success" + assert record["model"] == "gpt-3.5-turbo" + assert record["stream"] is False + assert record["request"]["messages"][0]["content"] == "hi" + assert record["response"] == upstream_payload + + +# ---------- Replay path ---------- @pytest.mark.asyncio -async def test_config_loads_retryable_status_codes_from_file(tmp_path): - """ - Test that ModelServiceConfig correctly loads retryable_status_codes from config file. - """ - conf_file = tmp_path / "proxy.yml" - conf_file.write_text(yaml.dump({"retryable_status_codes": [429, 500, 502, 503]})) +async def test_replay_returns_recorded_response_no_upstream_call(tmp_path): + record = { + "model": "gpt-3.5-turbo", + "response": { + "id": "rec-1", + "object": "chat.completion", + "model": "gpt-3.5-turbo", + "choices": [ + { + "index": 0, + "message": {"role": "assistant", "content": "recorded reply"}, + "finish_reason": "stop", + } + ], + }, + } + traj = tmp_path / "t.jsonl" + traj.write_text(json.dumps(record) + "\n", encoding="utf-8") - config = ModelServiceConfig.from_file(str(conf_file)) + config = ModelServiceConfig() + config.replay_file = str(traj) + app = _build_app(config, replay_cursor=SequentialCursor.load(traj)) + + transport = ASGITransport(app=app) + async with AsyncClient(transport=transport, base_url="http://test") as ac: + r = await ac.post( + "/v1/chat/completions", + json={"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "hi"}]}, + ) + + assert r.status_code == 200 + assert r.json()["choices"][0]["message"]["content"] == "recorded reply" - assert config.retryable_status_codes == [429, 500, 502, 503] +@pytest.mark.asyncio +async def test_replay_streaming_emits_recorded_response_as_sse(tmp_path): + record = { + "model": "gpt-3.5-turbo", + "response": { + "id": "rec-stream", + "object": "chat.completion", + "model": "gpt-3.5-turbo", + "choices": [ + { + "index": 0, + "message": {"role": "assistant", "content": "streamed reply"}, + "finish_reason": "tool_calls", + } + ], + }, + } + traj = tmp_path / "t.jsonl" + traj.write_text(json.dumps(record) + "\n", encoding="utf-8") -def test_config_default_retryable_status_codes(): - """ - Test default values for retryable_status_codes. - """ config = ModelServiceConfig() + config.replay_file = str(traj) + app = _build_app(config, replay_cursor=SequentialCursor.load(traj)) + + transport = ASGITransport(app=app) + async with AsyncClient(transport=transport, base_url="http://test") as ac: + r = await ac.post( + "/v1/chat/completions", + json={"model": "gpt-3.5-turbo", "stream": True, "messages": [{"role": "user", "content": "hi"}]}, + ) - assert config.retryable_status_codes == [429, 500] + body = r.text + assert "data: [DONE]" in body + assert '"object": "chat.completion.chunk"' in body + assert '"delta": {"role": "assistant", "content": "streamed reply"}' in body + assert '"finish_reason": "tool_calls"' in body @pytest.mark.asyncio -async def test_perform_llm_request_respects_custom_retryable_codes(): - """ - Test that custom retryable_status_codes are respected (502 retries, 401 does not). - """ +async def test_replay_returns_404_when_cursor_exhausted(tmp_path): + record = { + "model": "gpt-3.5-turbo", + "response": { + "id": "only", + "choices": [{"index": 0, "message": {"role": "assistant", "content": "x"}, "finish_reason": "stop"}], + }, + } + traj = tmp_path / "t.jsonl" + traj.write_text(json.dumps(record) + "\n", encoding="utf-8") + config = ModelServiceConfig() - config.retryable_status_codes = [502, 503, 504] # Custom retryable status codes + config.replay_file = str(traj) + app = _build_app(config, replay_cursor=SequentialCursor.load(traj)) + + transport = ASGITransport(app=app) + async with AsyncClient(transport=transport, base_url="http://test") as ac: + await ac.post( + "/v1/chat/completions", + json={"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "hi"}]}, + ) + second = await ac.post( + "/v1/chat/completions", + json={"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "again"}]}, + ) - client_post_path = "rock.sdk.model.server.api.proxy.http_client.post" + assert second.status_code == 404 + assert "exhausted" in second.json()["detail"] - with ( - patch(client_post_path, new_callable=AsyncMock) as mock_post, - patch("rock.utils.retry.asyncio.sleep", return_value=None), - ): - # 502 should retry (in custom list) - resp_502 = MagicMock(spec=Response) - resp_502.status_code = 502 - error_502 = HTTPStatusError("Bad Gateway", request=MagicMock(spec=Request), response=resp_502) - resp_200 = MagicMock(spec=Response) - resp_200.status_code = 200 - resp_200.json.return_value = {"ok": True} +# ---------- Lifespan + Config ---------- - # Sequence: 502 fail, then 200 success - mock_post.side_effect = [error_502, resp_200] - result = await perform_llm_request("http://fake.url", {}, {}, config) +@pytest.mark.asyncio +async def test_lifespan_initialization_with_config(tmp_path): + conf_file = tmp_path / "proxy.yml" + conf_file.write_text(yaml.dump({"proxy_rules": {"my-model": "http://custom-url"}, "request_timeout": 50})) - assert result.status_code == 200 - assert mock_post.call_count == 2 + config = ModelServiceConfig.from_file(str(conf_file)) + app = FastAPI(lifespan=lambda app: lifespan(app, config)) + + async with lifespan(app, config): + assert app.state.model_service_config.proxy_rules["my-model"] == "http://custom-url" + assert app.state.model_service_config.request_timeout == 50 @pytest.mark.asyncio -async def test_perform_llm_request_non_retryable_code_not_retried(): - """ - Test that 401 (not in custom retryable_status_codes) does not trigger retry. - """ +async def test_lifespan_invalid_config_path(): + with pytest.raises(FileNotFoundError): + ModelServiceConfig.from_file("/tmp/non_existent_file.yml") + + +def test_config_default_host_and_port(): config = ModelServiceConfig() - config.retryable_status_codes = [502, 503, 504] # Custom retryable status codes, excluding 401 + assert config.host == "0.0.0.0" + assert config.port == 8080 - client_post_path = "rock.sdk.model.server.api.proxy.http_client.post" - with patch(client_post_path, new_callable=AsyncMock) as mock_post: - # 401 should not retry (not in custom list) - resp_401 = MagicMock(spec=Response) - resp_401.status_code = 401 - resp_401.json.return_value = {"error": "Invalid API Key"} +def test_config_default_recording_and_replay(): + config = ModelServiceConfig() + assert config.recording_file is None + assert config.replay_file is None + - mock_post.return_value = resp_401 +@pytest.mark.asyncio +async def test_config_loads_recording_file_from_yaml(tmp_path): + conf_file = tmp_path / "proxy.yml" + conf_file.write_text(yaml.dump({"recording_file": "/tmp/my-traj.jsonl"})) + config = ModelServiceConfig.from_file(str(conf_file)) + assert config.recording_file == "/tmp/my-traj.jsonl" + assert config.replay_file is None - result = await perform_llm_request("http://fake.url", {}, {}, config) - assert result.status_code == 401 - assert mock_post.call_count == 1 # No retry +@pytest.mark.asyncio +async def test_config_loads_replay_file_from_yaml(tmp_path): + conf_file = tmp_path / "proxy.yml" + conf_file.write_text(yaml.dump({"replay_file": "/tmp/in.jsonl"})) + config = ModelServiceConfig.from_file(str(conf_file)) + assert config.replay_file == "/tmp/in.jsonl" + assert config.recording_file is None -def test_cli_args_override_config_file(tmp_path): - """ - Test that CLI arguments override config file settings. - This tests the logic in create_config_from_args(). - """ - import argparse +def test_config_recording_and_replay_are_mutually_exclusive(): + """Setting both at construction time fails Pydantic validation.""" + with pytest.raises(ValueError, match="mutually exclusive"): + ModelServiceConfig(recording_file="/tmp/a.jsonl", replay_file="/tmp/b.jsonl") - # Create args with config file and CLI parameters + +def test_config_recording_replay_mutex_fires_on_assignment(): + """validate_assignment=True so CLI-style field-by-field overrides also trip the mutex.""" + config = ModelServiceConfig(recording_file="/tmp/a.jsonl") + with pytest.raises(ValueError, match="mutually exclusive"): + config.replay_file = "/tmp/b.jsonl" + + +def test_cli_args_override_config_file(tmp_path): conf_file = tmp_path / "proxy.yml" conf_file.write_text( yaml.dump( @@ -378,144 +595,67 @@ def test_cli_args_override_config_file(tmp_path): "host": "192.168.1.1", "port": 8080, "proxy_base_url": "https://config-url.example.com/v1", - "retryable_status_codes": [429, 500], "request_timeout": 60, } ) ) - args = argparse.Namespace( config_file=str(conf_file), - host="0.0.0.0", # CLI overrides config file - port=9000, # CLI overrides config file - proxy_base_url="https://cli-url.example.com/v1", # CLI overrides config file - retryable_status_codes="502,503", # CLI overrides config file - request_timeout=30, # CLI overrides config file + host="0.0.0.0", + port=9000, + proxy_base_url="https://cli-url.example.com/v1", + retryable_status_codes=None, + request_timeout=30, + recording_file=None, + replay_file=None, ) - config = create_config_from_args(args) - - # Verify CLI arguments override config file assert config.host == "0.0.0.0" assert config.port == 9000 assert config.proxy_base_url == "https://cli-url.example.com/v1" - assert config.retryable_status_codes == [502, 503] assert config.request_timeout == 30 -@pytest.mark.asyncio -async def test_config_file_overrides_defaults(tmp_path): - """ - Test that config file values override default values. - """ - conf_file = tmp_path / "proxy.yml" - conf_file.write_text( - yaml.dump( - { - "host": "10.0.0.1", - "port": 8888, - "request_timeout": 300, - "proxy_rules": {"test-model": "http://test-backend"}, - } - ) +def test_cli_replay_file_enables_replay(): + args = argparse.Namespace( + config_file=None, + host=None, + port=None, + proxy_base_url=None, + retryable_status_codes=None, + request_timeout=None, + recording_file=None, + replay_file="/tmp/in.jsonl", ) + config = create_config_from_args(args) + assert config.replay_file == "/tmp/in.jsonl" - config = ModelServiceConfig.from_file(str(conf_file)) - # Verify config file overrides defaults - assert config.host == "10.0.0.1" - assert config.port == 8888 - assert config.request_timeout == 300 - assert config.proxy_rules["test-model"] == "http://test-backend" - # Verify other fields remain as defaults - assert config.proxy_base_url is None +# ---------- Metrics singleton + legacy record_traj (still used by local mode) ---------- def test_metrics_monitor_is_singleton(): - """ - Test that _get_or_create_metrics_monitor returns the same instance - on repeated calls (module-level singleton, created only once). - """ import rock.sdk.model.server.utils as utils_module with patch("rock.sdk.model.server.utils.MetricsMonitor") as mock_cls: - mock_monitor = MagicMock() - mock_cls.create.return_value = mock_monitor - - # Reset singleton so the test is isolated + mock_cls.create.return_value = MagicMock() utils_module._metrics_monitor = None - first = _get_or_create_metrics_monitor() second = _get_or_create_metrics_monitor() - assert first is second - assert mock_cls.create.call_count == 1 - - # Cleanup - utils_module._metrics_monitor = None - - -def test_metrics_monitor_uses_env_endpoint(): - """ - Test that ROCK_METRICS_ENDPOINT env var is passed to MetricsMonitor.create(). - """ - import rock.sdk.model.server.utils as utils_module - - custom_endpoint = "http://my-otel-collector:4318/v1/metrics" - - with ( - patch("rock.sdk.model.server.utils.MetricsMonitor") as mock_cls, - patch.dict("os.environ", {"ROCK_METRICS_ENDPOINT": custom_endpoint}), - ): - mock_monitor = MagicMock() - mock_cls.create.return_value = mock_monitor - - utils_module._metrics_monitor = None - _get_or_create_metrics_monitor() - - mock_cls.create.assert_called_once_with(metrics_endpoint=custom_endpoint) - - utils_module._metrics_monitor = None - - -def test_metrics_monitor_registers_gauge_and_counter(): - """ - Test that _get_or_create_metrics_monitor registers both - the RT gauge and request count counter on first creation. - """ - import rock.sdk.model.server.utils as utils_module - - with patch("rock.sdk.model.server.utils.MetricsMonitor") as mock_cls: - mock_monitor = MagicMock() - mock_cls.create.return_value = mock_monitor - - utils_module._metrics_monitor = None - _get_or_create_metrics_monitor() - - mock_monitor._register_gauge.assert_called_once_with( - MODEL_SERVICE_REQUEST_RT, "total execution time for request", "ms" - ) - mock_monitor._register_counter.assert_called_once_with( - MODEL_SERVICE_REQUEST_COUNT, "total request count", "count" - ) - utils_module._metrics_monitor = None @pytest.mark.asyncio -async def test_record_traj_reports_rt_and_count(): - """ - Test that record_traj decorator calls record_gauge_by_name (RT) - and record_counter_by_name (count) with correct metric names and attributes. - """ +async def test_record_traj_decorator_reports_rt_and_count(): + """Legacy record_traj decorator (still used by local mode) reports RT/count.""" import rock.sdk.model.server.utils as utils_module - mock_monitor = MagicMock() - with ( patch("rock.sdk.model.server.utils.MetricsMonitor") as mock_cls, - patch.dict("os.environ", {"ROCK_SANDBOX_ID": "sandbox-test-001"}), + patch.dict("os.environ", {"ROCK_SANDBOX_ID": "sandbox-test"}), ): + mock_monitor = MagicMock() mock_cls.create.return_value = mock_monitor utils_module._metrics_monitor = None @@ -525,45 +665,11 @@ async def fake_handler(body: dict): await fake_handler({"model": "gpt-4", "messages": []}) - mock_monitor.record_gauge_by_name.assert_called_once() gauge_call = mock_monitor.record_gauge_by_name.call_args assert gauge_call[0][0] == MODEL_SERVICE_REQUEST_RT - assert gauge_call[1]["attributes"]["type"] == "chat_completions" - assert gauge_call[1]["attributes"]["sandbox_id"] == "sandbox-test-001" + assert gauge_call[1]["attributes"]["sandbox_id"] == "sandbox-test" - mock_monitor.record_counter_by_name.assert_called_once() counter_call = mock_monitor.record_counter_by_name.call_args assert counter_call[0][0] == MODEL_SERVICE_REQUEST_COUNT - assert counter_call[0][1] == 1 - assert counter_call[1]["attributes"]["sandbox_id"] == "sandbox-test-001" - - utils_module._metrics_monitor = None - - -@pytest.mark.asyncio -async def test_record_traj_sandbox_id_defaults_to_unknown(): - """ - Test that sandbox_id defaults to 'unknown' when ROCK_SANDBOX_ID is not set. - """ - import rock.sdk.model.server.utils as utils_module - - mock_monitor = MagicMock() - - with patch("rock.sdk.model.server.utils.MetricsMonitor") as mock_cls, patch.dict("os.environ", {}, clear=False): - # Ensure ROCK_SANDBOX_ID is not set - os_env = __import__("os").environ - os_env.pop("ROCK_SANDBOX_ID", None) - - mock_cls.create.return_value = mock_monitor - utils_module._metrics_monitor = None - - @record_traj - async def fake_handler(body: dict): - return {"id": "resp-2", "choices": []} - - await fake_handler({"model": "gpt-4", "messages": []}) - - gauge_call = mock_monitor.record_gauge_by_name.call_args - assert gauge_call[1]["attributes"]["sandbox_id"] == "unknown" utils_module._metrics_monitor = None diff --git a/tests/unit/sdk/model/test_proxy_record_replay_e2e.py b/tests/unit/sdk/model/test_proxy_record_replay_e2e.py new file mode 100644 index 0000000000..0b70ed0cf8 --- /dev/null +++ b/tests/unit/sdk/model/test_proxy_record_replay_e2e.py @@ -0,0 +1,332 @@ +"""End-to-end: real in-process TCP mock upstream + real proxy router + recorder. + +The mock upstream is a tiny FastAPI app served by uvicorn in a background thread +(real TCP). The proxy stays in-process and is hit via FastAPI's ``TestClient``; +its outbound ``httpx.AsyncClient`` makes a real TCP call to the mock — production +code path, no transport injection, no patching. +""" + +from __future__ import annotations + +import asyncio +import json +import threading +import time +from collections.abc import Iterator +from pathlib import Path + +import pytest +import uvicorn +from fastapi import FastAPI, Request +from fastapi.responses import JSONResponse, StreamingResponse +from fastapi.testclient import TestClient + +from rock.sdk.model.server.api.proxy import ForwardBackend, ReplayBackend, proxy_router +from rock.sdk.model.server.config import ModelServiceConfig +from rock.sdk.model.server.sse import parse_sse_data_chunks +from rock.sdk.model.server.traj import SequentialCursor, TrajectoryRecorder +from rock.utils.system import find_free_port + +# --------------------------------------------------------------------------- +# Mock upstream — a tiny FastAPI app behind a real TCP uvicorn in a thread. +# Owns both the canned reply AND the assertion helper, so the response shape +# and the expectations stay in sync if either is edited. +# --------------------------------------------------------------------------- + + +class MockUpstream: + """Mock OpenAI-compatible upstream. + + Single canonical reply (returned for both stream and non-stream requests) + contains three fields the proxy must preserve end-to-end: + - ``content`` (plain text) + - ``reasoning_content`` (vendor-specific thinking) + - ``tool_calls`` (a function call) + The streaming variant splits each field into multiple deltas so the + recorder also exercises the openai SDK's stream-state aggregator. + + Use as ``with MockUpstream() as mock: ...``; ``mock.base_url`` points at + the running server. ``mock.assert_message(msg)`` checks any received + assistant message matches the canonical reply. + """ + + # Canonical reply values — change here, both the handler and the assertion + # helper pick them up automatically. Two parallel tool_calls cover the + # multi-tool-call case (modern LLMs commonly emit several at once). + EXPECTED_CONTENT = "Checking weather and time for you." + EXPECTED_REASONING = "User wants weather + time; calling both tools in parallel." + EXPECTED_TOOL_CALLS = [ + { + "id": "call_weather", + "type": "function", + "function": {"name": "get_weather", "arguments": '{"city":"Tokyo","unit":"celsius"}'}, + }, + { + "id": "call_time", + "type": "function", + "function": {"name": "get_time", "arguments": '{"city":"Tokyo"}'}, + }, + ] + + def __init__(self) -> None: + port = asyncio.run(find_free_port()) + config = uvicorn.Config(self._build_app(), host="127.0.0.1", port=port, log_level="warning", access_log=False) + self._server = uvicorn.Server(config) + self._thread = threading.Thread(target=self._server.run, daemon=True) + self.base_url = f"http://127.0.0.1:{port}/v1" + + # ---- lifecycle ---- + + def __enter__(self) -> MockUpstream: + self._thread.start() + deadline = time.time() + 5.0 + while not self._server.started: + if time.time() > deadline: + raise RuntimeError("mock upstream did not start within 5s") + time.sleep(0.02) + return self + + def __exit__(self, *_exc) -> None: + self._server.should_exit = True + self._thread.join(timeout=5) + + # ---- assertion helper ---- + + def assert_message(self, msg: dict) -> None: + """Assert ``msg`` is the canonical full message (content + reasoning + 2 parallel tool_calls).""" + assert msg["content"] == self.EXPECTED_CONTENT + assert msg["reasoning_content"] == self.EXPECTED_REASONING + tcs = msg["tool_calls"] + assert len(tcs) == len(self.EXPECTED_TOOL_CALLS) + for actual, expected in zip(tcs, self.EXPECTED_TOOL_CALLS, strict=True): + assert actual["id"] == expected["id"] + assert actual["type"] == expected["type"] + assert actual["function"]["name"] == expected["function"]["name"] + assert json.loads(actual["function"]["arguments"]) == json.loads(expected["function"]["arguments"]) + + # ---- internal: FastAPI app + handlers ---- + + def _build_app(self) -> FastAPI: + app = FastAPI() + + @app.post("/v1/chat/completions") + async def chat_completions(request: Request): + body = await request.json() + model = body.get("model", "mock") + if body.get("stream"): + return StreamingResponse(self._stream_gen(model), media_type="text/event-stream") + return JSONResponse(status_code=200, content=self._completion_json(model)) + + return app + + def _completion_json(self, model: str) -> dict: + return { + "id": "chatcmpl-mock-1", + "object": "chat.completion", + "created": 0, + "model": model, + "choices": [ + { + "index": 0, + "message": { + "role": "assistant", + "content": self.EXPECTED_CONTENT, + "reasoning_content": self.EXPECTED_REASONING, + "tool_calls": self.EXPECTED_TOOL_CALLS, + }, + "finish_reason": "tool_calls", + } + ], + "usage": {"prompt_tokens": 12, "completion_tokens": 24, "total_tokens": 36}, + } + + async def _stream_gen(self, model: str): + base = {"id": "chatcmpl-mock-1", "object": "chat.completion.chunk", "created": 0, "model": model} + + def emit(delta: dict, finish_reason=None) -> bytes: + payload = {**base, "choices": [{"index": 0, "delta": delta, "finish_reason": finish_reason}]} + return f"data: {json.dumps(payload, ensure_ascii=False)}\n\n".encode() + + # 1-2. Reasoning split in two deltas + yield emit({"role": "assistant", "reasoning_content": "User wants weather + time; "}) + await asyncio.sleep(0.005) + yield emit({"reasoning_content": "calling both tools in parallel."}) + await asyncio.sleep(0.005) + # 3-4. Content split in two deltas + yield emit({"content": "Checking weather"}) + await asyncio.sleep(0.005) + yield emit({"content": " and time for you."}) + await asyncio.sleep(0.005) + + # 5-7. tool_call[0] (get_weather): announce, then arguments in two pieces + yield emit( + { + "tool_calls": [ + { + "index": 0, + "id": "call_weather", + "type": "function", + "function": {"name": "get_weather", "arguments": ""}, + } + ] + } + ) + await asyncio.sleep(0.005) + yield emit({"tool_calls": [{"index": 0, "function": {"arguments": '{"city":"Tokyo",'}}]}) + await asyncio.sleep(0.005) + yield emit({"tool_calls": [{"index": 0, "function": {"arguments": '"unit":"celsius"}'}}]}) + await asyncio.sleep(0.005) + + # 8-9. tool_call[1] (get_time): announce + arguments in one piece + yield emit( + { + "tool_calls": [ + { + "index": 1, + "id": "call_time", + "type": "function", + "function": {"name": "get_time", "arguments": ""}, + } + ] + } + ) + await asyncio.sleep(0.005) + yield emit({"tool_calls": [{"index": 1, "function": {"arguments": '{"city":"Tokyo"}'}}]}) + await asyncio.sleep(0.005) + + # 10. Finish + yield emit({}, finish_reason="tool_calls") + yield b"data: [DONE]\n\n" + + +@pytest.fixture +def mock_upstream() -> Iterator[MockUpstream]: + with MockUpstream() as m: + yield m + + +# --------------------------------------------------------------------------- +# Proxy app builder + request helper (module-level, generic) +# --------------------------------------------------------------------------- + + +def _build_proxy_app(*, mock_url: str | None = None, traj_file: Path | None = None, replay_cursor=None) -> FastAPI: + config = ModelServiceConfig() + # ReplayBackend never calls upstream, so mock_url is only relevant for forward mode. + if mock_url is not None: + config.proxy_base_url = mock_url + + app = FastAPI() + app.state.model_service_config = config + if replay_cursor is not None: + app.state.backend = ReplayBackend(replay_cursor) + else: + recorder = TrajectoryRecorder(traj_file=traj_file) if traj_file is not None else None + app.state.backend = ForwardBackend(config, recorder=recorder) + app.include_router(proxy_router) + return app + + +def _call_chat_completions(client: TestClient, *, stream: bool) -> dict: + """One chat.completions call. Returns the assistant message dict. + + - non-stream: just unwraps ``choices[0].message``. + - stream: replay always emits exactly one chunk + ``[DONE]`` (see + ``completion_to_chunk_dict``), so the chunk's ``delta`` IS the full + message — no aggregation needed. + """ + payload = {"model": "mock-model", "messages": [{"role": "user", "content": "hi"}]} + if not stream: + r = client.post("/v1/chat/completions", json=payload) + assert r.status_code == 200 + return r.json()["choices"][0]["message"] + + with client.stream("POST", "/v1/chat/completions", json={**payload, "stream": True}) as r: + assert r.status_code == 200 + body_bytes = b"".join(r.iter_bytes()) + chunks, _ = parse_sse_data_chunks(body_bytes) + return chunks[0]["choices"][0]["delta"] + + +# --------------------------------------------------------------------------- +# Tests +# --------------------------------------------------------------------------- + + +class TestProxyRecordReplay: + """End-to-end: real TCP mock upstream <-> real proxy router + recorder/replayer.""" + + def test_forward_non_stream(self, mock_upstream: MockUpstream, tmp_path): + """Vendor field reaches the client; recorder writes a JSONL line with the full response.""" + traj_file = tmp_path / "traj.jsonl" + proxy_app = _build_proxy_app(mock_url=mock_upstream.base_url, traj_file=traj_file) + + with TestClient(proxy_app) as client: + r = client.post( + "/v1/chat/completions", + json={"model": "mock-model", "messages": [{"role": "user", "content": "hi"}]}, + headers={"Authorization": "Bearer test-key"}, + ) + + assert r.status_code == 200 + body = r.json() + assert body["choices"][0]["finish_reason"] == "tool_calls" + mock_upstream.assert_message(body["choices"][0]["message"]) + + rec = json.loads(traj_file.read_text(encoding="utf-8").strip()) + assert rec["status"] == "success" + assert rec["stream"] is False + assert rec["response"]["choices"][0]["finish_reason"] == "tool_calls" + mock_upstream.assert_message(rec["response"]["choices"][0]["message"]) + + def test_forward_stream(self, mock_upstream: MockUpstream, tmp_path): + """Each upstream SSE chunk reaches the client; recorder gets the aggregated final completion + with reasoning_content concatenated and tool_calls.arguments assembled from deltas.""" + traj_file = tmp_path / "traj.jsonl" + proxy_app = _build_proxy_app(mock_url=mock_upstream.base_url, traj_file=traj_file) + + with TestClient(proxy_app) as client: + with client.stream( + "POST", + "/v1/chat/completions", + json={"model": "mock-model", "stream": True, "messages": [{"role": "user", "content": "hi"}]}, + headers={"Authorization": "Bearer test-key"}, + ) as r: + body = b"".join(r.iter_bytes()).decode("utf-8") + + # Raw chunks make it to the client untouched + assert '"reasoning_content": "User wants weather + time; "' in body + assert '"reasoning_content": "calling both tools in parallel."' in body + assert '"content": "Checking weather"' in body + assert '"content": " and time for you."' in body + assert '"name": "get_weather"' in body + assert '"name": "get_time"' in body + assert '"finish_reason": "tool_calls"' in body + assert body.rstrip().endswith("data: [DONE]") + + # Recorder's aggregated message matches the canonical reply + rec = json.loads(traj_file.read_text(encoding="utf-8").strip()) + assert rec["status"] == "success" + assert rec["stream"] is True + assert rec["response"]["choices"][0]["finish_reason"] == "tool_calls" + mock_upstream.assert_message(rec["response"]["choices"][0]["message"]) + + @pytest.mark.parametrize("replay_stream", [False, True], ids=["replay_nonstream", "replay_stream"]) + @pytest.mark.parametrize("record_stream", [False, True], ids=["record_nonstream", "record_stream"]) + def test_replay(self, mock_upstream: MockUpstream, tmp_path, record_stream: bool, replay_stream: bool): + """Recorded mode and replayed mode are orthogonal — all 4 combinations of + (stream/non-stream) on each side must yield the same full message.""" + traj_file = tmp_path / "traj.jsonl" + + # ---- record phase ---- + proxy_record = _build_proxy_app(mock_url=mock_upstream.base_url, traj_file=traj_file) + with TestClient(proxy_record) as client: + _call_chat_completions(client, stream=record_stream) + + # ---- replay phase: no upstream URL needed — ReplayBackend never calls upstream ---- + cursor = SequentialCursor.load(traj_file) + proxy_replay = _build_proxy_app(replay_cursor=cursor) + with TestClient(proxy_replay) as client: + msg = _call_chat_completions(client, stream=replay_stream) + + mock_upstream.assert_message(msg) diff --git a/tests/unit/sdk/model/test_service_subprocess.py b/tests/unit/sdk/model/test_service_subprocess.py new file mode 100644 index 0000000000..61176173bd --- /dev/null +++ b/tests/unit/sdk/model/test_service_subprocess.py @@ -0,0 +1,38 @@ +"""Tests for ModelService.start_sandbox_service subprocess command construction. + +Covers the CLI flag wiring without actually spawning a subprocess: mock Popen +and inspect the argv it would have been called with. +""" + +from unittest.mock import patch + +from rock.sdk.model.service import ModelService + + +def _captured_argv(**start_kwargs) -> list[str]: + with patch("rock.sdk.model.service.subprocess.Popen") as mock_popen: + ModelService().start_sandbox_service(**start_kwargs) + return mock_popen.call_args[0][0] + + +def test_start_sandbox_service_omits_recording_and_replay_flags_by_default(): + argv = _captured_argv(model_service_type="proxy", proxy_base_url="https://api.openai.com/v1", port=8080) + assert argv[1:5] == ["-m", "main", "--type", "proxy"] + assert "--proxy-base-url" in argv and "https://api.openai.com/v1" in argv + assert "--port" in argv and "8080" in argv + assert "--recording-file" not in argv + assert "--replay-file" not in argv + + +def test_start_sandbox_service_passes_recording_file(): + argv = _captured_argv(model_service_type="proxy", recording_file="/tmp/my-traj.jsonl") + idx = argv.index("--recording-file") + assert argv[idx + 1] == "/tmp/my-traj.jsonl" + assert "--replay-file" not in argv + + +def test_start_sandbox_service_passes_replay_file(): + argv = _captured_argv(model_service_type="proxy", replay_file="/tmp/in.jsonl") + idx = argv.index("--replay-file") + assert argv[idx + 1] == "/tmp/in.jsonl" + assert "--recording-file" not in argv diff --git a/tests/unit/sdk/model/test_sse.py b/tests/unit/sdk/model/test_sse.py new file mode 100644 index 0000000000..251016a0a8 --- /dev/null +++ b/tests/unit/sdk/model/test_sse.py @@ -0,0 +1,223 @@ +"""Tests for the pure SSE codec utilities (no openai/litellm dependencies).""" + +import json + +from rock.sdk.model.server.sse import ( + SSE_DONE, + completion_to_chunk_dict, + encode_sse_event, + parse_sse_data_chunks, +) + +# ---------- parse_sse_data_chunks ---------- + + +def test_parse_returns_complete_events_and_leftover_buffer(): + raw = b'data: {"a": 1}\n\ndata: {"a": 2}\n\ndata: {"a": 3}' # 3rd event is incomplete + chunks, leftover = parse_sse_data_chunks(raw) + + assert chunks == [{"a": 1}, {"a": 2}] + assert leftover == b'data: {"a": 3}' + + +def test_parse_skips_done_marker(): + raw = b'data: {"x": 1}\n\ndata: [DONE]\n\n' + chunks, leftover = parse_sse_data_chunks(raw) + + assert chunks == [{"x": 1}] + assert leftover == b"" + + +def test_parse_skips_non_data_lines(): + raw = b'event: progress\ndata: {"y": 2}\nid: abc\n\n' + chunks, leftover = parse_sse_data_chunks(raw) + + assert chunks == [{"y": 2}] + assert leftover == b"" + + +def test_parse_silently_skips_malformed_json(): + raw = b'data: not-json-at-all\n\ndata: {"ok": true}\n\n' + chunks, leftover = parse_sse_data_chunks(raw) + + assert chunks == [{"ok": True}] + assert leftover == b"" + + +def test_parse_handles_empty_buffer(): + chunks, leftover = parse_sse_data_chunks(b"") + assert chunks == [] + assert leftover == b"" + + +def test_parse_incremental_streaming_pattern(): + """Simulates feeding bytes in arbitrary chunks; final concatenation == all events.""" + full_stream = b'data: {"i": 0}\n\ndata: {"i": 1}\n\ndata: {"i": 2}\n\ndata: [DONE]\n\n' + fragments = [full_stream[i : i + 5] for i in range(0, len(full_stream), 5)] + + buffer = b"" + collected: list[dict] = [] + for frag in fragments: + new_chunks, buffer = parse_sse_data_chunks(buffer + frag) + collected.extend(new_chunks) + + assert collected == [{"i": 0}, {"i": 1}, {"i": 2}] + assert buffer == b"" + + +def test_parse_handles_unicode_payload(): + raw = b'data: {"content": "\xe4\xbd\xa0\xe5\xa5\xbd"}\n\n' # "你好" UTF-8 + chunks, _ = parse_sse_data_chunks(raw) + assert chunks == [{"content": "你好"}] + + +# ---------- completion_to_chunk_dict ---------- + + +def test_completion_to_chunk_renames_message_to_delta(): + response = { + "id": "rec-1", + "object": "chat.completion", + "created": 100, + "model": "gpt-4", + "choices": [ + { + "index": 0, + "message": {"role": "assistant", "content": "hi"}, + "finish_reason": "stop", + } + ], + } + chunk = completion_to_chunk_dict(response, model="gpt-4") + + assert chunk["object"] == "chat.completion.chunk" + assert chunk["id"] == "rec-1" + assert chunk["created"] == 100 + assert chunk["model"] == "gpt-4" + assert chunk["choices"][0]["delta"] == {"role": "assistant", "content": "hi"} + assert chunk["choices"][0]["finish_reason"] == "stop" + assert chunk["choices"][0]["index"] == 0 + assert "message" not in chunk["choices"][0] + + +def test_completion_to_chunk_preserves_provider_specific_message_fields(): + """reasoning_content kept verbatim; tool_calls get a positional index injected + (required by the OpenAI streaming spec — see test below).""" + response = { + "choices": [ + { + "index": 0, + "message": { + "role": "assistant", + "content": "answer", + "reasoning_content": "step-by-step thinking", + "tool_calls": [{"id": "t1", "type": "function"}], + }, + "finish_reason": "tool_calls", + } + ], + } + chunk = completion_to_chunk_dict(response, model="glm-5") + + assert chunk["choices"][0]["delta"]["reasoning_content"] == "step-by-step thinking" + assert chunk["choices"][0]["delta"]["tool_calls"] == [{"index": 0, "id": "t1", "type": "function"}] + assert chunk["choices"][0]["finish_reason"] == "tool_calls" + + +def test_completion_to_chunk_injects_tool_call_index_for_openai_sdk_compat(): + """A recorded non-stream message has tool_calls without 'index'; the OpenAI + streaming spec requires it on chunk deltas, and the openai SDK's + ChatCompletionChunk.model_validate() rejects the chunk otherwise. We inject + a positional index so replay-stream output is parseable by strict clients.""" + response = { + "choices": [ + { + "index": 0, + "message": { + "role": "assistant", + "tool_calls": [ + {"id": "a", "type": "function", "function": {"name": "f1", "arguments": "{}"}}, + {"id": "b", "type": "function", "function": {"name": "f2", "arguments": "{}"}}, + ], + }, + "finish_reason": "tool_calls", + } + ], + } + chunk = completion_to_chunk_dict(response, model="m") + tcs = chunk["choices"][0]["delta"]["tool_calls"] + assert [tc["index"] for tc in tcs] == [0, 1] + + # End-to-end: openai SDK accepts the chunk + from openai.types.chat import ChatCompletionChunk + + ChatCompletionChunk.model_validate(chunk) # must not raise + + +def test_completion_to_chunk_preserves_explicit_tool_call_index(): + """If the recorded tool_calls already have 'index', we don't overwrite it.""" + response = { + "choices": [ + { + "index": 0, + "message": { + "role": "assistant", + "tool_calls": [ + {"index": 5, "id": "a", "type": "function", "function": {"name": "f", "arguments": "{}"}}, + ], + }, + "finish_reason": "tool_calls", + } + ], + } + chunk = completion_to_chunk_dict(response, model="m") + assert chunk["choices"][0]["delta"]["tool_calls"][0]["index"] == 5 + + +def test_completion_to_chunk_synthesizes_id_and_created_when_missing(): + chunk = completion_to_chunk_dict( + {"choices": [{"index": 0, "message": {"role": "assistant"}, "finish_reason": "stop"}]}, + model="any", + ) + assert chunk["id"].startswith("chatcmpl-") + assert isinstance(chunk["created"], int) and chunk["created"] > 0 + assert chunk["model"] == "any" + + +def test_completion_to_chunk_handles_empty_choices(): + chunk = completion_to_chunk_dict({"choices": []}, model="m") + assert chunk["choices"] == [] + + +# ---------- encode_sse_event ---------- + + +def test_encode_sse_event_appends_double_newline_terminator(): + out = encode_sse_event({"k": "v"}) + assert out.endswith(b"\n\n") + assert out.startswith(b"data: ") + body = out[len(b"data: ") : -len(b"\n\n")] + assert json.loads(body) == {"k": "v"} + + +def test_encode_sse_event_preserves_unicode_without_escapes(): + out = encode_sse_event({"content": "你好"}) + # ensure_ascii=False is critical so Chinese stays readable in the wire format + assert "你好".encode() in out + + +def test_sse_done_constant(): + assert SSE_DONE == b"data: [DONE]\n\n" + + +# ---------- round-trip ---------- + + +def test_roundtrip_encode_then_parse(): + """encode → parse must round-trip a payload dict.""" + payloads = [{"i": 0, "text": "alpha"}, {"i": 1, "text": "beta 中文"}] + wire = b"".join(encode_sse_event(p) for p in payloads) + SSE_DONE + chunks, leftover = parse_sse_data_chunks(wire) + + assert chunks == payloads + assert leftover == b"" diff --git a/tests/unit/sdk/model/test_traj_recorder.py b/tests/unit/sdk/model/test_traj_recorder.py new file mode 100644 index 0000000000..3f06481639 --- /dev/null +++ b/tests/unit/sdk/model/test_traj_recorder.py @@ -0,0 +1,141 @@ +"""Tests for TrajectoryRecorder (explicit-call API, no longer a litellm CustomLogger).""" + +import json +from unittest.mock import MagicMock, patch + +import pytest + +from rock.sdk.model.server.traj import TrajectoryRecorder + + +@pytest.fixture +def mock_monitor(): + monitor = MagicMock() + with patch( + "rock.sdk.model.server.traj._get_or_create_metrics_monitor", + return_value=monitor, + ): + yield monitor + + +def _make_recorder(traj_file) -> TrajectoryRecorder: + return TrajectoryRecorder(traj_file=traj_file) + + +@pytest.mark.asyncio +async def test_recorder_appends_each_call_as_jsonl_line(tmp_path, mock_monitor): + traj_file = tmp_path / "traj.jsonl" + recorder = _make_recorder(traj_file) + + await recorder.record( + request={"model": "gpt-4", "messages": [{"role": "user", "content": "hi"}]}, + response={"id": "a", "choices": []}, + status="success", + start_time=100.0, + end_time=100.5, + ) + await recorder.record( + request={"model": "gpt-4", "messages": [{"role": "user", "content": "again"}]}, + response={"id": "b", "choices": []}, + status="success", + start_time=101.0, + end_time=101.2, + ) + + lines = traj_file.read_text(encoding="utf-8").strip().split("\n") + assert len(lines) == 2 + assert json.loads(lines[0])["response"]["id"] == "a" + assert json.loads(lines[1])["response"]["id"] == "b" + + +@pytest.mark.asyncio +async def test_recorder_writes_request_and_response_verbatim(tmp_path, mock_monitor): + """Provider-specific fields (reasoning_content, citations, ...) survive untouched.""" + traj_file = tmp_path / "traj.jsonl" + recorder = _make_recorder(traj_file) + + request = {"model": "glm-5", "stream": True, "messages": [{"role": "user", "content": "你是谁"}]} + response = { + "id": "x", + "choices": [ + { + "index": 0, + "message": {"role": "assistant", "content": "我是 GLM", "reasoning_content": "用户问..."}, + "finish_reason": "stop", + } + ], + } + await recorder.record(request=request, response=response, status="success", start_time=0.0, end_time=1.0) + + record = json.loads(traj_file.read_text(encoding="utf-8").strip()) + assert record["model"] == "glm-5" + assert record["stream"] is True + assert record["request"] == request + assert record["response"] == response + assert record["response_time"] == 1.0 + + +@pytest.mark.asyncio +async def test_recorder_emits_metrics_with_status_and_sandbox_id(tmp_path, mock_monitor): + traj_file = tmp_path / "traj.jsonl" + recorder = _make_recorder(traj_file) + + with patch.dict("os.environ", {"ROCK_SANDBOX_ID": "sandbox-xyz"}): + await recorder.record( + request={"model": "gpt-4"}, + response={"id": "x", "choices": []}, + status="success", + start_time=0.0, + end_time=0.5, + ) + + gauge_call = mock_monitor.record_gauge_by_name.call_args + assert gauge_call[0][0] == "model_service.request.rt" + assert gauge_call[0][1] == 500.0 # 0.5s -> 500 ms + assert gauge_call[1]["attributes"]["status"] == "success" + assert gauge_call[1]["attributes"]["sandbox_id"] == "sandbox-xyz" + assert gauge_call[1]["attributes"]["type"] == "chat_completions" + + mock_monitor.record_counter_by_name.assert_called_once_with( + "model_service.request.count", 1, attributes=gauge_call[1]["attributes"] + ) + + +@pytest.mark.asyncio +async def test_recorder_records_failure_with_error_text(tmp_path, mock_monitor): + traj_file = tmp_path / "traj.jsonl" + recorder = _make_recorder(traj_file) + + await recorder.record( + request={"model": "gpt-4"}, + response=None, + status="failure", + start_time=0.0, + end_time=1.0, + error="upstream_status=429", + ) + + record = json.loads(traj_file.read_text(encoding="utf-8").strip()) + assert record["status"] == "failure" + assert record["error"] == "upstream_status=429" + assert record["response"] is None + + gauge_call = mock_monitor.record_gauge_by_name.call_args + assert gauge_call[1]["attributes"]["status"] == "failure" + + +@pytest.mark.asyncio +async def test_recorder_creates_parent_directory(tmp_path, mock_monitor): + traj_file = tmp_path / "deep" / "nested" / "traj.jsonl" + recorder = _make_recorder(traj_file) + + await recorder.record( + request={"model": "gpt-4"}, + response={"id": "x", "choices": []}, + status="success", + start_time=0.0, + end_time=0.5, + ) + + assert traj_file.exists() + assert traj_file.parent.is_dir() diff --git a/tests/unit/sdk/model/test_traj_replayer.py b/tests/unit/sdk/model/test_traj_replayer.py new file mode 100644 index 0000000000..ffcc5c4011 --- /dev/null +++ b/tests/unit/sdk/model/test_traj_replayer.py @@ -0,0 +1,122 @@ +"""Tests for SequentialCursor (the replay cursor used by proxy.py). + +The proxy serves replay responses directly — there is no CustomLLM-based +``TrajectoryReplayer`` anymore. End-to-end replay coverage (cursor + SSE chunk +emit + cursor-exhausted → 404) lives in ``test_proxy.py``. +""" + +import json + +import pytest + +from rock.sdk.model.server.traj import SequentialCursor, TrajectoryExhausted + + +def _record(*, msg: str, model: str = "gpt-3.5-turbo", call_id: str = "x") -> dict: + return { + "id": call_id, + "model": model, + "messages": [{"role": "user", "content": msg}], + "response": { + "id": call_id, + "object": "chat.completion", + "model": model, + "choices": [ + { + "index": 0, + "message": {"role": "assistant", "content": f"reply: {msg}"}, + "finish_reason": "stop", + } + ], + "usage": {"prompt_tokens": 1, "completion_tokens": 1, "total_tokens": 2}, + }, + } + + +def _write_jsonl(path, records): + with path.open("w", encoding="utf-8") as f: + for r in records: + f.write(json.dumps(r) + "\n") + + +def test_cursor_load_from_single_file(tmp_path): + p = tmp_path / "traj.jsonl" + _write_jsonl(p, [_record(msg="a"), _record(msg="b")]) + + cur = SequentialCursor.load(p) + assert cur.total == 2 + assert cur.position == 0 + + +def test_cursor_load_skips_empty_lines(tmp_path): + p = tmp_path / "traj.jsonl" + p.write_text( + json.dumps(_record(msg="a")) + "\n\n \n" + json.dumps(_record(msg="b")) + "\n", + encoding="utf-8", + ) + + cur = SequentialCursor.load(p) + assert cur.total == 2 + + +def test_cursor_load_missing_file_raises(tmp_path): + with pytest.raises(FileNotFoundError): + SequentialCursor.load(tmp_path / "missing.jsonl") + + +def test_cursor_load_directory_raises(tmp_path): + """Path must be a single .jsonl file, not a directory.""" + with pytest.raises(FileNotFoundError): + SequentialCursor.load(tmp_path) + + +@pytest.mark.asyncio +async def test_cursor_next_returns_records_in_order(tmp_path): + p = tmp_path / "traj.jsonl" + _write_jsonl(p, [_record(msg="a", call_id="1"), _record(msg="b", call_id="2")]) + + cur = SequentialCursor.load(p) + first = await cur.next() + second = await cur.next() + + assert first["id"] == "1" + assert second["id"] == "2" + assert cur.position == 2 + + +@pytest.mark.asyncio +async def test_cursor_next_raises_trajectory_exhausted_when_done(tmp_path): + p = tmp_path / "traj.jsonl" + _write_jsonl(p, [_record(msg="only")]) + + cur = SequentialCursor.load(p) + await cur.next() + + with pytest.raises(TrajectoryExhausted) as exc_info: + await cur.next() + assert exc_info.value.position == 1 + assert exc_info.value.total == 1 + + +@pytest.mark.asyncio +async def test_cursor_reset_replays_from_start(tmp_path): + p = tmp_path / "traj.jsonl" + _write_jsonl(p, [_record(msg="a"), _record(msg="b")]) + + cur = SequentialCursor.load(p) + await cur.next() + await cur.next() + cur.reset() + + again = await cur.next() + assert again["messages"][0]["content"] == "a" + + +@pytest.mark.asyncio +async def test_cursor_model_mismatch_only_warns(tmp_path): + p = tmp_path / "traj.jsonl" + _write_jsonl(p, [_record(msg="a", model="gpt-3.5-turbo")]) + + cur = SequentialCursor.load(p) + record = await cur.next(expected_model="gpt-4o") # different model -> warn but don't raise + assert record["id"] == "x" diff --git a/uv.lock b/uv.lock index e00a7f86b3..cfed10409c 100644 --- a/uv.lock +++ b/uv.lock @@ -1196,6 +1196,15 @@ wheels = [ { url = "https://mirrors.aliyun.com/pypi/packages/33/6b/e0547afaf41bf2c42e52430072fa5658766e3d65bd4b03a563d1b6336f57/distlib-0.4.0-py2.py3-none-any.whl", hash = "sha256:9659f7d87e46584a30b5780e43ac7a2143098441670ff0a49d5f9034c54a6c16" }, ] +[[package]] +name = "distro" +version = "1.9.0" +source = { registry = "https://mirrors.aliyun.com/pypi/simple/" } +sdist = { url = "https://mirrors.aliyun.com/pypi/packages/fc/f8/98eea607f65de6527f8a2e8885fc8015d3e6f5775df186e443e0964a11c3/distro-1.9.0.tar.gz", hash = "sha256:2fa77c6fd8940f116ee1d6b94a2f90b13b5ea8d019b98bc8bafdcabcdd9bdbed" } +wheels = [ + { url = "https://mirrors.aliyun.com/pypi/packages/12/b3/231ffd4ab1fc9d679809f356cebee130ac7daa00d6d6f3206dd4fd137e9e/distro-1.9.0-py3-none-any.whl", hash = "sha256:7bffd925d65168f85027d8da9af6bddab658135b840670a223589bc0c8ef02b2" }, +] + [[package]] name = "docker" version = "7.1.0" @@ -1919,6 +1928,109 @@ wheels = [ { url = "https://mirrors.aliyun.com/pypi/packages/cb/b1/3846dd7f199d53cb17f49cba7e651e9ce294d8497c8c150530ed11865bb8/iniconfig-2.3.0-py3-none-any.whl", hash = "sha256:f631c04d2c48c52b84d0d0549c99ff3859c98df65b3101406327ecc7d53fbf12" }, ] +[[package]] +name = "jiter" +version = "0.14.0" +source = { registry = "https://mirrors.aliyun.com/pypi/simple/" } +sdist = { url = "https://mirrors.aliyun.com/pypi/packages/6e/c1/0cddc6eb17d4c53a99840953f95dd3accdc5cfc7a337b0e9b26476276be9/jiter-0.14.0.tar.gz", hash = "sha256:e8a39e66dac7153cf3f964a12aad515afa8d74938ec5cc0018adcdae5367c79e" } +wheels = [ + { url = "https://mirrors.aliyun.com/pypi/packages/64/2e/a9959997739c403378d0a4a3a1c4ed80b60aeace216c4d37b303a9fc60a4/jiter-0.14.0-cp310-cp310-macosx_10_12_x86_64.whl", hash = "sha256:02f36a5c700f105ac04a6556fe664a59037a2c200db3b7e88784fac2ddf02531" }, + { url = "https://mirrors.aliyun.com/pypi/packages/27/72/b6de8a531e0adbadd839bec301165feb1fccf00e9ff55073ba2dd20f0043/jiter-0.14.0-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:41eab6c09ceffb6f0fe25e214b3068146edb1eda3649ca2aee2a061029c7ba2e" }, + { url = "https://mirrors.aliyun.com/pypi/packages/db/d8/2040b9efa13c917f855c40890ae4119fe02c25b7c7677d5b4fa820a851fc/jiter-0.14.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:5cf4d4c109641f9cfaf4a7b6aebd51654e405cd00fa9ebbf87163b8b97b325aa" }, + { url = "https://mirrors.aliyun.com/pypi/packages/49/62/655c0ad5ce6a8e90f9068c175b8a236877d753e460762b3183c136db1c5b/jiter-0.14.0-cp310-cp310-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:b80c7b41a628e6be2213ad0ece763c5f88aa5ee003fa394d58acaaee1f4b8342" }, + { url = "https://mirrors.aliyun.com/pypi/packages/f1/66/549c40fa068f08710b7570869c306a051eb67a29758bd64f4114f730554c/jiter-0.14.0-cp310-cp310-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:fb3dbf7cc0d4dbe73cce307ebe7eefa7f73a7d3d854dd119ea0c243f03e40927" }, + { url = "https://mirrors.aliyun.com/pypi/packages/25/2f/97a32a05fed14ed58a18e181fdfb619e05163f3726b54ee6080ec0539c09/jiter-0.14.0-cp310-cp310-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:7054adcdeb06b46efd17b5734f75817a44a2d06d3748e36c3a023a1bb52af9ec" }, + { url = "https://mirrors.aliyun.com/pypi/packages/2a/3b/4347e1d6c2a973d653bbb7a2d671a2d2426e54b52ba735b8ff0d0a29b75c/jiter-0.14.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:d597cd1bf6790376f3fffc7c708766e57301d99a19314824ea0ccc9c3c70e1e2" }, + { url = "https://mirrors.aliyun.com/pypi/packages/ef/24/ca452fbf2ea33548ed30ce68a39a50442d3f7c9bf0704a7af958a930c057/jiter-0.14.0-cp310-cp310-manylinux_2_31_riscv64.whl", hash = "sha256:df63a14878da754427926281626fd3ee249424a186e25a274e78176d42945264" }, + { url = "https://mirrors.aliyun.com/pypi/packages/e3/a3/94470a0d199287caabeb4da2bb2ae5f6d17f3cf05dfc975d7cb064d58e0f/jiter-0.14.0-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:4ea73187627bcc5810e085df715e8a99da8bdfd96a7eb36b4b4df700ba6d4c9c" }, + { url = "https://mirrors.aliyun.com/pypi/packages/cf/71/6768edc09d7c45c39f093feb3de105fa718a3e982b5208b8a2ed6382b44b/jiter-0.14.0-cp310-cp310-musllinux_1_1_aarch64.whl", hash = "sha256:9f541eaf7bb8382367a1a23d6fc3d6aad57f8dd8c18c3c17f838bee20f217220" }, + { url = "https://mirrors.aliyun.com/pypi/packages/3d/6b/5c2e17559a0f4e96e934479f7137df46c939e983fa05244e674815befb73/jiter-0.14.0-cp310-cp310-musllinux_1_1_x86_64.whl", hash = "sha256:107465250de4fce00fdb47166bcd51df8e634e049541174fe3c71848e44f52ce" }, + { url = "https://mirrors.aliyun.com/pypi/packages/b1/83/c25f3556a60fc74d11199100f1b6cc0c006b815c8494dea8ca16fe398732/jiter-0.14.0-cp310-cp310-win32.whl", hash = "sha256:ffb2a08a406465bb076b7cc1df41d833106d3cf7905076cc73f0cb90078c7d10" }, + { url = "https://mirrors.aliyun.com/pypi/packages/2e/99/781a1b413f0989b7f2ea203b094b331685f1a35e52e0a45e5d000ecaab27/jiter-0.14.0-cp310-cp310-win_amd64.whl", hash = "sha256:cb8b682d10cb0cce7ff4c1af7244af7022c9b01ae16d46c357bdd0df13afb25d" }, + { url = "https://mirrors.aliyun.com/pypi/packages/8a/1f/198ae537fccb7080a0ed655eb56abf64a92f79489dfbf79f40fa34225bcd/jiter-0.14.0-cp311-cp311-macosx_10_12_x86_64.whl", hash = "sha256:7e791e247b8044512e070bd1f3633dc08350d32776d2d6e7473309d0edf256a2" }, + { url = "https://mirrors.aliyun.com/pypi/packages/cf/34/da67cff3fce964a36d03c3e365fb0f8726ade2a6cfd4d3c70107e216ead6/jiter-0.14.0-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:71527ce13fd5a0c4e40ad37331f8c547177dbb2dd0a93e5278b6a5eecf748804" }, + { url = "https://mirrors.aliyun.com/pypi/packages/ed/36/4c72e67180d4e71a4f5dcf7886d0840e83c49ab11788172177a77570326e/jiter-0.14.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:02c4a7ab56f746014874f2c525584c0daca1dec37f66fd707ecef3b7e5c2228c" }, + { url = "https://mirrors.aliyun.com/pypi/packages/bc/db/9b39e09ceafa9878235c0fc29e3e3f9b12a4c6a98ea3085b998cadf3accc/jiter-0.14.0-cp311-cp311-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:376e9dafff914253bb9d46cdc5f7965607fbe7feb0a491c34e35f92b2770702e" }, + { url = "https://mirrors.aliyun.com/pypi/packages/b0/96/0dcba1d7a82c1b720774b48ef239376addbaf30df24c34742ac4a57b67b2/jiter-0.14.0-cp311-cp311-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:23ad2a7a9da1935575c820428dd8d2490ce4d23189691ce33da1fc0a58e14e1c" }, + { url = "https://mirrors.aliyun.com/pypi/packages/f1/e3/f61b71543e746e6b8b805e7755814fc242715c16f1dba58e1cbccb8032c2/jiter-0.14.0-cp311-cp311-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:54b3ddf5786bc7732d293bba3411ac637ecfa200a39983166d1df86a59a43c9f" }, + { url = "https://mirrors.aliyun.com/pypi/packages/ad/5e/0ddeb7096aca099114abe36c4921016e8d251e6f35f5890240b31f1f60ae/jiter-0.14.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:5c001d5a646c2a50dc055dd526dad5d5245969e8234d2b1131d0451e81f3a373" }, + { url = "https://mirrors.aliyun.com/pypi/packages/e9/d1/fe0c46cd7fda9cad8f1ff9ad217dc61f1e4280b21052ec6dfe88c1446ef2/jiter-0.14.0-cp311-cp311-manylinux_2_31_riscv64.whl", hash = "sha256:834bb5bdabca2e91592a03d373838a8d0a1b8bbde7077ae6913fd2fc51812d00" }, + { url = "https://mirrors.aliyun.com/pypi/packages/ac/21/f5317f91729b501019184771c80d60abd89907009e7bfa6c7e348c5bdd44/jiter-0.14.0-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:4e9178be60e229b1b2b0710f61b9e24d1f4f8556985a83ff4c4f95920eea7314" }, + { url = "https://mirrors.aliyun.com/pypi/packages/e9/05/79d8f33fb2bf168db0df5c9cd16fe440a8ada57e929d3677b22712c2568f/jiter-0.14.0-cp311-cp311-musllinux_1_1_aarch64.whl", hash = "sha256:a7e4ccff04ec03614e62c613e976a3a5860dc9714ce8266f44328bdc8b1cab2c" }, + { url = "https://mirrors.aliyun.com/pypi/packages/5c/00/d1e3ff3d2a465e67f08507d74bafb2dcd29eba91dc939820e39e8dea38b8/jiter-0.14.0-cp311-cp311-musllinux_1_1_x86_64.whl", hash = "sha256:69539d936fb5d55caf6ecd33e2e884de083ff0ea28579780d56c4403094bb8d9" }, + { url = "https://mirrors.aliyun.com/pypi/packages/60/5b/bbb2189f62ace8d95e869aa4c84c9946616f301e2d02895a6f20dcc3bba3/jiter-0.14.0-cp311-cp311-win32.whl", hash = "sha256:4927d09b3e572787cc5e0a5318601448e1ab9391bcef95677f5840c2d00eaa6d" }, + { url = "https://mirrors.aliyun.com/pypi/packages/b8/86/c500b53dcbf08575f5963e536ebd757a1f7c568272ba5d180b212c9a87fb/jiter-0.14.0-cp311-cp311-win_amd64.whl", hash = "sha256:42d6ed359ac49eb922fdd565f209c57340aa06d589c84c8413e42a0f9ae1b842" }, + { url = "https://mirrors.aliyun.com/pypi/packages/75/4a/a676249049d42cb29bef82233e4fe0524d414cbe3606c7a4b311193c2f77/jiter-0.14.0-cp311-cp311-win_arm64.whl", hash = "sha256:6dd689f5f4a5a33747b28686e051095beb214fe28cfda5e9fe58a295a788f593" }, + { url = "https://mirrors.aliyun.com/pypi/packages/5a/68/7390a418f10897da93b158f2d5a8bd0bcd73a0f9ec3bb36917085bb759ef/jiter-0.14.0-cp312-cp312-macosx_10_12_x86_64.whl", hash = "sha256:2fb2ce3a7bc331256dfb14cefc34832366bb28a9aca81deaf43bbf2a5659e607" }, + { url = "https://mirrors.aliyun.com/pypi/packages/60/a0/5854ac00ff63551c52c6c89534ec6aba4b93474e7924d64e860b1c94165b/jiter-0.14.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:5252a7ca23785cef5d02d4ece6077a1b556a410c591b379f82091c3001e14844" }, + { url = "https://mirrors.aliyun.com/pypi/packages/41/a1/4f44832650a16b18e8391f1bf1d6ca4909bc738351826bcc198bba4357f4/jiter-0.14.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:c409578cbd77c338975670ada777add4efd53379667edf0aceea730cabede6fb" }, + { url = "https://mirrors.aliyun.com/pypi/packages/48/64/a329e9d469f86307203594b1707e11ae51c3348d03bfd514a5f997870012/jiter-0.14.0-cp312-cp312-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:7ede4331a1899d604463369c730dbb961ffdc5312bc7f16c41c2896415b1304a" }, + { url = "https://mirrors.aliyun.com/pypi/packages/94/c1/5e3dfc59635aa4d4c7bd20a820ac1d09b8ed851568356802cf1c08edb3cf/jiter-0.14.0-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:92cd8b6025981a041f5310430310b55b25ca593972c16407af8837d3d7d2ca01" }, + { url = "https://mirrors.aliyun.com/pypi/packages/e3/1b/dd157009dbc058f7b00108f545ccb72a2d56461395c4fc7b9cfdccb00af4/jiter-0.14.0-cp312-cp312-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:351bf6eda4e3a7ceb876377840c702e9a3e4ecc4624dbfb2d6463c67ae52637d" }, + { url = "https://mirrors.aliyun.com/pypi/packages/91/78/256013667b7c10b8834f8e6e54cd3e562d4c6e34227a1596addccc05e38c/jiter-0.14.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:c1dcfbeb93d9ecd9ca128bbf8910120367777973fa193fb9a39c31237d8df165" }, + { url = "https://mirrors.aliyun.com/pypi/packages/de/d9/137d65ade9093a409fe80955ce60b12bb753722c986467aeda47faf450ad/jiter-0.14.0-cp312-cp312-manylinux_2_31_riscv64.whl", hash = "sha256:ae039aaef8de3f8157ecc1fdd4d85043ac4f57538c245a0afaecb8321ec951c3" }, + { url = "https://mirrors.aliyun.com/pypi/packages/2e/48/76750835b87029342727c1a268bea8878ab988caf81ee4e7b880900eeb5a/jiter-0.14.0-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:7d9d51eb96c82a9652933bd769fe6de66877d6eb2b2440e281f2938c51b5643e" }, + { url = "https://mirrors.aliyun.com/pypi/packages/a6/60/456c4e81d5c8045279aefe60e9e483be08793828800a4e64add8fdde7f2a/jiter-0.14.0-cp312-cp312-musllinux_1_1_aarch64.whl", hash = "sha256:d824ca4148b705970bf4e120924a212fdfca9859a73e42bd7889a63a4ea6bb98" }, + { url = "https://mirrors.aliyun.com/pypi/packages/a8/9f/2020e0984c235f678dced38fe4eec3058cf528e6af36ebf969b410305941/jiter-0.14.0-cp312-cp312-musllinux_1_1_x86_64.whl", hash = "sha256:ff3a6465b3a0f54b1a430f45c3c0ba7d61ceb45cbc3e33f9e1a7f638d690baf3" }, + { url = "https://mirrors.aliyun.com/pypi/packages/ef/32/e2d298e1a22a4bbe6062136d1c7192db7dba003a6975e51d9a9eecabc4c2/jiter-0.14.0-cp312-cp312-win32.whl", hash = "sha256:5dec7c0a3e98d2a3f8a2e67382d0d7c3ac60c69103a4b271da889b4e8bb1e129" }, + { url = "https://mirrors.aliyun.com/pypi/packages/36/ac/96369141b3d8a4a8e4590e983085efe1c436f35c0cda940dd76d942e3e40/jiter-0.14.0-cp312-cp312-win_amd64.whl", hash = "sha256:fc7e37b4b8bc7e80a63ad6cfa5fc11fab27dbfea4cc4ae644b1ab3f273dc348f" }, + { url = "https://mirrors.aliyun.com/pypi/packages/01/c3/75d847f264647017d7e3052bbcc8b1e24b95fa139c320c5f5066fa7a0bdd/jiter-0.14.0-cp312-cp312-win_arm64.whl", hash = "sha256:ee4a72f12847ef29b072aee9ad5474041ab2924106bdca9fcf5d7d965853e057" }, + { url = "https://mirrors.aliyun.com/pypi/packages/97/2a/09f70020898507a89279659a1afe3364d57fc1b2c89949081975d135f6f5/jiter-0.14.0-cp313-cp313-macosx_10_12_x86_64.whl", hash = "sha256:af72f204cf4d44258e5b4c1745130ac45ddab0e71a06333b01de660ab4187a94" }, + { url = "https://mirrors.aliyun.com/pypi/packages/d6/be/080c96a45cd74f9fce5db4fd68510b88087fb37ffe2541ff73c12db92535/jiter-0.14.0-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:4b77da71f6e819be5fbcec11a453fde5b1d0267ef6ed487e2a392fd8e14e4e3a" }, + { url = "https://mirrors.aliyun.com/pypi/packages/7d/5e/2d0fee155826a968a832cc32438de5e2a193292c8721ca70d0b53e58245b/jiter-0.14.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:77f4ea612fe8b84b8b04e51d0e78029ecf3466348e25973f953de6e6a59aa4c1" }, + { url = "https://mirrors.aliyun.com/pypi/packages/70/af/bf9ee0d3a4f8dc0d679fc1337f874fe60cdbf841ebbb304b374e1c9aaceb/jiter-0.14.0-cp313-cp313-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:62fe2451f8fcc0240261e6a4df18ecbcd58327857e61e625b2393ea3b468aac9" }, + { url = "https://mirrors.aliyun.com/pypi/packages/0f/83/8e8561eadba31f4d3948a5b712fb0447ec71c3560b57a855449e7b8ddc98/jiter-0.14.0-cp313-cp313-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:6112f26f5afc75bcb475787d29da3aa92f9d09c7858f632f4be6ffe607be82e9" }, + { url = "https://mirrors.aliyun.com/pypi/packages/f6/c9/c5299e826a5fe6108d172b344033f61c69b1bb979dd8d9ddd4278a160971/jiter-0.14.0-cp313-cp313-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:215a6cb8fb7dc702aa35d475cc00ddc7f970e5c0b1417fb4b4ac5d82fa2a29db" }, + { url = "https://mirrors.aliyun.com/pypi/packages/5d/37/c16d9d15c0a471b8644b1abe3c82668092a707d9bedcf076f24ff2e380cd/jiter-0.14.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:fc4ab96a30fb3cb2c7e0cd33f7616c8860da5f5674438988a54ac717caccdbaa" }, + { url = "https://mirrors.aliyun.com/pypi/packages/58/ea/8050cb0dc654e728e1bfacbc0c640772f2181af5dedd13ae70145743a439/jiter-0.14.0-cp313-cp313-manylinux_2_31_riscv64.whl", hash = "sha256:3a99c1387b1f2928f799a9de899193484d66206a50e98233b6b088a7f0c1edb2" }, + { url = "https://mirrors.aliyun.com/pypi/packages/b0/3b/cf71506d270e5f84d97326bf220e47aed9b95e9a4a060758fb07772170ab/jiter-0.14.0-cp313-cp313-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:ab18d11074485438695f8d34a1b6da61db9754248f96d51341956607a8f39985" }, + { url = "https://mirrors.aliyun.com/pypi/packages/b0/cc/8c6c74a3efb5bd671bfd14f51e8a73375464ca914b1551bc3b40e26ac2c9/jiter-0.14.0-cp313-cp313-musllinux_1_1_aarch64.whl", hash = "sha256:801028dcfc26ac0895e4964cbc0fd62c73be9fd4a7d7b1aaf6e5790033a719b7" }, + { url = "https://mirrors.aliyun.com/pypi/packages/41/24/68d7b883ec959884ddf00d019b2e0e82ba81b167e1253684fa90519ce33c/jiter-0.14.0-cp313-cp313-musllinux_1_1_x86_64.whl", hash = "sha256:ad425b087aafb4a1c7e1e98a279200743b9aaf30c3e0ba723aec93f061bd9bc8" }, + { url = "https://mirrors.aliyun.com/pypi/packages/b6/89/b1a0985223bbf3150ff9e8f46f98fc9360c1de94f48abe271bbe1b465682/jiter-0.14.0-cp313-cp313-win32.whl", hash = "sha256:882bcb9b334318e233950b8be366fe5f92c86b66a7e449e76975dfd6d776a01f" }, + { url = "https://mirrors.aliyun.com/pypi/packages/4c/19/3f339a5a7f14a11730e67f6be34f9d5105751d547b615ef593fa122a5ded/jiter-0.14.0-cp313-cp313-win_amd64.whl", hash = "sha256:9b8c571a5dba09b98bd3462b5a53f27209a5cbbe85670391692ede71974e979f" }, + { url = "https://mirrors.aliyun.com/pypi/packages/50/56/752dd89c84be0e022a8ea3720bcfa0a8431db79a962578544812ce061739/jiter-0.14.0-cp313-cp313-win_arm64.whl", hash = "sha256:34f19dcc35cb1abe7c369b3756babf8c7f04595c0807a848df8f26ef8298ef92" }, + { url = "https://mirrors.aliyun.com/pypi/packages/91/28/292916f354f25a1fe8cf2c918d1415c699a4a659ae00be0430e1c5d9ffea/jiter-0.14.0-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:e89bcd7d426a75bb4952c696b267075790d854a07aad4c9894551a82c5b574ab" }, + { url = "https://mirrors.aliyun.com/pypi/packages/ad/c7/b002a7d8b8957ac3d469bd59c18ef4b1595a5216ae0de639a287b9816023/jiter-0.14.0-cp313-cp313t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:7b25beaa0d4447ea8c7ae0c18c688905d34840d7d0b937f2f7bdd52162c98a40" }, + { url = "https://mirrors.aliyun.com/pypi/packages/f9/3b/f8d07580d8706021d255a6356b8fab13ee4c869412995550ce6ed4ddf97d/jiter-0.14.0-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:651a8758dd413c51e3b7f6557cdc6921faf70b14106f45f969f091f5cda990ea" }, + { url = "https://mirrors.aliyun.com/pypi/packages/47/5b/ac1a974da29e35507230383110ffec59998b290a8732585d04e19a9eb5ba/jiter-0.14.0-cp313-cp313t-win_amd64.whl", hash = "sha256:e1a7eead856a5038a8d291f1447176ab0b525c77a279a058121b5fccee257f6f" }, + { url = "https://mirrors.aliyun.com/pypi/packages/96/6d/9fc8433d667d2454271378a79747d8c76c10b51b482b454e6190e511f244/jiter-0.14.0-cp313-cp313t-win_arm64.whl", hash = "sha256:2e692633a12cda97e352fdcd1c4acc971b1c28707e1e33aeef782b0cbf051975" }, + { url = "https://mirrors.aliyun.com/pypi/packages/4f/1e/354ed92461b165bd581f9ef5150971a572c873ec3b68a916d5aa91da3cc2/jiter-0.14.0-cp314-cp314-macosx_10_12_x86_64.whl", hash = "sha256:6f396837fc7577871ca8c12edaf239ed9ccef3bbe39904ae9b8b63ce0a48b140" }, + { url = "https://mirrors.aliyun.com/pypi/packages/a6/95/8c7c7028aa8636ac21b7a55faef3e34215e6ed0cbf5ae58258427f621aa3/jiter-0.14.0-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:a4d50ea3d8ba4176f79754333bd35f1bbcd28e91adc13eb9b7ca91bc52a6cef9" }, + { url = "https://mirrors.aliyun.com/pypi/packages/47/40/e2a852a44c4a089f2681a16611b7ce113224a80fd8504c46d78491b47220/jiter-0.14.0-cp314-cp314-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:ce17f8a050447d1b4153bda4fb7d26e6a9e74eb4f4a41913f30934c5075bf615" }, + { url = "https://mirrors.aliyun.com/pypi/packages/fc/1f/670f92adee1e9895eac41e8a4d623b6da68c4d46249d8b556b60b63f949e/jiter-0.14.0-cp314-cp314-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:f4f1c4b125e1652aefbc2e2c1617b60a160ab789d180e3d423c41439e5f32850" }, + { url = "https://mirrors.aliyun.com/pypi/packages/01/2f/541c9ba567d05de1c4874a0f8f8c5e3fd78e2b874266623da9a775cf46e0/jiter-0.14.0-cp314-cp314-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:be808176a6a3a14321d18c603f2d40741858a7c4fc982f83232842689fe86dd9" }, + { url = "https://mirrors.aliyun.com/pypi/packages/ce/a9/c31cbec09627e0d5de7aeaec7690dba03e090caa808fefd8133137cf45bc/jiter-0.14.0-cp314-cp314-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:26679d58ba816f88c3849306dd58cb863a90a1cf352cdd4ef67e30ccf8a77994" }, + { url = "https://mirrors.aliyun.com/pypi/packages/50/02/3c05c1666c41904a2f607475a73e7a4763d1cbde2d18229c4f85b22dc253/jiter-0.14.0-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:80381f5a19af8fa9aef743f080e34f6b25ebd89656475f8cf0470ec6157052aa" }, + { url = "https://mirrors.aliyun.com/pypi/packages/7d/97/e15b33545c2b13518f560d695f974b9891b311641bdcf178d63177e8801e/jiter-0.14.0-cp314-cp314-manylinux_2_31_riscv64.whl", hash = "sha256:004df5fdb8ecbd6d99f3227df18ba1a259254c4359736a2e6f036c944e02d7c5" }, + { url = "https://mirrors.aliyun.com/pypi/packages/ad/d2/8b1461def6b96ba44530df20d07ef7a1c7da22f3f9bf1727e2d611077bf1/jiter-0.14.0-cp314-cp314-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:cff5708f7ed0fa098f2b53446c6fa74c48469118e5cd7497b4f1cd569ab06928" }, + { url = "https://mirrors.aliyun.com/pypi/packages/e3/88/837566dd6ed6e452e8d3205355afd484ce44b2533edfa4ed73a298ea893e/jiter-0.14.0-cp314-cp314-musllinux_1_1_aarch64.whl", hash = "sha256:2492e5f06c36a976d25c7cc347a60e26d5470178d44cde1b9b75e60b4e519f28" }, + { url = "https://mirrors.aliyun.com/pypi/packages/89/6b/b00b45c4d1b4c031777fe161d620b755b5b02cdade1e316dcb46e4471d63/jiter-0.14.0-cp314-cp314-musllinux_1_1_x86_64.whl", hash = "sha256:7609cfbe3a03d37bfdbf5052012d5a879e72b83168a363deae7b3a26564d57de" }, + { url = "https://mirrors.aliyun.com/pypi/packages/ad/d8/6fe5b42011d19397433d345716eac16728ac241862a2aac9c91923c7509a/jiter-0.14.0-cp314-cp314-win32.whl", hash = "sha256:7282342d32e357543565286b6450378c3cd402eea333fc1ebe146f1fabb306fc" }, + { url = "https://mirrors.aliyun.com/pypi/packages/e5/43/5c2e08da1efad5e410f0eaaabeadd954812612c33fbbd8fd5328b489139d/jiter-0.14.0-cp314-cp314-win_amd64.whl", hash = "sha256:bd77945f38866a448e73b0b7637366afa814d4617790ecd88a18ca74377e6c02" }, + { url = "https://mirrors.aliyun.com/pypi/packages/aa/1f/6e39ac0b4cdfa23e606af5b245df5f9adaa76f35e0c5096790da430ca506/jiter-0.14.0-cp314-cp314-win_arm64.whl", hash = "sha256:f2d4c61da0821ee42e0cdf5489da60a6d074306313a377c2b35af464955a3611" }, + { url = "https://mirrors.aliyun.com/pypi/packages/05/57/7dbc0ffbbb5176a27e3518716608aa464aee2e2887dc938f0b900a120449/jiter-0.14.0-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:1bf7ff85517dd2f20a5750081d2b75083c1b269cf75afc7511bdf1f9548beb3b" }, + { url = "https://mirrors.aliyun.com/pypi/packages/83/6e/7b3314398d8983f06b557aa21b670511ec72d3b79a68ee5e4d9bff972286/jiter-0.14.0-cp314-cp314t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:c8ef8791c3e78d6c6b157c6d360fbb5c715bebb8113bc6a9303c5caff012754a" }, + { url = "https://mirrors.aliyun.com/pypi/packages/ae/4f/8dc674bcd7db6dba566de73c08c763c337058baff1dbeb34567045b27cdc/jiter-0.14.0-cp314-cp314t-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:e74663b8b10da1fe0f4e4703fd7980d24ad17174b6bb35d8498d6e3ebce2ae6a" }, + { url = "https://mirrors.aliyun.com/pypi/packages/3b/5f/188e09a1f20906f98bbdec44ed820e19f4e8eb8aff88b9d1a5a497587ff3/jiter-0.14.0-cp314-cp314t-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:1aca29ba52913f78362ec9c2da62f22cdc4c3083313403f90c15460979b84d9b" }, + { url = "https://mirrors.aliyun.com/pypi/packages/ac/f0/19046ef965ed8f349e8554775bb12ff4352f443fbe12b95d31f575891256/jiter-0.14.0-cp314-cp314t-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:8b39b7d87a952b79949af5fef44d2544e58c21a28da7f1bae3ef166455c61746" }, + { url = "https://mirrors.aliyun.com/pypi/packages/c4/c3/da43bd8431ee175695777ee78cf0e93eacbb47393ff493f18c45231b427d/jiter-0.14.0-cp314-cp314t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:78d918a68b26e9fab068c2b5453577ef04943ab2807b9a6275df2a812599a310" }, + { url = "https://mirrors.aliyun.com/pypi/packages/72/26/e054771be889707c6161dbdec9c23d33a9ec70945395d70f07cfea1e9a6f/jiter-0.14.0-cp314-cp314t-manylinux_2_31_riscv64.whl", hash = "sha256:b08997c35aee1201c1a5361466a8fb9162d03ae7bf6568df70b6c859f1e654a4" }, + { url = "https://mirrors.aliyun.com/pypi/packages/c3/0f/7bea65ea2a6d91f2bf989ff11a18136644392bf2b0497a1fa50934c30a9c/jiter-0.14.0-cp314-cp314t-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:260bf7ca20704d58d41f669e5e9fe7fe2fa72901a6b324e79056f5d52e9c9be2" }, + { url = "https://mirrors.aliyun.com/pypi/packages/3c/a1/b1ff7d70deef61ac0b7c6c2f12d2ace950cdeecb4fdc94500a0926802857/jiter-0.14.0-cp314-cp314t-musllinux_1_1_aarch64.whl", hash = "sha256:37826e3df29e60f30a382f9294348d0238ef127f4b5d7f5f8da78b5b9e050560" }, + { url = "https://mirrors.aliyun.com/pypi/packages/0b/7b/3b0649983cbaf15eda26a414b5b1982e910c67bd6f7b1b490f3cfc76896a/jiter-0.14.0-cp314-cp314t-musllinux_1_1_x86_64.whl", hash = "sha256:645be49c46f2900937ba0eaf871ad5183c96858c0af74b6becc7f4e367e36e06" }, + { url = "https://mirrors.aliyun.com/pypi/packages/97/f8/33d78c83bd93ae0c0af05293a6660f88a1977caef39a6d72a84afab94ce0/jiter-0.14.0-cp314-cp314t-win32.whl", hash = "sha256:2f7877ed45118de283786178eceaf877110abacd04fde31efff3940ae9672674" }, + { url = "https://mirrors.aliyun.com/pypi/packages/d6/ac/2b760516c03e2227826d1f7025d89bf6bf6357a28fe75c2a2800873c50bf/jiter-0.14.0-cp314-cp314t-win_amd64.whl", hash = "sha256:14c0cb10337c49f5eafe8e7364daca5e29a020ea03580b8f8e6c597fed4e1588" }, + { url = "https://mirrors.aliyun.com/pypi/packages/dc/2e/a44c20c58aeed0355f2d326969a181696aeb551a25195f47563908a815be/jiter-0.14.0-cp314-cp314t-win_arm64.whl", hash = "sha256:5419d4aa2024961da9fe12a9cfe7484996735dca99e8e090b5c88595ef1951ff" }, + { url = "https://mirrors.aliyun.com/pypi/packages/32/a1/ef34ca2cab2962598591636a1804b93645821201cc0095d4a93a9a329c9d/jiter-0.14.0-graalpy311-graalpy242_311_native-macosx_10_12_x86_64.whl", hash = "sha256:a25ffa2dbbdf8721855612f6dca15c108224b12d0c4024d0ac3d7902132b4211" }, + { url = "https://mirrors.aliyun.com/pypi/packages/60/bb/520576a532a6b8a6f42747afed289c8448c879a34d7802fe2c832d4fd38f/jiter-0.14.0-graalpy311-graalpy242_311_native-macosx_11_0_arm64.whl", hash = "sha256:0ac9cbaa86c10996b92bd12c91659b60f939f8e28fcfa6bc11a0e90a774ce95b" }, + { url = "https://mirrors.aliyun.com/pypi/packages/b2/7c/c16db114ea1f2f532f198aa8dc39585026af45af362c69a0492f31bc4821/jiter-0.14.0-graalpy311-graalpy242_311_native-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:844e73b6c56b505e9e169234ea3bdea2ea43f769f847f47ac559ba1d2361ebea" }, + { url = "https://mirrors.aliyun.com/pypi/packages/99/8f/15e7741ff19e9bcd4d753f7ff22f988fd54592f134ca13701c13ea8c20e0/jiter-0.14.0-graalpy311-graalpy242_311_native-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:e52c076f187405fc21523c746c04399c9af8ece566077ed147b2126f2bcba577" }, + { url = "https://mirrors.aliyun.com/pypi/packages/21/42/9042c3f3019de4adcb8c16591c325ec7255beea9fcd33a42a43f3b0b1000/jiter-0.14.0-graalpy312-graalpy250_312_native-macosx_10_12_x86_64.whl", hash = "sha256:fbd9e482663ca9d005d051330e4d2d8150bb208a209409c10f7e7dfdf7c49da9" }, + { url = "https://mirrors.aliyun.com/pypi/packages/60/cf/a7e19b308bd86bb04776803b1f01a5f9a287a4c55205f4708827ee487fbf/jiter-0.14.0-graalpy312-graalpy250_312_native-macosx_11_0_arm64.whl", hash = "sha256:33a20d838b91ef376b3a56896d5b04e725c7df5bc4864cc6569cf046a8d73b6d" }, + { url = "https://mirrors.aliyun.com/pypi/packages/ca/44/e26ede3f0caeff93f222559cb0cc4ca68579f07d009d7b6010c5b586f9b1/jiter-0.14.0-graalpy312-graalpy250_312_native-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:432c4db5255d86a259efde91e55cb4c8d18c0521d844c9e2e7efcce3899fb016" }, + { url = "https://mirrors.aliyun.com/pypi/packages/da/e9/1f9ada30cef7b05e74bb06f52127e7a724976c225f46adb65c37b1dadfb6/jiter-0.14.0-graalpy312-graalpy250_312_native-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:67f00d94b281174144d6532a04b66a12cb866cbdc47c3af3bfe2973677f9861a" }, +] + [[package]] name = "jmespath" version = "0.10.0" @@ -2653,6 +2765,25 @@ wheels = [ { url = "https://mirrors.aliyun.com/pypi/packages/be/9c/92789c596b8df838baa98fa71844d84283302f7604ed565dafe5a6b5041a/oauthlib-3.3.1-py3-none-any.whl", hash = "sha256:88119c938d2b8fb88561af5f6ee0eec8cc8d552b7bb1f712743136eb7523b7a1" }, ] +[[package]] +name = "openai" +version = "2.36.0" +source = { registry = "https://mirrors.aliyun.com/pypi/simple/" } +dependencies = [ + { name = "anyio" }, + { name = "distro" }, + { name = "httpx" }, + { name = "jiter" }, + { name = "pydantic" }, + { name = "sniffio" }, + { name = "tqdm" }, + { name = "typing-extensions" }, +] +sdist = { url = "https://mirrors.aliyun.com/pypi/packages/f4/a1/4d5e84cf51720fc1526cc49e10ac1961abcccb55b0efb3d970db1e9a2728/openai-2.36.0.tar.gz", hash = "sha256:139dea0edd2f1b30c33d46ae1a6929e03906254140318e4608e98fe8c566f2e7" } +wheels = [ + { url = "https://mirrors.aliyun.com/pypi/packages/9d/1c/5d43735b2553baae2a5e899dcbcd0670a86930d993184d72ca909bf11c9b/openai-2.36.0-py3-none-any.whl", hash = "sha256:143f6194b548dbc2c921af1f1b03b9f14c85fed8a75b5b516f5bcc11a2a50c63" }, +] + [[package]] name = "opencensus" version = "0.11.4" @@ -4118,6 +4249,8 @@ builder = [ model-service = [ { name = "alibabacloud-cr20181201" }, { name = "fastapi" }, + { name = "httpx" }, + { name = "openai" }, { name = "psutil" }, { name = "swebench" }, { name = "uvicorn" }, @@ -4180,10 +4313,12 @@ requires-dist = [ { name = "gem-llm", marker = "extra == 'rocklet'", specifier = ">=0.1.0" }, { name = "gem-llm", marker = "extra == 'sandbox-actor'", specifier = ">=0.1.0" }, { name = "httpx" }, + { name = "httpx", marker = "extra == 'model-service'" }, { name = "kubernetes", marker = "extra == 'admin'", specifier = ">=35.0.0" }, { name = "nacos-sdk-python", marker = "extra == 'admin'", specifier = ">=0.1.14" }, { name = "nacos-sdk-python", marker = "extra == 'sandbox-actor'", specifier = ">=0.1.14" }, { name = "numpy", marker = "extra == 'rocklet'", specifier = "<=2.2.6" }, + { name = "openai", marker = "extra == 'model-service'", specifier = ">=1.50.0" }, { name = "opentelemetry-api" }, { name = "opentelemetry-exporter-otlp" }, { name = "opentelemetry-exporter-prometheus" },