From 55ec3142467344b07b074ea635dcc142019e201f Mon Sep 17 00:00:00 2001 From: "pengshixin.psx" Date: Tue, 12 May 2026 02:52:26 +0000 Subject: [PATCH 01/25] refactor(model-service): rebuild proxy on litellm SDK with traj record/replay MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 替换 model-service `proxy` 模式手写的 httpx forward + retry_async,改为基于 litellm SDK 调用,同时新增 chat/completions 轨迹的录制与顺序回放能力,服务 SWE-agent / mini-swe-agent / OpenHands 等 deterministic agent 的无 LLM 成本调试。 主要改动: - proxy.py 改用 litellm.acompletion(num_retries / extra_headers / streaming) - 新增 TrajectoryRecorder(CustomLogger) 录制 StandardLoggingPayload 到 JSONL - 新增 TrajectoryReplayer(CustomLLM) + SequentialCursor 顺序回放单个 jsonl 文件 - ModelServiceConfig 新增 num_retries / traj_enabled / traj_file / replay_traj_path - CLI 新增 --num-retries / --traj-file(同时承担 replay 入口) - local 模式保留旧 record_traj 装饰器,不受影响 - 删除 examples 旧 YAML,改 README 主推纯 CLI 启动方式 - docs/dev/litellm_proxy_refactor.md 写明设计与 breaking change Co-Authored-By: Claude Sonnet 4.6 --- docs/dev/litellm_proxy_refactor.md | 535 ++++++++++++++++++ examples/model_service/README.md | 90 +++ pyproject.toml | 1 + rock/sdk/model/server/api/proxy.py | 203 ++++--- rock/sdk/model/server/config.py | 13 + .../sdk/model/server/integrations/__init__.py | 0 .../server/integrations/traj_recorder.py | 77 +++ .../server/integrations/traj_replayer.py | 139 +++++ rock/sdk/model/server/main.py | 55 +- rock/sdk/model/server/utils.py | 15 +- tests/unit/sdk/model/test_proxy.py | 455 +++++++-------- tests/unit/sdk/model/test_traj_recorder.py | 170 ++++++ tests/unit/sdk/model/test_traj_replayer.py | 204 +++++++ uv.lock | 404 +++++++++++++ 14 files changed, 2025 insertions(+), 336 deletions(-) create mode 100644 docs/dev/litellm_proxy_refactor.md create mode 100644 examples/model_service/README.md create mode 100644 rock/sdk/model/server/integrations/__init__.py create mode 100644 rock/sdk/model/server/integrations/traj_recorder.py create mode 100644 rock/sdk/model/server/integrations/traj_replayer.py create mode 100644 tests/unit/sdk/model/test_traj_recorder.py create mode 100644 tests/unit/sdk/model/test_traj_replayer.py diff --git a/docs/dev/litellm_proxy_refactor.md b/docs/dev/litellm_proxy_refactor.md new file mode 100644 index 0000000000..be30dbb4b2 --- /dev/null +++ b/docs/dev/litellm_proxy_refactor.md @@ -0,0 +1,535 @@ +# LiteLLM 重构 model-service proxy + 加 record/replay —— Handoff 文档 + +> 这份文档是给"接手者"(可能是另一个 Claude session 或人)看的,目的是让接手者**完全不看上一段对话**也能从我离开的地方继续往下做。文档放在 `docs/dev/litellm_proxy_refactor.md`。 + +--- + +## 0. TL;DR + +**目标**:把 `rock model-service --type proxy` 的自写 httpx forward + retry 替换为基于 `litellm` SDK 的实现;同时把 chat/completions 轨迹的"录制 + 顺序回放"作为一等公民能力做进来,服务 SWE-agent / mini-swe-agent / OpenHands 类 deterministic agent 的"无 LLM 成本"调试。 + +**当前状态**:**代码改动、单元测试、lint 全部完成通过**。下一步是集成验证(实际起 proxy + curl)和写 PR。 + +**完成清单**: +- ✅ `pyproject.toml` `model-service` extras 加 `litellm>=1.50.0` +- ✅ `ModelServiceConfig` 加 `traj_enabled / traj_file / traj_append / replay_enabled / replay_traj_path / num_retries` 6 个字段 +- ✅ 新模块 `rock/sdk/model/server/integrations/{__init__.py, traj_recorder.py, traj_replayer.py}` +- ✅ `rock/sdk/model/server/api/proxy.py` 整文件重写为 litellm SDK 调用 +- ✅ `rock/sdk/model/server/main.py` 加 `_configure_litellm_for_proxy()` + 新 CLI flags(`--num-retries / --traj-file / --no-traj / --replay-traj`) +- ✅ `rock/sdk/model/server/utils.py` 保留 `record_traj` 装饰器(给 local 模式继续用),proxy 模式不再用 +- ✅ `tests/unit/sdk/model/test_proxy.py` 改造完成(把 `patch perform_llm_request` 改为 `patch litellm.acompletion`) +- ✅ 新测试 `tests/unit/sdk/model/test_traj_recorder.py` + `test_traj_replayer.py` +- ✅ `examples/model_service/config_record.yaml` + `config_replay.yaml` +- ✅ **单元测试全部通过**(`uv run pytest tests/unit/sdk/model/` → 47 passed) +- ✅ **Lint/format 全部干净**(`ruff check` + `ruff format --check`,修了一个 `Optional[str]` → `str | None` 的 UP045) + +**未完成 / 阻塞**: +- ⏳ **集成验证**(实际起 proxy + curl + agent 端到端,见第 4.4 节) +- ⏳ **PR 描述里的 breaking change 提示**(见第 5 节) + +**原始 plan 文件**(更详细的设计推演):`/home/xinshi/.claude/plans/litellm-chat-completions-traj-replay-ser-lucky-rainbow.md`(在主 Claude 配置目录,不在 rock 仓内)。 + +--- + +## 1. 背景与目标 + +### 起因 + +用户问:"litellm 能支持把 chat/completions 接口的轨迹落盘吗,然后我想看看能否支持根据 traj 文件做一个 replay server, 比如给一些其他的 agent (swe-agent, openhands) 等用来做 traj 回放"。 + +### 需求方向的几次迭代(避免接手者重走弯路) + +1. **第一版方向**:做一个独立 Python 项目 `litellm-traj`,里面定义 `CustomLogger` 子类(record)和 `CustomLLM` 子类(replay),通过 dotted-path 注册到 litellm proxy 的 `config.yaml`。**已废弃**。 +2. **第二版方向**:在 rock 仓内把这个能力做进 `rock/sdk/model/server/api/proxy.py`(rock 已有 model-service)。但用户进一步要求:**重构掉 rock 自写的 proxy 实现,改为基于 litellm**。 +3. **最终方向(本次)**:用 **litellm SDK** 替换 `proxy.py` 内手写的 httpx forward + `retry_async`;record 接 `CustomLogger`,replay 接 `CustomLLM` provider。`rock model-service` CLI、`local` 模式、FastAPI app/health/metrics 全部保留不动 —— 只动 proxy 模式。 + +### 为什么是 litellm SDK 而不是 litellm proxy + +我们已经有 rock 自己的 FastAPI app + CLI + auth/metrics middleware,只需要一个"OpenAI 兼容上游调用 + 错误归一化 + 流式聚合 + record/replay 接入点"。**litellm SDK 是这层能力的最小外加**,不需要把 litellm proxy 整套生命周期/配置体系拽进来。litellm proxy 适合"完全没有 server 的人"用,我们已经有 server。 + +### 用户最终拍板的 4 个关键设计选择 + +| 维度 | 选择 | 理由 | +|---|---|---| +| 集成模式 | **litellm SDK** | 改动面最小,保留 rock 既有 FastAPI/CLI/metrics | +| traj schema | **`StandardLoggingPayload`(litellm 原生)** | 字段最全(messages/response/usage/timing/error_information/trace_id),与 litellm 生态互通 | +| 是否本期做 replay | **是,record + replay 一起** | 用户原始诉求就是回放;基础设施一次性铺好 | +| 流式 | **顺便解禁** | litellm 自动聚合,record/replay 走流式不增加复杂度 | + +--- + +## 2. 改动清单(按文件) + +### 2.1 `pyproject.toml` —— 修改 + +`[project.optional-dependencies]` 的 `model-service` 数组追加一项 `"litellm>=1.50.0"`。其它 extras 不动。 + +```toml +model-service = [ + "fastapi", + "uvicorn", + "psutil", + "swebench", + "alibabacloud_cr20181201==2.0.5", + "litellm>=1.50.0", # ← 这一行新加 +] +``` + +为什么是 `>=1.50.0`:这个版本之后 `StandardLoggingPayload`、`CustomLogger.async_log_success_event` 接口、`async_mock_completion_streaming_obj` 都已稳定。本仓现有 model-service 测试集没装过 litellm,所以全新引入,不存在升级冲突。 + +### 2.2 `rock/sdk/model/server/config.py` —— 修改 + +在 `ModelServiceConfig` 末尾新增 6 个字段(注意顺序、类型、默认值): + +```python +num_retries: int = Field(default=6) + +traj_enabled: bool = Field(default=True) +traj_file: str | None = Field(default=None) +traj_append: bool = Field(default=True) # 注意:旧默认是 False(覆盖),这里翻成 True + +replay_enabled: bool = Field(default=False) +replay_traj_path: str | None = Field(default=None) +``` + +每个字段的语义和取值范围都写在 docstring 里。`traj_append=True` 是这次的**默认行为变更**(旧的 `_write_traj` 默认覆盖,被认为是 bug)。`TRAJ_FILE`、`LOG_FILE`、`LOG_DIR` 模块级常量保留不动。 + +### 2.3 `rock/sdk/model/server/integrations/__init__.py` —— 新增(空文件) + +只为了让 `integrations` 成为一个包,内容为空。 + +### 2.4 `rock/sdk/model/server/integrations/traj_recorder.py` —— 新增 + +`TrajectoryRecorder(CustomLogger)`,实现两个钩子:`async_log_success_event` 和 `async_log_failure_event`。每次调用从 `kwargs["standard_logging_object"]` 取出 `StandardLoggingPayload`(dict 形态),append 一行 JSON 到 `traj_file`,同时上报 OTLP `model_service.request.{rt,count}` metrics。 + +关键设计点(展开见第 3.1 节): +- streaming 不分支(litellm 已在 callback 触发前把 chunks 聚合写入 `payload.response`) +- `asyncio.Lock` per recorder + `asyncio.to_thread` 包同步写,避免在 event loop 阻塞 +- `append=False` 模式只在**首次写**时截断(避免每次调用覆盖) +- metrics 复用 `rock.sdk.model.server.utils._get_or_create_metrics_monitor`,`MODEL_SERVICE_REQUEST_RT/COUNT` 常量 + +### 2.5 `rock/sdk/model/server/integrations/traj_replayer.py` —— 新增 + +包含两个类 + 两个 helper: + +- `SequentialCursor`:从 jsonl 文件或目录加载 records,`async next()` 返回下一条并推进游标,越界 raise `CustomLLMError(404)`。带 `asyncio.Lock` 防并发推进。`reset()` 用于回到起点。 +- `_record_to_model_response(record)` / `_extract_assistant_text(record)`:把 record 还原成 `litellm.types.utils.ModelResponse` 或抽出 assistant text(给 streaming 用)。 +- `TrajectoryReplayer(CustomLLM)`:实现 `acompletion` 和 `astreaming`。流式拆分直接调 `litellm.utils.async_mock_completion_streaming_obj`,不自己造轮子。 + +`acompletion`/`astreaming` 的签名是 `(self, model, messages, *args, **kwargs)`。litellm 调 CustomLLM 时**全部用关键字参数**(litellm/main.py:4302-4319 实测),所以 `kwargs.get("model_response")` 能可靠拿到流式拆分需要的目标对象。 + +### 2.6 `rock/sdk/model/server/utils.py` —— 修改(保留 + 注释更新) + +**关键决定**:不删 `record_traj` / `_write_traj`。原因:`local.py` 仍在用 `@record_traj`,plan 阶段说过"local 模式不动";所以 record_traj 保留,docstring 加一段说明"proxy 不再用,只给 local 用",新引导走 `TrajectoryRecorder`。 + +`_get_or_create_metrics_monitor` / `MODEL_SERVICE_REQUEST_RT` / `MODEL_SERVICE_REQUEST_COUNT` 不动 —— `traj_recorder.py` 复用之。 + +### 2.7 `rock/sdk/model/server/api/proxy.py` —— 整文件重写 + +旧实现: +- `httpx.AsyncClient` 全局 + `@retry_async` 6 次指数退避 +- `perform_llm_request(url, body, headers, config)` 自管 retry +- `@record_traj` 挂在 handler 上同步落盘 + metrics +- 强制 `stream=False`(MVP 限制) + +新实现: +- `litellm.acompletion(model, api_base, extra_headers, timeout, num_retries, **body)` +- 错误归一化:catch `RateLimitError / APIError / BadRequestError / AuthenticationError / Timeout` → `_format_error_response()` 回退到 `{error:{message,type,code}}` schema(agent 端关键字检测兼容) +- 流式开放:`stream=True` 走 `StreamingResponse(_sse_iter(...))` +- 不再有装饰器 —— record 落盘改由 `main.py` 在启动时挂的 `litellm.callbacks` 完成 + +`get_base_url()` 路由优先级**完全保留**(`proxy_base_url` > `proxy_rules[model]` > `proxy_rules["default"]`)。`_filter_headers()` 把 hop-by-hop headers(host/content-length/content-type/transfer-encoding/connection)滤掉,Authorization 等保留。 + +replay 模式下:`litellm_model = f"traj-replay/{model_name}"`,`api_base=None`。litellm 看到 `traj-replay/` 前缀会查 `litellm.custom_provider_map`,找到 `TrajectoryReplayer` 实例并调它的 `acompletion`/`astreaming`。 + +### 2.8 `rock/sdk/model/server/main.py` —— 修改 + +新增私有函数 `_configure_litellm_for_proxy(config)`,在 `main()` 进入 proxy 分支时(`include_router(proxy_router)` 之前)调用一次。两个分支: + +```python +if config.replay_enabled: + # 注册 TrajectoryReplayer 到 litellm.custom_provider_map + ... +elif config.traj_enabled: + # 把 TrajectoryRecorder 加到 litellm.callbacks + ... +``` + +**注意**:replay 和 record 互斥(replay 不要再录,否则录回放结果会污染 source-of-truth)。 + +`create_config_from_args()` 新增 4 个 CLI override:`--num-retries / --traj-file / --no-traj / --replay-traj`。所有用 `getattr(args, "", default)` 的方式取,这样老的调用方(传不带这些字段的 Namespace)不会炸。 + +`from rock.sdk.model.server.config import TRAJ_FILE, ModelServiceConfig` —— 新增 `TRAJ_FILE` 导入,因为 `_configure_litellm_for_proxy` 在 `traj_file` 未指定时回退到 `TRAJ_FILE`。 + +### 2.9 `tests/unit/sdk/model/test_proxy.py` —— 重写 + +- 删除:`test_perform_llm_request_*`(4 个,perform_llm_request 已不存在) +- 改造:`test_chat_completions_routing_*`、`test_proxy_base_url_overrides_proxy_rules` —— `patch_path` 从 `proxy.perform_llm_request` 改为 `proxy.litellm.acompletion` +- 改造:断言从"perform_llm_request 第一个位置参数 == URL"改为"litellm.acompletion kwargs 中 `api_base == 期望值`,`model == 'openai/'`" +- 新增:`test_chat_completions_passes_num_retries_and_timeout` / `test_chat_completions_litellm_error_returns_proxy_schema` / `test_chat_completions_replay_mode_uses_traj_replay_provider` / `test_chat_completions_strips_hop_by_hop_headers` / `test_config_default_traj_and_replay` / `test_config_loads_traj_and_replay_from_file` / `test_cli_replay_traj_enables_replay` +- 保留:所有 lifespan / config-load / metrics-monitor / record_traj 测试(record_traj 在 utils.py 还在,给 local 用) + +mock 返回的 ModelResponse:用 `SimpleNamespace(model_dump=lambda: payload)` 假装一个 pydantic 对象 —— 因为 handler 只调 `.model_dump()`,不需要真 import 整个 ModelResponse。 + +### 2.10 `tests/unit/sdk/model/test_traj_recorder.py` —— 新增 + +7 个测试:JSONL append / `append=False` 首次截断 / metrics + sandbox_id / failure 落盘 / 缺 standard_logging_object 跳过 / 自动建父目录 / `response_time` 缺失时回退到 `endTime - startTime`。 + +mock 思路:`patch("rock.sdk.model.server.integrations.traj_recorder._get_or_create_metrics_monitor", return_value=mock_monitor)` —— recorder 内部 import 了这个函数,mock 它的引用。 + +### 2.11 `tests/unit/sdk/model/test_traj_replayer.py` —— 新增 + +11 个测试:cursor 加载单文件/目录(按文件名 sort)/空行/缺失文件 raise / `next()` 顺序返回 / 越界 raise / `reset()` 回到起点 / model mismatch 只 warn / Replayer.acompletion 命中 record / cursor 推进 / streaming chunk 拼回 == 原文 / 越界 raise CustomLLMError。 + +streaming 测试构造一个 `SimpleNamespace(choices=[SimpleNamespace(delta=SimpleNamespace(role=None, content=None), index=0)])` 当 model_response,因为 `async_mock_completion_streaming_obj` 内部会写 `model_response.choices[0].delta.content = ...`。 + +### 2.12 `examples/model_service/config_record.yaml` 和 `config_replay.yaml` —— 新增 + +两份开箱即用的 yaml,带详细注释。`config_record.yaml` 默认开 `traj_enabled: true / traj_append: true`,关 replay。`config_replay.yaml` 默认关 traj_enabled / 开 replay,`replay_traj_path: "/data/logs/LLMTraj.jsonl"` 占位 —— 实际部署时根据 traj 位置改。 + +### 2.13 `/mnt/xinshi/github/litellm-traj/` —— 已删除 + +第一版独立项目骨架(`pyproject.toml / src/litellm_traj/cursor.py / .gitignore / LICENSE`)在方向变更时已 `rm -rf`。所有有效内容都迁回了 rock 的 integrations/ 模块。 + +--- + +## 3. 关键代码细节(踩坑点 + "为什么这么写") + +下文展开几个最容易让接手者迷失的设计选择。每一项都标了 litellm 仓内的源码定位(litellm 主仓在 `/mnt/xinshi/github/litellm/`),便于交叉验证。 + +### 3.1 Streaming 聚合在 litellm 内部完成,Recorder 不需要分支 + +`StandardLoggingPayload.response` 字段在 `success_handler` 触发前**已经是聚合完整的 OpenAI shape dict**。流式与非流式走同一条路径:litellm 在 streaming 结束时调用 `stream_chunk_builder` 拼出 `complete_streaming_response`(litellm 仓 `litellm/litellm_core_utils/litellm_logging.py:1930-1955`),然后写入 `standard_logging_object.response`。 + +实际后果:`TrajectoryRecorder.async_log_success_event` 拿到的 payload 永远含完整 response,我**不需要写 `async_log_stream_event`**。这也是为什么 stream 解禁几乎"零成本" —— 录制端无任何额外代码。 + +### 3.2 `model: "openai/"` 前缀的含义 + +litellm 把"provider"前缀作为路由依据。`openai/gpt-3.5-turbo` 表示"上游是 OpenAI 兼容协议的服务,模型名叫 gpt-3.5-turbo"。配合 `api_base="https://api.modelscope.cn/v1"` 这种第三方 OpenAI 兼容 endpoint 也能用 —— 这正是 rock 现有 `proxy_rules` 里的 ModelScope/OpenAI 等场景。 + +`traj-replay/` 是我们注册的自定义 provider。litellm 看到这个前缀会查 `litellm.custom_provider_map`,匹配到 `provider == "traj-replay"` 的项,把 `custom_handler.acompletion`/`astreaming` 当上游调(litellm 仓 `litellm/main.py:4280-4326`)。 + +### 3.3 错误归一化:为什么 catch 那 5 个 exception + +`proxy.py` catch 顺序:`RateLimitError, APIError, BadRequestError, AuthenticationError, Timeout`。这五个在 `litellm/exceptions.py` 全部继承自 `openai.OpenAIError` 派生类,**都带 `.status_code` 属性**。`_format_error_response` 用 `getattr(exc, "status_code", None) or 502` 提取上游真实状态码;message 走 `str(exc)` —— litellm 异常的 `__str__` 已经包含"上游原始 error message",所以 agent 端的关键字检测(如 `"context length exceeded"` / `"content violation"`)继续工作。 + +`type` 字段用 `type(exc).__name__`(`"BadRequestError"` 等),不再是旧的固定 `"proxy_retry_failed"`。这是 schema 的语义变化:同一个 `error.type` 字段,旧版本返回固定字符串,新版本返回 exception 类名。如果有下游消费 `error.type` 做分支,需要适配。 + +兜底 `except Exception` 走 `HTTPException(500)`,会被 `main.py` 里的 `global_exception_handler` 接住,返回 `{error:{message,type:"internal_error",code:"internal_error"}}` —— 这条路径与重构前完全一致。 + +### 3.4 retry 行为:从 `retry_async` 切到 `litellm.num_retries` + +旧实现:`@retry_async(max_attempts=6, delay_seconds=2.0, backoff=2.0, jitter=True, exceptions=(TimeoutException, ConnectError, HTTPStatusError))`。仅在 `status_code in retryable_status_codes` 时 raise,这样 401 不会触发 retry,而 429/500 会。 + +新实现:`config.num_retries`(默认 6) 直接传给 `litellm.acompletion(num_retries=...)`。litellm 内部对 `RateLimitError / APIError / Timeout / ServiceUnavailableError` 自动重试,**不暴露 `retryable_status_codes` 维度**。我保留 `retryable_status_codes` 字段在 config 里,但当前**handler 没用它**(向后兼容旧 yaml,不会因为多了字段而 reject)。 + +如果将来有人投诉"自定义重试码列表失效",这是已知的语义差异。fallback 方案:在 handler 里手写 `for attempt in range(config.num_retries):` 包一层,根据 status code 做白名单。本期不做,因为 litellm 默认行为已经覆盖最常见的 429/500。 + +### 3.5 `_filter_headers` 黑名单 vs 白名单 + +我用黑名单:`host / content-length / content-type / transfer-encoding / connection` 不转发,其余全部透传给 litellm 的 `extra_headers`。这与旧实现保持一致(旧的也是去掉前 4 个,新增 connection 是为了更标准)。Authorization/X-* 等都自动通过。 + +注意:`extra_headers` 在 litellm 里被合并到上游 HTTP 请求里(litellm 自己的 OpenAI client),不会覆盖 litellm 自己生成的 `Authorization: Bearer `。如果 rock 不主动设 `OPENAI_API_KEY`,而 client 又传了 Authorization header,litellm 会用 client 的;反之 litellm 会用环境变量。这一层逻辑全在 litellm 自己。 + +### 3.6 `traj_append=False` 的"首次截断"行为 + +旧 `_write_traj` 在 `append=False` 时**每次调用都 `mode="w"`**,导致 jsonl 永远只有最后一行 —— 这是个 bug。 + +新 `TrajectoryRecorder` 的修复:维护一个 `self._truncated` 实例标志;`append=False` 时,**第一次写**用 `mode="w"`(覆盖上一进程留下的旧 traj),**后续写**用 `mode="a"`(本进程内 append)。所以: +- 进程启动时:旧 traj 文件清空(如果存在) +- 进程运行中:每次调用 append 一行 +- 进程重启:再次清空,从头记 + +效果上等于"per-run 一份完整 traj"。我把这个语义在 docstring 里讲清楚了,因为这是和旧默认行为最不同的一点。 + +`traj_append=True`(新默认)就是纯 append-only,不管旧文件。 + +### 3.7 SequentialCursor 的并发模型 + +`async next()` 用 `asyncio.Lock` 保护索引 + 自增。**单进程多并发请求场景下** cursor 推进是原子的,但**含义是"按到达顺序消费"**,所以多个 agent 并发打过来会被串成一个伪顺序 —— 这是 v1 的已知约束(plan 里明确列出),约定"单 agent 串行回放"。 + +**model mismatch 只 warn 不 raise**:expected_model 来自调用方传入,recorded model 来自 record 内的 `model` 字段。两者不一致只打 warning,record 仍然返回。理由:agent 端可能切换了 base_url 但没改 model 名(常见调试场景),不该硬阻塞。 + +### 3.8 CustomLLM 的调用约定 —— `*args, **kwargs` 收尾很重要 + +`litellm/main.py:4302-4319` 实测调用方式是**全关键字参数**: +```python +response = handler_fn( + model=model, messages=messages, headers=headers, + model_response=model_response, print_verbose=..., + api_key=..., api_base=..., acompletion=..., logging_obj=..., + optional_params=..., litellm_params=..., logger_fn=..., + timeout=..., custom_prompt_dict=..., client=..., encoding=..., +) +``` +但 litellm 各小版本会不会增减字段不确定。`TrajectoryReplayer.acompletion(self, model, messages, *args, **kwargs)` 这种"显式 model+messages,其余吞掉"的签名,既能 PEP-484 注解,又对 litellm 后续加字段免疫。 + +**不要改成 `def acompletion(self, model, messages, *, optional_params, ...)`** 否则 litellm 加新字段时会 TypeError。 + +### 3.9 `LITELLM_TRAJ_FILE` env vs `traj_file` 字段 + +我没引入新 env var。`config.traj_file` 在 `main.py:_configure_litellm_for_proxy` 里通过 `config.traj_file or TRAJ_FILE` 取值,而 `TRAJ_FILE` 来自 `config.py:13`,= `LOG_DIR + "/LLMTraj.jsonl"`,`LOG_DIR = env_vars.ROCK_MODEL_SERVICE_DATA_DIR`(默认 `/data/logs`)。 + +所以路径优先级:`--traj-file CLI` > `traj_file: yaml` > `LOG_DIR/LLMTraj.jsonl`(LOG_DIR 受 `ROCK_MODEL_SERVICE_DATA_DIR` env 控制)。和旧体系一致。 + +### 3.10 `record_traj` 装饰器为什么保留 + +`local.py:75` 仍然用 `@record_traj` 装饰它的 chat_completions handler。local 模式不调 litellm,FileHandler 直接通过文件 marker 跟 Roll 通信 —— 没有 litellm callback 触发的窗口。所以为了保留 local 模式的"调用次数 + RT 上报",我把 `record_traj` 留在 `utils.py`,让 local 继续用,docstring 写明"proxy 模式不再用,改走 TrajectoryRecorder"。 + +代价:local 模式录的 traj schema 是旧的 `{request, response}`,proxy 模式是 `StandardLoggingPayload`。两种 schema 共存于同一个 `LLMTraj.jsonl` 文件路径上(因为 `TRAJ_FILE` 是同一个常量)。**实际部署时 local 和 proxy 用同一个进程的概率为 0**(`--type` 互斥),所以同一个 traj 文件不会混合两种 schema。但如果有人定时切换 `--type` 跑 + `traj_append=true` 不轮换文件,会出现混合。文档建议:**replay 时只读 proxy 模式录的 traj**(StandardLoggingPayload 格式),local 模式的 traj 仅用于 local 调试。 + +--- + +## 4. 跑测试 / 验证步骤(接手者从这里继续) + +### 4.1 准备 Python 环境 + +**已验证**:`uv sync` 后 litellm 已正常安装。使用 `uv run` 执行,不需要手动激活 venv。 + +```bash +cd /mnt/xinshi/github/Self-ROCK +uv sync --extra model-service --group test +``` + +验证依赖(已通过): + +```bash +uv run python -c "from litellm.integrations.custom_logger import CustomLogger; print('ok')" +uv run python -c "from litellm.llms.custom_llm import CustomLLM, CustomLLMError; print('ok')" +uv run python -c "from litellm.utils import async_mock_completion_streaming_obj; print('ok')" +``` + +### 4.2 静态检查 / lint + +```bash +uv run ruff check rock/sdk/model/server/ tests/unit/sdk/model/ +uv run ruff format --check rock/sdk/model/server/ tests/unit/sdk/model/ +``` + +如果 ruff format 报 diff,直接 `uv run ruff format rock/sdk/model/server/ tests/unit/sdk/model/` 修。代码写的时候我没跑 ruff,可能有 line-length / import 排序之类的小问题。 + +### 4.3 单测(已全部通过) + +```bash +uv run pytest tests/unit/sdk/model/ -v +# → 47 passed in ~4s +``` + +**已验证通过的测试集**: +- `test_proxy.py` (27 个):routing/error/replay/header/cli/config/metrics +- `test_traj_recorder.py` (7 个):JSONL append/truncate/metrics/failure/missing payload/mkdir/rt fallback +- `test_traj_replayer.py` (11 个):cursor 加载/顺序/越界/reset/model mismatch/acompletion/streaming/exhaustion +- `test_model_client.py` (2 个):原有测试保留通过 + +**已知但不影响测试的边界情况**(生产注意): +- tool_calls 场景下 `_extract_assistant_text` 返回 `""`,replay 流式会返回空流(已知限制,不在本期范围) +- `litellm.callbacks` 是全局 list,测试隔离靠 patch,生产只起一次 server 无问题 + +### 4.4 集成验证(测试通过后) + +#### Record 模式 + +```bash +# 终端 1 +export OPENAI_API_KEY="sk-..." +export ROCK_MODEL_SERVICE_DATA_DIR=/tmp/rock-traj +mkdir -p /tmp/rock-traj +uv run python -m rock.sdk.model.server.main \ + --type proxy \ + --config-file examples/model_service/config_record.yaml \ + --port 8080 + +# 终端 2 +curl -X POST http://127.0.0.1:8080/v1/chat/completions \ + -H "Authorization: Bearer $OPENAI_API_KEY" \ + -H "Content-Type: application/json" \ + -d '{"model":"gpt-3.5-turbo","messages":[{"role":"user","content":"say hi"}]}' + +# 验证 traj +cat /tmp/rock-traj/LLMTraj.jsonl | jq '.id, .model, .response.choices[0].message.content' +# 应该看到 chatcmpl-xxx / gpt-3.5-turbo / "..." +``` + +#### Replay 模式 + +```bash +# 终端 1 +uv run python -m rock.sdk.model.server.main \ + --type proxy \ + --replay-traj /tmp/rock-traj/LLMTraj.jsonl \ + --port 8081 + +# 终端 2 - 同样的 curl 打 8081 +curl -X POST http://127.0.0.1:8081/v1/chat/completions \ + -H "Content-Type: application/json" \ + -d '{"model":"gpt-3.5-turbo","messages":[{"role":"user","content":"anything (replay ignores msgs)"}]}' + +# 应该返回与录制时同样的 response.choices[0].message.content +# 第二次 curl 会 404(traj exhausted),证明 cursor 在工作 +``` + +#### Streaming 验证 + +```bash +curl -N -X POST http://127.0.0.1:8080/v1/chat/completions \ + -H "Authorization: Bearer $OPENAI_API_KEY" \ + -H "Content-Type: application/json" \ + -d '{"model":"gpt-3.5-turbo","stream":true,"messages":[{"role":"user","content":"count to 5"}]}' +# 应该看到 SSE chunks: data: {...}\n\n ... data: [DONE]\n\n +# traj 文件里那一行的 .stream == true,.response 是聚合后的完整 dict +``` + +#### Agent 端到端(最终验证) + +`mini-swe-agent` 跑一个 SWE-bench 实例,base_url 指向 8080(record),完了用同 instance 接 8081(replay),期望 agent 最终生成的 patch 与录制时一致。这是最强 check,但跑起来麻烦,可以在 PR review 阶段再做。 + +--- + +## 5. Breaking Changes(PR 描述里必须写清楚) + +### 5.1 traj 文件 schema 改变 + +`LLMTraj.jsonl` 每行从 `{"request": {...}, "response": {...}}` 变成 `StandardLoggingPayload`(几十个字段:`id/trace_id/model/messages/response/model_parameters/usage/startTime/endTime/status/...`)。 + +如果有下游消费者依赖旧的两字段 schema(脚本、UI、统计),会破坏。本期不提供"双格式输出"或"旧→新转换"工具,如有需要可单独写 `scripts/convert_traj.py`。 + +### 5.2 `traj_append` 默认值翻转 + +旧的 `ROCK_MODEL_SERVICE_TRAJ_APPEND_MODE` 默认 `"false"` → `_write_traj` 用 `mode="w"`,实际表现是"每次调用覆盖,文件只剩最后一条"。新的 `ModelServiceConfig.traj_append` 默认 `True`(append-only)。 + +如果有人**之前依赖每次都覆盖来获取"最近一次调用"**(很罕见但可能),需要在 yaml 显式设 `traj_append: false`。 + +### 5.3 `error.type` 字段语义变化 + +旧值:固定字符串 `"proxy_retry_failed"`(retry 用尽)或 `"internal_error"`(其他)。 +新值:litellm 异常类名,如 `"BadRequestError" / "RateLimitError" / "Timeout" / "AuthenticationError" / "APIError"`。 + +`error.message` 仍以 `"LLM backend error: ..."` 开头,关键字检测兼容。 + +### 5.4 `retryable_status_codes` 字段不再生效 + +旧版本根据 `retryable_status_codes` 白名单决定哪些状态码触发 retry(如 401 不 retry,429/500 retry)。新版本由 litellm 内部决定(对 `RateLimitError / APIError / Timeout / ServiceUnavailableError` 自动 retry,4xx 一般不 retry)。 + +字段保留在 yaml 不报错,但 handler 不读它。如果将来需要恢复白名单,见 3.4 节"fallback 方案"。 + +### 5.5 `stream=true` 不再被强制拒绝 + +旧版本对 `stream=true` 返回 400 + `"Streaming requests (stream=True) are not supported"`。新版本正常处理,返回 SSE。 + +如果有 client 之前**依赖** 400 来探测"是否启用流式",会破坏。但这种用法很反常,基本不会有。 + +### 5.6 `perform_llm_request` 函数已删除 + +下游不应该 import 这个 —— 它本来就是 proxy.py 内的 helper。如果有 test/script 直接 import 它,需要适配。`tests/unit/sdk/model/test_proxy.py` 我已改完。 + +### 5.7 新的依赖 + +`pip install rl-rock[model-service]` 会多装 litellm(及其依赖链:`openai>=1.x / tiktoken / aiohttp / tokenizers / ...`)。包大小 +~50MB。 + +--- + +## 6. 已知坑 / 接手时的注意事项 + +### 6.1 `local.py` 仍在 import `record_traj` + +我**没改 local.py**(plan 明确"local 不动")。`local.py:12` 的 `from rock.sdk.model.server.utils import record_traj` 仍然成立,因为 utils.py 保留了 record_traj。如果接手者看到这个 import 想清理,**不要清理** —— 那会破坏 local 模式。 + +### 6.2 `litellm.callbacks` 是全局 list + +`main.py:_configure_litellm_for_proxy` 用 `litellm.callbacks.append(recorder)`。如果同一进程多次启动(测试场景),会注册多次,导致每次调用落多份 traj。生产部署只跑一次没问题。**如果要写"重复初始化也安全"的逻辑**,可以改成 `if not any(isinstance(cb, TrajectoryRecorder) for cb in litellm.callbacks): litellm.callbacks.append(recorder)`。我没做,因为生产路径是"启动一次"。 + +同理 `litellm.custom_provider_map = [...]` 是赋值不是 append,所以 replay 重复初始化是幂等的。 + +### 6.3 SequentialCursor 在测试里要小心 cursor 跨用例 + +`SequentialCursor` 是实例属性 `self._idx`,每个测试自己 `SequentialCursor.load(p)` 都是新实例,不会跨用例污染。但如果有人写"模块级单例 replayer + 多个测试调它"的 fixture,会撞 idx。当前测试都是 per-test 实例,OK。 + +### 6.4 `litellm` import 较慢 + +litellm import 时会加载几个 OpenAI/HuggingFace 客户端,首次 import 可能 1-2 秒。`main.py` 把 `import litellm` 放在 `_configure_litellm_for_proxy()` 内部(函数级延迟 import),只在 proxy 模式启动时触发。`proxy.py` 是模块顶级 `import litellm`,handler 文件首次加载就触发 —— 这是 fastapi 路由注册时的开销,不影响请求路径性能。 + +### 6.5 `pyproject.toml` 的 `tzdata` 依赖 + +我看到 pyproject.toml 里 ide_diagnostics 报 `httpx/uuid/anyio/tzdata/...` 未安装 —— 这是 ide 当前 Python 环境没装 rock 主仓依赖,与本次改动无关。`uv sync` 后这些 hint 自动消失。 + +### 6.6 `__pycache__` 残留 + +旧 `proxy.py` 有 `__pycache__/proxy.cpython-310.pyc`。重写后第一次 import 会重新生成,**正常情况下没问题**。如果跑测试时报 `ImportError: cannot import name 'perform_llm_request'`,先 `find rock -name __pycache__ -exec rm -rf {} +` 清掉缓存。 + +### 6.7 别忘了 `extra_headers` 可能含敏感信息 + +`_filter_headers` 把所有非 hop-by-hop header 透传给上游,包括 client 传的 `Authorization`。这是**故意的** —— 让 client 自己带 API key 是 rock 现有约定。但意味着 traj 录的 `StandardLoggingPayload.metadata.headers`(如果有) 可能含 Bearer token。litellm 自己有 `turn_off_message_logging` / `redact_user_api_key_info` 等开关,**目前没启用**。如果将来 traj 文件要分发,需要先脱敏。 + +--- + +## 7. 不在本次范围 / 后续扩展(v2) + +### 不在范围(明确不做) + +- local 模式(`--type local`)的任何改动 +- DB 持久化(traj 只走 JSONL) +- 旧 `{request, response}` traj 的兼容读取(replay 只接受新 schema) +- SWE-agent / OpenHands 原生 traj 格式互转 +- replay 时 streaming 的细粒度时序还原(只保证 chunk 序列正确) +- tool_calls 的增量流式拆分(本期 streaming replay 只到 message-level chunk) + +### 后续扩展(留了接口) + +- **基于 messages hash 的乱序匹配**:`SequentialCursor` 旁加 `HashMatcher`,通过 `replay_mode: sequential | hash` 切换。当 agent 内部不严格按录制顺序调 LLM(分支/retry)时用。 +- **多并发回放**:用请求 metadata 中的 `run_id` 路由到不同 cursor;`SequentialCursor` 改成 `dict[run_id, Cursor]`。 +- **passthrough on miss**:cursor 用尽时回落到真 LLM(`import litellm; await litellm.acompletion(...)`)。用于"录到一半 traj 不够长"的调试场景。 +- **`/admin/reset` HTTP 端点**:不重启 proxy 即可把 cursor 归零。 +- **`scripts/convert_traj.py`**:把 SWE-agent `.traj` 或 OpenHands event log 转成 StandardLoggingPayload,反向也行。 +- **traj 脱敏 hook**:写盘前过 `redact_keys: list[str]` 把指定字段抹掉。 + +--- + +## 8. 关键路径速查 + +### Rock 仓内(本次改动的) + +| 路径 | 角色 | +|---|---| +| `pyproject.toml` | model-service extras 加 litellm | +| `rock/sdk/model/server/config.py` | ModelServiceConfig 新字段 | +| `rock/sdk/model/server/api/proxy.py` | 重写为 litellm SDK | +| `rock/sdk/model/server/main.py` | `_configure_litellm_for_proxy` + 新 CLI flags | +| `rock/sdk/model/server/utils.py` | 保留 record_traj 给 local | +| `rock/sdk/model/server/integrations/__init__.py` | 空,只为成包 | +| `rock/sdk/model/server/integrations/traj_recorder.py` | TrajectoryRecorder(CustomLogger) | +| `rock/sdk/model/server/integrations/traj_replayer.py` | SequentialCursor + TrajectoryReplayer(CustomLLM) | +| `rock/sdk/model/server/api/local.py` | **没改**(仍用 record_traj) | +| `tests/unit/sdk/model/test_proxy.py` | 改造完 | +| `tests/unit/sdk/model/test_traj_recorder.py` | 新 | +| `tests/unit/sdk/model/test_traj_replayer.py` | 新 | +| `examples/model_service/config_record.yaml` | 新 | +| `examples/model_service/config_replay.yaml` | 新 | + +### litellm 仓(交叉验证用,在 `/mnt/xinshi/github/litellm/`) + +| 关注点 | 路径 | +|---|---| +| CustomLogger 接口(基类) | `litellm/integrations/custom_logger.py:67` | +| CustomLLM 接口(基类) | `litellm/llms/custom_llm.py:47` | +| StandardLoggingPayload schema | `litellm/types/utils.py:2764` | +| streaming 聚合写入 payload | `litellm/litellm_core_utils/litellm_logging.py:1930-1955` | +| async_mock_completion_streaming_obj | `litellm/utils.py:6831` | +| custom_provider_map 加载流程(实际是怎么调 acompletion 的) | `litellm/main.py:4280-4326` | +| LiteLLM 异常基类(status_code 来源) | `litellm/exceptions.py` | + +### 历史 / 对话产物 + +- 原始 plan 文件(详细设计推演): `/home/xinshi/.claude/plans/litellm-chat-completions-traj-replay-ser-lucky-rainbow.md` +- 已废弃的独立项目骨架: `/mnt/xinshi/github/litellm-traj/`(已 `rm -rf`) + +--- + +## 9. 给接手者的 1 分钟上手 + +1. `cd /mnt/xinshi/github/Self-ROCK && uv sync --extra model-service --group test` +2. `uv run pytest tests/unit/sdk/model/ -v` → 应得 **47 passed**(已验证) +3. 跑集成验证(第 4.4 节) +4. 写 PR 描述,**重点说第 5 节的 breaking changes** +5. PR 评审里如果有人问"为什么不沿用 retry_async 的 status code 白名单",答:见第 3.4 节(litellm 默认 retry 已覆盖最常见场景,白名单后续可选加) + +如果想了解整个项目背景而不只是这次 refactor,看顶层 `CLAUDE.md`。如果想知道 litellm 内部细节,看 `/mnt/xinshi/github/litellm/CLAUDE.md`(litellm 主仓的)。 diff --git a/examples/model_service/README.md b/examples/model_service/README.md new file mode 100644 index 0000000000..7a169764fe --- /dev/null +++ b/examples/model_service/README.md @@ -0,0 +1,90 @@ +# model-service proxy 用法示例 + +`rock model-service` 的 `proxy` 模式把 `/v1/chat/completions` 转发到上游 LLM,并把每次调用以 +`StandardLoggingPayload` 格式 append 到 JSONL traj 文件。配合 `--traj-file` 可以让相同 base URL 的 +agent(SWE-agent / mini-swe-agent / OpenHands)从录制的 traj 回放,实现"无 LLM 成本"调试。 + +下面所有命令都用 `python -m rock.sdk.model.server.main` 启动,等价于 `rock model-service start`。 + +## 1. Record 模式(默认) + +转发到单个上游,每次调用 append 到 `LOG_DIR/LLMTraj.jsonl`: + +```bash +export OPENAI_API_KEY="sk-..." +export ROCK_MODEL_SERVICE_DATA_DIR=/tmp/rock-traj # traj 文件落盘根目录 + +python -m rock.sdk.model.server.main \ + --type proxy \ + --proxy-base-url https://api.openai.com/v1 \ + --port 8080 +``` + +调用: + +```bash +curl -X POST http://127.0.0.1:8080/v1/chat/completions \ + -H "Authorization: Bearer $OPENAI_API_KEY" \ + -H "Content-Type: application/json" \ + -d '{"model":"gpt-3.5-turbo","messages":[{"role":"user","content":"hi"}]}' + +# 查看 traj +cat /tmp/rock-traj/LLMTraj.jsonl | jq '.id, .model, .response.choices[0].message.content' +``` + +支持流式(litellm 自动聚合写入 traj): + +```bash +curl -N -X POST http://127.0.0.1:8080/v1/chat/completions \ + -H "Authorization: Bearer $OPENAI_API_KEY" \ + -H "Content-Type: application/json" \ + -d '{"model":"gpt-3.5-turbo","stream":true,"messages":[{"role":"user","content":"count to 5"}]}' +``` + +## 2. Replay 模式 + +把 `--traj-file` 指到一个录好的 jsonl,proxy 不再访问真实 LLM,按录制顺序返回响应: + +```bash +python -m rock.sdk.model.server.main \ + --type proxy \ + --traj-file /tmp/rock-traj/LLMTraj.jsonl \ + --port 8081 +``` + +agent 把 base URL 换成 `http://127.0.0.1:8081/v1` 即可重放,cursor 用尽后返回 404。 +`--traj-file` 必须是单个 jsonl 文件路径。 + +## 3. 调整重试和超时 + +```bash +python -m rock.sdk.model.server.main \ + --type proxy \ + --proxy-base-url https://api.openai.com/v1 \ + --num-retries 3 \ + --request-timeout 60 \ + --port 8080 +``` + +## 4. 多模型路由(需要 YAML) + +只有在按 model name 分流到不同上游时才需要 YAML(CLI 只暴露单一 `--proxy-base-url`)。新建 +`routes.yaml`: + +```yaml +proxy_rules: + gpt-3.5-turbo: "https://api.openai.com/v1" + gpt-4o: "https://api.openai.com/v1" + default: "https://api-inference.modelscope.cn/v1" +``` + +启动时配合 CLI: + +```bash +python -m rock.sdk.model.server.main \ + --type proxy \ + --config-file routes.yaml \ + --port 8080 +``` + +CLI 上指定的 `--proxy-base-url` / `--port` / `--num-retries` 等仍会覆盖 YAML 的同名字段。 diff --git a/pyproject.toml b/pyproject.toml index badb7d1a4b..bf814e0aa1 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -86,6 +86,7 @@ model-service = [ "psutil", "swebench", "alibabacloud_cr20181201==2.0.5", + "litellm>=1.50.0", ] diff --git a/rock/sdk/model/server/api/proxy.py b/rock/sdk/model/server/api/proxy.py index fb2b7bec3c..4894430641 100644 --- a/rock/sdk/model/server/api/proxy.py +++ b/rock/sdk/model/server/api/proxy.py @@ -1,13 +1,28 @@ +"""OpenAI-compatible chat/completions proxy backed by the litellm SDK. + +The proxy ``/v1/chat/completions`` handler routes a request to the configured +upstream LLM (or to the in-process traj-replay handler when ``replay_traj_path`` +is set), forwards header/body, and applies retry via litellm's ``num_retries``. + +Trajectory recording is wired up at startup in +``rock.sdk.model.server.main`` by registering ``TrajectoryRecorder`` as a +``litellm.callbacks`` entry — this handler does not carry a ``@record_traj`` +decorator anymore. +""" + +from __future__ import annotations + +import json +from collections.abc import AsyncIterator from typing import Any -import httpx +import litellm from fastapi import APIRouter, HTTPException, Request -from fastapi.responses import JSONResponse +from fastapi.responses import JSONResponse, StreamingResponse +from litellm.exceptions import APIError, AuthenticationError, BadRequestError, RateLimitError, Timeout from rock.logger import init_logger from rock.sdk.model.server.config import ModelServiceConfig -from rock.sdk.model.server.utils import record_traj -from rock.utils import retry_async logger = init_logger(__name__) @@ -15,40 +30,21 @@ proxy_router = APIRouter() -# Global HTTP client with a persistent connection pool -http_client = httpx.AsyncClient() - - -@retry_async( - max_attempts=6, - delay_seconds=2.0, - backoff=2.0, # Exponential backoff (2s, 4s, 8s, 16s, 32s). - jitter=True, # Adds randomness to prevent "thundering herd" effect on the backend. - exceptions=(httpx.TimeoutException, httpx.ConnectError, httpx.HTTPStatusError), -) -async def perform_llm_request(url: str, body: dict, headers: dict, config: ModelServiceConfig): - """ - Forwards the request and triggers retry ONLY if the status code - is in the explicit retryable whitelist. - """ - response = await http_client.post(url, json=body, headers=headers, timeout=config.request_timeout) - status_code = response.status_code - - # Check against the explicit whitelist - if status_code in config.retryable_status_codes: - logger.warning(f"Retryable error detected: {status_code}. Triggering retry for {url}...") - response.raise_for_status() - - return response +# Headers we never forward upstream: +# - host / content-length / content-type: litellm rewrites the body and re-targets, +# so the client's values would be wrong or misleading +# - transfer-encoding / connection: true RFC 7230 hop-by-hop headers, scoped to +# the client↔proxy connection only +_HEADERS_NOT_TO_FORWARD = frozenset({"host", "content-length", "content-type", "transfer-encoding", "connection"}) def get_base_url(model_name: str, config: ModelServiceConfig) -> str: - """ - Selects the target backend URL based on model name matching. + """Pick the upstream base URL by model name. - If proxy_base_url is configured, it takes precedence over proxy_rules. + ``proxy_base_url`` takes precedence; falls back to ``proxy_rules[model]`` and + then ``proxy_rules["default"]``. Trailing slashes are stripped so the caller + can append ``/chat/completions`` directly. """ - # If direct proxy base URL is configured, return it directly (bypass model name matching) if config.proxy_base_url: return config.proxy_base_url.rstrip("/") @@ -59,67 +55,108 @@ def get_base_url(model_name: str, config: ModelServiceConfig) -> str: base_url = rules.get(model_name) or rules.get("default") if not base_url: raise HTTPException( - status_code=400, detail=f"Model '{model_name}' is not configured and no 'default' rule found." + status_code=400, + detail=f"Model '{model_name}' is not configured and no 'default' rule found.", ) return base_url.rstrip("/") +def _filter_headers(headers) -> dict[str, str]: + forwarded = {} + for key, value in headers.items(): + if key.lower() in _HEADERS_NOT_TO_FORWARD: + continue + forwarded[key] = value + return forwarded + + +def _format_error_response(exc: Exception) -> JSONResponse: + """Render a litellm exception as the legacy ``{error:{message,type,code}}`` JSON. + + Agent-side logic keys off message substrings (e.g. "context length exceeded", + "content violation"), so we keep the message verbatim from the upstream. + """ + status_code = getattr(exc, "status_code", None) or 502 + message = str(exc) + error_type = type(exc).__name__ + return JSONResponse( + status_code=status_code, + content={ + "error": { + "message": f"LLM backend error: {message}", + "type": error_type, + "code": status_code, + } + }, + ) + + +async def _sse_iter(stream: AsyncIterator[Any]) -> AsyncIterator[bytes]: + """Convert a litellm async chunk stream into Server-Sent Events bytes.""" + try: + async for chunk in stream: + payload = chunk.model_dump() if hasattr(chunk, "model_dump") else chunk + yield f"data: {json.dumps(payload, ensure_ascii=False)}\n\n".encode() + finally: + yield b"data: [DONE]\n\n" + + @proxy_router.post("/v1/chat/completions") -@record_traj async def chat_completions(body: dict[str, Any], request: Request): + """OpenAI-compatible chat completions proxy endpoint. + + Routes via ``proxy_base_url`` / ``proxy_rules``, forwards Authorization-style + headers, supports streaming, retries via litellm. In replay mode the request + is dispatched to the registered ``traj-replay`` CustomLLM provider instead + of being forwarded upstream. """ - OpenAI-compatible chat completions proxy endpoint. - Handles routing, header transparent forwarding, and automatic retries. - """ - config = request.app.state.model_service_config + config: ModelServiceConfig = request.app.state.model_service_config - # Step 1: Model Routing model_name = body.get("model", "") - base_url = get_base_url(model_name, config) - target_url = f"{base_url}/chat/completions" - logger.info(f"Routing model '{model_name}' to URL: {target_url}") - - # Step 2: Header Cleaning - # Preserve 'Authorization' for authentication while removing hop-by-hop transport headers. - forwarded_headers = {} - for key, value in request.headers.items(): - if key.lower() in ["host", "content-length", "content-type", "transfer-encoding"]: - continue - forwarded_headers[key] = value - # Step 3: Strategy Enforcement - # Force non-streaming mode for the MVP phase to ensure stability. - if body.get("stream") is True: - raise HTTPException( - status_code=400, - detail="Streaming requests (stream=True) are not supported in the current version. Please set stream=False or omit the stream parameter.", - ) - body["stream"] = False + # 1. Route selection + if config.replay_traj_path: + litellm_model = f"traj-replay/{model_name or 'replay'}" + api_base: str | None = None + logger.info(f"[replay] dispatching '{model_name}' to traj-replay handler") + else: + api_base = get_base_url(model_name, config) + # Tell litellm to treat the upstream as an OpenAI-compatible server. + litellm_model = f"openai/{model_name}" if model_name else "openai/default" + logger.info(f"Routing model '{model_name}' to {api_base}") + + # 2. Header forwarding (preserve Authorization, drop hop-by-hop) + extra_headers = _filter_headers(request.headers) + + # 3. Build call kwargs (transparent passthrough of body fields) + call_kwargs = dict(body) + call_kwargs.pop("model", None) # avoid duplicate kwargs + is_stream = bool(call_kwargs.get("stream")) try: - # Step 4: Execute Request with Retry Logic - response = await perform_llm_request(target_url, body, forwarded_headers, config) - return JSONResponse(status_code=response.status_code, content=response.json()) - - except httpx.HTTPStatusError as e: - # Forward the raw backend error message to the client. - # This allows the Agent-side logic to detect keywords like 'context length exceeded' - # or 'content violation' and raise appropriate exceptions. - error_text = e.response.text if e.response else "No error details" - status_code = e.response.status_code if e.response else 502 - logger.error(f"Final failure after retries. Status: {status_code}, Response: {error_text}") - return JSONResponse( - status_code=status_code, - content={ - "error": { - "message": f"LLM backend error: {error_text}", - "type": "proxy_retry_failed", - "code": status_code, - } - }, + response = await litellm.acompletion( + model=litellm_model, + api_base=api_base, + extra_headers=extra_headers, + timeout=config.request_timeout, + num_retries=config.num_retries, + **call_kwargs, ) - except Exception as e: - logger.error(f"Unexpected proxy error: {str(e)}") - # Raise standard 500 for non-HTTP related errors or system errors - raise HTTPException(status_code=500, detail=str(e)) + except (RateLimitError, APIError, BadRequestError, AuthenticationError, Timeout) as exc: + logger.warning(f"litellm error for model '{model_name}': {exc}") + return _format_error_response(exc) + except Exception as exc: # pragma: no cover - last-resort safety net + logger.error(f"Unexpected proxy error: {exc}", exc_info=True) + raise HTTPException(status_code=500, detail=str(exc)) + + # 4. Streaming vs non-streaming response + if is_stream: + return StreamingResponse(_sse_iter(response), media_type="text/event-stream") + + # litellm returns a ModelResponse pydantic; expose the OpenAI-shape dict. + if hasattr(response, "model_dump"): + body_out = response.model_dump() + else: + body_out = response # already a dict (replay path can short-circuit) + return JSONResponse(status_code=200, content=body_out) diff --git a/rock/sdk/model/server/config.py b/rock/sdk/model/server/config.py index 2c96992b5c..8c992fb4b3 100644 --- a/rock/sdk/model/server/config.py +++ b/rock/sdk/model/server/config.py @@ -51,6 +51,19 @@ class ModelServiceConfig(BaseModel): request_timeout: int = Field(default=120) """Request timeout in seconds.""" + num_retries: int = Field(default=6) + """Number of retries for retryable failures (passed through to litellm).""" + + traj_enabled: bool = Field(default=True) + """When True, write each chat/completions call as a JSONL trajectory line.""" + + traj_file: str | None = Field(default=None) + """Override default trajectory file path. None → uses TRAJ_FILE (LOG_DIR/LLMTraj.jsonl).""" + + replay_traj_path: str | None = Field(default=None) + """Path to a .jsonl trajectory file or a directory of .jsonl files for replay mode. + When set, requests are served from recorded responses instead of a real upstream.""" + @classmethod def from_file(cls, config_path: str | None = None): """ diff --git a/rock/sdk/model/server/integrations/__init__.py b/rock/sdk/model/server/integrations/__init__.py new file mode 100644 index 0000000000..e69de29bb2 diff --git a/rock/sdk/model/server/integrations/traj_recorder.py b/rock/sdk/model/server/integrations/traj_recorder.py new file mode 100644 index 0000000000..6aa01a8eed --- /dev/null +++ b/rock/sdk/model/server/integrations/traj_recorder.py @@ -0,0 +1,77 @@ +"""Record chat/completions trajectories as JSONL via litellm's CustomLogger hook. + +One line per call, each line is a ``StandardLoggingPayload`` dict from litellm. +Streaming chunks are aggregated by litellm before this callback fires (see +litellm/litellm_core_utils/litellm_logging.py around line 1930), so we don't +need to handle the streaming/non-streaming split ourselves. +""" + +from __future__ import annotations + +import asyncio +import json +import os +from pathlib import Path + +from litellm.integrations.custom_logger import CustomLogger + +from rock.logger import init_logger +from rock.sdk.model.server.utils import ( + MODEL_SERVICE_REQUEST_COUNT, + MODEL_SERVICE_REQUEST_RT, + _get_or_create_metrics_monitor, +) + +logger = init_logger(__name__) + + +class TrajectoryRecorder(CustomLogger): + """litellm CustomLogger that appends each call's StandardLoggingPayload to JSONL + and reports OTLP RT/count metrics.""" + + def __init__(self, traj_file: str | os.PathLike) -> None: + super().__init__() + self.traj_file = Path(traj_file) + self.traj_file.parent.mkdir(parents=True, exist_ok=True) + self._lock = asyncio.Lock() + self._monitor = _get_or_create_metrics_monitor() + + async def async_log_success_event(self, kwargs, response_obj, start_time, end_time): + payload = kwargs.get("standard_logging_object") + if payload is None: + logger.debug("[traj-recorder] success event without standard_logging_object, skipping") + return + await self._append_jsonl(payload) + self._record_metrics(payload, status="success") + + async def async_log_failure_event(self, kwargs, response_obj, start_time, end_time): + payload = kwargs.get("standard_logging_object") + if payload is None: + return + await self._append_jsonl(payload) + self._record_metrics(payload, status="failure") + + async def _append_jsonl(self, payload: dict) -> None: + line = json.dumps(payload, ensure_ascii=False, default=str) + "\n" + async with self._lock: + await asyncio.to_thread(self._write_line, line) + + def _write_line(self, line: str) -> None: + with self.traj_file.open("a", encoding="utf-8") as f: + f.write(line) + + def _record_metrics(self, payload: dict, *, status: str) -> None: + rt_seconds = payload.get("response_time") + if rt_seconds is None: + start = payload.get("startTime") + end = payload.get("endTime") + rt_seconds = (end - start) if (start is not None and end is not None) else 0.0 + rt_ms = float(rt_seconds) * 1000.0 + + attrs = { + "type": "chat_completions", + "status": status, + "sandbox_id": os.getenv("ROCK_SANDBOX_ID", "unknown"), + } + self._monitor.record_gauge_by_name(MODEL_SERVICE_REQUEST_RT, rt_ms, attributes=attrs) + self._monitor.record_counter_by_name(MODEL_SERVICE_REQUEST_COUNT, 1, attributes=attrs) diff --git a/rock/sdk/model/server/integrations/traj_replayer.py b/rock/sdk/model/server/integrations/traj_replayer.py new file mode 100644 index 0000000000..c87c0fe75f --- /dev/null +++ b/rock/sdk/model/server/integrations/traj_replayer.py @@ -0,0 +1,139 @@ +"""Replay a recorded trajectory by registering a litellm CustomLLM provider. + +Loads a single JSONL trajectory file on init, then hands records out one at a +time in recorded order. This is the simplest matching strategy and works for +deterministic agent runs that replay the same sequence of LLM calls +(SWE-agent / mini-swe-agent / OpenHands). +""" + +from __future__ import annotations + +import asyncio +import json +import os +from collections.abc import AsyncIterator +from pathlib import Path +from typing import Any + +from litellm.llms.custom_llm import CustomLLM, CustomLLMError +from litellm.types.utils import GenericStreamingChunk, ModelResponse +from litellm.utils import async_mock_completion_streaming_obj + +from rock.logger import init_logger + +logger = init_logger(__name__) + + +class SequentialCursor: + """Hands out trajectory records one at a time, in recorded order. + + Going past the end raises CustomLLMError(404) so the proxy returns a clear + error to the caller. + """ + + def __init__(self, records: list[dict]) -> None: + self.records = records + self._idx = 0 + self._lock = asyncio.Lock() + + @classmethod + def load(cls, path: str | os.PathLike) -> SequentialCursor: + path = Path(path) + if not path.is_file(): + raise FileNotFoundError(f"traj file not found: {path}") + + records: list[dict] = [] + with path.open("r", encoding="utf-8") as fp: + for line in fp: + line = line.strip() + if not line: + continue + records.append(json.loads(line)) + + logger.info(f"[traj-replay] loaded {len(records)} record(s) from {path}") + return cls(records) + + async def next(self, expected_model: str | None = None) -> dict: + async with self._lock: + if self._idx >= len(self.records): + raise CustomLLMError( + status_code=404, + message=(f"trajectory exhausted at step {self._idx} (total recorded steps={len(self.records)})"), + ) + record = self.records[self._idx] + self._idx += 1 + current_idx = self._idx - 1 + + if expected_model: + recorded_model = record.get("model") + if recorded_model and recorded_model != expected_model: + logger.warning( + f"[traj-replay] step {current_idx} model mismatch: " + f"recorded={recorded_model!r} requested={expected_model!r}" + ) + return record + + def reset(self) -> None: + self._idx = 0 + + @property + def position(self) -> int: + return self._idx + + @property + def total(self) -> int: + return len(self.records) + + +def _record_to_model_response(record: dict) -> ModelResponse: + response = record.get("response") + if not isinstance(response, dict): + raise CustomLLMError( + status_code=500, + message=f"traj record at step has no usable 'response' dict: got {type(response).__name__}", + ) + return ModelResponse(**response) + + +def _extract_assistant_text(record: dict) -> str: + response = record.get("response") or {} + choices = response.get("choices") or [] + if not choices: + return "" + message = choices[0].get("message") or {} + return message.get("content") or "" + + +class TrajectoryReplayer(CustomLLM): + """litellm CustomLLM that returns recorded responses in sequential order.""" + + def __init__(self, traj_path: str | os.PathLike) -> None: + super().__init__() + self.cursor = SequentialCursor.load(traj_path) + + async def acompletion( + self, + model: str, + messages: list, + *args: Any, + **kwargs: Any, + ) -> ModelResponse: + record = await self.cursor.next(expected_model=model) + return _record_to_model_response(record) + + async def astreaming( + self, + model: str, + messages: list, + *args: Any, + **kwargs: Any, + ) -> AsyncIterator[GenericStreamingChunk]: + record = await self.cursor.next(expected_model=model) + text = _extract_assistant_text(record) + model_response = kwargs.get("model_response") + async for chunk in async_mock_completion_streaming_obj( + model_response=model_response, + mock_response=text, + model=model, + ): + yield chunk diff --git a/rock/sdk/model/server/main.py b/rock/sdk/model/server/main.py index 7f8dabebe2..e2263a1858 100644 --- a/rock/sdk/model/server/main.py +++ b/rock/sdk/model/server/main.py @@ -11,7 +11,7 @@ from rock.logger import init_logger from rock.sdk.model.server.api.local import init_local_api, local_router from rock.sdk.model.server.api.proxy import proxy_router -from rock.sdk.model.server.config import ModelServiceConfig +from rock.sdk.model.server.config import TRAJ_FILE, ModelServiceConfig # Configure logging logger = init_logger(__name__) @@ -52,6 +52,39 @@ async def global_exception_handler(request, exc): return app +def _configure_litellm_for_proxy(config: ModelServiceConfig) -> None: + """Wire up litellm record/replay integrations for the proxy mode. + + - When ``replay_traj_path`` is set, register ``TrajectoryReplayer`` as a + custom provider so requests routed to ``traj-replay/`` return + recorded responses without hitting any upstream. + - When recording is enabled (default), register ``TrajectoryRecorder`` as + a litellm callback so every chat/completions call appends a JSONL line. + + Replay and record are mutually exclusive: in replay mode we don't record, + since replayed responses re-traversing the recorder would inflate metrics + and overwrite the source-of-truth file. + """ + import litellm + + from rock.sdk.model.server.integrations.traj_recorder import TrajectoryRecorder + from rock.sdk.model.server.integrations.traj_replayer import TrajectoryReplayer + + if config.replay_traj_path: + replayer = TrajectoryReplayer(config.replay_traj_path) + litellm.custom_provider_map = [ + {"provider": "traj-replay", "custom_handler": replayer}, + ] + logger.info(f"litellm replay handler registered, traj_path={config.replay_traj_path}") + return + + if config.traj_enabled: + traj_path = config.traj_file or TRAJ_FILE + recorder = TrajectoryRecorder(traj_file=traj_path) + litellm.callbacks.append(recorder) + logger.info(f"litellm trajectory recorder registered, traj_file={traj_path}") + + def main( model_servie_type: str, config: ModelServiceConfig, @@ -63,6 +96,7 @@ def main( asyncio.run(init_local_api()) app.include_router(local_router, prefix="", tags=["local"]) else: + _configure_litellm_for_proxy(config) app.include_router(proxy_router, prefix="", tags=["proxy"]) logger.info(f"Starting LLM Service on {config.host}:{config.port}, type: {model_servie_type}") @@ -100,6 +134,13 @@ def create_config_from_args(args) -> ModelServiceConfig: if args.request_timeout: config.request_timeout = args.request_timeout logger.info(f"request_timeout set from command line: {args.request_timeout}s") + if getattr(args, "num_retries", None) is not None: + config.num_retries = args.num_retries + logger.info(f"num_retries set from command line: {args.num_retries}") + if getattr(args, "traj_file", None): + config.replay_traj_path = args.traj_file + config.traj_enabled = False + logger.info(f"replay mode enabled via --traj-file: {args.traj_file}") return config @@ -142,6 +183,18 @@ def create_config_from_args(args) -> ModelServiceConfig: parser.add_argument( "--request-timeout", type=int, default=None, help="Request timeout in seconds. Overrides config file." ) + parser.add_argument( + "--num-retries", + type=int, + default=None, + help="Number of retries for retryable failures (passed through to litellm). Overrides config file.", + ) + parser.add_argument( + "--traj-file", + type=str, + default=None, + help="Replay mode: path to a recorded .jsonl traj file or directory. Disables real LLM upstreams.", + ) args = parser.parse_args() config = create_config_from_args(args) diff --git a/rock/sdk/model/server/utils.py b/rock/sdk/model/server/utils.py index 20ae8896dc..86b7414e29 100644 --- a/rock/sdk/model/server/utils.py +++ b/rock/sdk/model/server/utils.py @@ -26,7 +26,13 @@ def _get_or_create_metrics_monitor() -> MetricsMonitor: def _write_traj(data: dict): - """Write traj data to file in JSONL format.""" + """Write traj data to file in JSONL format. + + Used by the legacy ``@record_traj`` decorator on the ``local`` model-service + flow. The proxy flow now persists trajectories via + :class:`rock.sdk.model.server.integrations.traj_recorder.TrajectoryRecorder` + instead, which uses litellm's StandardLoggingPayload schema. + """ from rock import env_vars append = env_vars.ROCK_MODEL_SERVICE_TRAJ_APPEND_MODE @@ -38,7 +44,12 @@ def _write_traj(data: dict): def record_traj(func: Callable): - """Decorator to record chat completions input/output as traj.""" + """Decorator to record chat completions input/output as traj. + + Kept for the ``local`` model-service mode (rock/sdk/model/server/api/local.py). + The ``proxy`` mode no longer uses this decorator — it relies on the + TrajectoryRecorder litellm callback for richer payloads. + """ @wraps(func) async def wrapper(*args, **kwargs): diff --git a/tests/unit/sdk/model/test_proxy.py b/tests/unit/sdk/model/test_proxy.py index edce5584cb..c994c1c9d6 100644 --- a/tests/unit/sdk/model/test_proxy.py +++ b/tests/unit/sdk/model/test_proxy.py @@ -1,12 +1,12 @@ +from types import SimpleNamespace from unittest.mock import AsyncMock, MagicMock, patch -import httpx import pytest import yaml -from fastapi import FastAPI, Request -from httpx import ASGITransport, AsyncClient, HTTPStatusError, Request, Response +from fastapi import FastAPI +from httpx import ASGITransport, AsyncClient -from rock.sdk.model.server.api.proxy import perform_llm_request, proxy_router +from rock.sdk.model.server.api.proxy import proxy_router from rock.sdk.model.server.config import ModelServiceConfig from rock.sdk.model.server.main import create_config_from_args, lifespan from rock.sdk.model.server.utils import ( @@ -24,18 +24,30 @@ test_app.state.model_service_config = mock_config +# Patch path for the litellm.acompletion symbol as imported inside proxy.py. +ACOMPLETION_PATCH = "rock.sdk.model.server.api.proxy.litellm.acompletion" + + +def _fake_model_response(*, id="chat-123", choices=None) -> SimpleNamespace: + """Build a litellm-shaped object that exposes .model_dump() like a Pydantic model.""" + payload = { + "id": id, + "object": "chat.completion", + "model": "gpt-3.5-turbo", + "choices": choices + or [ + {"index": 0, "message": {"role": "assistant", "content": "hi"}, "finish_reason": "stop"}, + ], + "usage": {"prompt_tokens": 1, "completion_tokens": 1, "total_tokens": 2}, + } + return SimpleNamespace(model_dump=lambda: payload) + + @pytest.mark.asyncio async def test_chat_completions_routing_success(): - """ - Test the high-level routing logic. - """ - patch_path = "rock.sdk.model.server.api.proxy.perform_llm_request" - - with patch(patch_path, new_callable=AsyncMock) as mock_request: - mock_resp = MagicMock(spec=Response) - mock_resp.status_code = 200 - mock_resp.json.return_value = {"id": "chat-123", "choices": []} - mock_request.return_value = mock_resp + """Routing: model name maps to its proxy_rules entry, passed to litellm as api_base.""" + with patch(ACOMPLETION_PATCH, new_callable=AsyncMock) as mock_acompletion: + mock_acompletion.return_value = _fake_model_response() transport = ASGITransport(app=test_app) async with AsyncClient(transport=transport, base_url="http://test") as ac: @@ -43,51 +55,38 @@ async def test_chat_completions_routing_success(): response = await ac.post("/v1/chat/completions", json=payload) assert response.status_code == 200 - call_args = mock_request.call_args[0] - assert call_args[0] == "https://api.openai.com/v1/chat/completions" - assert mock_request.called + assert mock_acompletion.called + call_kwargs = mock_acompletion.call_args.kwargs + assert call_kwargs["api_base"] == "https://api.openai.com/v1" + assert call_kwargs["model"] == "openai/gpt-3.5-turbo" + assert call_kwargs["messages"] == [{"role": "user", "content": "hello"}] @pytest.mark.asyncio async def test_chat_completions_fallback_to_default_when_not_found(): - """ - Test that an unrecognized model name correctly falls back to the 'default' URL. - """ - patch_path = "rock.sdk.model.server.api.proxy.perform_llm_request" - - with patch(patch_path, new_callable=AsyncMock) as mock_request: - mock_resp = MagicMock(spec=Response) - mock_resp.status_code = 200 - mock_resp.json.return_value = {"id": "chat-fallback", "choices": []} - mock_request.return_value = mock_resp + """Unrecognized model name → falls back to the 'default' base URL.""" + with patch(ACOMPLETION_PATCH, new_callable=AsyncMock) as mock_acompletion: + mock_acompletion.return_value = _fake_model_response(id="chat-fallback") config = test_app.state.model_service_config default_base_url = config.proxy_rules["default"].rstrip("/") - expected_target_url = f"{default_base_url}/chat/completions" transport = ASGITransport(app=test_app) async with AsyncClient(transport=transport, base_url="http://test") as ac: payload = { - "model": "some-random-unsupported-model", # This model is NOT in proxy_rules + "model": "some-random-unsupported-model", "messages": [{"role": "user", "content": "hello"}], } response = await ac.post("/v1/chat/completions", json=payload) assert response.status_code == 200 - - # Verify that perform_llm_request was called with the DEFAULT URL - call_args = mock_request.call_args[0] - actual_url = call_args[0] - - assert actual_url == expected_target_url - assert mock_request.called + call_kwargs = mock_acompletion.call_args.kwargs + assert call_kwargs["api_base"] == default_base_url @pytest.mark.asyncio async def test_chat_completions_routing_absolute_fail(): - """ - Test that both the specific model and the 'default' rule are missing. - """ + """No matching rule and no 'default' → 400.""" empty_config = ModelServiceConfig() empty_config.proxy_rules = {} @@ -103,98 +102,143 @@ async def test_chat_completions_routing_absolute_fail(): @pytest.mark.asyncio -async def test_perform_llm_request_retry_on_whitelist(): - """ - Test that the proxy retries when receiving a whitelisted error code. - """ - client_post_path = "rock.sdk.model.server.api.proxy.http_client.post" - - # Patch asyncio.sleep inside the retry module to avoid actual waiting - with ( - patch(client_post_path, new_callable=AsyncMock) as mock_post, - patch("rock.utils.retry.asyncio.sleep", return_value=None), - ): - # 1. Setup Failed Response (429) - resp_429 = MagicMock(spec=Response) - resp_429.status_code = 429 - error_429 = HTTPStatusError("Rate Limited", request=MagicMock(spec=Request), response=resp_429) +async def test_proxy_base_url_overrides_proxy_rules(): + """When proxy_base_url is set, all requests go to that URL, ignoring proxy_rules.""" + config = ModelServiceConfig() + config.proxy_base_url = "https://custom-endpoint.example.com/v1" - # 2. Setup Success Response (200) - resp_200 = MagicMock(spec=Response) - resp_200.status_code = 200 - resp_200.json.return_value = {"ok": True} + local_app = FastAPI() + local_app.state.model_service_config = config + local_app.include_router(proxy_router) - # Sequence: Fail with 429, then Succeed with 200 - mock_post.side_effect = [error_429, resp_200] + with patch(ACOMPLETION_PATCH, new_callable=AsyncMock) as mock_acompletion: + mock_acompletion.return_value = _fake_model_response() - result = await perform_llm_request("http://fake.url", {}, {}, mock_config) + transport = ASGITransport(app=local_app) + async with AsyncClient(transport=transport, base_url="http://test") as ac: + payload = {"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "hello"}]} + response = await ac.post("/v1/chat/completions", json=payload) - assert result.status_code == 200 - assert mock_post.call_count == 2 + assert response.status_code == 200 + call_kwargs = mock_acompletion.call_args.kwargs + assert call_kwargs["api_base"] == "https://custom-endpoint.example.com/v1" @pytest.mark.asyncio -async def test_perform_llm_request_no_retry_on_non_whitelist(): - """ - Test that the proxy DOES NOT retry for non-retryable codes (e.g., 401). - It should return the error response immediately. - """ - client_post_path = "rock.sdk.model.server.api.proxy.http_client.post" +async def test_chat_completions_passes_num_retries_and_timeout(): + """num_retries and request_timeout from config flow through to litellm.acompletion.""" + config = ModelServiceConfig() + config.num_retries = 3 + config.request_timeout = 45 - with patch(client_post_path, new_callable=AsyncMock) as mock_post: - # Mock 401 Unauthorized (NOT in the retry whitelist) - resp_401 = MagicMock(spec=Response) - resp_401.status_code = 401 - resp_401.json.return_value = {"error": "Invalid API Key"} + local_app = FastAPI() + local_app.state.model_service_config = config + local_app.include_router(proxy_router) - # The function should return this response directly - mock_post.return_value = resp_401 + with patch(ACOMPLETION_PATCH, new_callable=AsyncMock) as mock_acompletion: + mock_acompletion.return_value = _fake_model_response() - result = await perform_llm_request("http://fake.url", {}, {}, mock_config) + transport = ASGITransport(app=local_app) + async with AsyncClient(transport=transport, base_url="http://test") as ac: + payload = {"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "hi"}]} + await ac.post("/v1/chat/completions", json=payload) - assert result.status_code == 401 - # Call count must be 1, meaning no retries were attempted - assert mock_post.call_count == 1 + call_kwargs = mock_acompletion.call_args.kwargs + assert call_kwargs["num_retries"] == 3 + assert call_kwargs["timeout"] == 45 @pytest.mark.asyncio -async def test_perform_llm_request_network_timeout_retry(): - """ - Test that network-level exceptions (like Timeout) also trigger retries. - """ - client_post_path = "rock.sdk.model.server.api.proxy.http_client.post" +async def test_chat_completions_litellm_error_returns_proxy_schema(): + """A litellm exception is converted to {error:{message,type,code}} JSON + so agent-side keyword detection (e.g. 'context length exceeded') keeps working.""" + from litellm.exceptions import BadRequestError + + err = BadRequestError( + message="context length exceeded for this model", + model="gpt-3.5-turbo", + llm_provider="openai", + ) - with ( - patch(client_post_path, new_callable=AsyncMock) as mock_post, - patch("rock.utils.retry.asyncio.sleep", return_value=None), - ): - resp_200 = MagicMock(spec=Response) - resp_200.status_code = 200 + with patch(ACOMPLETION_PATCH, new_callable=AsyncMock) as mock_acompletion: + mock_acompletion.side_effect = err - mock_post.side_effect = [httpx.TimeoutException("Network Timeout"), resp_200] + transport = ASGITransport(app=test_app) + async with AsyncClient(transport=transport, base_url="http://test") as ac: + payload = {"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "hello"}]} + response = await ac.post("/v1/chat/completions", json=payload) + + body = response.json() + assert "error" in body + assert "context length exceeded" in body["error"]["message"] + assert body["error"]["type"] == "BadRequestError" + assert body["error"]["code"] == response.status_code + + +@pytest.mark.asyncio +async def test_chat_completions_replay_mode_uses_traj_replay_provider(): + """In replay mode the proxy targets traj-replay/ instead of a real upstream.""" + config = ModelServiceConfig() + config.replay_traj_path = "/tmp/does-not-matter-for-this-test" + + local_app = FastAPI() + local_app.state.model_service_config = config + local_app.include_router(proxy_router) + + with patch(ACOMPLETION_PATCH, new_callable=AsyncMock) as mock_acompletion: + mock_acompletion.return_value = _fake_model_response() + + transport = ASGITransport(app=local_app) + async with AsyncClient(transport=transport, base_url="http://test") as ac: + payload = {"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "hi"}]} + response = await ac.post("/v1/chat/completions", json=payload) + + assert response.status_code == 200 + call_kwargs = mock_acompletion.call_args.kwargs + assert call_kwargs["model"] == "traj-replay/gpt-3.5-turbo" + assert call_kwargs["api_base"] is None + + +@pytest.mark.asyncio +async def test_chat_completions_strips_hop_by_hop_headers(): + """host / content-length / transfer-encoding etc. are not forwarded.""" + captured = {} - result = await perform_llm_request("http://fake.url", {}, {}, mock_config) + async def capture(*args, **kwargs): + captured.update(kwargs) + return _fake_model_response() - assert result.status_code == 200 - assert mock_post.call_count == 2 + with patch(ACOMPLETION_PATCH, new=capture): + transport = ASGITransport(app=test_app) + async with AsyncClient(transport=transport, base_url="http://test") as ac: + payload = {"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "hi"}]} + await ac.post( + "/v1/chat/completions", + json=payload, + headers={"Authorization": "Bearer abc", "X-Trace": "t1"}, + ) + + forwarded = captured["extra_headers"] + forwarded_lower = {k.lower() for k in forwarded} + assert "authorization" in forwarded_lower + assert "x-trace" in forwarded_lower + assert "host" not in forwarded_lower + assert "content-length" not in forwarded_lower + assert "content-type" not in forwarded_lower + assert "transfer-encoding" not in forwarded_lower @pytest.mark.asyncio async def test_lifespan_initialization_with_config(tmp_path): - """ - Test that the application correctly initializes and overrides defaults - when a valid configuration file path is provided. - """ + """Application initializes correctly when a valid config file is provided.""" conf_file = tmp_path / "proxy.yml" conf_file.write_text(yaml.dump({"proxy_rules": {"my-model": "http://custom-url"}, "request_timeout": 50})) - # Initialize App and load config from file config = ModelServiceConfig.from_file(str(conf_file)) app = FastAPI(lifespan=lambda app: lifespan(app, config)) async with lifespan(app, config): app_config = app.state.model_service_config - # Verify that the config reflects file content instead of defaults assert app_config.proxy_rules["my-model"] == "http://custom-url" assert app_config.request_timeout == 50 assert "gpt-3.5-turbo" not in app_config.proxy_rules @@ -202,67 +246,26 @@ async def test_lifespan_initialization_with_config(tmp_path): @pytest.mark.asyncio async def test_lifespan_initialization_no_config(): - """ - Test that the application initializes with default ModelServiceConfig - settings when no configuration file path is provided. - """ + """Defaults are loaded when no config file is provided.""" config = ModelServiceConfig() app = FastAPI(lifespan=lambda app: lifespan(app, config)) async with lifespan(app, config): app_config = app.state.model_service_config - # Verify that default rules (e.g., 'gpt-3.5-turbo') are loaded assert "gpt-3.5-turbo" in app_config.proxy_rules assert app_config.request_timeout == 120 @pytest.mark.asyncio async def test_lifespan_invalid_config_path(): - """ - Test that providing a non-existent configuration file path causes - ModelServiceConfig.from_file to raise a FileNotFoundError. - """ - # Expect FileNotFoundError when loading from non-existent file + """Non-existent config path → FileNotFoundError.""" with pytest.raises(FileNotFoundError): ModelServiceConfig.from_file("/tmp/non_existent_file.yml") -@pytest.mark.asyncio -async def test_proxy_base_url_overrides_proxy_rules(tmp_path): - """ - Test that when proxy_base_url is set, all requests are forwarded to that URL, - bypassing proxy_rules entirely. - """ - config = ModelServiceConfig() - config.proxy_base_url = "https://custom-endpoint.example.com/v1" - - test_app = FastAPI() - test_app.state.model_service_config = config - test_app.include_router(proxy_router) - - with patch("rock.sdk.model.server.api.proxy.perform_llm_request", new_callable=AsyncMock) as mock_request: - mock_resp = MagicMock(spec=Response) - mock_resp.status_code = 200 - mock_resp.json.return_value = {"id": "chat-123", "choices": []} - mock_request.return_value = mock_resp - - transport = ASGITransport(app=test_app) - async with AsyncClient(transport=transport, base_url="http://test") as ac: - # Even when requesting gpt-3.5-turbo, should forward to proxy_base_url - payload = {"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "hello"}]} - response = await ac.post("/v1/chat/completions", json=payload) - - assert response.status_code == 200 - # Verify request was sent to proxy_base_url - call_args = mock_request.call_args[0] - assert call_args[0] == "https://custom-endpoint.example.com/v1/chat/completions" - - @pytest.mark.asyncio async def test_config_loads_host_and_port_from_file(tmp_path): - """ - Test that ModelServiceConfig correctly loads host and port from config file. - """ + """ModelServiceConfig loads host and port from config file.""" conf_file = tmp_path / "proxy.yml" conf_file.write_text( yaml.dump({"host": "127.0.0.1", "port": 9000, "proxy_rules": {"my-model": "http://my-backend"}}) @@ -276,101 +279,59 @@ async def test_config_loads_host_and_port_from_file(tmp_path): def test_config_default_host_and_port(): - """ - Test default values for host and port. - """ config = ModelServiceConfig() - assert config.host == "0.0.0.0" assert config.port == 8080 @pytest.mark.asyncio async def test_config_loads_retryable_status_codes_from_file(tmp_path): - """ - Test that ModelServiceConfig correctly loads retryable_status_codes from config file. - """ conf_file = tmp_path / "proxy.yml" conf_file.write_text(yaml.dump({"retryable_status_codes": [429, 500, 502, 503]})) config = ModelServiceConfig.from_file(str(conf_file)) - assert config.retryable_status_codes == [429, 500, 502, 503] def test_config_default_retryable_status_codes(): - """ - Test default values for retryable_status_codes. - """ config = ModelServiceConfig() - assert config.retryable_status_codes == [429, 500] -@pytest.mark.asyncio -async def test_perform_llm_request_respects_custom_retryable_codes(): - """ - Test that custom retryable_status_codes are respected (502 retries, 401 does not). - """ +def test_config_default_traj_and_replay(): + """New traj/replay defaults: recording on (append=True), replay off.""" config = ModelServiceConfig() - config.retryable_status_codes = [502, 503, 504] # Custom retryable status codes - - client_post_path = "rock.sdk.model.server.api.proxy.http_client.post" - - with ( - patch(client_post_path, new_callable=AsyncMock) as mock_post, - patch("rock.utils.retry.asyncio.sleep", return_value=None), - ): - # 502 should retry (in custom list) - resp_502 = MagicMock(spec=Response) - resp_502.status_code = 502 - error_502 = HTTPStatusError("Bad Gateway", request=MagicMock(spec=Request), response=resp_502) - - resp_200 = MagicMock(spec=Response) - resp_200.status_code = 200 - resp_200.json.return_value = {"ok": True} - - # Sequence: 502 fail, then 200 success - mock_post.side_effect = [error_502, resp_200] - - result = await perform_llm_request("http://fake.url", {}, {}, config) - - assert result.status_code == 200 - assert mock_post.call_count == 2 + assert config.traj_enabled is True + assert config.traj_file is None + assert config.replay_traj_path is None + assert config.num_retries == 6 @pytest.mark.asyncio -async def test_perform_llm_request_non_retryable_code_not_retried(): - """ - Test that 401 (not in custom retryable_status_codes) does not trigger retry. - """ - config = ModelServiceConfig() - config.retryable_status_codes = [502, 503, 504] # Custom retryable status codes, excluding 401 - - client_post_path = "rock.sdk.model.server.api.proxy.http_client.post" - - with patch(client_post_path, new_callable=AsyncMock) as mock_post: - # 401 should not retry (not in custom list) - resp_401 = MagicMock(spec=Response) - resp_401.status_code = 401 - resp_401.json.return_value = {"error": "Invalid API Key"} - - mock_post.return_value = resp_401 - - result = await perform_llm_request("http://fake.url", {}, {}, config) +async def test_config_loads_traj_and_replay_from_file(tmp_path): + conf_file = tmp_path / "proxy.yml" + conf_file.write_text( + yaml.dump( + { + "traj_enabled": False, + "traj_file": "/tmp/my-traj.jsonl", + "replay_traj_path": "/tmp/in.jsonl", + "num_retries": 2, + } + ) + ) - assert result.status_code == 401 - assert mock_post.call_count == 1 # No retry + config = ModelServiceConfig.from_file(str(conf_file)) + assert config.traj_enabled is False + assert config.traj_file == "/tmp/my-traj.jsonl" + assert config.replay_traj_path == "/tmp/in.jsonl" + assert config.num_retries == 2 def test_cli_args_override_config_file(tmp_path): - """ - Test that CLI arguments override config file settings. - This tests the logic in create_config_from_args(). - """ + """CLI arguments override config file settings.""" import argparse - # Create args with config file and CLI parameters conf_file = tmp_path / "proxy.yml" conf_file.write_text( yaml.dump( @@ -386,28 +347,47 @@ def test_cli_args_override_config_file(tmp_path): args = argparse.Namespace( config_file=str(conf_file), - host="0.0.0.0", # CLI overrides config file - port=9000, # CLI overrides config file - proxy_base_url="https://cli-url.example.com/v1", # CLI overrides config file - retryable_status_codes="502,503", # CLI overrides config file - request_timeout=30, # CLI overrides config file + host="0.0.0.0", + port=9000, + proxy_base_url="https://cli-url.example.com/v1", + retryable_status_codes="502,503", + request_timeout=30, + num_retries=4, + traj_file=None, ) config = create_config_from_args(args) - # Verify CLI arguments override config file assert config.host == "0.0.0.0" assert config.port == 9000 assert config.proxy_base_url == "https://cli-url.example.com/v1" assert config.retryable_status_codes == [502, 503] assert config.request_timeout == 30 + assert config.num_retries == 4 + + +def test_cli_traj_file_enables_replay(): + """--traj-file sets replay_enabled, replay_traj_path, and disables recording.""" + import argparse + + args = argparse.Namespace( + config_file=None, + host=None, + port=None, + proxy_base_url=None, + retryable_status_codes=None, + request_timeout=None, + num_retries=None, + traj_file="/tmp/in.jsonl", + ) + + config = create_config_from_args(args) + assert config.replay_traj_path == "/tmp/in.jsonl" + assert config.traj_enabled is False @pytest.mark.asyncio async def test_config_file_overrides_defaults(tmp_path): - """ - Test that config file values override default values. - """ conf_file = tmp_path / "proxy.yml" conf_file.write_text( yaml.dump( @@ -422,27 +402,20 @@ async def test_config_file_overrides_defaults(tmp_path): config = ModelServiceConfig.from_file(str(conf_file)) - # Verify config file overrides defaults assert config.host == "10.0.0.1" assert config.port == 8888 assert config.request_timeout == 300 assert config.proxy_rules["test-model"] == "http://test-backend" - # Verify other fields remain as defaults assert config.proxy_base_url is None def test_metrics_monitor_is_singleton(): - """ - Test that _get_or_create_metrics_monitor returns the same instance - on repeated calls (module-level singleton, created only once). - """ + """_get_or_create_metrics_monitor returns the same instance on repeated calls.""" import rock.sdk.model.server.utils as utils_module with patch("rock.sdk.model.server.utils.MetricsMonitor") as mock_cls: mock_monitor = MagicMock() mock_cls.create.return_value = mock_monitor - - # Reset singleton so the test is isolated utils_module._metrics_monitor = None first = _get_or_create_metrics_monitor() @@ -450,15 +423,11 @@ def test_metrics_monitor_is_singleton(): assert first is second assert mock_cls.create.call_count == 1 - - # Cleanup utils_module._metrics_monitor = None def test_metrics_monitor_uses_env_endpoint(): - """ - Test that ROCK_METRICS_ENDPOINT env var is passed to MetricsMonitor.create(). - """ + """ROCK_METRICS_ENDPOINT env var is passed to MetricsMonitor.create().""" import rock.sdk.model.server.utils as utils_module custom_endpoint = "http://my-otel-collector:4318/v1/metrics" @@ -469,26 +438,19 @@ def test_metrics_monitor_uses_env_endpoint(): ): mock_monitor = MagicMock() mock_cls.create.return_value = mock_monitor - utils_module._metrics_monitor = None _get_or_create_metrics_monitor() - mock_cls.create.assert_called_once_with(metrics_endpoint=custom_endpoint) - utils_module._metrics_monitor = None def test_metrics_monitor_registers_gauge_and_counter(): - """ - Test that _get_or_create_metrics_monitor registers both - the RT gauge and request count counter on first creation. - """ + """_get_or_create_metrics_monitor registers both metrics on first creation.""" import rock.sdk.model.server.utils as utils_module with patch("rock.sdk.model.server.utils.MetricsMonitor") as mock_cls: mock_monitor = MagicMock() mock_cls.create.return_value = mock_monitor - utils_module._metrics_monitor = None _get_or_create_metrics_monitor() @@ -498,16 +460,12 @@ def test_metrics_monitor_registers_gauge_and_counter(): mock_monitor._register_counter.assert_called_once_with( MODEL_SERVICE_REQUEST_COUNT, "total request count", "count" ) - utils_module._metrics_monitor = None @pytest.mark.asyncio async def test_record_traj_reports_rt_and_count(): - """ - Test that record_traj decorator calls record_gauge_by_name (RT) - and record_counter_by_name (count) with correct metric names and attributes. - """ + """Legacy record_traj decorator (still used by local mode) reports RT/count.""" import rock.sdk.model.server.utils as utils_module mock_monitor = MagicMock() @@ -542,15 +500,12 @@ async def fake_handler(body: dict): @pytest.mark.asyncio async def test_record_traj_sandbox_id_defaults_to_unknown(): - """ - Test that sandbox_id defaults to 'unknown' when ROCK_SANDBOX_ID is not set. - """ + """sandbox_id defaults to 'unknown' when ROCK_SANDBOX_ID is not set.""" import rock.sdk.model.server.utils as utils_module mock_monitor = MagicMock() with patch("rock.sdk.model.server.utils.MetricsMonitor") as mock_cls, patch.dict("os.environ", {}, clear=False): - # Ensure ROCK_SANDBOX_ID is not set os_env = __import__("os").environ os_env.pop("ROCK_SANDBOX_ID", None) diff --git a/tests/unit/sdk/model/test_traj_recorder.py b/tests/unit/sdk/model/test_traj_recorder.py new file mode 100644 index 0000000000..c9b1c20197 --- /dev/null +++ b/tests/unit/sdk/model/test_traj_recorder.py @@ -0,0 +1,170 @@ +"""Tests for TrajectoryRecorder (litellm CustomLogger that writes JSONL + emits OTLP metrics).""" + +import json +from unittest.mock import MagicMock, patch + +import pytest + +from rock.sdk.model.server.integrations.traj_recorder import TrajectoryRecorder + + +def _sample_payload(**overrides): + payload = { + "id": "chatcmpl-abc", + "trace_id": "trace-1", + "call_type": "acompletion", + "stream": False, + "status": "success", + "model": "gpt-3.5-turbo", + "model_id": None, + "model_group": None, + "api_base": "https://api.openai.com/v1", + "messages": [{"role": "user", "content": "hi"}], + "response": { + "id": "chatcmpl-abc", + "choices": [ + { + "index": 0, + "message": {"role": "assistant", "content": "hello back"}, + "finish_reason": "stop", + } + ], + }, + "model_parameters": {"temperature": 0.7}, + "startTime": 100.0, + "endTime": 100.5, + "completionStartTime": 100.5, + "response_time": 0.5, + "total_tokens": 12, + "prompt_tokens": 4, + "completion_tokens": 8, + "metadata": {}, + } + payload.update(overrides) + return payload + + +@pytest.fixture +def mock_monitor(): + monitor = MagicMock() + with patch( + "rock.sdk.model.server.integrations.traj_recorder._get_or_create_metrics_monitor", + return_value=monitor, + ): + yield monitor + + +@pytest.mark.asyncio +async def test_recorder_appends_each_call_as_jsonl_line(tmp_path, mock_monitor): + """Each successful call adds one JSONL line (always append-only).""" + traj_file = tmp_path / "traj.jsonl" + recorder = TrajectoryRecorder(traj_file=traj_file) + + payload_a = _sample_payload(id="a", trace_id="run-1") + payload_b = _sample_payload(id="b", trace_id="run-1") + + await recorder.async_log_success_event( + kwargs={"standard_logging_object": payload_a}, response_obj=None, start_time=0, end_time=1 + ) + await recorder.async_log_success_event( + kwargs={"standard_logging_object": payload_b}, response_obj=None, start_time=0, end_time=1 + ) + + lines = traj_file.read_text(encoding="utf-8").strip().split("\n") + assert len(lines) == 2 + assert json.loads(lines[0])["id"] == "a" + assert json.loads(lines[1])["id"] == "b" + + +@pytest.mark.asyncio +async def test_recorder_emits_metrics_with_sandbox_id(tmp_path, mock_monitor): + traj_file = tmp_path / "traj.jsonl" + recorder = TrajectoryRecorder(traj_file=traj_file) + + with patch.dict("os.environ", {"ROCK_SANDBOX_ID": "sandbox-xyz"}): + await recorder.async_log_success_event( + kwargs={"standard_logging_object": _sample_payload()}, + response_obj=None, + start_time=0, + end_time=1, + ) + + mock_monitor.record_gauge_by_name.assert_called_once() + gauge_args = mock_monitor.record_gauge_by_name.call_args + assert gauge_args.args[0] == "model_service.request.rt" + # response_time of 0.5s → 500 ms + assert gauge_args.args[1] == 500.0 + assert gauge_args.kwargs["attributes"]["status"] == "success" + assert gauge_args.kwargs["attributes"]["sandbox_id"] == "sandbox-xyz" + assert gauge_args.kwargs["attributes"]["type"] == "chat_completions" + + mock_monitor.record_counter_by_name.assert_called_once_with( + "model_service.request.count", 1, attributes=gauge_args.kwargs["attributes"] + ) + + +@pytest.mark.asyncio +async def test_recorder_records_failure_with_failure_status(tmp_path, mock_monitor): + traj_file = tmp_path / "traj.jsonl" + recorder = TrajectoryRecorder(traj_file=traj_file) + + failed_payload = _sample_payload(status="failure", error_information={"error_class": "RateLimitError"}) + + await recorder.async_log_failure_event( + kwargs={"standard_logging_object": failed_payload}, + response_obj=None, + start_time=0, + end_time=1, + ) + + lines = traj_file.read_text(encoding="utf-8").strip().split("\n") + assert len(lines) == 1 + assert json.loads(lines[0])["status"] == "failure" + + gauge_args = mock_monitor.record_gauge_by_name.call_args + assert gauge_args.kwargs["attributes"]["status"] == "failure" + + +@pytest.mark.asyncio +async def test_recorder_skips_when_payload_missing(tmp_path, mock_monitor): + """If litellm doesn't attach a standard_logging_object, the recorder no-ops.""" + traj_file = tmp_path / "traj.jsonl" + recorder = TrajectoryRecorder(traj_file=traj_file) + + await recorder.async_log_success_event(kwargs={}, response_obj=None, start_time=0, end_time=1) + + assert not traj_file.exists() or traj_file.read_text() == "" + mock_monitor.record_gauge_by_name.assert_not_called() + mock_monitor.record_counter_by_name.assert_not_called() + + +@pytest.mark.asyncio +async def test_recorder_creates_parent_directory(tmp_path, mock_monitor): + traj_file = tmp_path / "deep" / "nested" / "traj.jsonl" + + recorder = TrajectoryRecorder(traj_file=traj_file) + await recorder.async_log_success_event( + kwargs={"standard_logging_object": _sample_payload()}, + response_obj=None, + start_time=0, + end_time=1, + ) + + assert traj_file.exists() + assert traj_file.parent.is_dir() + + +@pytest.mark.asyncio +async def test_recorder_falls_back_to_start_end_time_when_response_time_missing(tmp_path, mock_monitor): + traj_file = tmp_path / "traj.jsonl" + recorder = TrajectoryRecorder(traj_file=traj_file) + + payload = _sample_payload(startTime=10.0, endTime=10.25) + payload.pop("response_time", None) + + await recorder.async_log_success_event( + kwargs={"standard_logging_object": payload}, response_obj=None, start_time=0, end_time=1 + ) + + gauge_args = mock_monitor.record_gauge_by_name.call_args + assert abs(gauge_args.args[1] - 250.0) < 1e-6 diff --git a/tests/unit/sdk/model/test_traj_replayer.py b/tests/unit/sdk/model/test_traj_replayer.py new file mode 100644 index 0000000000..7bfe30ef4e --- /dev/null +++ b/tests/unit/sdk/model/test_traj_replayer.py @@ -0,0 +1,204 @@ +"""Tests for SequentialCursor + TrajectoryReplayer.""" + +import json +from types import SimpleNamespace + +import pytest +from litellm.llms.custom_llm import CustomLLMError + +from rock.sdk.model.server.integrations.traj_replayer import ( + SequentialCursor, + TrajectoryReplayer, +) + + +def _record(*, msg: str, model: str = "gpt-3.5-turbo", call_id: str = "x") -> dict: + """Build a minimal StandardLoggingPayload-shaped record.""" + return { + "id": call_id, + "model": model, + "messages": [{"role": "user", "content": msg}], + "response": { + "id": call_id, + "object": "chat.completion", + "model": model, + "choices": [ + { + "index": 0, + "message": {"role": "assistant", "content": f"reply: {msg}"}, + "finish_reason": "stop", + } + ], + "usage": {"prompt_tokens": 1, "completion_tokens": 1, "total_tokens": 2}, + }, + } + + +def _write_jsonl(path, records): + with path.open("w", encoding="utf-8") as f: + for r in records: + f.write(json.dumps(r) + "\n") + + +# ----- SequentialCursor ----- + + +def test_cursor_load_from_single_file(tmp_path): + p = tmp_path / "traj.jsonl" + _write_jsonl(p, [_record(msg="a"), _record(msg="b")]) + + cur = SequentialCursor.load(p) + assert cur.total == 2 + assert cur.position == 0 + + +def test_cursor_load_skips_empty_lines(tmp_path): + p = tmp_path / "traj.jsonl" + p.write_text( + json.dumps(_record(msg="a")) + "\n\n \n" + json.dumps(_record(msg="b")) + "\n", + encoding="utf-8", + ) + + cur = SequentialCursor.load(p) + assert cur.total == 2 + + +def test_cursor_load_missing_file_raises(tmp_path): + with pytest.raises(FileNotFoundError): + SequentialCursor.load(tmp_path / "missing.jsonl") + + +def test_cursor_load_directory_raises(tmp_path): + """A directory is no longer a valid traj_file — must point to a single .jsonl.""" + with pytest.raises(FileNotFoundError): + SequentialCursor.load(tmp_path) + + +@pytest.mark.asyncio +async def test_cursor_next_returns_records_in_order(tmp_path): + p = tmp_path / "traj.jsonl" + _write_jsonl(p, [_record(msg="a", call_id="1"), _record(msg="b", call_id="2")]) + + cur = SequentialCursor.load(p) + first = await cur.next() + second = await cur.next() + + assert first["id"] == "1" + assert second["id"] == "2" + assert cur.position == 2 + + +@pytest.mark.asyncio +async def test_cursor_next_raises_when_exhausted(tmp_path): + p = tmp_path / "traj.jsonl" + _write_jsonl(p, [_record(msg="only")]) + + cur = SequentialCursor.load(p) + await cur.next() + + with pytest.raises(CustomLLMError) as exc_info: + await cur.next() + assert exc_info.value.status_code == 404 + assert "exhausted" in exc_info.value.message + + +@pytest.mark.asyncio +async def test_cursor_reset_replays_from_start(tmp_path): + p = tmp_path / "traj.jsonl" + _write_jsonl(p, [_record(msg="a"), _record(msg="b")]) + + cur = SequentialCursor.load(p) + await cur.next() + await cur.next() + cur.reset() + + again = await cur.next() + assert again["messages"][0]["content"] == "a" + + +@pytest.mark.asyncio +async def test_cursor_model_mismatch_only_warns(tmp_path, caplog): + p = tmp_path / "traj.jsonl" + _write_jsonl(p, [_record(msg="a", model="gpt-3.5-turbo")]) + + cur = SequentialCursor.load(p) + record = await cur.next(expected_model="gpt-4o") # different model -> warn but don't raise + assert record["id"] == "x" + + +# ----- TrajectoryReplayer ----- + + +@pytest.mark.asyncio +async def test_replayer_acompletion_returns_recorded_response(tmp_path): + p = tmp_path / "traj.jsonl" + _write_jsonl(p, [_record(msg="a", call_id="step-1")]) + + replayer = TrajectoryReplayer(p) + response = await replayer.acompletion( + model="gpt-3.5-turbo", + messages=[{"role": "user", "content": "anything"}], + ) + + assert response.id == "step-1" + assert response.choices[0].message.content == "reply: a" + + +@pytest.mark.asyncio +async def test_replayer_acompletion_advances_cursor(tmp_path): + p = tmp_path / "traj.jsonl" + _write_jsonl( + p, + [ + _record(msg="a", call_id="step-1"), + _record(msg="b", call_id="step-2"), + ], + ) + + replayer = TrajectoryReplayer(p) + r1 = await replayer.acompletion(model="gpt-3.5-turbo", messages=[]) + r2 = await replayer.acompletion(model="gpt-3.5-turbo", messages=[]) + + assert r1.id == "step-1" + assert r2.id == "step-2" + + +@pytest.mark.asyncio +async def test_replayer_astreaming_yields_chunks_that_recompose_the_text(tmp_path): + """The chunks produced by astreaming should reassemble into the recorded text.""" + p = tmp_path / "traj.jsonl" + recorded_text = "Hello world, this is a deterministic replay." + record = _record(msg="hi") + record["response"]["choices"][0]["message"]["content"] = recorded_text + _write_jsonl(p, [record]) + + replayer = TrajectoryReplayer(p) + + # Build a litellm-shaped ModelResponse mock with one Choice/Delta slot. + fake_choice = SimpleNamespace(delta=SimpleNamespace(role=None, content=None), index=0) + fake_response = SimpleNamespace(choices=[fake_choice]) + + chunks_text = [] + async for chunk in replayer.astreaming( + model="gpt-3.5-turbo", + messages=[], + model_response=fake_response, + ): + if hasattr(chunk, "choices") and chunk.choices and getattr(chunk.choices[0], "delta", None): + piece = chunk.choices[0].delta.content + if piece: + chunks_text.append(piece) + + assert "".join(chunks_text) == recorded_text + + +@pytest.mark.asyncio +async def test_replayer_acompletion_raises_on_exhaustion(tmp_path): + p = tmp_path / "traj.jsonl" + _write_jsonl(p, [_record(msg="only")]) + + replayer = TrajectoryReplayer(p) + await replayer.acompletion(model="gpt-3.5-turbo", messages=[]) + + with pytest.raises(CustomLLMError): + await replayer.acompletion(model="gpt-3.5-turbo", messages=[]) diff --git a/uv.lock b/uv.lock index e00a7f86b3..6a3efc2ba4 100644 --- a/uv.lock +++ b/uv.lock @@ -1196,6 +1196,15 @@ wheels = [ { url = "https://mirrors.aliyun.com/pypi/packages/33/6b/e0547afaf41bf2c42e52430072fa5658766e3d65bd4b03a563d1b6336f57/distlib-0.4.0-py2.py3-none-any.whl", hash = "sha256:9659f7d87e46584a30b5780e43ac7a2143098441670ff0a49d5f9034c54a6c16" }, ] +[[package]] +name = "distro" +version = "1.9.0" +source = { registry = "https://mirrors.aliyun.com/pypi/simple/" } +sdist = { url = "https://mirrors.aliyun.com/pypi/packages/fc/f8/98eea607f65de6527f8a2e8885fc8015d3e6f5775df186e443e0964a11c3/distro-1.9.0.tar.gz", hash = "sha256:2fa77c6fd8940f116ee1d6b94a2f90b13b5ea8d019b98bc8bafdcabcdd9bdbed" } +wheels = [ + { url = "https://mirrors.aliyun.com/pypi/packages/12/b3/231ffd4ab1fc9d679809f356cebee130ac7daa00d6d6f3206dd4fd137e9e/distro-1.9.0-py3-none-any.whl", hash = "sha256:7bffd925d65168f85027d8da9af6bddab658135b840670a223589bc0c8ef02b2" }, +] + [[package]] name = "docker" version = "7.1.0" @@ -1294,6 +1303,69 @@ wheels = [ { url = "https://mirrors.aliyun.com/pypi/packages/2e/7a/c11883a98676e74a405d6503d65f58c3fa076ddd9c0cee6044884f6eac38/fastcore-1.8.15-py3-none-any.whl", hash = "sha256:d005d10d7ee5c2abb7ac0544da7c9f0a0a2f7706b48892a27c1906487ca6dea9" }, ] +[[package]] +name = "fastuuid" +version = "0.14.0" +source = { registry = "https://mirrors.aliyun.com/pypi/simple/" } +sdist = { url = "https://mirrors.aliyun.com/pypi/packages/c3/7d/d9daedf0f2ebcacd20d599928f8913e9d2aea1d56d2d355a93bfa2b611d7/fastuuid-0.14.0.tar.gz", hash = "sha256:178947fc2f995b38497a74172adee64fdeb8b7ec18f2a5934d037641ba265d26" } +wheels = [ + { url = "https://mirrors.aliyun.com/pypi/packages/ad/b2/731a6696e37cd20eed353f69a09f37a984a43c9713764ee3f7ad5f57f7f9/fastuuid-0.14.0-cp310-cp310-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl", hash = "sha256:6e6243d40f6c793c3e2ee14c13769e341b90be5ef0c23c82fa6515a96145181a" }, + { url = "https://mirrors.aliyun.com/pypi/packages/c5/79/c73c47be2a3b8734d16e628982653517f80bbe0570e27185d91af6096507/fastuuid-0.14.0-cp310-cp310-macosx_10_12_x86_64.whl", hash = "sha256:13ec4f2c3b04271f62be2e1ce7e95ad2dd1cf97e94503a3760db739afbd48f00" }, + { url = "https://mirrors.aliyun.com/pypi/packages/24/c5/84c1eea05977c8ba5173555b0133e3558dc628bcf868d6bf1689ff14aedc/fastuuid-0.14.0-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:b2fdd48b5e4236df145a149d7125badb28e0a383372add3fbaac9a6b7a394470" }, + { url = "https://mirrors.aliyun.com/pypi/packages/0e/23/4e362367b7fa17dbed646922f216b9921efb486e7abe02147e4b917359f8/fastuuid-0.14.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:f74631b8322d2780ebcf2d2d75d58045c3e9378625ec51865fe0b5620800c39d" }, + { url = "https://mirrors.aliyun.com/pypi/packages/b2/72/3985be633b5a428e9eaec4287ed4b873b7c4c53a9639a8b416637223c4cd/fastuuid-0.14.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:83cffc144dc93eb604b87b179837f2ce2af44871a7b323f2bfed40e8acb40ba8" }, + { url = "https://mirrors.aliyun.com/pypi/packages/b3/6d/6ef192a6df34e2266d5c9deb39cd3eea986df650cbcfeaf171aa52a059c3/fastuuid-0.14.0-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:1a771f135ab4523eb786e95493803942a5d1fc1610915f131b363f55af53b219" }, + { url = "https://mirrors.aliyun.com/pypi/packages/9d/11/8a2ea753c68d4fece29d5d7c6f3f903948cc6e82d1823bc9f7f7c0355db3/fastuuid-0.14.0-cp310-cp310-musllinux_1_1_aarch64.whl", hash = "sha256:4edc56b877d960b4eda2c4232f953a61490c3134da94f3c28af129fb9c62a4f6" }, + { url = "https://mirrors.aliyun.com/pypi/packages/23/42/7a32c93b6ce12642d9a152ee4753a078f372c9ebb893bc489d838dd4afd5/fastuuid-0.14.0-cp310-cp310-musllinux_1_1_i686.whl", hash = "sha256:bcc96ee819c282e7c09b2eed2b9bd13084e3b749fdb2faf58c318d498df2efbe" }, + { url = "https://mirrors.aliyun.com/pypi/packages/b9/e9/a5f6f686b46e3ed4ed3b93770111c233baac87dd6586a411b4988018ef1d/fastuuid-0.14.0-cp310-cp310-musllinux_1_1_x86_64.whl", hash = "sha256:7a3c0bca61eacc1843ea97b288d6789fbad7400d16db24e36a66c28c268cfe3d" }, + { url = "https://mirrors.aliyun.com/pypi/packages/b4/c9/18abc73c9c5b7fc0e476c1733b678783b2e8a35b0be9babd423571d44e98/fastuuid-0.14.0-cp310-cp310-win32.whl", hash = "sha256:7f2f3efade4937fae4e77efae1af571902263de7b78a0aee1a1653795a093b2a" }, + { url = "https://mirrors.aliyun.com/pypi/packages/5e/8a/d9e33f4eb4d4f6d9f2c5c7d7e96b5cdbb535c93f3b1ad6acce97ee9d4bf8/fastuuid-0.14.0-cp310-cp310-win_amd64.whl", hash = "sha256:ae64ba730d179f439b0736208b4c279b8bc9c089b102aec23f86512ea458c8a4" }, + { url = "https://mirrors.aliyun.com/pypi/packages/98/f3/12481bda4e5b6d3e698fbf525df4443cc7dce746f246b86b6fcb2fba1844/fastuuid-0.14.0-cp311-cp311-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl", hash = "sha256:73946cb950c8caf65127d4e9a325e2b6be0442a224fd51ba3b6ac44e1912ce34" }, + { url = "https://mirrors.aliyun.com/pypi/packages/59/19/2fc58a1446e4d72b655648eb0879b04e88ed6fa70d474efcf550f640f6ec/fastuuid-0.14.0-cp311-cp311-macosx_10_12_x86_64.whl", hash = "sha256:12ac85024637586a5b69645e7ed986f7535106ed3013640a393a03e461740cb7" }, + { url = "https://mirrors.aliyun.com/pypi/packages/78/29/3c74756e5b02c40cfcc8b1d8b5bac4edbd532b55917a6bcc9113550e99d1/fastuuid-0.14.0-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:05a8dde1f395e0c9b4be515b7a521403d1e8349443e7641761af07c7ad1624b1" }, + { url = "https://mirrors.aliyun.com/pypi/packages/52/96/d761da3fccfa84f0f353ce6e3eb8b7f76b3aa21fd25e1b00a19f9c80a063/fastuuid-0.14.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:09378a05020e3e4883dfdab438926f31fea15fd17604908f3d39cbeb22a0b4dc" }, + { url = "https://mirrors.aliyun.com/pypi/packages/fc/c2/f84c90167cc7765cb82b3ff7808057608b21c14a38531845d933a4637307/fastuuid-0.14.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:bbb0c4b15d66b435d2538f3827f05e44e2baafcc003dd7d8472dc67807ab8fd8" }, + { url = "https://mirrors.aliyun.com/pypi/packages/af/7b/4bacd03897b88c12348e7bd77943bac32ccf80ff98100598fcff74f75f2e/fastuuid-0.14.0-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:cd5a7f648d4365b41dbf0e38fe8da4884e57bed4e77c83598e076ac0c93995e7" }, + { url = "https://mirrors.aliyun.com/pypi/packages/c0/a2/584f2c29641df8bd810d00c1f21d408c12e9ad0c0dafdb8b7b29e5ddf787/fastuuid-0.14.0-cp311-cp311-musllinux_1_1_aarch64.whl", hash = "sha256:c0a94245afae4d7af8c43b3159d5e3934c53f47140be0be624b96acd672ceb73" }, + { url = "https://mirrors.aliyun.com/pypi/packages/24/68/c6b77443bb7764c760e211002c8638c0c7cce11cb584927e723215ba1398/fastuuid-0.14.0-cp311-cp311-musllinux_1_1_i686.whl", hash = "sha256:2b29e23c97e77c3a9514d70ce343571e469098ac7f5a269320a0f0b3e193ab36" }, + { url = "https://mirrors.aliyun.com/pypi/packages/5a/87/93f553111b33f9bb83145be12868c3c475bf8ea87c107063d01377cc0e8e/fastuuid-0.14.0-cp311-cp311-musllinux_1_1_x86_64.whl", hash = "sha256:1e690d48f923c253f28151b3a6b4e335f2b06bf669c68a02665bc150b7839e94" }, + { url = "https://mirrors.aliyun.com/pypi/packages/9e/8c/a04d486ca55b5abb7eaa65b39df8d891b7b1635b22db2163734dc273579a/fastuuid-0.14.0-cp311-cp311-win32.whl", hash = "sha256:a6f46790d59ab38c6aa0e35c681c0484b50dc0acf9e2679c005d61e019313c24" }, + { url = "https://mirrors.aliyun.com/pypi/packages/9c/b2/2d40bf00820de94b9280366a122cbaa60090c8cf59e89ac3938cf5d75895/fastuuid-0.14.0-cp311-cp311-win_amd64.whl", hash = "sha256:e150eab56c95dc9e3fefc234a0eedb342fac433dacc273cd4d150a5b0871e1fa" }, + { url = "https://mirrors.aliyun.com/pypi/packages/02/a2/e78fcc5df65467f0d207661b7ef86c5b7ac62eea337c0c0fcedbeee6fb13/fastuuid-0.14.0-cp312-cp312-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl", hash = "sha256:77e94728324b63660ebf8adb27055e92d2e4611645bf12ed9d88d30486471d0a" }, + { url = "https://mirrors.aliyun.com/pypi/packages/2b/b3/c846f933f22f581f558ee63f81f29fa924acd971ce903dab1a9b6701816e/fastuuid-0.14.0-cp312-cp312-macosx_10_12_x86_64.whl", hash = "sha256:caa1f14d2102cb8d353096bc6ef6c13b2c81f347e6ab9d6fbd48b9dea41c153d" }, + { url = "https://mirrors.aliyun.com/pypi/packages/54/ea/682551030f8c4fa9a769d9825570ad28c0c71e30cf34020b85c1f7ee7382/fastuuid-0.14.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:d23ef06f9e67163be38cece704170486715b177f6baae338110983f99a72c070" }, + { url = "https://mirrors.aliyun.com/pypi/packages/14/dd/5927f0a523d8e6a76b70968e6004966ee7df30322f5fc9b6cdfb0276646a/fastuuid-0.14.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:0c9ec605ace243b6dbe3bd27ebdd5d33b00d8d1d3f580b39fdd15cd96fd71796" }, + { url = "https://mirrors.aliyun.com/pypi/packages/16/6e/c0fb547eef61293153348f12e0f75a06abb322664b34a1573a7760501336/fastuuid-0.14.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:808527f2407f58a76c916d6aa15d58692a4a019fdf8d4c32ac7ff303b7d7af09" }, + { url = "https://mirrors.aliyun.com/pypi/packages/2d/b1/b9c75e03b768f61cf2e84ee193dc18601aeaf89a4684b20f2f0e9f52b62c/fastuuid-0.14.0-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:2fb3c0d7fef6674bbeacdd6dbd386924a7b60b26de849266d1ff6602937675c8" }, + { url = "https://mirrors.aliyun.com/pypi/packages/fc/fa/f7395fdac07c7a54f18f801744573707321ca0cee082e638e36452355a9d/fastuuid-0.14.0-cp312-cp312-musllinux_1_1_aarch64.whl", hash = "sha256:ab3f5d36e4393e628a4df337c2c039069344db5f4b9d2a3c9cea48284f1dd741" }, + { url = "https://mirrors.aliyun.com/pypi/packages/66/49/c9fd06a4a0b1f0f048aacb6599e7d96e5d6bc6fa680ed0d46bf111929d1b/fastuuid-0.14.0-cp312-cp312-musllinux_1_1_i686.whl", hash = "sha256:b9a0ca4f03b7e0b01425281ffd44e99d360e15c895f1907ca105854ed85e2057" }, + { url = "https://mirrors.aliyun.com/pypi/packages/be/9c/909e8c95b494e8e140e8be6165d5fc3f61fdc46198c1554df7b3e1764471/fastuuid-0.14.0-cp312-cp312-musllinux_1_1_x86_64.whl", hash = "sha256:3acdf655684cc09e60fb7e4cf524e8f42ea760031945aa8086c7eae2eeeabeb8" }, + { url = "https://mirrors.aliyun.com/pypi/packages/90/eb/d29d17521976e673c55ef7f210d4cdd72091a9ec6755d0fd4710d9b3c871/fastuuid-0.14.0-cp312-cp312-win32.whl", hash = "sha256:9579618be6280700ae36ac42c3efd157049fe4dd40ca49b021280481c78c3176" }, + { url = "https://mirrors.aliyun.com/pypi/packages/cc/fc/f5c799a6ea6d877faec0472d0b27c079b47c86b1cdc577720a5386483b36/fastuuid-0.14.0-cp312-cp312-win_amd64.whl", hash = "sha256:d9e4332dc4ba054434a9594cbfaf7823b57993d7d8e7267831c3e059857cf397" }, + { url = "https://mirrors.aliyun.com/pypi/packages/a5/83/ae12dd39b9a39b55d7f90abb8971f1a5f3c321fd72d5aa83f90dc67fe9ed/fastuuid-0.14.0-cp313-cp313-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl", hash = "sha256:77a09cb7427e7af74c594e409f7731a0cf887221de2f698e1ca0ebf0f3139021" }, + { url = "https://mirrors.aliyun.com/pypi/packages/53/b0/a4b03ff5d00f563cc7546b933c28cb3f2a07344b2aec5834e874f7d44143/fastuuid-0.14.0-cp313-cp313-macosx_10_12_x86_64.whl", hash = "sha256:9bd57289daf7b153bfa3e8013446aa144ce5e8c825e9e366d455155ede5ea2dc" }, + { url = "https://mirrors.aliyun.com/pypi/packages/9c/6d/64aee0a0f6a58eeabadd582e55d0d7d70258ffdd01d093b30c53d668303b/fastuuid-0.14.0-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:ac60fc860cdf3c3f327374db87ab8e064c86566ca8c49d2e30df15eda1b0c2d5" }, + { url = "https://mirrors.aliyun.com/pypi/packages/60/f5/a7e9cda8369e4f7919d36552db9b2ae21db7915083bc6336f1b0082c8b2e/fastuuid-0.14.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:ab32f74bd56565b186f036e33129da77db8be09178cd2f5206a5d4035fb2a23f" }, + { url = "https://mirrors.aliyun.com/pypi/packages/f0/d3/8ce11827c783affffd5bd4d6378b28eb6cc6d2ddf41474006b8d62e7448e/fastuuid-0.14.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:33e678459cf4addaedd9936bbb038e35b3f6b2061330fd8f2f6a1d80414c0f87" }, + { url = "https://mirrors.aliyun.com/pypi/packages/a2/51/680fb6352d0bbade04036da46264a8001f74b7484e2fd1f4da9e3db1c666/fastuuid-0.14.0-cp313-cp313-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:1e3cc56742f76cd25ecb98e4b82a25f978ccffba02e4bdce8aba857b6d85d87b" }, + { url = "https://mirrors.aliyun.com/pypi/packages/fa/7c/2014b5785bd8ebdab04ec857635ebd84d5ee4950186a577db9eff0fb8ff6/fastuuid-0.14.0-cp313-cp313-musllinux_1_1_aarch64.whl", hash = "sha256:cb9a030f609194b679e1660f7e32733b7a0f332d519c5d5a6a0a580991290022" }, + { url = "https://mirrors.aliyun.com/pypi/packages/01/d2/524d4ceeba9160e7a9bc2ea3e8f4ccf1ad78f3bde34090ca0c51f09a5e91/fastuuid-0.14.0-cp313-cp313-musllinux_1_1_i686.whl", hash = "sha256:09098762aad4f8da3a888eb9ae01c84430c907a297b97166b8abc07b640f2995" }, + { url = "https://mirrors.aliyun.com/pypi/packages/bc/17/354d04951ce114bf4afc78e27a18cfbd6ee319ab1829c2d5fb5e94063ac6/fastuuid-0.14.0-cp313-cp313-musllinux_1_1_x86_64.whl", hash = "sha256:1383fff584fa249b16329a059c68ad45d030d5a4b70fb7c73a08d98fd53bcdab" }, + { url = "https://mirrors.aliyun.com/pypi/packages/fb/be/d7be8670151d16d88f15bb121c5b66cdb5ea6a0c2a362d0dcf30276ade53/fastuuid-0.14.0-cp313-cp313-win32.whl", hash = "sha256:a0809f8cc5731c066c909047f9a314d5f536c871a7a22e815cc4967c110ac9ad" }, + { url = "https://mirrors.aliyun.com/pypi/packages/22/1d/5573ef3624ceb7abf4a46073d3554e37191c868abc3aecd5289a72f9810a/fastuuid-0.14.0-cp313-cp313-win_amd64.whl", hash = "sha256:0df14e92e7ad3276327631c9e7cec09e32572ce82089c55cb1bb8df71cf394ed" }, + { url = "https://mirrors.aliyun.com/pypi/packages/16/c9/8c7660d1fe3862e3f8acabd9be7fc9ad71eb270f1c65cce9a2b7a31329ab/fastuuid-0.14.0-cp314-cp314-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl", hash = "sha256:b852a870a61cfc26c884af205d502881a2e59cc07076b60ab4a951cc0c94d1ad" }, + { url = "https://mirrors.aliyun.com/pypi/packages/4c/f4/a989c82f9a90d0ad995aa957b3e572ebef163c5299823b4027986f133dfb/fastuuid-0.14.0-cp314-cp314-macosx_10_12_x86_64.whl", hash = "sha256:c7502d6f54cd08024c3ea9b3514e2d6f190feb2f46e6dbcd3747882264bb5f7b" }, + { url = "https://mirrors.aliyun.com/pypi/packages/da/6c/a1a24f73574ac995482b1326cf7ab41301af0fabaa3e37eeb6b3df00e6e2/fastuuid-0.14.0-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:1ca61b592120cf314cfd66e662a5b54a578c5a15b26305e1b8b618a6f22df714" }, + { url = "https://mirrors.aliyun.com/pypi/packages/1a/20/2a9b59185ba7a6c7b37808431477c2d739fcbdabbf63e00243e37bd6bf49/fastuuid-0.14.0-cp314-cp314-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:aa75b6657ec129d0abded3bec745e6f7ab642e6dba3a5272a68247e85f5f316f" }, + { url = "https://mirrors.aliyun.com/pypi/packages/ef/33/4105ca574f6ded0af6a797d39add041bcfb468a1255fbbe82fcb6f592da2/fastuuid-0.14.0-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:a8a0dfea3972200f72d4c7df02c8ac70bad1bb4c58d7e0ec1e6f341679073a7f" }, + { url = "https://mirrors.aliyun.com/pypi/packages/fe/8c/fca59f8e21c4deb013f574eae05723737ddb1d2937ce87cb2a5d20992dc3/fastuuid-0.14.0-cp314-cp314-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:1bf539a7a95f35b419f9ad105d5a8a35036df35fdafae48fb2fd2e5f318f0d75" }, + { url = "https://mirrors.aliyun.com/pypi/packages/cb/e2/f78c271b909c034d429218f2798ca4e89eeda7983f4257d7865976ddbb6c/fastuuid-0.14.0-cp314-cp314-musllinux_1_1_aarch64.whl", hash = "sha256:9a133bf9cc78fdbd1179cb58a59ad0100aa32d8675508150f3658814aeefeaa4" }, + { url = "https://mirrors.aliyun.com/pypi/packages/1e/f0/5ff209d865897667a2ff3e7a572267a9ced8f7313919f6d6043aed8b1caa/fastuuid-0.14.0-cp314-cp314-musllinux_1_1_i686.whl", hash = "sha256:f54d5b36c56a2d5e1a31e73b950b28a0d83eb0c37b91d10408875a5a29494bad" }, + { url = "https://mirrors.aliyun.com/pypi/packages/e0/c8/2ce1c78f983a2c4987ea865d9516dbdfb141a120fd3abb977ae6f02ba7ca/fastuuid-0.14.0-cp314-cp314-musllinux_1_1_x86_64.whl", hash = "sha256:ec27778c6ca3393ef662e2762dba8af13f4ec1aaa32d08d77f71f2a70ae9feb8" }, + { url = "https://mirrors.aliyun.com/pypi/packages/df/60/dad662ec9a33b4a5fe44f60699258da64172c39bd041da2994422cdc40fe/fastuuid-0.14.0-cp314-cp314-win32.whl", hash = "sha256:e23fc6a83f112de4be0cc1990e5b127c27663ae43f866353166f87df58e73d06" }, + { url = "https://mirrors.aliyun.com/pypi/packages/1f/f6/da4db31001e854025ffd26bc9ba0740a9cbba2c3259695f7c5834908b336/fastuuid-0.14.0-cp314-cp314-win_amd64.whl", hash = "sha256:df61342889d0f5e7a32f7284e55ef95103f2110fee433c2ae7c2c0956d76ac8a" }, +] + [[package]] name = "filelock" version = "3.20.0" @@ -1919,6 +1991,121 @@ wheels = [ { url = "https://mirrors.aliyun.com/pypi/packages/cb/b1/3846dd7f199d53cb17f49cba7e651e9ce294d8497c8c150530ed11865bb8/iniconfig-2.3.0-py3-none-any.whl", hash = "sha256:f631c04d2c48c52b84d0d0549c99ff3859c98df65b3101406327ecc7d53fbf12" }, ] +[[package]] +name = "jinja2" +version = "3.1.6" +source = { registry = "https://mirrors.aliyun.com/pypi/simple/" } +dependencies = [ + { name = "markupsafe" }, +] +sdist = { url = "https://mirrors.aliyun.com/pypi/packages/df/bf/f7da0350254c0ed7c72f3e33cef02e048281fec7ecec5f032d4aac52226b/jinja2-3.1.6.tar.gz", hash = "sha256:0137fb05990d35f1275a587e9aee6d56da821fc83491a0fb838183be43f66d6d" } +wheels = [ + { url = "https://mirrors.aliyun.com/pypi/packages/62/a1/3d680cbfd5f4b8f15abc1d571870c5fc3e594bb582bc3b64ea099db13e56/jinja2-3.1.6-py3-none-any.whl", hash = "sha256:85ece4451f492d0c13c5dd7c13a64681a86afae63a5f347908daf103ce6d2f67" }, +] + +[[package]] +name = "jiter" +version = "0.14.0" +source = { registry = "https://mirrors.aliyun.com/pypi/simple/" } +sdist = { url = "https://mirrors.aliyun.com/pypi/packages/6e/c1/0cddc6eb17d4c53a99840953f95dd3accdc5cfc7a337b0e9b26476276be9/jiter-0.14.0.tar.gz", hash = "sha256:e8a39e66dac7153cf3f964a12aad515afa8d74938ec5cc0018adcdae5367c79e" } +wheels = [ + { url = "https://mirrors.aliyun.com/pypi/packages/64/2e/a9959997739c403378d0a4a3a1c4ed80b60aeace216c4d37b303a9fc60a4/jiter-0.14.0-cp310-cp310-macosx_10_12_x86_64.whl", hash = "sha256:02f36a5c700f105ac04a6556fe664a59037a2c200db3b7e88784fac2ddf02531" }, + { url = "https://mirrors.aliyun.com/pypi/packages/27/72/b6de8a531e0adbadd839bec301165feb1fccf00e9ff55073ba2dd20f0043/jiter-0.14.0-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:41eab6c09ceffb6f0fe25e214b3068146edb1eda3649ca2aee2a061029c7ba2e" }, + { url = "https://mirrors.aliyun.com/pypi/packages/db/d8/2040b9efa13c917f855c40890ae4119fe02c25b7c7677d5b4fa820a851fc/jiter-0.14.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:5cf4d4c109641f9cfaf4a7b6aebd51654e405cd00fa9ebbf87163b8b97b325aa" }, + { url = "https://mirrors.aliyun.com/pypi/packages/49/62/655c0ad5ce6a8e90f9068c175b8a236877d753e460762b3183c136db1c5b/jiter-0.14.0-cp310-cp310-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:b80c7b41a628e6be2213ad0ece763c5f88aa5ee003fa394d58acaaee1f4b8342" }, + { url = "https://mirrors.aliyun.com/pypi/packages/f1/66/549c40fa068f08710b7570869c306a051eb67a29758bd64f4114f730554c/jiter-0.14.0-cp310-cp310-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:fb3dbf7cc0d4dbe73cce307ebe7eefa7f73a7d3d854dd119ea0c243f03e40927" }, + { url = "https://mirrors.aliyun.com/pypi/packages/25/2f/97a32a05fed14ed58a18e181fdfb619e05163f3726b54ee6080ec0539c09/jiter-0.14.0-cp310-cp310-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:7054adcdeb06b46efd17b5734f75817a44a2d06d3748e36c3a023a1bb52af9ec" }, + { url = "https://mirrors.aliyun.com/pypi/packages/2a/3b/4347e1d6c2a973d653bbb7a2d671a2d2426e54b52ba735b8ff0d0a29b75c/jiter-0.14.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:d597cd1bf6790376f3fffc7c708766e57301d99a19314824ea0ccc9c3c70e1e2" }, + { url = "https://mirrors.aliyun.com/pypi/packages/ef/24/ca452fbf2ea33548ed30ce68a39a50442d3f7c9bf0704a7af958a930c057/jiter-0.14.0-cp310-cp310-manylinux_2_31_riscv64.whl", hash = "sha256:df63a14878da754427926281626fd3ee249424a186e25a274e78176d42945264" }, + { url = "https://mirrors.aliyun.com/pypi/packages/e3/a3/94470a0d199287caabeb4da2bb2ae5f6d17f3cf05dfc975d7cb064d58e0f/jiter-0.14.0-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:4ea73187627bcc5810e085df715e8a99da8bdfd96a7eb36b4b4df700ba6d4c9c" }, + { url = "https://mirrors.aliyun.com/pypi/packages/cf/71/6768edc09d7c45c39f093feb3de105fa718a3e982b5208b8a2ed6382b44b/jiter-0.14.0-cp310-cp310-musllinux_1_1_aarch64.whl", hash = "sha256:9f541eaf7bb8382367a1a23d6fc3d6aad57f8dd8c18c3c17f838bee20f217220" }, + { url = "https://mirrors.aliyun.com/pypi/packages/3d/6b/5c2e17559a0f4e96e934479f7137df46c939e983fa05244e674815befb73/jiter-0.14.0-cp310-cp310-musllinux_1_1_x86_64.whl", hash = "sha256:107465250de4fce00fdb47166bcd51df8e634e049541174fe3c71848e44f52ce" }, + { url = "https://mirrors.aliyun.com/pypi/packages/b1/83/c25f3556a60fc74d11199100f1b6cc0c006b815c8494dea8ca16fe398732/jiter-0.14.0-cp310-cp310-win32.whl", hash = "sha256:ffb2a08a406465bb076b7cc1df41d833106d3cf7905076cc73f0cb90078c7d10" }, + { url = "https://mirrors.aliyun.com/pypi/packages/2e/99/781a1b413f0989b7f2ea203b094b331685f1a35e52e0a45e5d000ecaab27/jiter-0.14.0-cp310-cp310-win_amd64.whl", hash = "sha256:cb8b682d10cb0cce7ff4c1af7244af7022c9b01ae16d46c357bdd0df13afb25d" }, + { url = "https://mirrors.aliyun.com/pypi/packages/8a/1f/198ae537fccb7080a0ed655eb56abf64a92f79489dfbf79f40fa34225bcd/jiter-0.14.0-cp311-cp311-macosx_10_12_x86_64.whl", hash = "sha256:7e791e247b8044512e070bd1f3633dc08350d32776d2d6e7473309d0edf256a2" }, + { url = "https://mirrors.aliyun.com/pypi/packages/cf/34/da67cff3fce964a36d03c3e365fb0f8726ade2a6cfd4d3c70107e216ead6/jiter-0.14.0-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:71527ce13fd5a0c4e40ad37331f8c547177dbb2dd0a93e5278b6a5eecf748804" }, + { url = "https://mirrors.aliyun.com/pypi/packages/ed/36/4c72e67180d4e71a4f5dcf7886d0840e83c49ab11788172177a77570326e/jiter-0.14.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:02c4a7ab56f746014874f2c525584c0daca1dec37f66fd707ecef3b7e5c2228c" }, + { url = "https://mirrors.aliyun.com/pypi/packages/bc/db/9b39e09ceafa9878235c0fc29e3e3f9b12a4c6a98ea3085b998cadf3accc/jiter-0.14.0-cp311-cp311-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:376e9dafff914253bb9d46cdc5f7965607fbe7feb0a491c34e35f92b2770702e" }, + { url = "https://mirrors.aliyun.com/pypi/packages/b0/96/0dcba1d7a82c1b720774b48ef239376addbaf30df24c34742ac4a57b67b2/jiter-0.14.0-cp311-cp311-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:23ad2a7a9da1935575c820428dd8d2490ce4d23189691ce33da1fc0a58e14e1c" }, + { url = "https://mirrors.aliyun.com/pypi/packages/f1/e3/f61b71543e746e6b8b805e7755814fc242715c16f1dba58e1cbccb8032c2/jiter-0.14.0-cp311-cp311-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:54b3ddf5786bc7732d293bba3411ac637ecfa200a39983166d1df86a59a43c9f" }, + { url = "https://mirrors.aliyun.com/pypi/packages/ad/5e/0ddeb7096aca099114abe36c4921016e8d251e6f35f5890240b31f1f60ae/jiter-0.14.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:5c001d5a646c2a50dc055dd526dad5d5245969e8234d2b1131d0451e81f3a373" }, + { url = "https://mirrors.aliyun.com/pypi/packages/e9/d1/fe0c46cd7fda9cad8f1ff9ad217dc61f1e4280b21052ec6dfe88c1446ef2/jiter-0.14.0-cp311-cp311-manylinux_2_31_riscv64.whl", hash = "sha256:834bb5bdabca2e91592a03d373838a8d0a1b8bbde7077ae6913fd2fc51812d00" }, + { url = "https://mirrors.aliyun.com/pypi/packages/ac/21/f5317f91729b501019184771c80d60abd89907009e7bfa6c7e348c5bdd44/jiter-0.14.0-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:4e9178be60e229b1b2b0710f61b9e24d1f4f8556985a83ff4c4f95920eea7314" }, + { url = "https://mirrors.aliyun.com/pypi/packages/e9/05/79d8f33fb2bf168db0df5c9cd16fe440a8ada57e929d3677b22712c2568f/jiter-0.14.0-cp311-cp311-musllinux_1_1_aarch64.whl", hash = "sha256:a7e4ccff04ec03614e62c613e976a3a5860dc9714ce8266f44328bdc8b1cab2c" }, + { url = "https://mirrors.aliyun.com/pypi/packages/5c/00/d1e3ff3d2a465e67f08507d74bafb2dcd29eba91dc939820e39e8dea38b8/jiter-0.14.0-cp311-cp311-musllinux_1_1_x86_64.whl", hash = "sha256:69539d936fb5d55caf6ecd33e2e884de083ff0ea28579780d56c4403094bb8d9" }, + { url = "https://mirrors.aliyun.com/pypi/packages/60/5b/bbb2189f62ace8d95e869aa4c84c9946616f301e2d02895a6f20dcc3bba3/jiter-0.14.0-cp311-cp311-win32.whl", hash = "sha256:4927d09b3e572787cc5e0a5318601448e1ab9391bcef95677f5840c2d00eaa6d" }, + { url = "https://mirrors.aliyun.com/pypi/packages/b8/86/c500b53dcbf08575f5963e536ebd757a1f7c568272ba5d180b212c9a87fb/jiter-0.14.0-cp311-cp311-win_amd64.whl", hash = "sha256:42d6ed359ac49eb922fdd565f209c57340aa06d589c84c8413e42a0f9ae1b842" }, + { url = "https://mirrors.aliyun.com/pypi/packages/75/4a/a676249049d42cb29bef82233e4fe0524d414cbe3606c7a4b311193c2f77/jiter-0.14.0-cp311-cp311-win_arm64.whl", hash = "sha256:6dd689f5f4a5a33747b28686e051095beb214fe28cfda5e9fe58a295a788f593" }, + { url = "https://mirrors.aliyun.com/pypi/packages/5a/68/7390a418f10897da93b158f2d5a8bd0bcd73a0f9ec3bb36917085bb759ef/jiter-0.14.0-cp312-cp312-macosx_10_12_x86_64.whl", hash = "sha256:2fb2ce3a7bc331256dfb14cefc34832366bb28a9aca81deaf43bbf2a5659e607" }, + { url = "https://mirrors.aliyun.com/pypi/packages/60/a0/5854ac00ff63551c52c6c89534ec6aba4b93474e7924d64e860b1c94165b/jiter-0.14.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:5252a7ca23785cef5d02d4ece6077a1b556a410c591b379f82091c3001e14844" }, + { url = "https://mirrors.aliyun.com/pypi/packages/41/a1/4f44832650a16b18e8391f1bf1d6ca4909bc738351826bcc198bba4357f4/jiter-0.14.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:c409578cbd77c338975670ada777add4efd53379667edf0aceea730cabede6fb" }, + { url = "https://mirrors.aliyun.com/pypi/packages/48/64/a329e9d469f86307203594b1707e11ae51c3348d03bfd514a5f997870012/jiter-0.14.0-cp312-cp312-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:7ede4331a1899d604463369c730dbb961ffdc5312bc7f16c41c2896415b1304a" }, + { url = "https://mirrors.aliyun.com/pypi/packages/94/c1/5e3dfc59635aa4d4c7bd20a820ac1d09b8ed851568356802cf1c08edb3cf/jiter-0.14.0-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:92cd8b6025981a041f5310430310b55b25ca593972c16407af8837d3d7d2ca01" }, + { url = "https://mirrors.aliyun.com/pypi/packages/e3/1b/dd157009dbc058f7b00108f545ccb72a2d56461395c4fc7b9cfdccb00af4/jiter-0.14.0-cp312-cp312-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:351bf6eda4e3a7ceb876377840c702e9a3e4ecc4624dbfb2d6463c67ae52637d" }, + { url = "https://mirrors.aliyun.com/pypi/packages/91/78/256013667b7c10b8834f8e6e54cd3e562d4c6e34227a1596addccc05e38c/jiter-0.14.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:c1dcfbeb93d9ecd9ca128bbf8910120367777973fa193fb9a39c31237d8df165" }, + { url = "https://mirrors.aliyun.com/pypi/packages/de/d9/137d65ade9093a409fe80955ce60b12bb753722c986467aeda47faf450ad/jiter-0.14.0-cp312-cp312-manylinux_2_31_riscv64.whl", hash = "sha256:ae039aaef8de3f8157ecc1fdd4d85043ac4f57538c245a0afaecb8321ec951c3" }, + { url = "https://mirrors.aliyun.com/pypi/packages/2e/48/76750835b87029342727c1a268bea8878ab988caf81ee4e7b880900eeb5a/jiter-0.14.0-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:7d9d51eb96c82a9652933bd769fe6de66877d6eb2b2440e281f2938c51b5643e" }, + { url = "https://mirrors.aliyun.com/pypi/packages/a6/60/456c4e81d5c8045279aefe60e9e483be08793828800a4e64add8fdde7f2a/jiter-0.14.0-cp312-cp312-musllinux_1_1_aarch64.whl", hash = "sha256:d824ca4148b705970bf4e120924a212fdfca9859a73e42bd7889a63a4ea6bb98" }, + { url = "https://mirrors.aliyun.com/pypi/packages/a8/9f/2020e0984c235f678dced38fe4eec3058cf528e6af36ebf969b410305941/jiter-0.14.0-cp312-cp312-musllinux_1_1_x86_64.whl", hash = "sha256:ff3a6465b3a0f54b1a430f45c3c0ba7d61ceb45cbc3e33f9e1a7f638d690baf3" }, + { url = "https://mirrors.aliyun.com/pypi/packages/ef/32/e2d298e1a22a4bbe6062136d1c7192db7dba003a6975e51d9a9eecabc4c2/jiter-0.14.0-cp312-cp312-win32.whl", hash = "sha256:5dec7c0a3e98d2a3f8a2e67382d0d7c3ac60c69103a4b271da889b4e8bb1e129" }, + { url = "https://mirrors.aliyun.com/pypi/packages/36/ac/96369141b3d8a4a8e4590e983085efe1c436f35c0cda940dd76d942e3e40/jiter-0.14.0-cp312-cp312-win_amd64.whl", hash = "sha256:fc7e37b4b8bc7e80a63ad6cfa5fc11fab27dbfea4cc4ae644b1ab3f273dc348f" }, + { url = "https://mirrors.aliyun.com/pypi/packages/01/c3/75d847f264647017d7e3052bbcc8b1e24b95fa139c320c5f5066fa7a0bdd/jiter-0.14.0-cp312-cp312-win_arm64.whl", hash = "sha256:ee4a72f12847ef29b072aee9ad5474041ab2924106bdca9fcf5d7d965853e057" }, + { url = "https://mirrors.aliyun.com/pypi/packages/97/2a/09f70020898507a89279659a1afe3364d57fc1b2c89949081975d135f6f5/jiter-0.14.0-cp313-cp313-macosx_10_12_x86_64.whl", hash = "sha256:af72f204cf4d44258e5b4c1745130ac45ddab0e71a06333b01de660ab4187a94" }, + { url = "https://mirrors.aliyun.com/pypi/packages/d6/be/080c96a45cd74f9fce5db4fd68510b88087fb37ffe2541ff73c12db92535/jiter-0.14.0-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:4b77da71f6e819be5fbcec11a453fde5b1d0267ef6ed487e2a392fd8e14e4e3a" }, + { url = "https://mirrors.aliyun.com/pypi/packages/7d/5e/2d0fee155826a968a832cc32438de5e2a193292c8721ca70d0b53e58245b/jiter-0.14.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:77f4ea612fe8b84b8b04e51d0e78029ecf3466348e25973f953de6e6a59aa4c1" }, + { url = "https://mirrors.aliyun.com/pypi/packages/70/af/bf9ee0d3a4f8dc0d679fc1337f874fe60cdbf841ebbb304b374e1c9aaceb/jiter-0.14.0-cp313-cp313-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:62fe2451f8fcc0240261e6a4df18ecbcd58327857e61e625b2393ea3b468aac9" }, + { url = "https://mirrors.aliyun.com/pypi/packages/0f/83/8e8561eadba31f4d3948a5b712fb0447ec71c3560b57a855449e7b8ddc98/jiter-0.14.0-cp313-cp313-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:6112f26f5afc75bcb475787d29da3aa92f9d09c7858f632f4be6ffe607be82e9" }, + { url = "https://mirrors.aliyun.com/pypi/packages/f6/c9/c5299e826a5fe6108d172b344033f61c69b1bb979dd8d9ddd4278a160971/jiter-0.14.0-cp313-cp313-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:215a6cb8fb7dc702aa35d475cc00ddc7f970e5c0b1417fb4b4ac5d82fa2a29db" }, + { url = "https://mirrors.aliyun.com/pypi/packages/5d/37/c16d9d15c0a471b8644b1abe3c82668092a707d9bedcf076f24ff2e380cd/jiter-0.14.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:fc4ab96a30fb3cb2c7e0cd33f7616c8860da5f5674438988a54ac717caccdbaa" }, + { url = "https://mirrors.aliyun.com/pypi/packages/58/ea/8050cb0dc654e728e1bfacbc0c640772f2181af5dedd13ae70145743a439/jiter-0.14.0-cp313-cp313-manylinux_2_31_riscv64.whl", hash = "sha256:3a99c1387b1f2928f799a9de899193484d66206a50e98233b6b088a7f0c1edb2" }, + { url = "https://mirrors.aliyun.com/pypi/packages/b0/3b/cf71506d270e5f84d97326bf220e47aed9b95e9a4a060758fb07772170ab/jiter-0.14.0-cp313-cp313-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:ab18d11074485438695f8d34a1b6da61db9754248f96d51341956607a8f39985" }, + { url = "https://mirrors.aliyun.com/pypi/packages/b0/cc/8c6c74a3efb5bd671bfd14f51e8a73375464ca914b1551bc3b40e26ac2c9/jiter-0.14.0-cp313-cp313-musllinux_1_1_aarch64.whl", hash = "sha256:801028dcfc26ac0895e4964cbc0fd62c73be9fd4a7d7b1aaf6e5790033a719b7" }, + { url = "https://mirrors.aliyun.com/pypi/packages/41/24/68d7b883ec959884ddf00d019b2e0e82ba81b167e1253684fa90519ce33c/jiter-0.14.0-cp313-cp313-musllinux_1_1_x86_64.whl", hash = "sha256:ad425b087aafb4a1c7e1e98a279200743b9aaf30c3e0ba723aec93f061bd9bc8" }, + { url = "https://mirrors.aliyun.com/pypi/packages/b6/89/b1a0985223bbf3150ff9e8f46f98fc9360c1de94f48abe271bbe1b465682/jiter-0.14.0-cp313-cp313-win32.whl", hash = "sha256:882bcb9b334318e233950b8be366fe5f92c86b66a7e449e76975dfd6d776a01f" }, + { url = "https://mirrors.aliyun.com/pypi/packages/4c/19/3f339a5a7f14a11730e67f6be34f9d5105751d547b615ef593fa122a5ded/jiter-0.14.0-cp313-cp313-win_amd64.whl", hash = "sha256:9b8c571a5dba09b98bd3462b5a53f27209a5cbbe85670391692ede71974e979f" }, + { url = "https://mirrors.aliyun.com/pypi/packages/50/56/752dd89c84be0e022a8ea3720bcfa0a8431db79a962578544812ce061739/jiter-0.14.0-cp313-cp313-win_arm64.whl", hash = "sha256:34f19dcc35cb1abe7c369b3756babf8c7f04595c0807a848df8f26ef8298ef92" }, + { url = "https://mirrors.aliyun.com/pypi/packages/91/28/292916f354f25a1fe8cf2c918d1415c699a4a659ae00be0430e1c5d9ffea/jiter-0.14.0-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:e89bcd7d426a75bb4952c696b267075790d854a07aad4c9894551a82c5b574ab" }, + { url = "https://mirrors.aliyun.com/pypi/packages/ad/c7/b002a7d8b8957ac3d469bd59c18ef4b1595a5216ae0de639a287b9816023/jiter-0.14.0-cp313-cp313t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:7b25beaa0d4447ea8c7ae0c18c688905d34840d7d0b937f2f7bdd52162c98a40" }, + { url = "https://mirrors.aliyun.com/pypi/packages/f9/3b/f8d07580d8706021d255a6356b8fab13ee4c869412995550ce6ed4ddf97d/jiter-0.14.0-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:651a8758dd413c51e3b7f6557cdc6921faf70b14106f45f969f091f5cda990ea" }, + { url = "https://mirrors.aliyun.com/pypi/packages/47/5b/ac1a974da29e35507230383110ffec59998b290a8732585d04e19a9eb5ba/jiter-0.14.0-cp313-cp313t-win_amd64.whl", hash = "sha256:e1a7eead856a5038a8d291f1447176ab0b525c77a279a058121b5fccee257f6f" }, + { url = "https://mirrors.aliyun.com/pypi/packages/96/6d/9fc8433d667d2454271378a79747d8c76c10b51b482b454e6190e511f244/jiter-0.14.0-cp313-cp313t-win_arm64.whl", hash = "sha256:2e692633a12cda97e352fdcd1c4acc971b1c28707e1e33aeef782b0cbf051975" }, + { url = "https://mirrors.aliyun.com/pypi/packages/4f/1e/354ed92461b165bd581f9ef5150971a572c873ec3b68a916d5aa91da3cc2/jiter-0.14.0-cp314-cp314-macosx_10_12_x86_64.whl", hash = "sha256:6f396837fc7577871ca8c12edaf239ed9ccef3bbe39904ae9b8b63ce0a48b140" }, + { url = "https://mirrors.aliyun.com/pypi/packages/a6/95/8c7c7028aa8636ac21b7a55faef3e34215e6ed0cbf5ae58258427f621aa3/jiter-0.14.0-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:a4d50ea3d8ba4176f79754333bd35f1bbcd28e91adc13eb9b7ca91bc52a6cef9" }, + { url = "https://mirrors.aliyun.com/pypi/packages/47/40/e2a852a44c4a089f2681a16611b7ce113224a80fd8504c46d78491b47220/jiter-0.14.0-cp314-cp314-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:ce17f8a050447d1b4153bda4fb7d26e6a9e74eb4f4a41913f30934c5075bf615" }, + { url = "https://mirrors.aliyun.com/pypi/packages/fc/1f/670f92adee1e9895eac41e8a4d623b6da68c4d46249d8b556b60b63f949e/jiter-0.14.0-cp314-cp314-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:f4f1c4b125e1652aefbc2e2c1617b60a160ab789d180e3d423c41439e5f32850" }, + { url = "https://mirrors.aliyun.com/pypi/packages/01/2f/541c9ba567d05de1c4874a0f8f8c5e3fd78e2b874266623da9a775cf46e0/jiter-0.14.0-cp314-cp314-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:be808176a6a3a14321d18c603f2d40741858a7c4fc982f83232842689fe86dd9" }, + { url = "https://mirrors.aliyun.com/pypi/packages/ce/a9/c31cbec09627e0d5de7aeaec7690dba03e090caa808fefd8133137cf45bc/jiter-0.14.0-cp314-cp314-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:26679d58ba816f88c3849306dd58cb863a90a1cf352cdd4ef67e30ccf8a77994" }, + { url = "https://mirrors.aliyun.com/pypi/packages/50/02/3c05c1666c41904a2f607475a73e7a4763d1cbde2d18229c4f85b22dc253/jiter-0.14.0-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:80381f5a19af8fa9aef743f080e34f6b25ebd89656475f8cf0470ec6157052aa" }, + { url = "https://mirrors.aliyun.com/pypi/packages/7d/97/e15b33545c2b13518f560d695f974b9891b311641bdcf178d63177e8801e/jiter-0.14.0-cp314-cp314-manylinux_2_31_riscv64.whl", hash = "sha256:004df5fdb8ecbd6d99f3227df18ba1a259254c4359736a2e6f036c944e02d7c5" }, + { url = "https://mirrors.aliyun.com/pypi/packages/ad/d2/8b1461def6b96ba44530df20d07ef7a1c7da22f3f9bf1727e2d611077bf1/jiter-0.14.0-cp314-cp314-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:cff5708f7ed0fa098f2b53446c6fa74c48469118e5cd7497b4f1cd569ab06928" }, + { url = "https://mirrors.aliyun.com/pypi/packages/e3/88/837566dd6ed6e452e8d3205355afd484ce44b2533edfa4ed73a298ea893e/jiter-0.14.0-cp314-cp314-musllinux_1_1_aarch64.whl", hash = "sha256:2492e5f06c36a976d25c7cc347a60e26d5470178d44cde1b9b75e60b4e519f28" }, + { url = "https://mirrors.aliyun.com/pypi/packages/89/6b/b00b45c4d1b4c031777fe161d620b755b5b02cdade1e316dcb46e4471d63/jiter-0.14.0-cp314-cp314-musllinux_1_1_x86_64.whl", hash = "sha256:7609cfbe3a03d37bfdbf5052012d5a879e72b83168a363deae7b3a26564d57de" }, + { url = "https://mirrors.aliyun.com/pypi/packages/ad/d8/6fe5b42011d19397433d345716eac16728ac241862a2aac9c91923c7509a/jiter-0.14.0-cp314-cp314-win32.whl", hash = "sha256:7282342d32e357543565286b6450378c3cd402eea333fc1ebe146f1fabb306fc" }, + { url = "https://mirrors.aliyun.com/pypi/packages/e5/43/5c2e08da1efad5e410f0eaaabeadd954812612c33fbbd8fd5328b489139d/jiter-0.14.0-cp314-cp314-win_amd64.whl", hash = "sha256:bd77945f38866a448e73b0b7637366afa814d4617790ecd88a18ca74377e6c02" }, + { url = "https://mirrors.aliyun.com/pypi/packages/aa/1f/6e39ac0b4cdfa23e606af5b245df5f9adaa76f35e0c5096790da430ca506/jiter-0.14.0-cp314-cp314-win_arm64.whl", hash = "sha256:f2d4c61da0821ee42e0cdf5489da60a6d074306313a377c2b35af464955a3611" }, + { url = "https://mirrors.aliyun.com/pypi/packages/05/57/7dbc0ffbbb5176a27e3518716608aa464aee2e2887dc938f0b900a120449/jiter-0.14.0-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:1bf7ff85517dd2f20a5750081d2b75083c1b269cf75afc7511bdf1f9548beb3b" }, + { url = "https://mirrors.aliyun.com/pypi/packages/83/6e/7b3314398d8983f06b557aa21b670511ec72d3b79a68ee5e4d9bff972286/jiter-0.14.0-cp314-cp314t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:c8ef8791c3e78d6c6b157c6d360fbb5c715bebb8113bc6a9303c5caff012754a" }, + { url = "https://mirrors.aliyun.com/pypi/packages/ae/4f/8dc674bcd7db6dba566de73c08c763c337058baff1dbeb34567045b27cdc/jiter-0.14.0-cp314-cp314t-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:e74663b8b10da1fe0f4e4703fd7980d24ad17174b6bb35d8498d6e3ebce2ae6a" }, + { url = "https://mirrors.aliyun.com/pypi/packages/3b/5f/188e09a1f20906f98bbdec44ed820e19f4e8eb8aff88b9d1a5a497587ff3/jiter-0.14.0-cp314-cp314t-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:1aca29ba52913f78362ec9c2da62f22cdc4c3083313403f90c15460979b84d9b" }, + { url = "https://mirrors.aliyun.com/pypi/packages/ac/f0/19046ef965ed8f349e8554775bb12ff4352f443fbe12b95d31f575891256/jiter-0.14.0-cp314-cp314t-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:8b39b7d87a952b79949af5fef44d2544e58c21a28da7f1bae3ef166455c61746" }, + { url = "https://mirrors.aliyun.com/pypi/packages/c4/c3/da43bd8431ee175695777ee78cf0e93eacbb47393ff493f18c45231b427d/jiter-0.14.0-cp314-cp314t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:78d918a68b26e9fab068c2b5453577ef04943ab2807b9a6275df2a812599a310" }, + { url = "https://mirrors.aliyun.com/pypi/packages/72/26/e054771be889707c6161dbdec9c23d33a9ec70945395d70f07cfea1e9a6f/jiter-0.14.0-cp314-cp314t-manylinux_2_31_riscv64.whl", hash = "sha256:b08997c35aee1201c1a5361466a8fb9162d03ae7bf6568df70b6c859f1e654a4" }, + { url = "https://mirrors.aliyun.com/pypi/packages/c3/0f/7bea65ea2a6d91f2bf989ff11a18136644392bf2b0497a1fa50934c30a9c/jiter-0.14.0-cp314-cp314t-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:260bf7ca20704d58d41f669e5e9fe7fe2fa72901a6b324e79056f5d52e9c9be2" }, + { url = "https://mirrors.aliyun.com/pypi/packages/3c/a1/b1ff7d70deef61ac0b7c6c2f12d2ace950cdeecb4fdc94500a0926802857/jiter-0.14.0-cp314-cp314t-musllinux_1_1_aarch64.whl", hash = "sha256:37826e3df29e60f30a382f9294348d0238ef127f4b5d7f5f8da78b5b9e050560" }, + { url = "https://mirrors.aliyun.com/pypi/packages/0b/7b/3b0649983cbaf15eda26a414b5b1982e910c67bd6f7b1b490f3cfc76896a/jiter-0.14.0-cp314-cp314t-musllinux_1_1_x86_64.whl", hash = "sha256:645be49c46f2900937ba0eaf871ad5183c96858c0af74b6becc7f4e367e36e06" }, + { url = "https://mirrors.aliyun.com/pypi/packages/97/f8/33d78c83bd93ae0c0af05293a6660f88a1977caef39a6d72a84afab94ce0/jiter-0.14.0-cp314-cp314t-win32.whl", hash = "sha256:2f7877ed45118de283786178eceaf877110abacd04fde31efff3940ae9672674" }, + { url = "https://mirrors.aliyun.com/pypi/packages/d6/ac/2b760516c03e2227826d1f7025d89bf6bf6357a28fe75c2a2800873c50bf/jiter-0.14.0-cp314-cp314t-win_amd64.whl", hash = "sha256:14c0cb10337c49f5eafe8e7364daca5e29a020ea03580b8f8e6c597fed4e1588" }, + { url = "https://mirrors.aliyun.com/pypi/packages/dc/2e/a44c20c58aeed0355f2d326969a181696aeb551a25195f47563908a815be/jiter-0.14.0-cp314-cp314t-win_arm64.whl", hash = "sha256:5419d4aa2024961da9fe12a9cfe7484996735dca99e8e090b5c88595ef1951ff" }, + { url = "https://mirrors.aliyun.com/pypi/packages/32/a1/ef34ca2cab2962598591636a1804b93645821201cc0095d4a93a9a329c9d/jiter-0.14.0-graalpy311-graalpy242_311_native-macosx_10_12_x86_64.whl", hash = "sha256:a25ffa2dbbdf8721855612f6dca15c108224b12d0c4024d0ac3d7902132b4211" }, + { url = "https://mirrors.aliyun.com/pypi/packages/60/bb/520576a532a6b8a6f42747afed289c8448c879a34d7802fe2c832d4fd38f/jiter-0.14.0-graalpy311-graalpy242_311_native-macosx_11_0_arm64.whl", hash = "sha256:0ac9cbaa86c10996b92bd12c91659b60f939f8e28fcfa6bc11a0e90a774ce95b" }, + { url = "https://mirrors.aliyun.com/pypi/packages/b2/7c/c16db114ea1f2f532f198aa8dc39585026af45af362c69a0492f31bc4821/jiter-0.14.0-graalpy311-graalpy242_311_native-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:844e73b6c56b505e9e169234ea3bdea2ea43f769f847f47ac559ba1d2361ebea" }, + { url = "https://mirrors.aliyun.com/pypi/packages/99/8f/15e7741ff19e9bcd4d753f7ff22f988fd54592f134ca13701c13ea8c20e0/jiter-0.14.0-graalpy311-graalpy242_311_native-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:e52c076f187405fc21523c746c04399c9af8ece566077ed147b2126f2bcba577" }, + { url = "https://mirrors.aliyun.com/pypi/packages/21/42/9042c3f3019de4adcb8c16591c325ec7255beea9fcd33a42a43f3b0b1000/jiter-0.14.0-graalpy312-graalpy250_312_native-macosx_10_12_x86_64.whl", hash = "sha256:fbd9e482663ca9d005d051330e4d2d8150bb208a209409c10f7e7dfdf7c49da9" }, + { url = "https://mirrors.aliyun.com/pypi/packages/60/cf/a7e19b308bd86bb04776803b1f01a5f9a287a4c55205f4708827ee487fbf/jiter-0.14.0-graalpy312-graalpy250_312_native-macosx_11_0_arm64.whl", hash = "sha256:33a20d838b91ef376b3a56896d5b04e725c7df5bc4864cc6569cf046a8d73b6d" }, + { url = "https://mirrors.aliyun.com/pypi/packages/ca/44/e26ede3f0caeff93f222559cb0cc4ca68579f07d009d7b6010c5b586f9b1/jiter-0.14.0-graalpy312-graalpy250_312_native-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:432c4db5255d86a259efde91e55cb4c8d18c0521d844c9e2e7efcce3899fb016" }, + { url = "https://mirrors.aliyun.com/pypi/packages/da/e9/1f9ada30cef7b05e74bb06f52127e7a724976c225f46adb65c37b1dadfb6/jiter-0.14.0-graalpy312-graalpy250_312_native-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:67f00d94b281174144d6532a04b66a12cb866cbdc47c3af3bfe2973677f9861a" }, +] + [[package]] name = "jmespath" version = "0.10.0" @@ -2122,6 +2309,29 @@ antlr4-13-2 = [ { name = "antlr4-python3-runtime" }, ] +[[package]] +name = "litellm" +version = "1.82.6" +source = { registry = "https://mirrors.aliyun.com/pypi/simple/" } +dependencies = [ + { name = "aiohttp" }, + { name = "click" }, + { name = "fastuuid" }, + { name = "httpx" }, + { name = "importlib-metadata" }, + { name = "jinja2" }, + { name = "jsonschema" }, + { name = "openai" }, + { name = "pydantic" }, + { name = "python-dotenv" }, + { name = "tiktoken" }, + { name = "tokenizers" }, +] +sdist = { url = "https://mirrors.aliyun.com/pypi/packages/29/75/1c537aa458426a9127a92bc2273787b2f987f4e5044e21f01f2eed5244fd/litellm-1.82.6.tar.gz", hash = "sha256:2aa1c2da21fe940c33613aa447119674a3ad4d2ad5eb064e4d5ce5ee42420136" } +wheels = [ + { url = "https://mirrors.aliyun.com/pypi/packages/02/6c/5327667e6dbe9e98cbfbd4261c8e91386a52e38f41419575854248bbab6a/litellm-1.82.6-py3-none-any.whl", hash = "sha256:164a3ef3e19f309e3cabc199bef3d2045212712fefdfa25fc7f75884a5b5b205" }, +] + [[package]] name = "magiccube" version = "0.3.0" @@ -2146,6 +2356,91 @@ wheels = [ { url = "https://mirrors.aliyun.com/pypi/packages/94/54/e7d793b573f298e1c9013b8c4dade17d481164aa517d1d7148619c2cedbf/markdown_it_py-4.0.0-py3-none-any.whl", hash = "sha256:87327c59b172c5011896038353a81343b6754500a08cd7a4973bb48c6d578147" }, ] +[[package]] +name = "markupsafe" +version = "3.0.3" +source = { registry = "https://mirrors.aliyun.com/pypi/simple/" } +sdist = { url = "https://mirrors.aliyun.com/pypi/packages/7e/99/7690b6d4034fffd95959cbe0c02de8deb3098cc577c67bb6a24fe5d7caa7/markupsafe-3.0.3.tar.gz", hash = "sha256:722695808f4b6457b320fdc131280796bdceb04ab50fe1795cd540799ebe1698" } +wheels = [ + { url = "https://mirrors.aliyun.com/pypi/packages/e8/4b/3541d44f3937ba468b75da9eebcae497dcf67adb65caa16760b0a6807ebb/markupsafe-3.0.3-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:2f981d352f04553a7171b8e44369f2af4055f888dfb147d55e42d29e29e74559" }, + { url = "https://mirrors.aliyun.com/pypi/packages/98/1b/fbd8eed11021cabd9226c37342fa6ca4e8a98d8188a8d9b66740494960e4/markupsafe-3.0.3-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:e1c1493fb6e50ab01d20a22826e57520f1284df32f2d8601fdd90b6304601419" }, + { url = "https://mirrors.aliyun.com/pypi/packages/40/01/e560d658dc0bb8ab762670ece35281dec7b6c1b33f5fbc09ebb57a185519/markupsafe-3.0.3-cp310-cp310-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:1ba88449deb3de88bd40044603fafffb7bc2b055d626a330323a9ed736661695" }, + { url = "https://mirrors.aliyun.com/pypi/packages/af/cd/ce6e848bbf2c32314c9b237839119c5a564a59725b53157c856e90937b7a/markupsafe-3.0.3-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:f42d0984e947b8adf7dd6dde396e720934d12c506ce84eea8476409563607591" }, + { url = "https://mirrors.aliyun.com/pypi/packages/c9/2a/b5c12c809f1c3045c4d580b035a743d12fcde53cf685dbc44660826308da/markupsafe-3.0.3-cp310-cp310-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:c0c0b3ade1c0b13b936d7970b1d37a57acde9199dc2aecc4c336773e1d86049c" }, + { url = "https://mirrors.aliyun.com/pypi/packages/cf/e3/9427a68c82728d0a88c50f890d0fc072a1484de2f3ac1ad0bfc1a7214fd5/markupsafe-3.0.3-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:0303439a41979d9e74d18ff5e2dd8c43ed6c6001fd40e5bf2e43f7bd9bbc523f" }, + { url = "https://mirrors.aliyun.com/pypi/packages/bc/36/23578f29e9e582a4d0278e009b38081dbe363c5e7165113fad546918a232/markupsafe-3.0.3-cp310-cp310-musllinux_1_2_riscv64.whl", hash = "sha256:d2ee202e79d8ed691ceebae8e0486bd9a2cd4794cec4824e1c99b6f5009502f6" }, + { url = "https://mirrors.aliyun.com/pypi/packages/56/21/dca11354e756ebd03e036bd8ad58d6d7168c80ce1fe5e75218e4945cbab7/markupsafe-3.0.3-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:177b5253b2834fe3678cb4a5f0059808258584c559193998be2601324fdeafb1" }, + { url = "https://mirrors.aliyun.com/pypi/packages/87/99/faba9369a7ad6e4d10b6a5fbf71fa2a188fe4a593b15f0963b73859a1bbd/markupsafe-3.0.3-cp310-cp310-win32.whl", hash = "sha256:2a15a08b17dd94c53a1da0438822d70ebcd13f8c3a95abe3a9ef9f11a94830aa" }, + { url = "https://mirrors.aliyun.com/pypi/packages/d6/25/55dc3ab959917602c96985cb1253efaa4ff42f71194bddeb61eb7278b8be/markupsafe-3.0.3-cp310-cp310-win_amd64.whl", hash = "sha256:c4ffb7ebf07cfe8931028e3e4c85f0357459a3f9f9490886198848f4fa002ec8" }, + { url = "https://mirrors.aliyun.com/pypi/packages/d0/9e/0a02226640c255d1da0b8d12e24ac2aa6734da68bff14c05dd53b94a0fc3/markupsafe-3.0.3-cp310-cp310-win_arm64.whl", hash = "sha256:e2103a929dfa2fcaf9bb4e7c091983a49c9ac3b19c9061b6d5427dd7d14d81a1" }, + { url = "https://mirrors.aliyun.com/pypi/packages/08/db/fefacb2136439fc8dd20e797950e749aa1f4997ed584c62cfb8ef7c2be0e/markupsafe-3.0.3-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:1cc7ea17a6824959616c525620e387f6dd30fec8cb44f649e31712db02123dad" }, + { url = "https://mirrors.aliyun.com/pypi/packages/e1/2e/5898933336b61975ce9dc04decbc0a7f2fee78c30353c5efba7f2d6ff27a/markupsafe-3.0.3-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:4bd4cd07944443f5a265608cc6aab442e4f74dff8088b0dfc8238647b8f6ae9a" }, + { url = "https://mirrors.aliyun.com/pypi/packages/1d/09/adf2df3699d87d1d8184038df46a9c80d78c0148492323f4693df54e17bb/markupsafe-3.0.3-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:6b5420a1d9450023228968e7e6a9ce57f65d148ab56d2313fcd589eee96a7a50" }, + { url = "https://mirrors.aliyun.com/pypi/packages/30/ac/0273f6fcb5f42e314c6d8cd99effae6a5354604d461b8d392b5ec9530a54/markupsafe-3.0.3-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:0bf2a864d67e76e5c9a34dc26ec616a66b9888e25e7b9460e1c76d3293bd9dbf" }, + { url = "https://mirrors.aliyun.com/pypi/packages/19/ae/31c1be199ef767124c042c6c3e904da327a2f7f0cd63a0337e1eca2967a8/markupsafe-3.0.3-cp311-cp311-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:bc51efed119bc9cfdf792cdeaa4d67e8f6fcccab66ed4bfdd6bde3e59bfcbb2f" }, + { url = "https://mirrors.aliyun.com/pypi/packages/b2/76/7edcab99d5349a4532a459e1fe64f0b0467a3365056ae550d3bcf3f79e1e/markupsafe-3.0.3-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:068f375c472b3e7acbe2d5318dea141359e6900156b5b2ba06a30b169086b91a" }, + { url = "https://mirrors.aliyun.com/pypi/packages/a4/28/6e74cdd26d7514849143d69f0bf2399f929c37dc2b31e6829fd2045b2765/markupsafe-3.0.3-cp311-cp311-musllinux_1_2_riscv64.whl", hash = "sha256:7be7b61bb172e1ed687f1754f8e7484f1c8019780f6f6b0786e76bb01c2ae115" }, + { url = "https://mirrors.aliyun.com/pypi/packages/62/7e/a145f36a5c2945673e590850a6f8014318d5577ed7e5920a4b3448e0865d/markupsafe-3.0.3-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:f9e130248f4462aaa8e2552d547f36ddadbeaa573879158d721bbd33dfe4743a" }, + { url = "https://mirrors.aliyun.com/pypi/packages/0f/62/d9c46a7f5c9adbeeeda52f5b8d802e1094e9717705a645efc71b0913a0a8/markupsafe-3.0.3-cp311-cp311-win32.whl", hash = "sha256:0db14f5dafddbb6d9208827849fad01f1a2609380add406671a26386cdf15a19" }, + { url = "https://mirrors.aliyun.com/pypi/packages/83/8a/4414c03d3f891739326e1783338e48fb49781cc915b2e0ee052aa490d586/markupsafe-3.0.3-cp311-cp311-win_amd64.whl", hash = "sha256:de8a88e63464af587c950061a5e6a67d3632e36df62b986892331d4620a35c01" }, + { url = "https://mirrors.aliyun.com/pypi/packages/35/73/893072b42e6862f319b5207adc9ae06070f095b358655f077f69a35601f0/markupsafe-3.0.3-cp311-cp311-win_arm64.whl", hash = "sha256:3b562dd9e9ea93f13d53989d23a7e775fdfd1066c33494ff43f5418bc8c58a5c" }, + { url = "https://mirrors.aliyun.com/pypi/packages/5a/72/147da192e38635ada20e0a2e1a51cf8823d2119ce8883f7053879c2199b5/markupsafe-3.0.3-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:d53197da72cc091b024dd97249dfc7794d6a56530370992a5e1a08983ad9230e" }, + { url = "https://mirrors.aliyun.com/pypi/packages/9a/81/7e4e08678a1f98521201c3079f77db69fb552acd56067661f8c2f534a718/markupsafe-3.0.3-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:1872df69a4de6aead3491198eaf13810b565bdbeec3ae2dc8780f14458ec73ce" }, + { url = "https://mirrors.aliyun.com/pypi/packages/1e/2c/799f4742efc39633a1b54a92eec4082e4f815314869865d876824c257c1e/markupsafe-3.0.3-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:3a7e8ae81ae39e62a41ec302f972ba6ae23a5c5396c8e60113e9066ef893da0d" }, + { url = "https://mirrors.aliyun.com/pypi/packages/3c/2e/8d0c2ab90a8c1d9a24f0399058ab8519a3279d1bd4289511d74e909f060e/markupsafe-3.0.3-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:d6dd0be5b5b189d31db7cda48b91d7e0a9795f31430b7f271219ab30f1d3ac9d" }, + { url = "https://mirrors.aliyun.com/pypi/packages/2c/54/887f3092a85238093a0b2154bd629c89444f395618842e8b0c41783898ea/markupsafe-3.0.3-cp312-cp312-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:94c6f0bb423f739146aec64595853541634bde58b2135f27f61c1ffd1cd4d16a" }, + { url = "https://mirrors.aliyun.com/pypi/packages/c9/2f/336b8c7b6f4a4d95e91119dc8521402461b74a485558d8f238a68312f11c/markupsafe-3.0.3-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:be8813b57049a7dc738189df53d69395eba14fb99345e0a5994914a3864c8a4b" }, + { url = "https://mirrors.aliyun.com/pypi/packages/32/43/67935f2b7e4982ffb50a4d169b724d74b62a3964bc1a9a527f5ac4f1ee2b/markupsafe-3.0.3-cp312-cp312-musllinux_1_2_riscv64.whl", hash = "sha256:83891d0e9fb81a825d9a6d61e3f07550ca70a076484292a70fde82c4b807286f" }, + { url = "https://mirrors.aliyun.com/pypi/packages/89/e0/4486f11e51bbba8b0c041098859e869e304d1c261e59244baa3d295d47b7/markupsafe-3.0.3-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:77f0643abe7495da77fb436f50f8dab76dbc6e5fd25d39589a0f1fe6548bfa2b" }, + { url = "https://mirrors.aliyun.com/pypi/packages/2f/e1/78ee7a023dac597a5825441ebd17170785a9dab23de95d2c7508ade94e0e/markupsafe-3.0.3-cp312-cp312-win32.whl", hash = "sha256:d88b440e37a16e651bda4c7c2b930eb586fd15ca7406cb39e211fcff3bf3017d" }, + { url = "https://mirrors.aliyun.com/pypi/packages/aa/5b/bec5aa9bbbb2c946ca2733ef9c4ca91c91b6a24580193e891b5f7dbe8e1e/markupsafe-3.0.3-cp312-cp312-win_amd64.whl", hash = "sha256:26a5784ded40c9e318cfc2bdb30fe164bdb8665ded9cd64d500a34fb42067b1c" }, + { url = "https://mirrors.aliyun.com/pypi/packages/e5/f1/216fc1bbfd74011693a4fd837e7026152e89c4bcf3e77b6692fba9923123/markupsafe-3.0.3-cp312-cp312-win_arm64.whl", hash = "sha256:35add3b638a5d900e807944a078b51922212fb3dedb01633a8defc4b01a3c85f" }, + { url = "https://mirrors.aliyun.com/pypi/packages/38/2f/907b9c7bbba283e68f20259574b13d005c121a0fa4c175f9bed27c4597ff/markupsafe-3.0.3-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:e1cf1972137e83c5d4c136c43ced9ac51d0e124706ee1c8aa8532c1287fa8795" }, + { url = "https://mirrors.aliyun.com/pypi/packages/9c/d9/5f7756922cdd676869eca1c4e3c0cd0df60ed30199ffd775e319089cb3ed/markupsafe-3.0.3-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:116bb52f642a37c115f517494ea5feb03889e04df47eeff5b130b1808ce7c219" }, + { url = "https://mirrors.aliyun.com/pypi/packages/00/07/575a68c754943058c78f30db02ee03a64b3c638586fba6a6dd56830b30a3/markupsafe-3.0.3-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:133a43e73a802c5562be9bbcd03d090aa5a1fe899db609c29e8c8d815c5f6de6" }, + { url = "https://mirrors.aliyun.com/pypi/packages/a9/21/9b05698b46f218fc0e118e1f8168395c65c8a2c750ae2bab54fc4bd4e0e8/markupsafe-3.0.3-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:ccfcd093f13f0f0b7fdd0f198b90053bf7b2f02a3927a30e63f3ccc9df56b676" }, + { url = "https://mirrors.aliyun.com/pypi/packages/7f/71/544260864f893f18b6827315b988c146b559391e6e7e8f7252839b1b846a/markupsafe-3.0.3-cp313-cp313-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:509fa21c6deb7a7a273d629cf5ec029bc209d1a51178615ddf718f5918992ab9" }, + { url = "https://mirrors.aliyun.com/pypi/packages/c2/28/b50fc2f74d1ad761af2f5dcce7492648b983d00a65b8c0e0cb457c82ebbe/markupsafe-3.0.3-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:a4afe79fb3de0b7097d81da19090f4df4f8d3a2b3adaa8764138aac2e44f3af1" }, + { url = "https://mirrors.aliyun.com/pypi/packages/ed/76/104b2aa106a208da8b17a2fb72e033a5a9d7073c68f7e508b94916ed47a9/markupsafe-3.0.3-cp313-cp313-musllinux_1_2_riscv64.whl", hash = "sha256:795e7751525cae078558e679d646ae45574b47ed6e7771863fcc079a6171a0fc" }, + { url = "https://mirrors.aliyun.com/pypi/packages/b5/99/16a5eb2d140087ebd97180d95249b00a03aa87e29cc224056274f2e45fd6/markupsafe-3.0.3-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:8485f406a96febb5140bfeca44a73e3ce5116b2501ac54fe953e488fb1d03b12" }, + { url = "https://mirrors.aliyun.com/pypi/packages/19/bc/e7140ed90c5d61d77cea142eed9f9c303f4c4806f60a1044c13e3f1471d0/markupsafe-3.0.3-cp313-cp313-win32.whl", hash = "sha256:bdd37121970bfd8be76c5fb069c7751683bdf373db1ed6c010162b2a130248ed" }, + { url = "https://mirrors.aliyun.com/pypi/packages/05/73/c4abe620b841b6b791f2edc248f556900667a5a1cf023a6646967ae98335/markupsafe-3.0.3-cp313-cp313-win_amd64.whl", hash = "sha256:9a1abfdc021a164803f4d485104931fb8f8c1efd55bc6b748d2f5774e78b62c5" }, + { url = "https://mirrors.aliyun.com/pypi/packages/f0/3a/fa34a0f7cfef23cf9500d68cb7c32dd64ffd58a12b09225fb03dd37d5b80/markupsafe-3.0.3-cp313-cp313-win_arm64.whl", hash = "sha256:7e68f88e5b8799aa49c85cd116c932a1ac15caaa3f5db09087854d218359e485" }, + { url = "https://mirrors.aliyun.com/pypi/packages/e4/d7/e05cd7efe43a88a17a37b3ae96e79a19e846f3f456fe79c57ca61356ef01/markupsafe-3.0.3-cp313-cp313t-macosx_10_13_x86_64.whl", hash = "sha256:218551f6df4868a8d527e3062d0fb968682fe92054e89978594c28e642c43a73" }, + { url = "https://mirrors.aliyun.com/pypi/packages/99/9e/e412117548182ce2148bdeacdda3bb494260c0b0184360fe0d56389b523b/markupsafe-3.0.3-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:3524b778fe5cfb3452a09d31e7b5adefeea8c5be1d43c4f810ba09f2ceb29d37" }, + { url = "https://mirrors.aliyun.com/pypi/packages/bc/e6/fa0ffcda717ef64a5108eaa7b4f5ed28d56122c9a6d70ab8b72f9f715c80/markupsafe-3.0.3-cp313-cp313t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:4e885a3d1efa2eadc93c894a21770e4bc67899e3543680313b09f139e149ab19" }, + { url = "https://mirrors.aliyun.com/pypi/packages/96/ec/2102e881fe9d25fc16cb4b25d5f5cde50970967ffa5dddafdb771237062d/markupsafe-3.0.3-cp313-cp313t-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:8709b08f4a89aa7586de0aadc8da56180242ee0ada3999749b183aa23df95025" }, + { url = "https://mirrors.aliyun.com/pypi/packages/4b/30/6f2fce1f1f205fc9323255b216ca8a235b15860c34b6798f810f05828e32/markupsafe-3.0.3-cp313-cp313t-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:b8512a91625c9b3da6f127803b166b629725e68af71f8184ae7e7d54686a56d6" }, + { url = "https://mirrors.aliyun.com/pypi/packages/58/47/4a0ccea4ab9f5dcb6f79c0236d954acb382202721e704223a8aafa38b5c8/markupsafe-3.0.3-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:9b79b7a16f7fedff2495d684f2b59b0457c3b493778c9eed31111be64d58279f" }, + { url = "https://mirrors.aliyun.com/pypi/packages/6a/70/3780e9b72180b6fecb83a4814d84c3bf4b4ae4bf0b19c27196104149734c/markupsafe-3.0.3-cp313-cp313t-musllinux_1_2_riscv64.whl", hash = "sha256:12c63dfb4a98206f045aa9563db46507995f7ef6d83b2f68eda65c307c6829eb" }, + { url = "https://mirrors.aliyun.com/pypi/packages/98/c5/c03c7f4125180fc215220c035beac6b9cb684bc7a067c84fc69414d315f5/markupsafe-3.0.3-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:8f71bc33915be5186016f675cd83a1e08523649b0e33efdb898db577ef5bb009" }, + { url = "https://mirrors.aliyun.com/pypi/packages/80/d6/2d1b89f6ca4bff1036499b1e29a1d02d282259f3681540e16563f27ebc23/markupsafe-3.0.3-cp313-cp313t-win32.whl", hash = "sha256:69c0b73548bc525c8cb9a251cddf1931d1db4d2258e9599c28c07ef3580ef354" }, + { url = "https://mirrors.aliyun.com/pypi/packages/2b/98/e48a4bfba0a0ffcf9925fe2d69240bfaa19c6f7507b8cd09c70684a53c1e/markupsafe-3.0.3-cp313-cp313t-win_amd64.whl", hash = "sha256:1b4b79e8ebf6b55351f0d91fe80f893b4743f104bff22e90697db1590e47a218" }, + { url = "https://mirrors.aliyun.com/pypi/packages/0e/72/e3cc540f351f316e9ed0f092757459afbc595824ca724cbc5a5d4263713f/markupsafe-3.0.3-cp313-cp313t-win_arm64.whl", hash = "sha256:ad2cf8aa28b8c020ab2fc8287b0f823d0a7d8630784c31e9ee5edea20f406287" }, + { url = "https://mirrors.aliyun.com/pypi/packages/33/8a/8e42d4838cd89b7dde187011e97fe6c3af66d8c044997d2183fbd6d31352/markupsafe-3.0.3-cp314-cp314-macosx_10_13_x86_64.whl", hash = "sha256:eaa9599de571d72e2daf60164784109f19978b327a3910d3e9de8c97b5b70cfe" }, + { url = "https://mirrors.aliyun.com/pypi/packages/b5/64/7660f8a4a8e53c924d0fa05dc3a55c9cee10bbd82b11c5afb27d44b096ce/markupsafe-3.0.3-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:c47a551199eb8eb2121d4f0f15ae0f923d31350ab9280078d1e5f12b249e0026" }, + { url = "https://mirrors.aliyun.com/pypi/packages/da/ef/e648bfd021127bef5fa12e1720ffed0c6cbb8310c8d9bea7266337ff06de/markupsafe-3.0.3-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:f34c41761022dd093b4b6896d4810782ffbabe30f2d443ff5f083e0cbbb8c737" }, + { url = "https://mirrors.aliyun.com/pypi/packages/41/3c/a36c2450754618e62008bf7435ccb0f88053e07592e6028a34776213d877/markupsafe-3.0.3-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:457a69a9577064c05a97c41f4e65148652db078a3a509039e64d3467b9e7ef97" }, + { url = "https://mirrors.aliyun.com/pypi/packages/bc/20/b7fdf89a8456b099837cd1dc21974632a02a999ec9bf7ca3e490aacd98e7/markupsafe-3.0.3-cp314-cp314-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:e8afc3f2ccfa24215f8cb28dcf43f0113ac3c37c2f0f0806d8c70e4228c5cf4d" }, + { url = "https://mirrors.aliyun.com/pypi/packages/9a/a7/591f592afdc734f47db08a75793a55d7fbcc6902a723ae4cfbab61010cc5/markupsafe-3.0.3-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:ec15a59cf5af7be74194f7ab02d0f59a62bdcf1a537677ce67a2537c9b87fcda" }, + { url = "https://mirrors.aliyun.com/pypi/packages/7d/33/45b24e4f44195b26521bc6f1a82197118f74df348556594bd2262bda1038/markupsafe-3.0.3-cp314-cp314-musllinux_1_2_riscv64.whl", hash = "sha256:0eb9ff8191e8498cca014656ae6b8d61f39da5f95b488805da4bb029cccbfbaf" }, + { url = "https://mirrors.aliyun.com/pypi/packages/ff/0e/53dfaca23a69fbfbbf17a4b64072090e70717344c52eaaaa9c5ddff1e5f0/markupsafe-3.0.3-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:2713baf880df847f2bece4230d4d094280f4e67b1e813eec43b4c0e144a34ffe" }, + { url = "https://mirrors.aliyun.com/pypi/packages/46/11/f333a06fc16236d5238bfe74daccbca41459dcd8d1fa952e8fbd5dccfb70/markupsafe-3.0.3-cp314-cp314-win32.whl", hash = "sha256:729586769a26dbceff69f7a7dbbf59ab6572b99d94576a5592625d5b411576b9" }, + { url = "https://mirrors.aliyun.com/pypi/packages/28/52/182836104b33b444e400b14f797212f720cbc9ed6ba34c800639d154e821/markupsafe-3.0.3-cp314-cp314-win_amd64.whl", hash = "sha256:bdc919ead48f234740ad807933cdf545180bfbe9342c2bb451556db2ed958581" }, + { url = "https://mirrors.aliyun.com/pypi/packages/6f/18/acf23e91bd94fd7b3031558b1f013adfa21a8e407a3fdb32745538730382/markupsafe-3.0.3-cp314-cp314-win_arm64.whl", hash = "sha256:5a7d5dc5140555cf21a6fefbdbf8723f06fcd2f63ef108f2854de715e4422cb4" }, + { url = "https://mirrors.aliyun.com/pypi/packages/3c/f0/57689aa4076e1b43b15fdfa646b04653969d50cf30c32a102762be2485da/markupsafe-3.0.3-cp314-cp314t-macosx_10_13_x86_64.whl", hash = "sha256:1353ef0c1b138e1907ae78e2f6c63ff67501122006b0f9abad68fda5f4ffc6ab" }, + { url = "https://mirrors.aliyun.com/pypi/packages/89/c3/2e67a7ca217c6912985ec766c6393b636fb0c2344443ff9d91404dc4c79f/markupsafe-3.0.3-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:1085e7fbddd3be5f89cc898938f42c0b3c711fdcb37d75221de2666af647c175" }, + { url = "https://mirrors.aliyun.com/pypi/packages/f0/00/be561dce4e6ca66b15276e184ce4b8aec61fe83662cce2f7d72bd3249d28/markupsafe-3.0.3-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:1b52b4fb9df4eb9ae465f8d0c228a00624de2334f216f178a995ccdcf82c4634" }, + { url = "https://mirrors.aliyun.com/pypi/packages/50/09/c419f6f5a92e5fadde27efd190eca90f05e1261b10dbd8cbcb39cd8ea1dc/markupsafe-3.0.3-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:fed51ac40f757d41b7c48425901843666a6677e3e8eb0abcff09e4ba6e664f50" }, + { url = "https://mirrors.aliyun.com/pypi/packages/22/44/a0681611106e0b2921b3033fc19bc53323e0b50bc70cffdd19f7d679bb66/markupsafe-3.0.3-cp314-cp314t-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:f190daf01f13c72eac4efd5c430a8de82489d9cff23c364c3ea822545032993e" }, + { url = "https://mirrors.aliyun.com/pypi/packages/5f/57/1b0b3f100259dc9fffe780cfb60d4be71375510e435efec3d116b6436d43/markupsafe-3.0.3-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:e56b7d45a839a697b5eb268c82a71bd8c7f6c94d6fd50c3d577fa39a9f1409f5" }, + { url = "https://mirrors.aliyun.com/pypi/packages/26/6a/4bf6d0c97c4920f1597cc14dd720705eca0bf7c787aebc6bb4d1bead5388/markupsafe-3.0.3-cp314-cp314t-musllinux_1_2_riscv64.whl", hash = "sha256:f3e98bb3798ead92273dc0e5fd0f31ade220f59a266ffd8a4f6065e0a3ce0523" }, + { url = "https://mirrors.aliyun.com/pypi/packages/14/c7/ca723101509b518797fedc2fdf79ba57f886b4aca8a7d31857ba3ee8281f/markupsafe-3.0.3-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:5678211cb9333a6468fb8d8be0305520aa073f50d17f089b5b4b477ea6e67fdc" }, + { url = "https://mirrors.aliyun.com/pypi/packages/fb/df/5bd7a48c256faecd1d36edc13133e51397e41b73bb77e1a69deab746ebac/markupsafe-3.0.3-cp314-cp314t-win32.whl", hash = "sha256:915c04ba3851909ce68ccc2b8e2cd691618c4dc4c4232fb7982bca3f41fd8c3d" }, + { url = "https://mirrors.aliyun.com/pypi/packages/1a/8a/0402ba61a2f16038b48b39bccca271134be00c5c9f0f623208399333c448/markupsafe-3.0.3-cp314-cp314t-win_amd64.whl", hash = "sha256:4faffd047e07c38848ce017e8725090413cd80cbc23d86e55c587bf979e579c9" }, + { url = "https://mirrors.aliyun.com/pypi/packages/70/bc/6f1c2f612465f5fa89b95bead1f44dcb607670fd42891d8fdcd5d039f4f4/markupsafe-3.0.3-cp314-cp314t-win_arm64.whl", hash = "sha256:32001d6a8fc98c8cb5c947787c5d08b0a50663d139f1305bac5885d98d9b40fa" }, +] + [[package]] name = "math-verify" version = "0.8.0" @@ -2653,6 +2948,25 @@ wheels = [ { url = "https://mirrors.aliyun.com/pypi/packages/be/9c/92789c596b8df838baa98fa71844d84283302f7604ed565dafe5a6b5041a/oauthlib-3.3.1-py3-none-any.whl", hash = "sha256:88119c938d2b8fb88561af5f6ee0eec8cc8d552b7bb1f712743136eb7523b7a1" }, ] +[[package]] +name = "openai" +version = "2.36.0" +source = { registry = "https://mirrors.aliyun.com/pypi/simple/" } +dependencies = [ + { name = "anyio" }, + { name = "distro" }, + { name = "httpx" }, + { name = "jiter" }, + { name = "pydantic" }, + { name = "sniffio" }, + { name = "tqdm" }, + { name = "typing-extensions" }, +] +sdist = { url = "https://mirrors.aliyun.com/pypi/packages/f4/a1/4d5e84cf51720fc1526cc49e10ac1961abcccb55b0efb3d970db1e9a2728/openai-2.36.0.tar.gz", hash = "sha256:139dea0edd2f1b30c33d46ae1a6929e03906254140318e4608e98fe8c566f2e7" } +wheels = [ + { url = "https://mirrors.aliyun.com/pypi/packages/9d/1c/5d43735b2553baae2a5e899dcbcd0670a86930d993184d72ca909bf11c9b/openai-2.36.0-py3-none-any.whl", hash = "sha256:143f6194b548dbc2c921af1f1b03b9f14c85fed8a75b5b516f5bcc11a2a50c63" }, +] + [[package]] name = "opencensus" version = "0.11.4" @@ -4118,6 +4432,7 @@ builder = [ model-service = [ { name = "alibabacloud-cr20181201" }, { name = "fastapi" }, + { name = "litellm" }, { name = "psutil" }, { name = "swebench" }, { name = "uvicorn" }, @@ -4181,6 +4496,7 @@ requires-dist = [ { name = "gem-llm", marker = "extra == 'sandbox-actor'", specifier = ">=0.1.0" }, { name = "httpx" }, { name = "kubernetes", marker = "extra == 'admin'", specifier = ">=35.0.0" }, + { name = "litellm", marker = "extra == 'model-service'", specifier = ">=1.50.0" }, { name = "nacos-sdk-python", marker = "extra == 'admin'", specifier = ">=0.1.14" }, { name = "nacos-sdk-python", marker = "extra == 'sandbox-actor'", specifier = ">=0.1.14" }, { name = "numpy", marker = "extra == 'rocklet'", specifier = "<=2.2.6" }, @@ -4633,6 +4949,94 @@ wheels = [ { url = "https://mirrors.aliyun.com/pypi/packages/e5/30/643397144bfbfec6f6ef821f36f33e57d35946c44a2352d3c9f0ae847619/tenacity-9.1.2-py3-none-any.whl", hash = "sha256:f77bf36710d8b73a50b2dd155c97b870017ad21afe6ab300326b0371b3b05138" }, ] +[[package]] +name = "tiktoken" +version = "0.12.0" +source = { registry = "https://mirrors.aliyun.com/pypi/simple/" } +dependencies = [ + { name = "regex" }, + { name = "requests" }, +] +sdist = { url = "https://mirrors.aliyun.com/pypi/packages/7d/ab/4d017d0f76ec3171d469d80fc03dfbb4e48a4bcaddaa831b31d526f05edc/tiktoken-0.12.0.tar.gz", hash = "sha256:b18ba7ee2b093863978fcb14f74b3707cdc8d4d4d3836853ce7ec60772139931" } +wheels = [ + { url = "https://mirrors.aliyun.com/pypi/packages/89/b3/2cb7c17b6c4cf8ca983204255d3f1d95eda7213e247e6947a0ee2c747a2c/tiktoken-0.12.0-cp310-cp310-macosx_10_12_x86_64.whl", hash = "sha256:3de02f5a491cfd179aec916eddb70331814bd6bf764075d39e21d5862e533970" }, + { url = "https://mirrors.aliyun.com/pypi/packages/27/0f/df139f1df5f6167194ee5ab24634582ba9a1b62c6b996472b0277ec80f66/tiktoken-0.12.0-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:b6cfb6d9b7b54d20af21a912bfe63a2727d9cfa8fbda642fd8322c70340aad16" }, + { url = "https://mirrors.aliyun.com/pypi/packages/ef/5d/26a691f28ab220d5edc09b9b787399b130f24327ef824de15e5d85ef21aa/tiktoken-0.12.0-cp310-cp310-manylinux_2_28_aarch64.whl", hash = "sha256:cde24cdb1b8a08368f709124f15b36ab5524aac5fa830cc3fdce9c03d4fb8030" }, + { url = "https://mirrors.aliyun.com/pypi/packages/b2/94/443fab3d4e5ebecac895712abd3849b8da93b7b7dec61c7db5c9c7ebe40c/tiktoken-0.12.0-cp310-cp310-manylinux_2_28_x86_64.whl", hash = "sha256:6de0da39f605992649b9cfa6f84071e3f9ef2cec458d08c5feb1b6f0ff62e134" }, + { url = "https://mirrors.aliyun.com/pypi/packages/54/35/388f941251b2521c70dd4c5958e598ea6d2c88e28445d2fb8189eecc1dfc/tiktoken-0.12.0-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:6faa0534e0eefbcafaccb75927a4a380463a2eaa7e26000f0173b920e98b720a" }, + { url = "https://mirrors.aliyun.com/pypi/packages/f8/00/c6681c7f833dd410576183715a530437a9873fa910265817081f65f9105f/tiktoken-0.12.0-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:82991e04fc860afb933efb63957affc7ad54f83e2216fe7d319007dab1ba5892" }, + { url = "https://mirrors.aliyun.com/pypi/packages/5f/d2/82e795a6a9bafa034bf26a58e68fe9a89eeaaa610d51dbeb22106ba04f0a/tiktoken-0.12.0-cp310-cp310-win_amd64.whl", hash = "sha256:6fb2995b487c2e31acf0a9e17647e3b242235a20832642bb7a9d1a181c0c1bb1" }, + { url = "https://mirrors.aliyun.com/pypi/packages/de/46/21ea696b21f1d6d1efec8639c204bdf20fde8bafb351e1355c72c5d7de52/tiktoken-0.12.0-cp311-cp311-macosx_10_12_x86_64.whl", hash = "sha256:6e227c7f96925003487c33b1b32265fad2fbcec2b7cf4817afb76d416f40f6bb" }, + { url = "https://mirrors.aliyun.com/pypi/packages/c9/d9/35c5d2d9e22bb2a5f74ba48266fb56c63d76ae6f66e02feb628671c0283e/tiktoken-0.12.0-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:c06cf0fcc24c2cb2adb5e185c7082a82cba29c17575e828518c2f11a01f445aa" }, + { url = "https://mirrors.aliyun.com/pypi/packages/01/84/961106c37b8e49b9fdcf33fe007bb3a8fdcc380c528b20cc7fbba80578b8/tiktoken-0.12.0-cp311-cp311-manylinux_2_28_aarch64.whl", hash = "sha256:f18f249b041851954217e9fd8e5c00b024ab2315ffda5ed77665a05fa91f42dc" }, + { url = "https://mirrors.aliyun.com/pypi/packages/6a/d0/3d9275198e067f8b65076a68894bb52fd253875f3644f0a321a720277b8a/tiktoken-0.12.0-cp311-cp311-manylinux_2_28_x86_64.whl", hash = "sha256:47a5bc270b8c3db00bb46ece01ef34ad050e364b51d406b6f9730b64ac28eded" }, + { url = "https://mirrors.aliyun.com/pypi/packages/78/db/a58e09687c1698a7c592e1038e01c206569b86a0377828d51635561f8ebf/tiktoken-0.12.0-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:508fa71810c0efdcd1b898fda574889ee62852989f7c1667414736bcb2b9a4bd" }, + { url = "https://mirrors.aliyun.com/pypi/packages/9e/1b/a9e4d2bf91d515c0f74afc526fd773a812232dd6cda33ebea7f531202325/tiktoken-0.12.0-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:a1af81a6c44f008cba48494089dd98cccb8b313f55e961a52f5b222d1e507967" }, + { url = "https://mirrors.aliyun.com/pypi/packages/9d/15/963819345f1b1fb0809070a79e9dd96938d4ca41297367d471733e79c76c/tiktoken-0.12.0-cp311-cp311-win_amd64.whl", hash = "sha256:3e68e3e593637b53e56f7237be560f7a394451cb8c11079755e80ae64b9e6def" }, + { url = "https://mirrors.aliyun.com/pypi/packages/a4/85/be65d39d6b647c79800fd9d29241d081d4eeb06271f383bb87200d74cf76/tiktoken-0.12.0-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:b97f74aca0d78a1ff21b8cd9e9925714c15a9236d6ceacf5c7327c117e6e21e8" }, + { url = "https://mirrors.aliyun.com/pypi/packages/4a/42/6573e9129bc55c9bf7300b3a35bef2c6b9117018acca0dc760ac2d93dffe/tiktoken-0.12.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:2b90f5ad190a4bb7c3eb30c5fa32e1e182ca1ca79f05e49b448438c3e225a49b" }, + { url = "https://mirrors.aliyun.com/pypi/packages/66/c5/ed88504d2f4a5fd6856990b230b56d85a777feab84e6129af0822f5d0f70/tiktoken-0.12.0-cp312-cp312-manylinux_2_28_aarch64.whl", hash = "sha256:65b26c7a780e2139e73acc193e5c63ac754021f160df919add909c1492c0fb37" }, + { url = "https://mirrors.aliyun.com/pypi/packages/f4/90/3dae6cc5436137ebd38944d396b5849e167896fc2073da643a49f372dc4f/tiktoken-0.12.0-cp312-cp312-manylinux_2_28_x86_64.whl", hash = "sha256:edde1ec917dfd21c1f2f8046b86348b0f54a2c0547f68149d8600859598769ad" }, + { url = "https://mirrors.aliyun.com/pypi/packages/a3/fe/26df24ce53ffde419a42f5f53d755b995c9318908288c17ec3f3448313a3/tiktoken-0.12.0-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:35a2f8ddd3824608b3d650a000c1ef71f730d0c56486845705a8248da00f9fe5" }, + { url = "https://mirrors.aliyun.com/pypi/packages/20/cc/b064cae1a0e9fac84b0d2c46b89f4e57051a5f41324e385d10225a984c24/tiktoken-0.12.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:83d16643edb7fa2c99eff2ab7733508aae1eebb03d5dfc46f5565862810f24e3" }, + { url = "https://mirrors.aliyun.com/pypi/packages/81/10/b8523105c590c5b8349f2587e2fdfe51a69544bd5a76295fc20f2374f470/tiktoken-0.12.0-cp312-cp312-win_amd64.whl", hash = "sha256:ffc5288f34a8bc02e1ea7047b8d041104791d2ddbf42d1e5fa07822cbffe16bd" }, + { url = "https://mirrors.aliyun.com/pypi/packages/00/61/441588ee21e6b5cdf59d6870f86beb9789e532ee9718c251b391b70c68d6/tiktoken-0.12.0-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:775c2c55de2310cc1bc9a3ad8826761cbdc87770e586fd7b6da7d4589e13dab3" }, + { url = "https://mirrors.aliyun.com/pypi/packages/1f/05/dcf94486d5c5c8d34496abe271ac76c5b785507c8eae71b3708f1ad9b45a/tiktoken-0.12.0-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:a01b12f69052fbe4b080a2cfb867c4de12c704b56178edf1d1d7b273561db160" }, + { url = "https://mirrors.aliyun.com/pypi/packages/a0/70/5163fe5359b943f8db9946b62f19be2305de8c3d78a16f629d4165e2f40e/tiktoken-0.12.0-cp313-cp313-manylinux_2_28_aarch64.whl", hash = "sha256:01d99484dc93b129cd0964f9d34eee953f2737301f18b3c7257bf368d7615baa" }, + { url = "https://mirrors.aliyun.com/pypi/packages/0c/da/c028aa0babf77315e1cef357d4d768800c5f8a6de04d0eac0f377cb619fa/tiktoken-0.12.0-cp313-cp313-manylinux_2_28_x86_64.whl", hash = "sha256:4a1a4fcd021f022bfc81904a911d3df0f6543b9e7627b51411da75ff2fe7a1be" }, + { url = "https://mirrors.aliyun.com/pypi/packages/a0/5a/886b108b766aa53e295f7216b509be95eb7d60b166049ce2c58416b25f2a/tiktoken-0.12.0-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:981a81e39812d57031efdc9ec59fa32b2a5a5524d20d4776574c4b4bd2e9014a" }, + { url = "https://mirrors.aliyun.com/pypi/packages/f4/f8/4db272048397636ac7a078d22773dd2795b1becee7bc4922fe6207288d57/tiktoken-0.12.0-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:9baf52f84a3f42eef3ff4e754a0db79a13a27921b457ca9832cf944c6be4f8f3" }, + { url = "https://mirrors.aliyun.com/pypi/packages/8e/32/45d02e2e0ea2be3a9ed22afc47d93741247e75018aac967b713b2941f8ea/tiktoken-0.12.0-cp313-cp313-win_amd64.whl", hash = "sha256:b8a0cd0c789a61f31bf44851defbd609e8dd1e2c8589c614cc1060940ef1f697" }, + { url = "https://mirrors.aliyun.com/pypi/packages/ce/76/994fc868f88e016e6d05b0da5ac24582a14c47893f4474c3e9744283f1d5/tiktoken-0.12.0-cp313-cp313t-macosx_10_13_x86_64.whl", hash = "sha256:d5f89ea5680066b68bcb797ae85219c72916c922ef0fcdd3480c7d2315ffff16" }, + { url = "https://mirrors.aliyun.com/pypi/packages/f6/b8/57ef1456504c43a849821920d582a738a461b76a047f352f18c0b26c6516/tiktoken-0.12.0-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:b4e7ed1c6a7a8a60a3230965bdedba8cc58f68926b835e519341413370e0399a" }, + { url = "https://mirrors.aliyun.com/pypi/packages/72/90/13da56f664286ffbae9dbcfadcc625439142675845baa62715e49b87b68b/tiktoken-0.12.0-cp313-cp313t-manylinux_2_28_aarch64.whl", hash = "sha256:fc530a28591a2d74bce821d10b418b26a094bf33839e69042a6e86ddb7a7fb27" }, + { url = "https://mirrors.aliyun.com/pypi/packages/05/df/4f80030d44682235bdaecd7346c90f67ae87ec8f3df4a3442cb53834f7e4/tiktoken-0.12.0-cp313-cp313t-manylinux_2_28_x86_64.whl", hash = "sha256:06a9f4f49884139013b138920a4c393aa6556b2f8f536345f11819389c703ebb" }, + { url = "https://mirrors.aliyun.com/pypi/packages/22/1f/ae535223a8c4ef4c0c1192e3f9b82da660be9eb66b9279e95c99288e9dab/tiktoken-0.12.0-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:04f0e6a985d95913cabc96a741c5ffec525a2c72e9df086ff17ebe35985c800e" }, + { url = "https://mirrors.aliyun.com/pypi/packages/78/a7/f8ead382fce0243cb625c4f266e66c27f65ae65ee9e77f59ea1653b6d730/tiktoken-0.12.0-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:0ee8f9ae00c41770b5f9b0bb1235474768884ae157de3beb5439ca0fd70f3e25" }, + { url = "https://mirrors.aliyun.com/pypi/packages/93/e0/6cc82a562bc6365785a3ff0af27a2a092d57c47d7a81d9e2295d8c36f011/tiktoken-0.12.0-cp313-cp313t-win_amd64.whl", hash = "sha256:dc2dd125a62cb2b3d858484d6c614d136b5b848976794edfb63688d539b8b93f" }, + { url = "https://mirrors.aliyun.com/pypi/packages/72/05/3abc1db5d2c9aadc4d2c76fa5640134e475e58d9fbb82b5c535dc0de9b01/tiktoken-0.12.0-cp314-cp314-macosx_10_13_x86_64.whl", hash = "sha256:a90388128df3b3abeb2bfd1895b0681412a8d7dc644142519e6f0a97c2111646" }, + { url = "https://mirrors.aliyun.com/pypi/packages/e3/7b/50c2f060412202d6c95f32b20755c7a6273543b125c0985d6fa9465105af/tiktoken-0.12.0-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:da900aa0ad52247d8794e307d6446bd3cdea8e192769b56276695d34d2c9aa88" }, + { url = "https://mirrors.aliyun.com/pypi/packages/14/27/bf795595a2b897e271771cd31cb847d479073497344c637966bdf2853da1/tiktoken-0.12.0-cp314-cp314-manylinux_2_28_aarch64.whl", hash = "sha256:285ba9d73ea0d6171e7f9407039a290ca77efcdb026be7769dccc01d2c8d7fff" }, + { url = "https://mirrors.aliyun.com/pypi/packages/f5/de/9341a6d7a8f1b448573bbf3425fa57669ac58258a667eb48a25dfe916d70/tiktoken-0.12.0-cp314-cp314-manylinux_2_28_x86_64.whl", hash = "sha256:d186a5c60c6a0213f04a7a802264083dea1bbde92a2d4c7069e1a56630aef830" }, + { url = "https://mirrors.aliyun.com/pypi/packages/75/0d/881866647b8d1be4d67cb24e50d0c26f9f807f994aa1510cb9ba2fe5f612/tiktoken-0.12.0-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:604831189bd05480f2b885ecd2d1986dc7686f609de48208ebbbddeea071fc0b" }, + { url = "https://mirrors.aliyun.com/pypi/packages/b3/1e/b651ec3059474dab649b8d5b69f5c65cd8fcd8918568c1935bd4136c9392/tiktoken-0.12.0-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:8f317e8530bb3a222547b85a58583238c8f74fd7a7408305f9f63246d1a0958b" }, + { url = "https://mirrors.aliyun.com/pypi/packages/80/57/ce64fd16ac390fafde001268c364d559447ba09b509181b2808622420eec/tiktoken-0.12.0-cp314-cp314-win_amd64.whl", hash = "sha256:399c3dd672a6406719d84442299a490420b458c44d3ae65516302a99675888f3" }, + { url = "https://mirrors.aliyun.com/pypi/packages/ac/a4/72eed53e8976a099539cdd5eb36f241987212c29629d0a52c305173e0a68/tiktoken-0.12.0-cp314-cp314t-macosx_10_13_x86_64.whl", hash = "sha256:c2c714c72bc00a38ca969dae79e8266ddec999c7ceccd603cc4f0d04ccd76365" }, + { url = "https://mirrors.aliyun.com/pypi/packages/e6/d7/0110b8f54c008466b19672c615f2168896b83706a6611ba6e47313dbc6e9/tiktoken-0.12.0-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:cbb9a3ba275165a2cb0f9a83f5d7025afe6b9d0ab01a22b50f0e74fee2ad253e" }, + { url = "https://mirrors.aliyun.com/pypi/packages/5f/77/4f268c41a3957c418b084dd576ea2fad2e95da0d8e1ab705372892c2ca22/tiktoken-0.12.0-cp314-cp314t-manylinux_2_28_aarch64.whl", hash = "sha256:dfdfaa5ffff8993a3af94d1125870b1d27aed7cb97aa7eb8c1cefdbc87dbee63" }, + { url = "https://mirrors.aliyun.com/pypi/packages/4e/2b/fc46c90fe5028bd094cd6ee25a7db321cb91d45dc87531e2bdbb26b4867a/tiktoken-0.12.0-cp314-cp314t-manylinux_2_28_x86_64.whl", hash = "sha256:584c3ad3d0c74f5269906eb8a659c8bfc6144a52895d9261cdaf90a0ae5f4de0" }, + { url = "https://mirrors.aliyun.com/pypi/packages/28/c0/3c7a39ff68022ddfd7d93f3337ad90389a342f761c4d71de99a3ccc57857/tiktoken-0.12.0-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:54c891b416a0e36b8e2045b12b33dd66fb34a4fe7965565f1b482da50da3e86a" }, + { url = "https://mirrors.aliyun.com/pypi/packages/ab/0d/c1ad6f4016a3968c048545f5d9b8ffebf577774b2ede3e2e352553b685fe/tiktoken-0.12.0-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:5edb8743b88d5be814b1a8a8854494719080c28faaa1ccbef02e87354fe71ef0" }, + { url = "https://mirrors.aliyun.com/pypi/packages/af/df/c7891ef9d2712ad774777271d39fdef63941ffba0a9d59b7ad1fd2765e57/tiktoken-0.12.0-cp314-cp314t-win_amd64.whl", hash = "sha256:f61c0aea5565ac82e2ec50a05e02a6c44734e91b51c10510b084ea1b8e633a71" }, +] + +[[package]] +name = "tokenizers" +version = "0.23.1" +source = { registry = "https://mirrors.aliyun.com/pypi/simple/" } +dependencies = [ + { name = "huggingface-hub" }, +] +sdist = { url = "https://mirrors.aliyun.com/pypi/packages/c1/60/21f715d9faba5f5407ff759472ade058ec4a507ad62bcea47cb847239a73/tokenizers-0.23.1.tar.gz", hash = "sha256:1feeeadf865a7915adc25445dea30e9933e593c31bb96c277cee36de227c8bfa" } +wheels = [ + { url = "https://mirrors.aliyun.com/pypi/packages/87/39/b87a87d5bb9470610b80a2d31df42fcffeaf35118b8b97952b2aff598cc7/tokenizers-0.23.1-cp310-abi3-macosx_10_12_x86_64.whl", hash = "sha256:e03d6ffcbe0d56ee9c1ccd070e70a13fa750727c0277e138152acbc0252c2224" }, + { url = "https://mirrors.aliyun.com/pypi/packages/e2/6a/068ed9f6e444c9d7e9d55ce134181325700f3d7f30410721bdc8f848d727/tokenizers-0.23.1-cp310-abi3-macosx_11_0_arm64.whl", hash = "sha256:e0948bbb1ac1d7cdfc9fb6d62c596e3b7550036ad60ecd654a66ad273326324e" }, + { url = "https://mirrors.aliyun.com/pypi/packages/6c/36/e006edf031154cba92b8416057d92c3abe3635e4c4b0aa0b5b9bb39dde70/tokenizers-0.23.1-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:1bf13402aff9bc533c89cb849ec3b412dc3fbeacc9744840e423d7bf3f7dc0e3" }, + { url = "https://mirrors.aliyun.com/pypi/packages/a2/ef/7735d226f9c7f874a6bee5e3f27fb25ecabdf207d37b8cf45286d0795893/tokenizers-0.23.1-cp310-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:f836ca703b89ae07919a309f9651f7a88fd5a33d5f718ba5ad0870ec0256bad6" }, + { url = "https://mirrors.aliyun.com/pypi/packages/b9/d9/24827036f6e21297bfffda0768e58eb6096a4f411e932964a01707857931/tokenizers-0.23.1-cp310-abi3-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:ae848657742035523fdf261773630cb819a26995fcd3d9ecae0c1daf6e5a4959" }, + { url = "https://mirrors.aliyun.com/pypi/packages/0c/9a/22f3582b3a4f49358293a5206e25317621ee4526bfe9cdaa0f07a12e770e/tokenizers-0.23.1-cp310-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:53b09e85775d5187941e7bab30e941b4134ab4a7dd8c68e783d231fb7ca27c51" }, + { url = "https://mirrors.aliyun.com/pypi/packages/7e/65/b8f8814eef95800f20721384136d9a1d22241d50b2874357cb70542c392f/tokenizers-0.23.1-cp310-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:ea5a0ce170074329faaa8ea3f6400ecde604b6678192688533af80980daae71a" }, + { url = "https://mirrors.aliyun.com/pypi/packages/0d/d5/1353e5f677ec27c2494fb6a6725e82d56c985f53e90ec511369e7e4f02c6/tokenizers-0.23.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:5075b405006415ea148a992d093699c66eb01952bf59f4d5727089a98bda45a4" }, + { url = "https://mirrors.aliyun.com/pypi/packages/71/89/39b6b8fc073fb6d413d0147aa333dc7eff7be65639ac9d19930a0b21bf33/tokenizers-0.23.1-cp310-abi3-manylinux_2_31_riscv64.whl", hash = "sha256:56f3a77de629917652f876294dc9fe6bad4a0c43bc229dc72e59bb23a0f4729a" }, + { url = "https://mirrors.aliyun.com/pypi/packages/0f/80/127c854da64827e5b79264ce524993a90dddcb320e5cd42412c5c02f9e8a/tokenizers-0.23.1-cp310-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:9d10a6d957ef01896dc274e890eee27d41bd0e74ef31e60616f0fc311345184e" }, + { url = "https://mirrors.aliyun.com/pypi/packages/fe/ba/44c2502feb1a058f096ddfb4e0996ef3225a01a388e1a9b094e91689fe93/tokenizers-0.23.1-cp310-abi3-musllinux_1_2_armv7l.whl", hash = "sha256:1974288a609c343774f1b897c8b482c791ab17b75ab5c8c2b1737565c1d82288" }, + { url = "https://mirrors.aliyun.com/pypi/packages/9e/c1/464019a9fb059870bfe4eebb4ba12208f3042035e258bf5e782906bd3847/tokenizers-0.23.1-cp310-abi3-musllinux_1_2_i686.whl", hash = "sha256:120468fb4c24faf0543c835a4fabafa4deb3f20a035c9b6e83d0b553a97615d4" }, + { url = "https://mirrors.aliyun.com/pypi/packages/79/94/3ac1432bda31626071e9b6a12709b97ae05131c804b94c8f3ac622c5da32/tokenizers-0.23.1-cp310-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:e3d8f40ea6268047de7046906326abed5134f27d4e8447b23763afe5808c8a96" }, + { url = "https://mirrors.aliyun.com/pypi/packages/6a/dd/631b21433c771b1382535326f0eca80b9c9cee2e64961dd993bc9ac4669e/tokenizers-0.23.1-cp310-abi3-win32.whl", hash = "sha256:93120a930b919416da7cd10a2f606ac9919cc69cacae7980fa2140e277660948" }, + { url = "https://mirrors.aliyun.com/pypi/packages/97/c9/2553f72aaf65a2797d4229e37fa7fbe38ffbf3e32912d31bdd78b3323e59/tokenizers-0.23.1-cp310-abi3-win_amd64.whl", hash = "sha256:e7bfaf995c1bdbbd21d13539decb6650967013759318627d85daeb7881af16b7" }, + { url = "https://mirrors.aliyun.com/pypi/packages/cd/2b/2be299bab55fc595e3d38567edb1a87f86e594842968fa9515a07bdcf422/tokenizers-0.23.1-cp310-abi3-win_arm64.whl", hash = "sha256:a26197957d8e4425dfba746315f3c425ea00cfa8367c5fbc4ec73447893dcea9" }, +] + [[package]] name = "toml" version = "0.10.2" From a8fb54c66b7aa2e9376f4c593ca4bce75cb36be1 Mon Sep 17 00:00:00 2001 From: "pengshixin.psx" Date: Tue, 12 May 2026 03:18:28 +0000 Subject: [PATCH 02/25] fix(model-service): pass api_key + use custom_openai prefix + suppress cost calc Three fixes to proxy.py uncovered while testing against DashScope (glm-5): - Extract Bearer token from incoming Authorization header and pass as litellm api_key kwarg; setting it via extra_headers does not work because litellm always regenerates Authorization from api_key. Authorization is now stripped from the forwarded header set. - Switch upstream prefix from openai/ to custom_openai/. This is litellm's standard pattern for OpenAI-compatible third-party endpoints (DashScope, ModelScope, Groq, Mistral, ...) and avoids "model isn't mapped" on arbitrary upstream model names. - Pass input_cost_per_token=0 / output_cost_per_token=0 so litellm's cost calculator does not raise "model isn't mapped" on unknown models and pollute StandardLoggingPayload.response_cost_failure_debug_information. Co-Authored-By: Claude Sonnet 4.6 --- rock/sdk/model/server/api/proxy.py | 42 ++++++++++++++++++++++++++---- tests/unit/sdk/model/test_proxy.py | 12 ++++++--- 2 files changed, 45 insertions(+), 9 deletions(-) diff --git a/rock/sdk/model/server/api/proxy.py b/rock/sdk/model/server/api/proxy.py index 4894430641..18d0ead93c 100644 --- a/rock/sdk/model/server/api/proxy.py +++ b/rock/sdk/model/server/api/proxy.py @@ -35,7 +35,26 @@ # so the client's values would be wrong or misleading # - transfer-encoding / connection: true RFC 7230 hop-by-hop headers, scoped to # the client↔proxy connection only -_HEADERS_NOT_TO_FORWARD = frozenset({"host", "content-length", "content-type", "transfer-encoding", "connection"}) +_HEADERS_NOT_TO_FORWARD = frozenset( + {"host", "content-length", "content-type", "transfer-encoding", "connection", "authorization"} +) + + +def _extract_bearer_token(headers) -> str | None: + """Pull the Bearer token out of the Authorization header. + + litellm's OpenAI client needs the API key as an explicit ``api_key=`` kwarg — + setting Authorization in extra_headers does not work because litellm always + regenerates that header from ``api_key`` (or env vars). So we extract it here + and let the proxy stay stateless about which key the client is using. + """ + auth = headers.get("authorization") or headers.get("Authorization") + if not auth: + return None + parts = auth.split(None, 1) + if len(parts) == 2 and parts[0].lower() == "bearer": + return parts[1].strip() + return auth.strip() def get_base_url(model_name: str, config: ModelServiceConfig) -> str: @@ -122,14 +141,20 @@ async def chat_completions(body: dict[str, Any], request: Request): logger.info(f"[replay] dispatching '{model_name}' to traj-replay handler") else: api_base = get_base_url(model_name, config) - # Tell litellm to treat the upstream as an OpenAI-compatible server. - litellm_model = f"openai/{model_name}" if model_name else "openai/default" + # custom_openai is litellm's catch-all for OpenAI-compatible third-party endpoints + # (DashScope, ModelScope, Groq, Mistral, ...). Unlike `openai/`, it does NOT do + # model-name lookup, so arbitrary upstream model names like "glm-5" / "qwen-turbo" + # work without "This model isn't mapped yet" errors. + litellm_model = f"custom_openai/{model_name}" if model_name else "custom_openai/default" logger.info(f"Routing model '{model_name}' to {api_base}") - # 2. Header forwarding (preserve Authorization, drop hop-by-hop) + # 2. Extract Bearer token (litellm needs api_key explicitly, not via headers) + api_key = _extract_bearer_token(request.headers) + + # 3. Header forwarding (drop Authorization since we pass it via api_key, plus hop-by-hop) extra_headers = _filter_headers(request.headers) - # 3. Build call kwargs (transparent passthrough of body fields) + # 4. Build call kwargs (transparent passthrough of body fields) call_kwargs = dict(body) call_kwargs.pop("model", None) # avoid duplicate kwargs is_stream = bool(call_kwargs.get("stream")) @@ -138,9 +163,16 @@ async def chat_completions(body: dict[str, Any], request: Request): response = await litellm.acompletion( model=litellm_model, api_base=api_base, + api_key=api_key, extra_headers=extra_headers, timeout=config.request_timeout, num_retries=config.num_retries, + # Suppress litellm's "model isn't mapped yet" cost-calc exception for + # arbitrary upstream models (glm-5, qwen-turbo, ...) that aren't in + # litellm's pricing table. We don't care about cost tracking here, so + # zero rates make the calc succeed cleanly with response_cost=0. + input_cost_per_token=0, + output_cost_per_token=0, **call_kwargs, ) except (RateLimitError, APIError, BadRequestError, AuthenticationError, Timeout) as exc: diff --git a/tests/unit/sdk/model/test_proxy.py b/tests/unit/sdk/model/test_proxy.py index c994c1c9d6..c1b141f930 100644 --- a/tests/unit/sdk/model/test_proxy.py +++ b/tests/unit/sdk/model/test_proxy.py @@ -58,7 +58,7 @@ async def test_chat_completions_routing_success(): assert mock_acompletion.called call_kwargs = mock_acompletion.call_args.kwargs assert call_kwargs["api_base"] == "https://api.openai.com/v1" - assert call_kwargs["model"] == "openai/gpt-3.5-turbo" + assert call_kwargs["model"] == "custom_openai/gpt-3.5-turbo" assert call_kwargs["messages"] == [{"role": "user", "content": "hello"}] @@ -200,8 +200,10 @@ async def test_chat_completions_replay_mode_uses_traj_replay_provider(): @pytest.mark.asyncio -async def test_chat_completions_strips_hop_by_hop_headers(): - """host / content-length / transfer-encoding etc. are not forwarded.""" +async def test_chat_completions_extracts_bearer_token_and_strips_framing_headers(): + """Bearer token goes to api_key kwarg; host / content-length / transfer-encoding / + Authorization are not forwarded as extra_headers (litellm regenerates Authorization + from api_key, so passing it both ways would conflict). Custom X-* headers pass through.""" captured = {} async def capture(*args, **kwargs): @@ -218,10 +220,12 @@ async def capture(*args, **kwargs): headers={"Authorization": "Bearer abc", "X-Trace": "t1"}, ) + assert captured["api_key"] == "abc" + forwarded = captured["extra_headers"] forwarded_lower = {k.lower() for k in forwarded} - assert "authorization" in forwarded_lower assert "x-trace" in forwarded_lower + assert "authorization" not in forwarded_lower assert "host" not in forwarded_lower assert "content-length" not in forwarded_lower assert "content-type" not in forwarded_lower From 11852aed2c5956343e3dd471face487e66cfa999 Mon Sep 17 00:00:00 2001 From: "pengshixin.psx" Date: Tue, 12 May 2026 03:36:52 +0000 Subject: [PATCH 03/25] refactor(model-service): drop CustomLLM, serve replay directly from cursor MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The replay path no longer goes through litellm. We have a complete OpenAI-shape response on disk, so routing it through CustomLLM/CustomStreamWrapper just to translate formats was pure overhead — and the source of every replay-side bug (cursor-exhausted retried 6× and wrapped as APIConnectionError, GenericStreamingChunk type gymnastics, finish_reason hardcoded to "stop", reasoning_content dropped on streaming, tool_calls reconstruction left as TODO). Changes: - traj_replayer.py: delete TrajectoryReplayer(CustomLLM) and helpers; keep just SequentialCursor. Cursor exhaustion now raises a plain TrajectoryExhausted. - proxy.py: in replay mode, fetch from app.state.replay_cursor and emit either the raw response dict (non-stream) or one SSE chunk + [DONE] (stream). The stream path renames message → delta and preserves all fields verbatim (finish_reason, tool_calls, reasoning_content, ...). - main.py: rename _configure_litellm_for_proxy → _configure_proxy_integrations. Replay branch now just attaches a SequentialCursor to app.state; no litellm.custom_provider_map registration. - Tests: drop the CustomLLM-based replayer tests; keep cursor tests; add three end-to-end proxy replay tests covering non-stream / stream / cursor exhausted. 43 passed. Direct curl against DashScope glm-5: record + replay (both modes) verified end-to-end. Co-Authored-By: Claude Sonnet 4.6 --- rock/sdk/model/server/api/proxy.py | 146 ++++++++++++------ .../server/integrations/traj_replayer.py | 91 ++--------- rock/sdk/model/server/main.py | 43 +++--- tests/unit/sdk/model/test_proxy.py | 125 +++++++++++++-- tests/unit/sdk/model/test_traj_replayer.py | 108 ++----------- 5 files changed, 270 insertions(+), 243 deletions(-) diff --git a/rock/sdk/model/server/api/proxy.py b/rock/sdk/model/server/api/proxy.py index 18d0ead93c..e7ded21cd2 100644 --- a/rock/sdk/model/server/api/proxy.py +++ b/rock/sdk/model/server/api/proxy.py @@ -1,18 +1,27 @@ -"""OpenAI-compatible chat/completions proxy backed by the litellm SDK. - -The proxy ``/v1/chat/completions`` handler routes a request to the configured -upstream LLM (or to the in-process traj-replay handler when ``replay_traj_path`` -is set), forwards header/body, and applies retry via litellm's ``num_retries``. - -Trajectory recording is wired up at startup in -``rock.sdk.model.server.main`` by registering ``TrajectoryRecorder`` as a -``litellm.callbacks`` entry — this handler does not carry a ``@record_traj`` -decorator anymore. +"""OpenAI-compatible chat/completions proxy. + +Two paths share this handler: + +1. **Record / forward mode** (default) — ``litellm.acompletion`` is called with + the user-supplied model/messages, the upstream is selected from + ``proxy_base_url`` / ``proxy_rules``, retries come from litellm's + ``num_retries``, and the recorded JSONL trajectory is written by a + ``litellm.callbacks`` entry registered at startup (see + ``rock.sdk.model.server.main``). + +2. **Replay mode** (``replay_traj_path`` set) — the request is served directly + from the next record in ``app.state.replay_cursor`` without going through + litellm at all. We have a complete OpenAI-shape response on disk, so there's + no value in routing through CustomLLM/CustomStreamWrapper just to translate + formats. Streaming emits the recorded response as a single SSE chunk + + ``[DONE]``, mirroring litellm's own ``MockResponseIterator`` strategy. """ from __future__ import annotations import json +import time +import uuid from collections.abc import AsyncIterator from typing import Any @@ -23,6 +32,7 @@ from rock.logger import init_logger from rock.sdk.model.server.config import ModelServiceConfig +from rock.sdk.model.server.integrations.traj_replayer import SequentialCursor, TrajectoryExhausted logger = init_logger(__name__) @@ -35,6 +45,7 @@ # so the client's values would be wrong or misleading # - transfer-encoding / connection: true RFC 7230 hop-by-hop headers, scoped to # the client↔proxy connection only +# - authorization: extracted into api_key kwarg, see _extract_bearer_token _HEADERS_NOT_TO_FORWARD = frozenset( {"host", "content-length", "content-type", "transfer-encoding", "connection", "authorization"} ) @@ -121,43 +132,97 @@ async def _sse_iter(stream: AsyncIterator[Any]) -> AsyncIterator[bytes]: yield b"data: [DONE]\n\n" +def _completion_to_chunk(response: dict, *, model: str) -> dict: + """Convert a recorded ``chat.completion`` response into a single + ``chat.completion.chunk`` shape (move ``message`` → ``delta``). + + Mirrors what litellm's ``convert_model_response_to_streaming`` does for its + own non-streaming providers — preserves ``finish_reason``, ``tool_calls`` + and any other fields verbatim by simply renaming the wrapper key. + """ + choices_in = response.get("choices") or [] + choices_out = [] + for choice in choices_in: + delta = dict(choice.get("message") or {}) + choices_out.append( + { + "index": choice.get("index", 0), + "delta": delta, + "finish_reason": choice.get("finish_reason"), + "logprobs": choice.get("logprobs"), + } + ) + return { + "id": response.get("id") or f"chatcmpl-{uuid.uuid4()}", + "object": "chat.completion.chunk", + "created": response.get("created") or int(time.time()), + "model": response.get("model") or model, + "choices": choices_out, + } + + +async def _replay_sse_iter(response: dict, *, model: str) -> AsyncIterator[bytes]: + """Emit a recorded response as a single SSE chunk + ``[DONE]``. + + The whole recorded answer goes out in one chunk — same strategy as + litellm's ``MockResponseIterator``. Most agents accumulate SSE into a + final string anyway; faking finer-grained streaming would just add code + without buying anyone anything. + """ + chunk = _completion_to_chunk(response, model=model) + yield f"data: {json.dumps(chunk, ensure_ascii=False)}\n\n".encode() + yield b"data: [DONE]\n\n" + + @proxy_router.post("/v1/chat/completions") async def chat_completions(body: dict[str, Any], request: Request): """OpenAI-compatible chat completions proxy endpoint. - Routes via ``proxy_base_url`` / ``proxy_rules``, forwards Authorization-style - headers, supports streaming, retries via litellm. In replay mode the request - is dispatched to the registered ``traj-replay`` CustomLLM provider instead - of being forwarded upstream. + In replay mode (``replay_traj_path`` set), serves the next record from + ``app.state.replay_cursor`` directly — no litellm involvement. Otherwise + forwards to the configured upstream via ``litellm.acompletion``. """ config: ModelServiceConfig = request.app.state.model_service_config - model_name = body.get("model", "") + is_stream = bool(body.get("stream")) - # 1. Route selection + # ---- Replay mode: short-circuit, never touch litellm ---- if config.replay_traj_path: - litellm_model = f"traj-replay/{model_name or 'replay'}" - api_base: str | None = None - logger.info(f"[replay] dispatching '{model_name}' to traj-replay handler") - else: - api_base = get_base_url(model_name, config) - # custom_openai is litellm's catch-all for OpenAI-compatible third-party endpoints - # (DashScope, ModelScope, Groq, Mistral, ...). Unlike `openai/`, it does NOT do - # model-name lookup, so arbitrary upstream model names like "glm-5" / "qwen-turbo" - # work without "This model isn't mapped yet" errors. - litellm_model = f"custom_openai/{model_name}" if model_name else "custom_openai/default" - logger.info(f"Routing model '{model_name}' to {api_base}") - - # 2. Extract Bearer token (litellm needs api_key explicitly, not via headers) - api_key = _extract_bearer_token(request.headers) + cursor: SequentialCursor = request.app.state.replay_cursor + try: + record = await cursor.next(expected_model=model_name) + except TrajectoryExhausted as exc: + raise HTTPException(status_code=404, detail=str(exc)) + + response_dict = record.get("response") + if not isinstance(response_dict, dict): + raise HTTPException( + status_code=500, + detail=f"replay record at step {cursor.position - 1} has no usable response dict", + ) + logger.info(f"[replay] step {cursor.position}/{cursor.total} served for model={model_name!r}") + + if is_stream: + return StreamingResponse( + _replay_sse_iter(response_dict, model=model_name), + media_type="text/event-stream", + ) + return JSONResponse(status_code=200, content=response_dict) + + # ---- Forward / record mode: go through litellm ---- + api_base = get_base_url(model_name, config) + # custom_openai is litellm's catch-all for OpenAI-compatible third-party endpoints + # (DashScope, ModelScope, Groq, Mistral, ...). Unlike `openai/`, it does NOT do + # model-name lookup, so arbitrary upstream model names like "glm-5" / "qwen-turbo" + # work without "This model isn't mapped yet" errors. + litellm_model = f"custom_openai/{model_name}" if model_name else "custom_openai/default" + logger.info(f"Routing model '{model_name}' to {api_base}") - # 3. Header forwarding (drop Authorization since we pass it via api_key, plus hop-by-hop) + api_key = _extract_bearer_token(request.headers) extra_headers = _filter_headers(request.headers) - # 4. Build call kwargs (transparent passthrough of body fields) call_kwargs = dict(body) - call_kwargs.pop("model", None) # avoid duplicate kwargs - is_stream = bool(call_kwargs.get("stream")) + call_kwargs.pop("model", None) try: response = await litellm.acompletion( @@ -167,10 +232,8 @@ async def chat_completions(body: dict[str, Any], request: Request): extra_headers=extra_headers, timeout=config.request_timeout, num_retries=config.num_retries, - # Suppress litellm's "model isn't mapped yet" cost-calc exception for - # arbitrary upstream models (glm-5, qwen-turbo, ...) that aren't in - # litellm's pricing table. We don't care about cost tracking here, so - # zero rates make the calc succeed cleanly with response_cost=0. + # Zero-cost rates suppress "model isn't mapped yet" from litellm's + # post-call cost calculator for arbitrary upstream model names. input_cost_per_token=0, output_cost_per_token=0, **call_kwargs, @@ -182,13 +245,8 @@ async def chat_completions(body: dict[str, Any], request: Request): logger.error(f"Unexpected proxy error: {exc}", exc_info=True) raise HTTPException(status_code=500, detail=str(exc)) - # 4. Streaming vs non-streaming response if is_stream: return StreamingResponse(_sse_iter(response), media_type="text/event-stream") - # litellm returns a ModelResponse pydantic; expose the OpenAI-shape dict. - if hasattr(response, "model_dump"): - body_out = response.model_dump() - else: - body_out = response # already a dict (replay path can short-circuit) + body_out = response.model_dump() if hasattr(response, "model_dump") else response return JSONResponse(status_code=200, content=body_out) diff --git a/rock/sdk/model/server/integrations/traj_replayer.py b/rock/sdk/model/server/integrations/traj_replayer.py index c87c0fe75f..af2fdd6bb4 100644 --- a/rock/sdk/model/server/integrations/traj_replayer.py +++ b/rock/sdk/model/server/integrations/traj_replayer.py @@ -1,9 +1,10 @@ -"""Replay a recorded trajectory by registering a litellm CustomLLM provider. +"""Sequential cursor over a recorded JSONL trajectory. -Loads a single JSONL trajectory file on init, then hands records out one at a -time in recorded order. This is the simplest matching strategy and works for -deterministic agent runs that replay the same sequence of LLM calls -(SWE-agent / mini-swe-agent / OpenHands). +Loaded once at startup; ``await cursor.next(expected_model=...)`` hands out the +next record (full StandardLoggingPayload dict) and advances. Going past the end +raises :class:`TrajectoryExhausted` so the proxy can return a clean 404 without +involving litellm — that's the whole point: replay does NOT need to go through +litellm's CustomLLM machinery, the proxy serves recorded responses directly. """ from __future__ import annotations @@ -11,25 +12,24 @@ import asyncio import json import os -from collections.abc import AsyncIterator from pathlib import Path -from typing import Any - -from litellm.llms.custom_llm import CustomLLM, CustomLLMError -from litellm.types.utils import GenericStreamingChunk, ModelResponse -from litellm.utils import async_mock_completion_streaming_obj from rock.logger import init_logger logger = init_logger(__name__) -class SequentialCursor: - """Hands out trajectory records one at a time, in recorded order. +class TrajectoryExhausted(Exception): + """Raised by ``SequentialCursor.next`` when all recorded steps have been served.""" + + def __init__(self, position: int, total: int) -> None: + super().__init__(f"trajectory exhausted at step {position} (total recorded steps={total})") + self.position = position + self.total = total + - Going past the end raises CustomLLMError(404) so the proxy returns a clear - error to the caller. - """ +class SequentialCursor: + """Hands out trajectory records one at a time, in recorded order.""" def __init__(self, records: list[dict]) -> None: self.records = records @@ -56,10 +56,7 @@ def load(cls, path: str | os.PathLike) -> SequentialCursor: async def next(self, expected_model: str | None = None) -> dict: async with self._lock: if self._idx >= len(self.records): - raise CustomLLMError( - status_code=404, - message=(f"trajectory exhausted at step {self._idx} (total recorded steps={len(self.records)})"), - ) + raise TrajectoryExhausted(position=self._idx, total=len(self.records)) record = self.records[self._idx] self._idx += 1 current_idx = self._idx - 1 @@ -83,57 +80,3 @@ def position(self) -> int: @property def total(self) -> int: return len(self.records) - - -def _record_to_model_response(record: dict) -> ModelResponse: - response = record.get("response") - if not isinstance(response, dict): - raise CustomLLMError( - status_code=500, - message=f"traj record at step has no usable 'response' dict: got {type(response).__name__}", - ) - return ModelResponse(**response) - - -def _extract_assistant_text(record: dict) -> str: - response = record.get("response") or {} - choices = response.get("choices") or [] - if not choices: - return "" - message = choices[0].get("message") or {} - return message.get("content") or "" - - -class TrajectoryReplayer(CustomLLM): - """litellm CustomLLM that returns recorded responses in sequential order.""" - - def __init__(self, traj_path: str | os.PathLike) -> None: - super().__init__() - self.cursor = SequentialCursor.load(traj_path) - - async def acompletion( - self, - model: str, - messages: list, - *args: Any, - **kwargs: Any, - ) -> ModelResponse: - record = await self.cursor.next(expected_model=model) - return _record_to_model_response(record) - - async def astreaming( - self, - model: str, - messages: list, - *args: Any, - **kwargs: Any, - ) -> AsyncIterator[GenericStreamingChunk]: - record = await self.cursor.next(expected_model=model) - text = _extract_assistant_text(record) - model_response = kwargs.get("model_response") - async for chunk in async_mock_completion_streaming_obj( - model_response=model_response, - mock_response=text, - model=model, - ): - yield chunk diff --git a/rock/sdk/model/server/main.py b/rock/sdk/model/server/main.py index e2263a1858..133605b046 100644 --- a/rock/sdk/model/server/main.py +++ b/rock/sdk/model/server/main.py @@ -52,33 +52,34 @@ async def global_exception_handler(request, exc): return app -def _configure_litellm_for_proxy(config: ModelServiceConfig) -> None: - """Wire up litellm record/replay integrations for the proxy mode. - - - When ``replay_traj_path`` is set, register ``TrajectoryReplayer`` as a - custom provider so requests routed to ``traj-replay/`` return - recorded responses without hitting any upstream. - - When recording is enabled (default), register ``TrajectoryRecorder`` as - a litellm callback so every chat/completions call appends a JSONL line. +def _configure_proxy_integrations(app: FastAPI, config: ModelServiceConfig) -> None: + """Wire up record/replay integrations for the proxy mode. + + - When ``replay_traj_path`` is set, load the trajectory into a + ``SequentialCursor`` and attach it to ``app.state.replay_cursor``. The + proxy handler serves recorded responses directly from this cursor; we + do NOT register anything with litellm (replay path bypasses litellm + entirely so cursor-exhausted errors aren't swallowed by retry logic). + - Otherwise (record/forward mode), if ``traj_enabled`` is True, register + ``TrajectoryRecorder`` as a ``litellm.callbacks`` entry so every + chat/completions call appends a JSONL line. Replay and record are mutually exclusive: in replay mode we don't record, - since replayed responses re-traversing the recorder would inflate metrics - and overwrite the source-of-truth file. + since replayed responses round-tripping back into the source file would + inflate metrics and corrupt the trajectory. """ - import litellm - - from rock.sdk.model.server.integrations.traj_recorder import TrajectoryRecorder - from rock.sdk.model.server.integrations.traj_replayer import TrajectoryReplayer - if config.replay_traj_path: - replayer = TrajectoryReplayer(config.replay_traj_path) - litellm.custom_provider_map = [ - {"provider": "traj-replay", "custom_handler": replayer}, - ] - logger.info(f"litellm replay handler registered, traj_path={config.replay_traj_path}") + from rock.sdk.model.server.integrations.traj_replayer import SequentialCursor + + app.state.replay_cursor = SequentialCursor.load(config.replay_traj_path) + logger.info(f"replay cursor loaded, traj_path={config.replay_traj_path}") return if config.traj_enabled: + import litellm + + from rock.sdk.model.server.integrations.traj_recorder import TrajectoryRecorder + traj_path = config.traj_file or TRAJ_FILE recorder = TrajectoryRecorder(traj_file=traj_path) litellm.callbacks.append(recorder) @@ -96,7 +97,7 @@ def main( asyncio.run(init_local_api()) app.include_router(local_router, prefix="", tags=["local"]) else: - _configure_litellm_for_proxy(config) + _configure_proxy_integrations(app, config) app.include_router(proxy_router, prefix="", tags=["proxy"]) logger.info(f"Starting LLM Service on {config.host}:{config.port}, type: {model_servie_type}") diff --git a/tests/unit/sdk/model/test_proxy.py b/tests/unit/sdk/model/test_proxy.py index c1b141f930..5a4d7e8230 100644 --- a/tests/unit/sdk/model/test_proxy.py +++ b/tests/unit/sdk/model/test_proxy.py @@ -1,3 +1,4 @@ +import json from types import SimpleNamespace from unittest.mock import AsyncMock, MagicMock, patch @@ -176,27 +177,133 @@ async def test_chat_completions_litellm_error_returns_proxy_schema(): @pytest.mark.asyncio -async def test_chat_completions_replay_mode_uses_traj_replay_provider(): - """In replay mode the proxy targets traj-replay/ instead of a real upstream.""" +async def test_replay_mode_returns_recorded_response_without_calling_litellm(tmp_path): + """In replay mode the proxy serves the next record directly from app.state.replay_cursor; + litellm.acompletion must never be invoked.""" + from rock.sdk.model.server.integrations.traj_replayer import SequentialCursor + + record = { + "model": "gpt-3.5-turbo", + "response": { + "id": "rec-1", + "object": "chat.completion", + "model": "gpt-3.5-turbo", + "choices": [ + { + "index": 0, + "message": {"role": "assistant", "content": "recorded reply"}, + "finish_reason": "stop", + } + ], + "usage": {"prompt_tokens": 1, "completion_tokens": 2, "total_tokens": 3}, + }, + } + traj = tmp_path / "t.jsonl" + traj.write_text(json.dumps(record) + "\n", encoding="utf-8") + config = ModelServiceConfig() - config.replay_traj_path = "/tmp/does-not-matter-for-this-test" + config.replay_traj_path = str(traj) local_app = FastAPI() local_app.state.model_service_config = config + local_app.state.replay_cursor = SequentialCursor.load(traj) local_app.include_router(proxy_router) with patch(ACOMPLETION_PATCH, new_callable=AsyncMock) as mock_acompletion: - mock_acompletion.return_value = _fake_model_response() - transport = ASGITransport(app=local_app) async with AsyncClient(transport=transport, base_url="http://test") as ac: payload = {"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "hi"}]} response = await ac.post("/v1/chat/completions", json=payload) - assert response.status_code == 200 - call_kwargs = mock_acompletion.call_args.kwargs - assert call_kwargs["model"] == "traj-replay/gpt-3.5-turbo" - assert call_kwargs["api_base"] is None + assert response.status_code == 200 + body = response.json() + assert body["choices"][0]["message"]["content"] == "recorded reply" + mock_acompletion.assert_not_called() + + +@pytest.mark.asyncio +async def test_replay_mode_streaming_emits_recorded_response_as_sse(tmp_path): + """Replay + stream=True emits one SSE chunk (content moved into delta) plus [DONE].""" + from rock.sdk.model.server.integrations.traj_replayer import SequentialCursor + + record = { + "model": "gpt-3.5-turbo", + "response": { + "id": "rec-stream", + "object": "chat.completion", + "model": "gpt-3.5-turbo", + "choices": [ + { + "index": 0, + "message": {"role": "assistant", "content": "streamed reply"}, + "finish_reason": "tool_calls", + } + ], + }, + } + traj = tmp_path / "t.jsonl" + traj.write_text(json.dumps(record) + "\n", encoding="utf-8") + + config = ModelServiceConfig() + config.replay_traj_path = str(traj) + + local_app = FastAPI() + local_app.state.model_service_config = config + local_app.state.replay_cursor = SequentialCursor.load(traj) + local_app.include_router(proxy_router) + + transport = ASGITransport(app=local_app) + async with AsyncClient(transport=transport, base_url="http://test") as ac: + response = await ac.post( + "/v1/chat/completions", + json={"model": "gpt-3.5-turbo", "stream": True, "messages": [{"role": "user", "content": "hi"}]}, + ) + + assert response.status_code == 200 + body = response.text + assert "data: [DONE]" in body + # The SSE chunk shape is chat.completion.chunk with message → delta, finish_reason preserved + assert '"object": "chat.completion.chunk"' in body + assert '"delta": {"role": "assistant", "content": "streamed reply"}' in body + assert '"finish_reason": "tool_calls"' in body + + +@pytest.mark.asyncio +async def test_replay_mode_returns_404_when_cursor_exhausted(tmp_path): + """Cursor used up → 404 with a clear message; no litellm retries involved.""" + from rock.sdk.model.server.integrations.traj_replayer import SequentialCursor + + record = { + "model": "gpt-3.5-turbo", + "response": { + "id": "only", + "choices": [{"index": 0, "message": {"role": "assistant", "content": "x"}, "finish_reason": "stop"}], + }, + } + traj = tmp_path / "t.jsonl" + traj.write_text(json.dumps(record) + "\n", encoding="utf-8") + + config = ModelServiceConfig() + config.replay_traj_path = str(traj) + + local_app = FastAPI() + local_app.state.model_service_config = config + local_app.state.replay_cursor = SequentialCursor.load(traj) + local_app.include_router(proxy_router) + + transport = ASGITransport(app=local_app) + async with AsyncClient(transport=transport, base_url="http://test") as ac: + await ac.post( + "/v1/chat/completions", + json={"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "hi"}]}, + ) + second = await ac.post( + "/v1/chat/completions", + json={"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "again"}]}, + ) + + assert second.status_code == 404 + assert "exhausted" in second.json()["detail"] @pytest.mark.asyncio diff --git a/tests/unit/sdk/model/test_traj_replayer.py b/tests/unit/sdk/model/test_traj_replayer.py index 7bfe30ef4e..e4a379bd0d 100644 --- a/tests/unit/sdk/model/test_traj_replayer.py +++ b/tests/unit/sdk/model/test_traj_replayer.py @@ -1,19 +1,18 @@ -"""Tests for SequentialCursor + TrajectoryReplayer.""" +"""Tests for SequentialCursor (the replay cursor used by proxy.py). + +The proxy serves replay responses directly — there is no CustomLLM-based +``TrajectoryReplayer`` anymore. End-to-end replay coverage (cursor + SSE chunk +emit + cursor-exhausted → 404) lives in ``test_proxy.py``. +""" import json -from types import SimpleNamespace import pytest -from litellm.llms.custom_llm import CustomLLMError -from rock.sdk.model.server.integrations.traj_replayer import ( - SequentialCursor, - TrajectoryReplayer, -) +from rock.sdk.model.server.integrations.traj_replayer import SequentialCursor, TrajectoryExhausted def _record(*, msg: str, model: str = "gpt-3.5-turbo", call_id: str = "x") -> dict: - """Build a minimal StandardLoggingPayload-shaped record.""" return { "id": call_id, "model": model, @@ -40,9 +39,6 @@ def _write_jsonl(path, records): f.write(json.dumps(r) + "\n") -# ----- SequentialCursor ----- - - def test_cursor_load_from_single_file(tmp_path): p = tmp_path / "traj.jsonl" _write_jsonl(p, [_record(msg="a"), _record(msg="b")]) @@ -69,7 +65,7 @@ def test_cursor_load_missing_file_raises(tmp_path): def test_cursor_load_directory_raises(tmp_path): - """A directory is no longer a valid traj_file — must point to a single .jsonl.""" + """Path must be a single .jsonl file, not a directory.""" with pytest.raises(FileNotFoundError): SequentialCursor.load(tmp_path) @@ -89,17 +85,17 @@ async def test_cursor_next_returns_records_in_order(tmp_path): @pytest.mark.asyncio -async def test_cursor_next_raises_when_exhausted(tmp_path): +async def test_cursor_next_raises_trajectory_exhausted_when_done(tmp_path): p = tmp_path / "traj.jsonl" _write_jsonl(p, [_record(msg="only")]) cur = SequentialCursor.load(p) await cur.next() - with pytest.raises(CustomLLMError) as exc_info: + with pytest.raises(TrajectoryExhausted) as exc_info: await cur.next() - assert exc_info.value.status_code == 404 - assert "exhausted" in exc_info.value.message + assert exc_info.value.position == 1 + assert exc_info.value.total == 1 @pytest.mark.asyncio @@ -117,88 +113,10 @@ async def test_cursor_reset_replays_from_start(tmp_path): @pytest.mark.asyncio -async def test_cursor_model_mismatch_only_warns(tmp_path, caplog): +async def test_cursor_model_mismatch_only_warns(tmp_path): p = tmp_path / "traj.jsonl" _write_jsonl(p, [_record(msg="a", model="gpt-3.5-turbo")]) cur = SequentialCursor.load(p) record = await cur.next(expected_model="gpt-4o") # different model -> warn but don't raise assert record["id"] == "x" - - -# ----- TrajectoryReplayer ----- - - -@pytest.mark.asyncio -async def test_replayer_acompletion_returns_recorded_response(tmp_path): - p = tmp_path / "traj.jsonl" - _write_jsonl(p, [_record(msg="a", call_id="step-1")]) - - replayer = TrajectoryReplayer(p) - response = await replayer.acompletion( - model="gpt-3.5-turbo", - messages=[{"role": "user", "content": "anything"}], - ) - - assert response.id == "step-1" - assert response.choices[0].message.content == "reply: a" - - -@pytest.mark.asyncio -async def test_replayer_acompletion_advances_cursor(tmp_path): - p = tmp_path / "traj.jsonl" - _write_jsonl( - p, - [ - _record(msg="a", call_id="step-1"), - _record(msg="b", call_id="step-2"), - ], - ) - - replayer = TrajectoryReplayer(p) - r1 = await replayer.acompletion(model="gpt-3.5-turbo", messages=[]) - r2 = await replayer.acompletion(model="gpt-3.5-turbo", messages=[]) - - assert r1.id == "step-1" - assert r2.id == "step-2" - - -@pytest.mark.asyncio -async def test_replayer_astreaming_yields_chunks_that_recompose_the_text(tmp_path): - """The chunks produced by astreaming should reassemble into the recorded text.""" - p = tmp_path / "traj.jsonl" - recorded_text = "Hello world, this is a deterministic replay." - record = _record(msg="hi") - record["response"]["choices"][0]["message"]["content"] = recorded_text - _write_jsonl(p, [record]) - - replayer = TrajectoryReplayer(p) - - # Build a litellm-shaped ModelResponse mock with one Choice/Delta slot. - fake_choice = SimpleNamespace(delta=SimpleNamespace(role=None, content=None), index=0) - fake_response = SimpleNamespace(choices=[fake_choice]) - - chunks_text = [] - async for chunk in replayer.astreaming( - model="gpt-3.5-turbo", - messages=[], - model_response=fake_response, - ): - if hasattr(chunk, "choices") and chunk.choices and getattr(chunk.choices[0], "delta", None): - piece = chunk.choices[0].delta.content - if piece: - chunks_text.append(piece) - - assert "".join(chunks_text) == recorded_text - - -@pytest.mark.asyncio -async def test_replayer_acompletion_raises_on_exhaustion(tmp_path): - p = tmp_path / "traj.jsonl" - _write_jsonl(p, [_record(msg="only")]) - - replayer = TrajectoryReplayer(p) - await replayer.acompletion(model="gpt-3.5-turbo", messages=[]) - - with pytest.raises(CustomLLMError): - await replayer.acompletion(model="gpt-3.5-turbo", messages=[]) From 7a4b37fda5c701fd12a8d50168efab8b30e71060 Mon Sep 17 00:00:00 2001 From: "pengshixin.psx" Date: Tue, 12 May 2026 03:58:17 +0000 Subject: [PATCH 04/25] refactor(model-service): drop litellm, use httpx byte-passthrough + openai SDK as parser MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Replaces litellm.acompletion with raw httpx forwarding. The proxy no longer parses or rewrites the OpenAI protocol on the forward path — body bytes go upstream as-is, response bytes come back as-is. The openai SDK is kept solely as a parser library for the recording side: ChatCompletionChunk + the official ChatCompletionStreamState aggregate streaming chunks into a final ChatCompletion that the recorder writes to JSONL. This restores the proxy's original "transparent forward" intent and eliminates several litellm-specific pain points encountered during testing: - No "model isn't mapped yet" cost-calc exception (no calc happens at all). - No need for the input_cost_per_token=0 / custom_openai prefix workarounds. - Authorization header passes through verbatim (no api_key extraction kludge). - Provider-specific fields (reasoning_content, citations, ...) are preserved byte-for-byte going to the client AND auto-aggregated in the recorded traj (openai SDK uses extra="allow" pydantic mode). - Cursor exhaustion in replay returns 404 directly, never gets retried. Changes: - pyproject.toml: drop litellm>=1.50.0, add openai>=1.50.0 and httpx - proxy.py: rewrite forward path with httpx; record streams via dual-purpose byte forwarding + parallel SSE parsing into ChatCompletionStreamState - traj_recorder.py: drop CustomLogger inheritance; expose explicit recorder.record(request, response, status, ...) API called from proxy.py - main.py: attach recorder/cursor to app.state instead of registering with litellm.callbacks / litellm.custom_provider_map - test_proxy.py: rewritten to use httpx.MockTransport for upstream mocking; cover byte passthrough, provider-specific field preservation, error forwarding, recorder invocation, replay paths - test_traj_recorder.py: rewritten for the explicit-call API 36 passed. End-to-end verified against DashScope glm-5: streaming record, non-stream replay, streaming replay, cursor exhausted -> 404 all work. Co-Authored-By: Claude Sonnet 4.6 --- pyproject.toml | 8 +- rock/sdk/model/server/api/proxy.py | 314 +++++---- .../server/integrations/traj_recorder.py | 93 +-- rock/sdk/model/server/main.py | 28 +- tests/unit/sdk/model/test_proxy.py | 614 ++++++++---------- tests/unit/sdk/model/test_traj_recorder.py | 203 +++--- uv.lock | 277 +------- 7 files changed, 612 insertions(+), 925 deletions(-) diff --git a/pyproject.toml b/pyproject.toml index bf814e0aa1..ac87c14c41 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -86,7 +86,13 @@ model-service = [ "psutil", "swebench", "alibabacloud_cr20181201==2.0.5", - "litellm>=1.50.0", + # openai SDK is used as a TYPE/parser library only — for ChatCompletionChunk + # validation and ChatCompletionStreamState (the official stream chunk aggregator). + # We do NOT use AsyncOpenAI as an HTTP client; transport is plain httpx so the + # proxy can forward upstream bytes verbatim, including any provider-specific + # fields (reasoning_content, citations, ...) without re-encoding OpenAI protocol. + "openai>=1.50.0", + "httpx", ] diff --git a/rock/sdk/model/server/api/proxy.py b/rock/sdk/model/server/api/proxy.py index e7ded21cd2..8161c4adc0 100644 --- a/rock/sdk/model/server/api/proxy.py +++ b/rock/sdk/model/server/api/proxy.py @@ -1,20 +1,20 @@ -"""OpenAI-compatible chat/completions proxy. +"""OpenAI-compatible chat/completions proxy with trajectory record/replay. Two paths share this handler: -1. **Record / forward mode** (default) — ``litellm.acompletion`` is called with - the user-supplied model/messages, the upstream is selected from - ``proxy_base_url`` / ``proxy_rules``, retries come from litellm's - ``num_retries``, and the recorded JSONL trajectory is written by a - ``litellm.callbacks`` entry registered at startup (see - ``rock.sdk.model.server.main``). +1. **Forward / record mode** (default) — body bytes are POSTed verbatim to the + configured upstream via plain ``httpx``. The upstream response is forwarded + byte-for-byte back to the client (raw JSON for non-stream, raw SSE bytes + for stream). On the side we run a parser (``ChatCompletionChunk`` + + ``ChatCompletionStreamState`` from the openai SDK) to aggregate streaming + chunks into a final ChatCompletion that the recorder writes to JSONL. The + forward path itself does NOT depend on OpenAI types — anything the upstream + returns (provider-specific ``reasoning_content``, ``citations``, ...) is + passed through untouched. 2. **Replay mode** (``replay_traj_path`` set) — the request is served directly - from the next record in ``app.state.replay_cursor`` without going through - litellm at all. We have a complete OpenAI-shape response on disk, so there's - no value in routing through CustomLLM/CustomStreamWrapper just to translate - formats. Streaming emits the recorded response as a single SSE chunk + - ``[DONE]``, mirroring litellm's own ``MockResponseIterator`` strategy. + from the next record in ``app.state.replay_cursor`` without any upstream + call. Streaming emits the recorded response as one SSE chunk + ``[DONE]``. """ from __future__ import annotations @@ -25,13 +25,15 @@ from collections.abc import AsyncIterator from typing import Any -import litellm +import httpx from fastapi import APIRouter, HTTPException, Request -from fastapi.responses import JSONResponse, StreamingResponse -from litellm.exceptions import APIError, AuthenticationError, BadRequestError, RateLimitError, Timeout +from fastapi.responses import JSONResponse, Response, StreamingResponse +from openai.lib.streaming.chat import ChatCompletionStreamState +from openai.types.chat import ChatCompletionChunk from rock.logger import init_logger from rock.sdk.model.server.config import ModelServiceConfig +from rock.sdk.model.server.integrations.traj_recorder import TrajectoryRecorder from rock.sdk.model.server.integrations.traj_replayer import SequentialCursor, TrajectoryExhausted logger = init_logger(__name__) @@ -41,31 +43,9 @@ # Headers we never forward upstream: -# - host / content-length / content-type: litellm rewrites the body and re-targets, -# so the client's values would be wrong or misleading -# - transfer-encoding / connection: true RFC 7230 hop-by-hop headers, scoped to -# the client↔proxy connection only -# - authorization: extracted into api_key kwarg, see _extract_bearer_token -_HEADERS_NOT_TO_FORWARD = frozenset( - {"host", "content-length", "content-type", "transfer-encoding", "connection", "authorization"} -) - - -def _extract_bearer_token(headers) -> str | None: - """Pull the Bearer token out of the Authorization header. - - litellm's OpenAI client needs the API key as an explicit ``api_key=`` kwarg — - setting Authorization in extra_headers does not work because litellm always - regenerates that header from ``api_key`` (or env vars). So we extract it here - and let the proxy stay stateless about which key the client is using. - """ - auth = headers.get("authorization") or headers.get("Authorization") - if not auth: - return None - parts = auth.split(None, 1) - if len(parts) == 2 and parts[0].lower() == "bearer": - return parts[1].strip() - return auth.strip() +# - host / content-length: rebuilt by httpx for the upstream request +# - transfer-encoding / connection: RFC 7230 hop-by-hop, scoped to one connection +_HEADERS_NOT_TO_FORWARD = frozenset({"host", "content-length", "transfer-encoding", "connection"}) def get_base_url(model_name: str, config: ModelServiceConfig) -> str: @@ -93,53 +73,21 @@ def get_base_url(model_name: str, config: ModelServiceConfig) -> str: def _filter_headers(headers) -> dict[str, str]: - forwarded = {} + """Drop headers that are scoped to the client↔proxy hop or rebuilt by httpx. + ``Authorization`` is forwarded verbatim — proxy stays stateless about which + API key the client uses.""" + out = {} for key, value in headers.items(): if key.lower() in _HEADERS_NOT_TO_FORWARD: continue - forwarded[key] = value - return forwarded - - -def _format_error_response(exc: Exception) -> JSONResponse: - """Render a litellm exception as the legacy ``{error:{message,type,code}}`` JSON. - - Agent-side logic keys off message substrings (e.g. "context length exceeded", - "content violation"), so we keep the message verbatim from the upstream. - """ - status_code = getattr(exc, "status_code", None) or 502 - message = str(exc) - error_type = type(exc).__name__ - return JSONResponse( - status_code=status_code, - content={ - "error": { - "message": f"LLM backend error: {message}", - "type": error_type, - "code": status_code, - } - }, - ) - - -async def _sse_iter(stream: AsyncIterator[Any]) -> AsyncIterator[bytes]: - """Convert a litellm async chunk stream into Server-Sent Events bytes.""" - try: - async for chunk in stream: - payload = chunk.model_dump() if hasattr(chunk, "model_dump") else chunk - yield f"data: {json.dumps(payload, ensure_ascii=False)}\n\n".encode() - finally: - yield b"data: [DONE]\n\n" + out[key] = value + return out def _completion_to_chunk(response: dict, *, model: str) -> dict: """Convert a recorded ``chat.completion`` response into a single - ``chat.completion.chunk`` shape (move ``message`` → ``delta``). - - Mirrors what litellm's ``convert_model_response_to_streaming`` does for its - own non-streaming providers — preserves ``finish_reason``, ``tool_calls`` - and any other fields verbatim by simply renaming the wrapper key. - """ + ``chat.completion.chunk`` shape (move ``message`` → ``delta``). Used only by + the replay streaming path.""" choices_in = response.get("choices") or [] choices_out = [] for choice in choices_in: @@ -162,31 +110,112 @@ def _completion_to_chunk(response: dict, *, model: str) -> dict: async def _replay_sse_iter(response: dict, *, model: str) -> AsyncIterator[bytes]: - """Emit a recorded response as a single SSE chunk + ``[DONE]``. - - The whole recorded answer goes out in one chunk — same strategy as - litellm's ``MockResponseIterator``. Most agents accumulate SSE into a - final string anyway; faking finer-grained streaming would just add code - without buying anyone anything. - """ + """Emit a recorded response as one SSE chunk + ``[DONE]``.""" chunk = _completion_to_chunk(response, model=model) yield f"data: {json.dumps(chunk, ensure_ascii=False)}\n\n".encode() yield b"data: [DONE]\n\n" +def _parse_sse_chunks_into_state(buffer: bytes, state: ChatCompletionStreamState) -> bytes: + """Pull complete SSE events out of ``buffer`` and feed each ``data:`` line + (other than ``[DONE]``) to the openai stream-state aggregator. Returns the + leftover bytes that did not yet form a complete event.""" + while b"\n\n" in buffer: + event, buffer = buffer.split(b"\n\n", 1) + for raw_line in event.split(b"\n"): + line = raw_line.decode("utf-8", errors="replace").strip() + if not line.startswith("data:"): + continue + payload = line[len("data:") :].strip() + if not payload or payload == "[DONE]": + continue + try: + state.handle_chunk(ChatCompletionChunk.model_validate(json.loads(payload))) + except Exception as exc: # parser error: forward continues, traj will be partial + logger.debug(f"[record] chunk parse failed (forward continues): {exc}") + return buffer + + +async def _forward_stream_and_record( + *, + upstream_url: str, + body_bytes: bytes, + fwd_headers: dict[str, str], + timeout: float, + request_dict: dict[str, Any], + recorder: TrajectoryRecorder | None, +) -> AsyncIterator[bytes]: + """SSE bytes are forwarded verbatim; chunks are parsed in parallel and + aggregated into the final ChatCompletion that the recorder writes to JSONL.""" + state = ChatCompletionStreamState() + start = time.time() + parse_buffer = b"" + upstream_status = 0 + + try: + async with httpx.AsyncClient(timeout=timeout) as client: + async with client.stream("POST", upstream_url, content=body_bytes, headers=fwd_headers) as r: + upstream_status = r.status_code + async for chunk in r.aiter_bytes(): + yield chunk + parse_buffer = _parse_sse_chunks_into_state(parse_buffer + chunk, state) + except httpx.RequestError as exc: + # Connection died mid-stream. The bytes already sent reach the client; + # we still try to record what we got. + if recorder is not None: + await recorder.record( + request=request_dict, + response=None, + status="failure", + start_time=start, + end_time=time.time(), + error=f"{type(exc).__name__}: {exc}", + ) + return + + if recorder is None: + return + + status = "success" if upstream_status < 400 else "failure" + final_dict: dict | None = None + if status == "success": + try: + final_dict = state.get_final_completion().model_dump() + except Exception as exc: + logger.warning(f"[record] stream aggregation failed: {exc}") + + await recorder.record( + request=request_dict, + response=final_dict, + status=status, + start_time=start, + end_time=time.time(), + error=None if status == "success" else f"upstream_status={upstream_status}", + ) + + @proxy_router.post("/v1/chat/completions") -async def chat_completions(body: dict[str, Any], request: Request): +async def chat_completions(request: Request): """OpenAI-compatible chat completions proxy endpoint. - In replay mode (``replay_traj_path`` set), serves the next record from - ``app.state.replay_cursor`` directly — no litellm involvement. Otherwise - forwards to the configured upstream via ``litellm.acompletion``. + Reads the body as raw bytes (no parsing on the forward path) and either + serves it from the replay cursor or forwards it to the configured upstream. """ config: ModelServiceConfig = request.app.state.model_service_config - model_name = body.get("model", "") - is_stream = bool(body.get("stream")) + recorder: TrajectoryRecorder | None = getattr(request.app.state, "recorder", None) + + body_bytes = await request.body() + try: + request_dict = json.loads(body_bytes) if body_bytes else {} + except json.JSONDecodeError: + raise HTTPException(status_code=400, detail="Request body is not valid JSON.") + if not isinstance(request_dict, dict): + raise HTTPException(status_code=400, detail="Request body must be a JSON object.") - # ---- Replay mode: short-circuit, never touch litellm ---- + model_name = request_dict.get("model", "") + is_stream = bool(request_dict.get("stream")) + + # ---- Replay mode: short-circuit, no upstream call ---- if config.replay_traj_path: cursor: SequentialCursor = request.app.state.replay_cursor try: @@ -209,44 +238,71 @@ async def chat_completions(body: dict[str, Any], request: Request): ) return JSONResponse(status_code=200, content=response_dict) - # ---- Forward / record mode: go through litellm ---- - api_base = get_base_url(model_name, config) - # custom_openai is litellm's catch-all for OpenAI-compatible third-party endpoints - # (DashScope, ModelScope, Groq, Mistral, ...). Unlike `openai/`, it does NOT do - # model-name lookup, so arbitrary upstream model names like "glm-5" / "qwen-turbo" - # work without "This model isn't mapped yet" errors. - litellm_model = f"custom_openai/{model_name}" if model_name else "custom_openai/default" - logger.info(f"Routing model '{model_name}' to {api_base}") + # ---- Forward / record mode: byte-passthrough via httpx ---- + upstream_url = f"{get_base_url(model_name, config)}/chat/completions" + fwd_headers = _filter_headers(request.headers) + logger.info(f"Routing model {model_name!r} to {upstream_url}") - api_key = _extract_bearer_token(request.headers) - extra_headers = _filter_headers(request.headers) + if is_stream: + return StreamingResponse( + _forward_stream_and_record( + upstream_url=upstream_url, + body_bytes=body_bytes, + fwd_headers=fwd_headers, + timeout=config.request_timeout, + request_dict=request_dict, + recorder=recorder, + ), + media_type="text/event-stream", + ) - call_kwargs = dict(body) - call_kwargs.pop("model", None) + # Non-stream: single POST, return upstream's status + body verbatim, record on the side. + start = time.time() + try: + async with httpx.AsyncClient(timeout=config.request_timeout) as client: + r = await client.post(upstream_url, content=body_bytes, headers=fwd_headers) + except httpx.TimeoutException as exc: + if recorder is not None: + await recorder.record( + request=request_dict, + response=None, + status="failure", + start_time=start, + end_time=time.time(), + error=f"timeout: {exc}", + ) + raise HTTPException(status_code=504, detail=f"Upstream timed out: {exc}") + except httpx.RequestError as exc: + if recorder is not None: + await recorder.record( + request=request_dict, + response=None, + status="failure", + start_time=start, + end_time=time.time(), + error=f"{type(exc).__name__}: {exc}", + ) + raise HTTPException(status_code=502, detail=f"Upstream request failed: {exc}") + response_text = r.text # bytes already read by httpx; .text decodes once + response_dict: dict | None = None try: - response = await litellm.acompletion( - model=litellm_model, - api_base=api_base, - api_key=api_key, - extra_headers=extra_headers, - timeout=config.request_timeout, - num_retries=config.num_retries, - # Zero-cost rates suppress "model isn't mapped yet" from litellm's - # post-call cost calculator for arbitrary upstream model names. - input_cost_per_token=0, - output_cost_per_token=0, - **call_kwargs, + parsed = json.loads(response_text) if response_text else None + if isinstance(parsed, dict): + response_dict = parsed + except json.JSONDecodeError: + pass + + if recorder is not None: + await recorder.record( + request=request_dict, + response=response_dict, + status="success" if r.status_code < 400 else "failure", + start_time=start, + end_time=time.time(), + error=None if r.status_code < 400 else f"upstream_status={r.status_code}", ) - except (RateLimitError, APIError, BadRequestError, AuthenticationError, Timeout) as exc: - logger.warning(f"litellm error for model '{model_name}': {exc}") - return _format_error_response(exc) - except Exception as exc: # pragma: no cover - last-resort safety net - logger.error(f"Unexpected proxy error: {exc}", exc_info=True) - raise HTTPException(status_code=500, detail=str(exc)) - - if is_stream: - return StreamingResponse(_sse_iter(response), media_type="text/event-stream") - body_out = response.model_dump() if hasattr(response, "model_dump") else response - return JSONResponse(status_code=200, content=body_out) + # Forward bytes verbatim — preserves any provider-specific fields untouched. + media_type = r.headers.get("content-type", "application/json") + return Response(content=response_text, status_code=r.status_code, media_type=media_type) diff --git a/rock/sdk/model/server/integrations/traj_recorder.py b/rock/sdk/model/server/integrations/traj_recorder.py index 6aa01a8eed..a0c7e08fc7 100644 --- a/rock/sdk/model/server/integrations/traj_recorder.py +++ b/rock/sdk/model/server/integrations/traj_recorder.py @@ -1,9 +1,13 @@ -"""Record chat/completions trajectories as JSONL via litellm's CustomLogger hook. +"""Append chat/completions trajectories as JSONL. -One line per call, each line is a ``StandardLoggingPayload`` dict from litellm. -Streaming chunks are aggregated by litellm before this callback fires (see -litellm/litellm_core_utils/litellm_logging.py around line 1930), so we don't -need to handle the streaming/non-streaming split ourselves. +The recorder is invoked **explicitly** from ``proxy.py`` after each forwarded +call (success or failure). It is no longer a litellm CustomLogger — we removed +the litellm SDK dependency in favor of httpx-based byte forwarding, and call +this object directly so writes stay deterministic and locally testable. + +Schema per line: a small dict with ``request`` / ``response`` / ``status`` / +``response_time`` / ``model`` / ``stream``. Faithful enough to drive the +sequential replayer; not a full StandardLoggingPayload. """ from __future__ import annotations @@ -11,9 +15,9 @@ import asyncio import json import os +import time from pathlib import Path - -from litellm.integrations.custom_logger import CustomLogger +from typing import Any from rock.logger import init_logger from rock.sdk.model.server.utils import ( @@ -25,53 +29,62 @@ logger = init_logger(__name__) -class TrajectoryRecorder(CustomLogger): - """litellm CustomLogger that appends each call's StandardLoggingPayload to JSONL - and reports OTLP RT/count metrics.""" +class TrajectoryRecorder: + """Appends one JSONL line per chat/completions call and reports OTLP metrics.""" def __init__(self, traj_file: str | os.PathLike) -> None: - super().__init__() self.traj_file = Path(traj_file) self.traj_file.parent.mkdir(parents=True, exist_ok=True) self._lock = asyncio.Lock() self._monitor = _get_or_create_metrics_monitor() - async def async_log_success_event(self, kwargs, response_obj, start_time, end_time): - payload = kwargs.get("standard_logging_object") - if payload is None: - logger.debug("[traj-recorder] success event without standard_logging_object, skipping") - return - await self._append_jsonl(payload) - self._record_metrics(payload, status="success") - - async def async_log_failure_event(self, kwargs, response_obj, start_time, end_time): - payload = kwargs.get("standard_logging_object") - if payload is None: - return - await self._append_jsonl(payload) - self._record_metrics(payload, status="failure") - - async def _append_jsonl(self, payload: dict) -> None: + async def record( + self, + *, + request: dict[str, Any], + response: dict[str, Any] | None, + status: str, + start_time: float, + end_time: float, + error: str | None = None, + ) -> None: + """Persist one call to the JSONL file and report RT/count metrics. + + ``request`` / ``response`` are stored verbatim (whatever the upstream + returned, including provider-specific fields like ``reasoning_content``). + For streaming calls, ``response`` is the aggregated final ChatCompletion + produced by ``ChatCompletionStreamState.get_final_completion().model_dump()``. + """ + rt_seconds = end_time - start_time + payload = { + "model": request.get("model"), + "stream": bool(request.get("stream")), + "status": status, + "response_time": rt_seconds, + "start_time": start_time, + "end_time": end_time, + "request": request, + "response": response, + "error": error, + } + line = json.dumps(payload, ensure_ascii=False, default=str) + "\n" async with self._lock: await asyncio.to_thread(self._write_line, line) - def _write_line(self, line: str) -> None: - with self.traj_file.open("a", encoding="utf-8") as f: - f.write(line) - - def _record_metrics(self, payload: dict, *, status: str) -> None: - rt_seconds = payload.get("response_time") - if rt_seconds is None: - start = payload.get("startTime") - end = payload.get("endTime") - rt_seconds = (end - start) if (start is not None and end is not None) else 0.0 - rt_ms = float(rt_seconds) * 1000.0 - attrs = { "type": "chat_completions", "status": status, "sandbox_id": os.getenv("ROCK_SANDBOX_ID", "unknown"), } - self._monitor.record_gauge_by_name(MODEL_SERVICE_REQUEST_RT, rt_ms, attributes=attrs) + self._monitor.record_gauge_by_name(MODEL_SERVICE_REQUEST_RT, rt_seconds * 1000.0, attributes=attrs) self._monitor.record_counter_by_name(MODEL_SERVICE_REQUEST_COUNT, 1, attributes=attrs) + + def _write_line(self, line: str) -> None: + with self.traj_file.open("a", encoding="utf-8") as f: + f.write(line) + + +def now() -> float: + """Wall-clock seconds (single shim so callers don't import time directly).""" + return time.time() diff --git a/rock/sdk/model/server/main.py b/rock/sdk/model/server/main.py index 133605b046..31b4918e55 100644 --- a/rock/sdk/model/server/main.py +++ b/rock/sdk/model/server/main.py @@ -53,18 +53,15 @@ async def global_exception_handler(request, exc): def _configure_proxy_integrations(app: FastAPI, config: ModelServiceConfig) -> None: - """Wire up record/replay integrations for the proxy mode. - - - When ``replay_traj_path`` is set, load the trajectory into a - ``SequentialCursor`` and attach it to ``app.state.replay_cursor``. The - proxy handler serves recorded responses directly from this cursor; we - do NOT register anything with litellm (replay path bypasses litellm - entirely so cursor-exhausted errors aren't swallowed by retry logic). - - Otherwise (record/forward mode), if ``traj_enabled`` is True, register - ``TrajectoryRecorder`` as a ``litellm.callbacks`` entry so every - chat/completions call appends a JSONL line. - - Replay and record are mutually exclusive: in replay mode we don't record, + """Wire up record/replay integrations and attach them to ``app.state``. + + - Replay mode (``replay_traj_path`` set): load the trajectory into a + ``SequentialCursor`` and stash it as ``app.state.replay_cursor``. + - Forward/record mode (default): if ``traj_enabled`` is True, attach a + ``TrajectoryRecorder`` instance as ``app.state.recorder``. The proxy + handler invokes it explicitly after each forwarded call. + + Replay and record are mutually exclusive — in replay mode we don't record, since replayed responses round-tripping back into the source file would inflate metrics and corrupt the trajectory. """ @@ -76,14 +73,11 @@ def _configure_proxy_integrations(app: FastAPI, config: ModelServiceConfig) -> N return if config.traj_enabled: - import litellm - from rock.sdk.model.server.integrations.traj_recorder import TrajectoryRecorder traj_path = config.traj_file or TRAJ_FILE - recorder = TrajectoryRecorder(traj_file=traj_path) - litellm.callbacks.append(recorder) - logger.info(f"litellm trajectory recorder registered, traj_file={traj_path}") + app.state.recorder = TrajectoryRecorder(traj_file=traj_path) + logger.info(f"trajectory recorder attached, traj_file={traj_path}") def main( diff --git a/tests/unit/sdk/model/test_proxy.py b/tests/unit/sdk/model/test_proxy.py index 5a4d7e8230..88c61edcb3 100644 --- a/tests/unit/sdk/model/test_proxy.py +++ b/tests/unit/sdk/model/test_proxy.py @@ -1,7 +1,15 @@ +"""Tests for the chat/completions proxy. + +Forward path is exercised by pointing the proxy at an httpx ``MockTransport`` +(no real network). Replay path is exercised end-to-end via the FastAPI test +client. Config / CLI / metrics-singleton tests round out the file. +""" + +import argparse import json -from types import SimpleNamespace -from unittest.mock import AsyncMock, MagicMock, patch +from unittest.mock import MagicMock, patch +import httpx import pytest import yaml from fastapi import FastAPI @@ -9,6 +17,7 @@ from rock.sdk.model.server.api.proxy import proxy_router from rock.sdk.model.server.config import ModelServiceConfig +from rock.sdk.model.server.integrations.traj_replayer import SequentialCursor from rock.sdk.model.server.main import create_config_from_args, lifespan from rock.sdk.model.server.utils import ( MODEL_SERVICE_REQUEST_COUNT, @@ -17,171 +26,287 @@ record_traj, ) -# Initialize a temporary FastAPI application for testing the router -test_app = FastAPI() -test_app.include_router(proxy_router) -mock_config = ModelServiceConfig() -test_app.state.model_service_config = mock_config +def _build_app(config: ModelServiceConfig, *, replay_cursor=None) -> FastAPI: + """Build a FastAPI app with the proxy router and the given config attached.""" + app = FastAPI() + app.state.model_service_config = config + if replay_cursor is not None: + app.state.replay_cursor = replay_cursor + app.include_router(proxy_router) + return app + + +def _patch_httpx_with_handler(handler): + """Patch ``proxy.httpx.AsyncClient`` so each ``async with httpx.AsyncClient(...)`` + returns a real client wrapping ``MockTransport(handler)``.""" + real_client_cls = httpx.AsyncClient # capture before patching kicks in + transport = httpx.MockTransport(handler) + def factory(*args, **kwargs): + kwargs.pop("timeout", None) # transport supplies the response, no timeout needed + return real_client_cls(transport=transport, **kwargs) -# Patch path for the litellm.acompletion symbol as imported inside proxy.py. -ACOMPLETION_PATCH = "rock.sdk.model.server.api.proxy.litellm.acompletion" + return patch("rock.sdk.model.server.api.proxy.httpx.AsyncClient", side_effect=factory) -def _fake_model_response(*, id="chat-123", choices=None) -> SimpleNamespace: - """Build a litellm-shaped object that exposes .model_dump() like a Pydantic model.""" - payload = { - "id": id, +def _success_response_json(*, model: str = "gpt-3.5-turbo", content: str = "hi") -> dict: + return { + "id": "chatcmpl-1", "object": "chat.completion", - "model": "gpt-3.5-turbo", - "choices": choices - or [ - {"index": 0, "message": {"role": "assistant", "content": "hi"}, "finish_reason": "stop"}, + "created": 1234, + "model": model, + "choices": [ + { + "index": 0, + "message": {"role": "assistant", "content": content}, + "finish_reason": "stop", + } ], "usage": {"prompt_tokens": 1, "completion_tokens": 1, "total_tokens": 2}, } - return SimpleNamespace(model_dump=lambda: payload) + + +# ---------- Forward path: routing ---------- @pytest.mark.asyncio -async def test_chat_completions_routing_success(): - """Routing: model name maps to its proxy_rules entry, passed to litellm as api_base.""" - with patch(ACOMPLETION_PATCH, new_callable=AsyncMock) as mock_acompletion: - mock_acompletion.return_value = _fake_model_response() +async def test_forward_routes_by_model_name_to_proxy_rules(): + captured = {} - transport = ASGITransport(app=test_app) + def handler(request: httpx.Request) -> httpx.Response: + captured["url"] = str(request.url) + captured["body"] = json.loads(request.content) + return httpx.Response(200, json=_success_response_json()) + + app = _build_app(ModelServiceConfig()) + with _patch_httpx_with_handler(handler): + transport = ASGITransport(app=app) async with AsyncClient(transport=transport, base_url="http://test") as ac: - payload = {"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "hello"}]} - response = await ac.post("/v1/chat/completions", json=payload) + r = await ac.post( + "/v1/chat/completions", + json={"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "hi"}]}, + ) - assert response.status_code == 200 - assert mock_acompletion.called - call_kwargs = mock_acompletion.call_args.kwargs - assert call_kwargs["api_base"] == "https://api.openai.com/v1" - assert call_kwargs["model"] == "custom_openai/gpt-3.5-turbo" - assert call_kwargs["messages"] == [{"role": "user", "content": "hello"}] + assert r.status_code == 200 + assert captured["url"] == "https://api.openai.com/v1/chat/completions" + assert captured["body"]["model"] == "gpt-3.5-turbo" @pytest.mark.asyncio -async def test_chat_completions_fallback_to_default_when_not_found(): - """Unrecognized model name → falls back to the 'default' base URL.""" - with patch(ACOMPLETION_PATCH, new_callable=AsyncMock) as mock_acompletion: - mock_acompletion.return_value = _fake_model_response(id="chat-fallback") +async def test_forward_falls_back_to_default_for_unknown_model(): + captured = {} + + def handler(request: httpx.Request) -> httpx.Response: + captured["url"] = str(request.url) + return httpx.Response(200, json=_success_response_json(model="some-random")) - config = test_app.state.model_service_config - default_base_url = config.proxy_rules["default"].rstrip("/") + config = ModelServiceConfig() + expected_default = config.proxy_rules["default"].rstrip("/") + "/chat/completions" + app = _build_app(config) - transport = ASGITransport(app=test_app) + with _patch_httpx_with_handler(handler): + transport = ASGITransport(app=app) async with AsyncClient(transport=transport, base_url="http://test") as ac: - payload = { - "model": "some-random-unsupported-model", - "messages": [{"role": "user", "content": "hello"}], - } - response = await ac.post("/v1/chat/completions", json=payload) + r = await ac.post( + "/v1/chat/completions", + json={"model": "some-random", "messages": [{"role": "user", "content": "hi"}]}, + ) - assert response.status_code == 200 - call_kwargs = mock_acompletion.call_args.kwargs - assert call_kwargs["api_base"] == default_base_url + assert r.status_code == 200 + assert captured["url"] == expected_default @pytest.mark.asyncio -async def test_chat_completions_routing_absolute_fail(): - """No matching rule and no 'default' → 400.""" - empty_config = ModelServiceConfig() - empty_config.proxy_rules = {} +async def test_forward_400_when_no_rule_and_no_default(): + config = ModelServiceConfig() + config.proxy_rules = {} + app = _build_app(config) - with patch.object(test_app.state, "model_service_config", empty_config): - transport = ASGITransport(app=test_app) - async with AsyncClient(transport=transport, base_url="http://test") as ac: - payload = {"model": "any-model", "messages": [{"role": "user", "content": "hello"}]} - response = await ac.post("/v1/chat/completions", json=payload) + transport = ASGITransport(app=app) + async with AsyncClient(transport=transport, base_url="http://test") as ac: + r = await ac.post( + "/v1/chat/completions", + json={"model": "any", "messages": [{"role": "user", "content": "hi"}]}, + ) - assert response.status_code == 400 - detail = response.json()["detail"] - assert "not configured" in detail + assert r.status_code == 400 + assert "not configured" in r.json()["detail"] @pytest.mark.asyncio -async def test_proxy_base_url_overrides_proxy_rules(): - """When proxy_base_url is set, all requests go to that URL, ignoring proxy_rules.""" +async def test_forward_proxy_base_url_overrides_proxy_rules(): + captured = {} + + def handler(request: httpx.Request) -> httpx.Response: + captured["url"] = str(request.url) + return httpx.Response(200, json=_success_response_json()) + config = ModelServiceConfig() config.proxy_base_url = "https://custom-endpoint.example.com/v1" + app = _build_app(config) - local_app = FastAPI() - local_app.state.model_service_config = config - local_app.include_router(proxy_router) + with _patch_httpx_with_handler(handler): + transport = ASGITransport(app=app) + async with AsyncClient(transport=transport, base_url="http://test") as ac: + await ac.post( + "/v1/chat/completions", + json={"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "hi"}]}, + ) - with patch(ACOMPLETION_PATCH, new_callable=AsyncMock) as mock_acompletion: - mock_acompletion.return_value = _fake_model_response() + assert captured["url"] == "https://custom-endpoint.example.com/v1/chat/completions" - transport = ASGITransport(app=local_app) - async with AsyncClient(transport=transport, base_url="http://test") as ac: - payload = {"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "hello"}]} - response = await ac.post("/v1/chat/completions", json=payload) - assert response.status_code == 200 - call_kwargs = mock_acompletion.call_args.kwargs - assert call_kwargs["api_base"] == "https://custom-endpoint.example.com/v1" +# ---------- Forward path: byte passthrough ---------- @pytest.mark.asyncio -async def test_chat_completions_passes_num_retries_and_timeout(): - """num_retries and request_timeout from config flow through to litellm.acompletion.""" - config = ModelServiceConfig() - config.num_retries = 3 - config.request_timeout = 45 +async def test_forward_response_body_is_byte_for_byte_passthrough(): + """Upstream's exact JSON bytes (incl. provider-specific fields) reach the client.""" + upstream_payload = { + "id": "x", + "object": "chat.completion", + "model": "glm-5", + "choices": [ + { + "index": 0, + "message": {"role": "assistant", "content": "hi", "reasoning_content": "...think..."}, + "finish_reason": "stop", + } + ], + "provider_specific_fields": {"vendor_field": "vendor_value"}, + } + + def handler(request: httpx.Request) -> httpx.Response: + return httpx.Response(200, json=upstream_payload) + + app = _build_app(ModelServiceConfig()) + with _patch_httpx_with_handler(handler): + transport = ASGITransport(app=app) + async with AsyncClient(transport=transport, base_url="http://test") as ac: + r = await ac.post( + "/v1/chat/completions", + json={"model": "glm-5", "messages": [{"role": "user", "content": "hi"}]}, + ) - local_app = FastAPI() - local_app.state.model_service_config = config - local_app.include_router(proxy_router) + body = r.json() + assert body["choices"][0]["message"]["reasoning_content"] == "...think..." + assert body["provider_specific_fields"] == {"vendor_field": "vendor_value"} - with patch(ACOMPLETION_PATCH, new_callable=AsyncMock) as mock_acompletion: - mock_acompletion.return_value = _fake_model_response() - transport = ASGITransport(app=local_app) +@pytest.mark.asyncio +async def test_forward_propagates_upstream_status_and_body_on_4xx(): + """Upstream 4xx is forwarded verbatim — proxy doesn't re-shape error JSON.""" + err_body = {"error": {"message": "context length exceeded", "type": "BadRequestError"}} + + def handler(request: httpx.Request) -> httpx.Response: + return httpx.Response(400, json=err_body) + + app = _build_app(ModelServiceConfig()) + with _patch_httpx_with_handler(handler): + transport = ASGITransport(app=app) async with AsyncClient(transport=transport, base_url="http://test") as ac: - payload = {"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "hi"}]} - await ac.post("/v1/chat/completions", json=payload) + r = await ac.post( + "/v1/chat/completions", + json={"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "hi"}]}, + ) - call_kwargs = mock_acompletion.call_args.kwargs - assert call_kwargs["num_retries"] == 3 - assert call_kwargs["timeout"] == 45 + assert r.status_code == 400 + assert r.json() == err_body @pytest.mark.asyncio -async def test_chat_completions_litellm_error_returns_proxy_schema(): - """A litellm exception is converted to {error:{message,type,code}} JSON - so agent-side keyword detection (e.g. 'context length exceeded') keeps working.""" - from litellm.exceptions import BadRequestError - - err = BadRequestError( - message="context length exceeded for this model", - model="gpt-3.5-turbo", - llm_provider="openai", - ) +async def test_forward_authorization_header_passes_through(): + captured = {} + + def handler(request: httpx.Request) -> httpx.Response: + captured["headers"] = dict(request.headers) + return httpx.Response(200, json=_success_response_json()) + + app = _build_app(ModelServiceConfig()) + with _patch_httpx_with_handler(handler): + transport = ASGITransport(app=app) + async with AsyncClient(transport=transport, base_url="http://test") as ac: + await ac.post( + "/v1/chat/completions", + json={"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "hi"}]}, + headers={"Authorization": "Bearer sk-abc", "X-Trace": "t1"}, + ) + + # Authorization and custom X-* headers are forwarded verbatim. We don't assert + # on framing headers (connection / content-length / accept-encoding) because + # httpx rebuilds them itself for the outgoing request. + auth_value = captured["headers"].get("Authorization") or captured["headers"].get("authorization") + assert auth_value == "Bearer sk-abc" + fwd_lower = {k.lower() for k in captured["headers"]} + assert "x-trace" in fwd_lower + - with patch(ACOMPLETION_PATCH, new_callable=AsyncMock) as mock_acompletion: - mock_acompletion.side_effect = err +@pytest.mark.asyncio +async def test_forward_502_on_upstream_connection_failure(): + def handler(request: httpx.Request) -> httpx.Response: + raise httpx.ConnectError("upstream is down") - transport = ASGITransport(app=test_app) + app = _build_app(ModelServiceConfig()) + with _patch_httpx_with_handler(handler): + transport = ASGITransport(app=app) async with AsyncClient(transport=transport, base_url="http://test") as ac: - payload = {"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "hello"}]} - response = await ac.post("/v1/chat/completions", json=payload) + r = await ac.post( + "/v1/chat/completions", + json={"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "hi"}]}, + ) - body = response.json() - assert "error" in body - assert "context length exceeded" in body["error"]["message"] - assert body["error"]["type"] == "BadRequestError" - assert body["error"]["code"] == response.status_code + assert r.status_code == 502 + + +# ---------- Forward path: recording ---------- @pytest.mark.asyncio -async def test_replay_mode_returns_recorded_response_without_calling_litellm(tmp_path): - """In replay mode the proxy serves the next record directly from app.state.replay_cursor; - litellm.acompletion must never be invoked.""" - from rock.sdk.model.server.integrations.traj_replayer import SequentialCursor +async def test_forward_invokes_recorder_on_success(tmp_path): + """When app.state.recorder is set, success calls write a JSONL line with the + request and the upstream response verbatim.""" + from rock.sdk.model.server.integrations.traj_recorder import TrajectoryRecorder + + upstream_payload = _success_response_json(content="recorded reply") + + def handler(request: httpx.Request) -> httpx.Response: + return httpx.Response(200, json=upstream_payload) + + config = ModelServiceConfig() + app = _build_app(config) + traj_file = tmp_path / "traj.jsonl" + app.state.recorder = TrajectoryRecorder(traj_file=traj_file) + + with ( + _patch_httpx_with_handler(handler), + patch( + "rock.sdk.model.server.integrations.traj_recorder._get_or_create_metrics_monitor", return_value=MagicMock() + ), + ): + # Re-create the recorder so it picks up the patched monitor. + app.state.recorder = TrajectoryRecorder(traj_file=traj_file) + transport = ASGITransport(app=app) + async with AsyncClient(transport=transport, base_url="http://test") as ac: + await ac.post( + "/v1/chat/completions", + json={"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "hi"}]}, + ) + + line = traj_file.read_text(encoding="utf-8").strip() + record = json.loads(line) + assert record["status"] == "success" + assert record["model"] == "gpt-3.5-turbo" + assert record["stream"] is False + assert record["request"]["messages"][0]["content"] == "hi" + assert record["response"] == upstream_payload + +# ---------- Replay path ---------- + + +@pytest.mark.asyncio +async def test_replay_returns_recorded_response_no_upstream_call(tmp_path): record = { "model": "gpt-3.5-turbo", "response": { @@ -195,7 +320,6 @@ async def test_replay_mode_returns_recorded_response_without_calling_litellm(tmp "finish_reason": "stop", } ], - "usage": {"prompt_tokens": 1, "completion_tokens": 2, "total_tokens": 3}, }, } traj = tmp_path / "t.jsonl" @@ -203,29 +327,21 @@ async def test_replay_mode_returns_recorded_response_without_calling_litellm(tmp config = ModelServiceConfig() config.replay_traj_path = str(traj) + app = _build_app(config, replay_cursor=SequentialCursor.load(traj)) - local_app = FastAPI() - local_app.state.model_service_config = config - local_app.state.replay_cursor = SequentialCursor.load(traj) - local_app.include_router(proxy_router) - - with patch(ACOMPLETION_PATCH, new_callable=AsyncMock) as mock_acompletion: - transport = ASGITransport(app=local_app) - async with AsyncClient(transport=transport, base_url="http://test") as ac: - payload = {"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "hi"}]} - response = await ac.post("/v1/chat/completions", json=payload) + transport = ASGITransport(app=app) + async with AsyncClient(transport=transport, base_url="http://test") as ac: + r = await ac.post( + "/v1/chat/completions", + json={"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "hi"}]}, + ) - assert response.status_code == 200 - body = response.json() - assert body["choices"][0]["message"]["content"] == "recorded reply" - mock_acompletion.assert_not_called() + assert r.status_code == 200 + assert r.json()["choices"][0]["message"]["content"] == "recorded reply" @pytest.mark.asyncio -async def test_replay_mode_streaming_emits_recorded_response_as_sse(tmp_path): - """Replay + stream=True emits one SSE chunk (content moved into delta) plus [DONE].""" - from rock.sdk.model.server.integrations.traj_replayer import SequentialCursor - +async def test_replay_streaming_emits_recorded_response_as_sse(tmp_path): record = { "model": "gpt-3.5-turbo", "response": { @@ -246,33 +362,24 @@ async def test_replay_mode_streaming_emits_recorded_response_as_sse(tmp_path): config = ModelServiceConfig() config.replay_traj_path = str(traj) + app = _build_app(config, replay_cursor=SequentialCursor.load(traj)) - local_app = FastAPI() - local_app.state.model_service_config = config - local_app.state.replay_cursor = SequentialCursor.load(traj) - local_app.include_router(proxy_router) - - transport = ASGITransport(app=local_app) + transport = ASGITransport(app=app) async with AsyncClient(transport=transport, base_url="http://test") as ac: - response = await ac.post( + r = await ac.post( "/v1/chat/completions", json={"model": "gpt-3.5-turbo", "stream": True, "messages": [{"role": "user", "content": "hi"}]}, ) - assert response.status_code == 200 - body = response.text + body = r.text assert "data: [DONE]" in body - # The SSE chunk shape is chat.completion.chunk with message → delta, finish_reason preserved assert '"object": "chat.completion.chunk"' in body assert '"delta": {"role": "assistant", "content": "streamed reply"}' in body assert '"finish_reason": "tool_calls"' in body @pytest.mark.asyncio -async def test_replay_mode_returns_404_when_cursor_exhausted(tmp_path): - """Cursor used up → 404 with a clear message; no litellm retries involved.""" - from rock.sdk.model.server.integrations.traj_replayer import SequentialCursor - +async def test_replay_returns_404_when_cursor_exhausted(tmp_path): record = { "model": "gpt-3.5-turbo", "response": { @@ -285,13 +392,9 @@ async def test_replay_mode_returns_404_when_cursor_exhausted(tmp_path): config = ModelServiceConfig() config.replay_traj_path = str(traj) + app = _build_app(config, replay_cursor=SequentialCursor.load(traj)) - local_app = FastAPI() - local_app.state.model_service_config = config - local_app.state.replay_cursor = SequentialCursor.load(traj) - local_app.include_router(proxy_router) - - transport = ASGITransport(app=local_app) + transport = ASGITransport(app=app) async with AsyncClient(transport=transport, base_url="http://test") as ac: await ac.post( "/v1/chat/completions", @@ -306,42 +409,11 @@ async def test_replay_mode_returns_404_when_cursor_exhausted(tmp_path): assert "exhausted" in second.json()["detail"] -@pytest.mark.asyncio -async def test_chat_completions_extracts_bearer_token_and_strips_framing_headers(): - """Bearer token goes to api_key kwarg; host / content-length / transfer-encoding / - Authorization are not forwarded as extra_headers (litellm regenerates Authorization - from api_key, so passing it both ways would conflict). Custom X-* headers pass through.""" - captured = {} - - async def capture(*args, **kwargs): - captured.update(kwargs) - return _fake_model_response() - - with patch(ACOMPLETION_PATCH, new=capture): - transport = ASGITransport(app=test_app) - async with AsyncClient(transport=transport, base_url="http://test") as ac: - payload = {"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "hi"}]} - await ac.post( - "/v1/chat/completions", - json=payload, - headers={"Authorization": "Bearer abc", "X-Trace": "t1"}, - ) - - assert captured["api_key"] == "abc" - - forwarded = captured["extra_headers"] - forwarded_lower = {k.lower() for k in forwarded} - assert "x-trace" in forwarded_lower - assert "authorization" not in forwarded_lower - assert "host" not in forwarded_lower - assert "content-length" not in forwarded_lower - assert "content-type" not in forwarded_lower - assert "transfer-encoding" not in forwarded_lower +# ---------- Lifespan + Config ---------- @pytest.mark.asyncio async def test_lifespan_initialization_with_config(tmp_path): - """Application initializes correctly when a valid config file is provided.""" conf_file = tmp_path / "proxy.yml" conf_file.write_text(yaml.dump({"proxy_rules": {"my-model": "http://custom-url"}, "request_timeout": 50})) @@ -349,73 +421,27 @@ async def test_lifespan_initialization_with_config(tmp_path): app = FastAPI(lifespan=lambda app: lifespan(app, config)) async with lifespan(app, config): - app_config = app.state.model_service_config - assert app_config.proxy_rules["my-model"] == "http://custom-url" - assert app_config.request_timeout == 50 - assert "gpt-3.5-turbo" not in app_config.proxy_rules - - -@pytest.mark.asyncio -async def test_lifespan_initialization_no_config(): - """Defaults are loaded when no config file is provided.""" - config = ModelServiceConfig() - app = FastAPI(lifespan=lambda app: lifespan(app, config)) - - async with lifespan(app, config): - app_config = app.state.model_service_config - assert "gpt-3.5-turbo" in app_config.proxy_rules - assert app_config.request_timeout == 120 + assert app.state.model_service_config.proxy_rules["my-model"] == "http://custom-url" + assert app.state.model_service_config.request_timeout == 50 @pytest.mark.asyncio async def test_lifespan_invalid_config_path(): - """Non-existent config path → FileNotFoundError.""" with pytest.raises(FileNotFoundError): ModelServiceConfig.from_file("/tmp/non_existent_file.yml") -@pytest.mark.asyncio -async def test_config_loads_host_and_port_from_file(tmp_path): - """ModelServiceConfig loads host and port from config file.""" - conf_file = tmp_path / "proxy.yml" - conf_file.write_text( - yaml.dump({"host": "127.0.0.1", "port": 9000, "proxy_rules": {"my-model": "http://my-backend"}}) - ) - - config = ModelServiceConfig.from_file(str(conf_file)) - - assert config.host == "127.0.0.1" - assert config.port == 9000 - assert config.proxy_rules["my-model"] == "http://my-backend" - - def test_config_default_host_and_port(): config = ModelServiceConfig() assert config.host == "0.0.0.0" assert config.port == 8080 -@pytest.mark.asyncio -async def test_config_loads_retryable_status_codes_from_file(tmp_path): - conf_file = tmp_path / "proxy.yml" - conf_file.write_text(yaml.dump({"retryable_status_codes": [429, 500, 502, 503]})) - - config = ModelServiceConfig.from_file(str(conf_file)) - assert config.retryable_status_codes == [429, 500, 502, 503] - - -def test_config_default_retryable_status_codes(): - config = ModelServiceConfig() - assert config.retryable_status_codes == [429, 500] - - def test_config_default_traj_and_replay(): - """New traj/replay defaults: recording on (append=True), replay off.""" config = ModelServiceConfig() assert config.traj_enabled is True assert config.traj_file is None assert config.replay_traj_path is None - assert config.num_retries == 6 @pytest.mark.asyncio @@ -427,22 +453,16 @@ async def test_config_loads_traj_and_replay_from_file(tmp_path): "traj_enabled": False, "traj_file": "/tmp/my-traj.jsonl", "replay_traj_path": "/tmp/in.jsonl", - "num_retries": 2, } ) ) - config = ModelServiceConfig.from_file(str(conf_file)) assert config.traj_enabled is False assert config.traj_file == "/tmp/my-traj.jsonl" assert config.replay_traj_path == "/tmp/in.jsonl" - assert config.num_retries == 2 def test_cli_args_override_config_file(tmp_path): - """CLI arguments override config file settings.""" - import argparse - conf_file = tmp_path / "proxy.yml" conf_file.write_text( yaml.dump( @@ -450,37 +470,28 @@ def test_cli_args_override_config_file(tmp_path): "host": "192.168.1.1", "port": 8080, "proxy_base_url": "https://config-url.example.com/v1", - "retryable_status_codes": [429, 500], "request_timeout": 60, } ) ) - args = argparse.Namespace( config_file=str(conf_file), host="0.0.0.0", port=9000, proxy_base_url="https://cli-url.example.com/v1", - retryable_status_codes="502,503", + retryable_status_codes=None, request_timeout=30, - num_retries=4, + num_retries=None, traj_file=None, ) - config = create_config_from_args(args) - assert config.host == "0.0.0.0" assert config.port == 9000 assert config.proxy_base_url == "https://cli-url.example.com/v1" - assert config.retryable_status_codes == [502, 503] assert config.request_timeout == 30 - assert config.num_retries == 4 def test_cli_traj_file_enables_replay(): - """--traj-file sets replay_enabled, replay_traj_path, and disables recording.""" - import argparse - args = argparse.Namespace( config_file=None, host=None, @@ -491,100 +502,36 @@ def test_cli_traj_file_enables_replay(): num_retries=None, traj_file="/tmp/in.jsonl", ) - config = create_config_from_args(args) assert config.replay_traj_path == "/tmp/in.jsonl" assert config.traj_enabled is False -@pytest.mark.asyncio -async def test_config_file_overrides_defaults(tmp_path): - conf_file = tmp_path / "proxy.yml" - conf_file.write_text( - yaml.dump( - { - "host": "10.0.0.1", - "port": 8888, - "request_timeout": 300, - "proxy_rules": {"test-model": "http://test-backend"}, - } - ) - ) - - config = ModelServiceConfig.from_file(str(conf_file)) - - assert config.host == "10.0.0.1" - assert config.port == 8888 - assert config.request_timeout == 300 - assert config.proxy_rules["test-model"] == "http://test-backend" - assert config.proxy_base_url is None +# ---------- Metrics singleton + legacy record_traj (still used by local mode) ---------- def test_metrics_monitor_is_singleton(): - """_get_or_create_metrics_monitor returns the same instance on repeated calls.""" import rock.sdk.model.server.utils as utils_module with patch("rock.sdk.model.server.utils.MetricsMonitor") as mock_cls: - mock_monitor = MagicMock() - mock_cls.create.return_value = mock_monitor + mock_cls.create.return_value = MagicMock() utils_module._metrics_monitor = None - first = _get_or_create_metrics_monitor() second = _get_or_create_metrics_monitor() - assert first is second - assert mock_cls.create.call_count == 1 - utils_module._metrics_monitor = None - - -def test_metrics_monitor_uses_env_endpoint(): - """ROCK_METRICS_ENDPOINT env var is passed to MetricsMonitor.create().""" - import rock.sdk.model.server.utils as utils_module - - custom_endpoint = "http://my-otel-collector:4318/v1/metrics" - - with ( - patch("rock.sdk.model.server.utils.MetricsMonitor") as mock_cls, - patch.dict("os.environ", {"ROCK_METRICS_ENDPOINT": custom_endpoint}), - ): - mock_monitor = MagicMock() - mock_cls.create.return_value = mock_monitor - utils_module._metrics_monitor = None - _get_or_create_metrics_monitor() - mock_cls.create.assert_called_once_with(metrics_endpoint=custom_endpoint) - utils_module._metrics_monitor = None - - -def test_metrics_monitor_registers_gauge_and_counter(): - """_get_or_create_metrics_monitor registers both metrics on first creation.""" - import rock.sdk.model.server.utils as utils_module - - with patch("rock.sdk.model.server.utils.MetricsMonitor") as mock_cls: - mock_monitor = MagicMock() - mock_cls.create.return_value = mock_monitor - utils_module._metrics_monitor = None - _get_or_create_metrics_monitor() - - mock_monitor._register_gauge.assert_called_once_with( - MODEL_SERVICE_REQUEST_RT, "total execution time for request", "ms" - ) - mock_monitor._register_counter.assert_called_once_with( - MODEL_SERVICE_REQUEST_COUNT, "total request count", "count" - ) utils_module._metrics_monitor = None @pytest.mark.asyncio -async def test_record_traj_reports_rt_and_count(): +async def test_record_traj_decorator_reports_rt_and_count(): """Legacy record_traj decorator (still used by local mode) reports RT/count.""" import rock.sdk.model.server.utils as utils_module - mock_monitor = MagicMock() - with ( patch("rock.sdk.model.server.utils.MetricsMonitor") as mock_cls, - patch.dict("os.environ", {"ROCK_SANDBOX_ID": "sandbox-test-001"}), + patch.dict("os.environ", {"ROCK_SANDBOX_ID": "sandbox-test"}), ): + mock_monitor = MagicMock() mock_cls.create.return_value = mock_monitor utils_module._metrics_monitor = None @@ -594,42 +541,11 @@ async def fake_handler(body: dict): await fake_handler({"model": "gpt-4", "messages": []}) - mock_monitor.record_gauge_by_name.assert_called_once() gauge_call = mock_monitor.record_gauge_by_name.call_args assert gauge_call[0][0] == MODEL_SERVICE_REQUEST_RT - assert gauge_call[1]["attributes"]["type"] == "chat_completions" - assert gauge_call[1]["attributes"]["sandbox_id"] == "sandbox-test-001" + assert gauge_call[1]["attributes"]["sandbox_id"] == "sandbox-test" - mock_monitor.record_counter_by_name.assert_called_once() counter_call = mock_monitor.record_counter_by_name.call_args assert counter_call[0][0] == MODEL_SERVICE_REQUEST_COUNT - assert counter_call[0][1] == 1 - assert counter_call[1]["attributes"]["sandbox_id"] == "sandbox-test-001" - - utils_module._metrics_monitor = None - - -@pytest.mark.asyncio -async def test_record_traj_sandbox_id_defaults_to_unknown(): - """sandbox_id defaults to 'unknown' when ROCK_SANDBOX_ID is not set.""" - import rock.sdk.model.server.utils as utils_module - - mock_monitor = MagicMock() - - with patch("rock.sdk.model.server.utils.MetricsMonitor") as mock_cls, patch.dict("os.environ", {}, clear=False): - os_env = __import__("os").environ - os_env.pop("ROCK_SANDBOX_ID", None) - - mock_cls.create.return_value = mock_monitor - utils_module._metrics_monitor = None - - @record_traj - async def fake_handler(body: dict): - return {"id": "resp-2", "choices": []} - - await fake_handler({"model": "gpt-4", "messages": []}) - - gauge_call = mock_monitor.record_gauge_by_name.call_args - assert gauge_call[1]["attributes"]["sandbox_id"] == "unknown" utils_module._metrics_monitor = None diff --git a/tests/unit/sdk/model/test_traj_recorder.py b/tests/unit/sdk/model/test_traj_recorder.py index c9b1c20197..6eb3b49571 100644 --- a/tests/unit/sdk/model/test_traj_recorder.py +++ b/tests/unit/sdk/model/test_traj_recorder.py @@ -1,4 +1,4 @@ -"""Tests for TrajectoryRecorder (litellm CustomLogger that writes JSONL + emits OTLP metrics).""" +"""Tests for TrajectoryRecorder (explicit-call API, no longer a litellm CustomLogger).""" import json from unittest.mock import MagicMock, patch @@ -8,42 +8,6 @@ from rock.sdk.model.server.integrations.traj_recorder import TrajectoryRecorder -def _sample_payload(**overrides): - payload = { - "id": "chatcmpl-abc", - "trace_id": "trace-1", - "call_type": "acompletion", - "stream": False, - "status": "success", - "model": "gpt-3.5-turbo", - "model_id": None, - "model_group": None, - "api_base": "https://api.openai.com/v1", - "messages": [{"role": "user", "content": "hi"}], - "response": { - "id": "chatcmpl-abc", - "choices": [ - { - "index": 0, - "message": {"role": "assistant", "content": "hello back"}, - "finish_reason": "stop", - } - ], - }, - "model_parameters": {"temperature": 0.7}, - "startTime": 100.0, - "endTime": 100.5, - "completionStartTime": 100.5, - "response_time": 0.5, - "total_tokens": 12, - "prompt_tokens": 4, - "completion_tokens": 8, - "metadata": {}, - } - payload.update(overrides) - return payload - - @pytest.fixture def mock_monitor(): monitor = MagicMock() @@ -54,117 +18,124 @@ def mock_monitor(): yield monitor +def _make_recorder(traj_file) -> TrajectoryRecorder: + return TrajectoryRecorder(traj_file=traj_file) + + @pytest.mark.asyncio async def test_recorder_appends_each_call_as_jsonl_line(tmp_path, mock_monitor): - """Each successful call adds one JSONL line (always append-only).""" traj_file = tmp_path / "traj.jsonl" - recorder = TrajectoryRecorder(traj_file=traj_file) - - payload_a = _sample_payload(id="a", trace_id="run-1") - payload_b = _sample_payload(id="b", trace_id="run-1") - - await recorder.async_log_success_event( - kwargs={"standard_logging_object": payload_a}, response_obj=None, start_time=0, end_time=1 + recorder = _make_recorder(traj_file) + + await recorder.record( + request={"model": "gpt-4", "messages": [{"role": "user", "content": "hi"}]}, + response={"id": "a", "choices": []}, + status="success", + start_time=100.0, + end_time=100.5, ) - await recorder.async_log_success_event( - kwargs={"standard_logging_object": payload_b}, response_obj=None, start_time=0, end_time=1 + await recorder.record( + request={"model": "gpt-4", "messages": [{"role": "user", "content": "again"}]}, + response={"id": "b", "choices": []}, + status="success", + start_time=101.0, + end_time=101.2, ) lines = traj_file.read_text(encoding="utf-8").strip().split("\n") assert len(lines) == 2 - assert json.loads(lines[0])["id"] == "a" - assert json.loads(lines[1])["id"] == "b" + assert json.loads(lines[0])["response"]["id"] == "a" + assert json.loads(lines[1])["response"]["id"] == "b" @pytest.mark.asyncio -async def test_recorder_emits_metrics_with_sandbox_id(tmp_path, mock_monitor): +async def test_recorder_writes_request_and_response_verbatim(tmp_path, mock_monitor): + """Provider-specific fields (reasoning_content, citations, ...) survive untouched.""" traj_file = tmp_path / "traj.jsonl" - recorder = TrajectoryRecorder(traj_file=traj_file) - - with patch.dict("os.environ", {"ROCK_SANDBOX_ID": "sandbox-xyz"}): - await recorder.async_log_success_event( - kwargs={"standard_logging_object": _sample_payload()}, - response_obj=None, - start_time=0, - end_time=1, - ) - - mock_monitor.record_gauge_by_name.assert_called_once() - gauge_args = mock_monitor.record_gauge_by_name.call_args - assert gauge_args.args[0] == "model_service.request.rt" - # response_time of 0.5s → 500 ms - assert gauge_args.args[1] == 500.0 - assert gauge_args.kwargs["attributes"]["status"] == "success" - assert gauge_args.kwargs["attributes"]["sandbox_id"] == "sandbox-xyz" - assert gauge_args.kwargs["attributes"]["type"] == "chat_completions" + recorder = _make_recorder(traj_file) + + request = {"model": "glm-5", "stream": True, "messages": [{"role": "user", "content": "你是谁"}]} + response = { + "id": "x", + "choices": [ + { + "index": 0, + "message": {"role": "assistant", "content": "我是 GLM", "reasoning_content": "用户问..."}, + "finish_reason": "stop", + } + ], + } + await recorder.record(request=request, response=response, status="success", start_time=0.0, end_time=1.0) - mock_monitor.record_counter_by_name.assert_called_once_with( - "model_service.request.count", 1, attributes=gauge_args.kwargs["attributes"] - ) + record = json.loads(traj_file.read_text(encoding="utf-8").strip()) + assert record["model"] == "glm-5" + assert record["stream"] is True + assert record["request"] == request + assert record["response"] == response + assert record["response_time"] == 1.0 @pytest.mark.asyncio -async def test_recorder_records_failure_with_failure_status(tmp_path, mock_monitor): +async def test_recorder_emits_metrics_with_status_and_sandbox_id(tmp_path, mock_monitor): traj_file = tmp_path / "traj.jsonl" - recorder = TrajectoryRecorder(traj_file=traj_file) + recorder = _make_recorder(traj_file) - failed_payload = _sample_payload(status="failure", error_information={"error_class": "RateLimitError"}) - - await recorder.async_log_failure_event( - kwargs={"standard_logging_object": failed_payload}, - response_obj=None, - start_time=0, - end_time=1, - ) + with patch.dict("os.environ", {"ROCK_SANDBOX_ID": "sandbox-xyz"}): + await recorder.record( + request={"model": "gpt-4"}, + response={"id": "x", "choices": []}, + status="success", + start_time=0.0, + end_time=0.5, + ) - lines = traj_file.read_text(encoding="utf-8").strip().split("\n") - assert len(lines) == 1 - assert json.loads(lines[0])["status"] == "failure" + gauge_call = mock_monitor.record_gauge_by_name.call_args + assert gauge_call[0][0] == "model_service.request.rt" + assert gauge_call[0][1] == 500.0 # 0.5s -> 500 ms + assert gauge_call[1]["attributes"]["status"] == "success" + assert gauge_call[1]["attributes"]["sandbox_id"] == "sandbox-xyz" + assert gauge_call[1]["attributes"]["type"] == "chat_completions" - gauge_args = mock_monitor.record_gauge_by_name.call_args - assert gauge_args.kwargs["attributes"]["status"] == "failure" + mock_monitor.record_counter_by_name.assert_called_once_with( + "model_service.request.count", 1, attributes=gauge_call[1]["attributes"] + ) @pytest.mark.asyncio -async def test_recorder_skips_when_payload_missing(tmp_path, mock_monitor): - """If litellm doesn't attach a standard_logging_object, the recorder no-ops.""" +async def test_recorder_records_failure_with_error_text(tmp_path, mock_monitor): traj_file = tmp_path / "traj.jsonl" - recorder = TrajectoryRecorder(traj_file=traj_file) + recorder = _make_recorder(traj_file) + + await recorder.record( + request={"model": "gpt-4"}, + response=None, + status="failure", + start_time=0.0, + end_time=1.0, + error="upstream_status=429", + ) - await recorder.async_log_success_event(kwargs={}, response_obj=None, start_time=0, end_time=1) + record = json.loads(traj_file.read_text(encoding="utf-8").strip()) + assert record["status"] == "failure" + assert record["error"] == "upstream_status=429" + assert record["response"] is None - assert not traj_file.exists() or traj_file.read_text() == "" - mock_monitor.record_gauge_by_name.assert_not_called() - mock_monitor.record_counter_by_name.assert_not_called() + gauge_call = mock_monitor.record_gauge_by_name.call_args + assert gauge_call[1]["attributes"]["status"] == "failure" @pytest.mark.asyncio async def test_recorder_creates_parent_directory(tmp_path, mock_monitor): traj_file = tmp_path / "deep" / "nested" / "traj.jsonl" - - recorder = TrajectoryRecorder(traj_file=traj_file) - await recorder.async_log_success_event( - kwargs={"standard_logging_object": _sample_payload()}, - response_obj=None, - start_time=0, - end_time=1, + recorder = _make_recorder(traj_file) + + await recorder.record( + request={"model": "gpt-4"}, + response={"id": "x", "choices": []}, + status="success", + start_time=0.0, + end_time=0.5, ) assert traj_file.exists() assert traj_file.parent.is_dir() - - -@pytest.mark.asyncio -async def test_recorder_falls_back_to_start_end_time_when_response_time_missing(tmp_path, mock_monitor): - traj_file = tmp_path / "traj.jsonl" - recorder = TrajectoryRecorder(traj_file=traj_file) - - payload = _sample_payload(startTime=10.0, endTime=10.25) - payload.pop("response_time", None) - - await recorder.async_log_success_event( - kwargs={"standard_logging_object": payload}, response_obj=None, start_time=0, end_time=1 - ) - - gauge_args = mock_monitor.record_gauge_by_name.call_args - assert abs(gauge_args.args[1] - 250.0) < 1e-6 diff --git a/uv.lock b/uv.lock index 6a3efc2ba4..cfed10409c 100644 --- a/uv.lock +++ b/uv.lock @@ -1303,69 +1303,6 @@ wheels = [ { url = "https://mirrors.aliyun.com/pypi/packages/2e/7a/c11883a98676e74a405d6503d65f58c3fa076ddd9c0cee6044884f6eac38/fastcore-1.8.15-py3-none-any.whl", hash = "sha256:d005d10d7ee5c2abb7ac0544da7c9f0a0a2f7706b48892a27c1906487ca6dea9" }, ] -[[package]] -name = "fastuuid" -version = "0.14.0" -source = { registry = "https://mirrors.aliyun.com/pypi/simple/" } -sdist = { url = "https://mirrors.aliyun.com/pypi/packages/c3/7d/d9daedf0f2ebcacd20d599928f8913e9d2aea1d56d2d355a93bfa2b611d7/fastuuid-0.14.0.tar.gz", hash = "sha256:178947fc2f995b38497a74172adee64fdeb8b7ec18f2a5934d037641ba265d26" } -wheels = [ - { url = "https://mirrors.aliyun.com/pypi/packages/ad/b2/731a6696e37cd20eed353f69a09f37a984a43c9713764ee3f7ad5f57f7f9/fastuuid-0.14.0-cp310-cp310-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl", hash = "sha256:6e6243d40f6c793c3e2ee14c13769e341b90be5ef0c23c82fa6515a96145181a" }, - { url = "https://mirrors.aliyun.com/pypi/packages/c5/79/c73c47be2a3b8734d16e628982653517f80bbe0570e27185d91af6096507/fastuuid-0.14.0-cp310-cp310-macosx_10_12_x86_64.whl", hash = "sha256:13ec4f2c3b04271f62be2e1ce7e95ad2dd1cf97e94503a3760db739afbd48f00" }, - { url = "https://mirrors.aliyun.com/pypi/packages/24/c5/84c1eea05977c8ba5173555b0133e3558dc628bcf868d6bf1689ff14aedc/fastuuid-0.14.0-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:b2fdd48b5e4236df145a149d7125badb28e0a383372add3fbaac9a6b7a394470" }, - { url = "https://mirrors.aliyun.com/pypi/packages/0e/23/4e362367b7fa17dbed646922f216b9921efb486e7abe02147e4b917359f8/fastuuid-0.14.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:f74631b8322d2780ebcf2d2d75d58045c3e9378625ec51865fe0b5620800c39d" }, - { url = "https://mirrors.aliyun.com/pypi/packages/b2/72/3985be633b5a428e9eaec4287ed4b873b7c4c53a9639a8b416637223c4cd/fastuuid-0.14.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:83cffc144dc93eb604b87b179837f2ce2af44871a7b323f2bfed40e8acb40ba8" }, - { url = "https://mirrors.aliyun.com/pypi/packages/b3/6d/6ef192a6df34e2266d5c9deb39cd3eea986df650cbcfeaf171aa52a059c3/fastuuid-0.14.0-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:1a771f135ab4523eb786e95493803942a5d1fc1610915f131b363f55af53b219" }, - { url = "https://mirrors.aliyun.com/pypi/packages/9d/11/8a2ea753c68d4fece29d5d7c6f3f903948cc6e82d1823bc9f7f7c0355db3/fastuuid-0.14.0-cp310-cp310-musllinux_1_1_aarch64.whl", hash = "sha256:4edc56b877d960b4eda2c4232f953a61490c3134da94f3c28af129fb9c62a4f6" }, - { url = "https://mirrors.aliyun.com/pypi/packages/23/42/7a32c93b6ce12642d9a152ee4753a078f372c9ebb893bc489d838dd4afd5/fastuuid-0.14.0-cp310-cp310-musllinux_1_1_i686.whl", hash = "sha256:bcc96ee819c282e7c09b2eed2b9bd13084e3b749fdb2faf58c318d498df2efbe" }, - { url = "https://mirrors.aliyun.com/pypi/packages/b9/e9/a5f6f686b46e3ed4ed3b93770111c233baac87dd6586a411b4988018ef1d/fastuuid-0.14.0-cp310-cp310-musllinux_1_1_x86_64.whl", hash = "sha256:7a3c0bca61eacc1843ea97b288d6789fbad7400d16db24e36a66c28c268cfe3d" }, - { url = "https://mirrors.aliyun.com/pypi/packages/b4/c9/18abc73c9c5b7fc0e476c1733b678783b2e8a35b0be9babd423571d44e98/fastuuid-0.14.0-cp310-cp310-win32.whl", hash = "sha256:7f2f3efade4937fae4e77efae1af571902263de7b78a0aee1a1653795a093b2a" }, - { url = "https://mirrors.aliyun.com/pypi/packages/5e/8a/d9e33f4eb4d4f6d9f2c5c7d7e96b5cdbb535c93f3b1ad6acce97ee9d4bf8/fastuuid-0.14.0-cp310-cp310-win_amd64.whl", hash = "sha256:ae64ba730d179f439b0736208b4c279b8bc9c089b102aec23f86512ea458c8a4" }, - { url = "https://mirrors.aliyun.com/pypi/packages/98/f3/12481bda4e5b6d3e698fbf525df4443cc7dce746f246b86b6fcb2fba1844/fastuuid-0.14.0-cp311-cp311-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl", hash = "sha256:73946cb950c8caf65127d4e9a325e2b6be0442a224fd51ba3b6ac44e1912ce34" }, - { url = "https://mirrors.aliyun.com/pypi/packages/59/19/2fc58a1446e4d72b655648eb0879b04e88ed6fa70d474efcf550f640f6ec/fastuuid-0.14.0-cp311-cp311-macosx_10_12_x86_64.whl", hash = "sha256:12ac85024637586a5b69645e7ed986f7535106ed3013640a393a03e461740cb7" }, - { url = "https://mirrors.aliyun.com/pypi/packages/78/29/3c74756e5b02c40cfcc8b1d8b5bac4edbd532b55917a6bcc9113550e99d1/fastuuid-0.14.0-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:05a8dde1f395e0c9b4be515b7a521403d1e8349443e7641761af07c7ad1624b1" }, - { url = "https://mirrors.aliyun.com/pypi/packages/52/96/d761da3fccfa84f0f353ce6e3eb8b7f76b3aa21fd25e1b00a19f9c80a063/fastuuid-0.14.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:09378a05020e3e4883dfdab438926f31fea15fd17604908f3d39cbeb22a0b4dc" }, - { url = "https://mirrors.aliyun.com/pypi/packages/fc/c2/f84c90167cc7765cb82b3ff7808057608b21c14a38531845d933a4637307/fastuuid-0.14.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:bbb0c4b15d66b435d2538f3827f05e44e2baafcc003dd7d8472dc67807ab8fd8" }, - { url = "https://mirrors.aliyun.com/pypi/packages/af/7b/4bacd03897b88c12348e7bd77943bac32ccf80ff98100598fcff74f75f2e/fastuuid-0.14.0-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:cd5a7f648d4365b41dbf0e38fe8da4884e57bed4e77c83598e076ac0c93995e7" }, - { url = "https://mirrors.aliyun.com/pypi/packages/c0/a2/584f2c29641df8bd810d00c1f21d408c12e9ad0c0dafdb8b7b29e5ddf787/fastuuid-0.14.0-cp311-cp311-musllinux_1_1_aarch64.whl", hash = "sha256:c0a94245afae4d7af8c43b3159d5e3934c53f47140be0be624b96acd672ceb73" }, - { url = "https://mirrors.aliyun.com/pypi/packages/24/68/c6b77443bb7764c760e211002c8638c0c7cce11cb584927e723215ba1398/fastuuid-0.14.0-cp311-cp311-musllinux_1_1_i686.whl", hash = "sha256:2b29e23c97e77c3a9514d70ce343571e469098ac7f5a269320a0f0b3e193ab36" }, - { url = "https://mirrors.aliyun.com/pypi/packages/5a/87/93f553111b33f9bb83145be12868c3c475bf8ea87c107063d01377cc0e8e/fastuuid-0.14.0-cp311-cp311-musllinux_1_1_x86_64.whl", hash = "sha256:1e690d48f923c253f28151b3a6b4e335f2b06bf669c68a02665bc150b7839e94" }, - { url = "https://mirrors.aliyun.com/pypi/packages/9e/8c/a04d486ca55b5abb7eaa65b39df8d891b7b1635b22db2163734dc273579a/fastuuid-0.14.0-cp311-cp311-win32.whl", hash = "sha256:a6f46790d59ab38c6aa0e35c681c0484b50dc0acf9e2679c005d61e019313c24" }, - { url = "https://mirrors.aliyun.com/pypi/packages/9c/b2/2d40bf00820de94b9280366a122cbaa60090c8cf59e89ac3938cf5d75895/fastuuid-0.14.0-cp311-cp311-win_amd64.whl", hash = "sha256:e150eab56c95dc9e3fefc234a0eedb342fac433dacc273cd4d150a5b0871e1fa" }, - { url = "https://mirrors.aliyun.com/pypi/packages/02/a2/e78fcc5df65467f0d207661b7ef86c5b7ac62eea337c0c0fcedbeee6fb13/fastuuid-0.14.0-cp312-cp312-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl", hash = "sha256:77e94728324b63660ebf8adb27055e92d2e4611645bf12ed9d88d30486471d0a" }, - { url = "https://mirrors.aliyun.com/pypi/packages/2b/b3/c846f933f22f581f558ee63f81f29fa924acd971ce903dab1a9b6701816e/fastuuid-0.14.0-cp312-cp312-macosx_10_12_x86_64.whl", hash = "sha256:caa1f14d2102cb8d353096bc6ef6c13b2c81f347e6ab9d6fbd48b9dea41c153d" }, - { url = "https://mirrors.aliyun.com/pypi/packages/54/ea/682551030f8c4fa9a769d9825570ad28c0c71e30cf34020b85c1f7ee7382/fastuuid-0.14.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:d23ef06f9e67163be38cece704170486715b177f6baae338110983f99a72c070" }, - { url = "https://mirrors.aliyun.com/pypi/packages/14/dd/5927f0a523d8e6a76b70968e6004966ee7df30322f5fc9b6cdfb0276646a/fastuuid-0.14.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:0c9ec605ace243b6dbe3bd27ebdd5d33b00d8d1d3f580b39fdd15cd96fd71796" }, - { url = "https://mirrors.aliyun.com/pypi/packages/16/6e/c0fb547eef61293153348f12e0f75a06abb322664b34a1573a7760501336/fastuuid-0.14.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:808527f2407f58a76c916d6aa15d58692a4a019fdf8d4c32ac7ff303b7d7af09" }, - { url = "https://mirrors.aliyun.com/pypi/packages/2d/b1/b9c75e03b768f61cf2e84ee193dc18601aeaf89a4684b20f2f0e9f52b62c/fastuuid-0.14.0-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:2fb3c0d7fef6674bbeacdd6dbd386924a7b60b26de849266d1ff6602937675c8" }, - { url = "https://mirrors.aliyun.com/pypi/packages/fc/fa/f7395fdac07c7a54f18f801744573707321ca0cee082e638e36452355a9d/fastuuid-0.14.0-cp312-cp312-musllinux_1_1_aarch64.whl", hash = "sha256:ab3f5d36e4393e628a4df337c2c039069344db5f4b9d2a3c9cea48284f1dd741" }, - { url = "https://mirrors.aliyun.com/pypi/packages/66/49/c9fd06a4a0b1f0f048aacb6599e7d96e5d6bc6fa680ed0d46bf111929d1b/fastuuid-0.14.0-cp312-cp312-musllinux_1_1_i686.whl", hash = "sha256:b9a0ca4f03b7e0b01425281ffd44e99d360e15c895f1907ca105854ed85e2057" }, - { url = "https://mirrors.aliyun.com/pypi/packages/be/9c/909e8c95b494e8e140e8be6165d5fc3f61fdc46198c1554df7b3e1764471/fastuuid-0.14.0-cp312-cp312-musllinux_1_1_x86_64.whl", hash = "sha256:3acdf655684cc09e60fb7e4cf524e8f42ea760031945aa8086c7eae2eeeabeb8" }, - { url = "https://mirrors.aliyun.com/pypi/packages/90/eb/d29d17521976e673c55ef7f210d4cdd72091a9ec6755d0fd4710d9b3c871/fastuuid-0.14.0-cp312-cp312-win32.whl", hash = "sha256:9579618be6280700ae36ac42c3efd157049fe4dd40ca49b021280481c78c3176" }, - { url = "https://mirrors.aliyun.com/pypi/packages/cc/fc/f5c799a6ea6d877faec0472d0b27c079b47c86b1cdc577720a5386483b36/fastuuid-0.14.0-cp312-cp312-win_amd64.whl", hash = "sha256:d9e4332dc4ba054434a9594cbfaf7823b57993d7d8e7267831c3e059857cf397" }, - { url = "https://mirrors.aliyun.com/pypi/packages/a5/83/ae12dd39b9a39b55d7f90abb8971f1a5f3c321fd72d5aa83f90dc67fe9ed/fastuuid-0.14.0-cp313-cp313-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl", hash = "sha256:77a09cb7427e7af74c594e409f7731a0cf887221de2f698e1ca0ebf0f3139021" }, - { url = "https://mirrors.aliyun.com/pypi/packages/53/b0/a4b03ff5d00f563cc7546b933c28cb3f2a07344b2aec5834e874f7d44143/fastuuid-0.14.0-cp313-cp313-macosx_10_12_x86_64.whl", hash = "sha256:9bd57289daf7b153bfa3e8013446aa144ce5e8c825e9e366d455155ede5ea2dc" }, - { url = "https://mirrors.aliyun.com/pypi/packages/9c/6d/64aee0a0f6a58eeabadd582e55d0d7d70258ffdd01d093b30c53d668303b/fastuuid-0.14.0-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:ac60fc860cdf3c3f327374db87ab8e064c86566ca8c49d2e30df15eda1b0c2d5" }, - { url = "https://mirrors.aliyun.com/pypi/packages/60/f5/a7e9cda8369e4f7919d36552db9b2ae21db7915083bc6336f1b0082c8b2e/fastuuid-0.14.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:ab32f74bd56565b186f036e33129da77db8be09178cd2f5206a5d4035fb2a23f" }, - { url = "https://mirrors.aliyun.com/pypi/packages/f0/d3/8ce11827c783affffd5bd4d6378b28eb6cc6d2ddf41474006b8d62e7448e/fastuuid-0.14.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:33e678459cf4addaedd9936bbb038e35b3f6b2061330fd8f2f6a1d80414c0f87" }, - { url = "https://mirrors.aliyun.com/pypi/packages/a2/51/680fb6352d0bbade04036da46264a8001f74b7484e2fd1f4da9e3db1c666/fastuuid-0.14.0-cp313-cp313-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:1e3cc56742f76cd25ecb98e4b82a25f978ccffba02e4bdce8aba857b6d85d87b" }, - { url = "https://mirrors.aliyun.com/pypi/packages/fa/7c/2014b5785bd8ebdab04ec857635ebd84d5ee4950186a577db9eff0fb8ff6/fastuuid-0.14.0-cp313-cp313-musllinux_1_1_aarch64.whl", hash = "sha256:cb9a030f609194b679e1660f7e32733b7a0f332d519c5d5a6a0a580991290022" }, - { url = "https://mirrors.aliyun.com/pypi/packages/01/d2/524d4ceeba9160e7a9bc2ea3e8f4ccf1ad78f3bde34090ca0c51f09a5e91/fastuuid-0.14.0-cp313-cp313-musllinux_1_1_i686.whl", hash = "sha256:09098762aad4f8da3a888eb9ae01c84430c907a297b97166b8abc07b640f2995" }, - { url = "https://mirrors.aliyun.com/pypi/packages/bc/17/354d04951ce114bf4afc78e27a18cfbd6ee319ab1829c2d5fb5e94063ac6/fastuuid-0.14.0-cp313-cp313-musllinux_1_1_x86_64.whl", hash = "sha256:1383fff584fa249b16329a059c68ad45d030d5a4b70fb7c73a08d98fd53bcdab" }, - { url = "https://mirrors.aliyun.com/pypi/packages/fb/be/d7be8670151d16d88f15bb121c5b66cdb5ea6a0c2a362d0dcf30276ade53/fastuuid-0.14.0-cp313-cp313-win32.whl", hash = "sha256:a0809f8cc5731c066c909047f9a314d5f536c871a7a22e815cc4967c110ac9ad" }, - { url = "https://mirrors.aliyun.com/pypi/packages/22/1d/5573ef3624ceb7abf4a46073d3554e37191c868abc3aecd5289a72f9810a/fastuuid-0.14.0-cp313-cp313-win_amd64.whl", hash = "sha256:0df14e92e7ad3276327631c9e7cec09e32572ce82089c55cb1bb8df71cf394ed" }, - { url = "https://mirrors.aliyun.com/pypi/packages/16/c9/8c7660d1fe3862e3f8acabd9be7fc9ad71eb270f1c65cce9a2b7a31329ab/fastuuid-0.14.0-cp314-cp314-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl", hash = "sha256:b852a870a61cfc26c884af205d502881a2e59cc07076b60ab4a951cc0c94d1ad" }, - { url = "https://mirrors.aliyun.com/pypi/packages/4c/f4/a989c82f9a90d0ad995aa957b3e572ebef163c5299823b4027986f133dfb/fastuuid-0.14.0-cp314-cp314-macosx_10_12_x86_64.whl", hash = "sha256:c7502d6f54cd08024c3ea9b3514e2d6f190feb2f46e6dbcd3747882264bb5f7b" }, - { url = "https://mirrors.aliyun.com/pypi/packages/da/6c/a1a24f73574ac995482b1326cf7ab41301af0fabaa3e37eeb6b3df00e6e2/fastuuid-0.14.0-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:1ca61b592120cf314cfd66e662a5b54a578c5a15b26305e1b8b618a6f22df714" }, - { url = "https://mirrors.aliyun.com/pypi/packages/1a/20/2a9b59185ba7a6c7b37808431477c2d739fcbdabbf63e00243e37bd6bf49/fastuuid-0.14.0-cp314-cp314-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:aa75b6657ec129d0abded3bec745e6f7ab642e6dba3a5272a68247e85f5f316f" }, - { url = "https://mirrors.aliyun.com/pypi/packages/ef/33/4105ca574f6ded0af6a797d39add041bcfb468a1255fbbe82fcb6f592da2/fastuuid-0.14.0-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:a8a0dfea3972200f72d4c7df02c8ac70bad1bb4c58d7e0ec1e6f341679073a7f" }, - { url = "https://mirrors.aliyun.com/pypi/packages/fe/8c/fca59f8e21c4deb013f574eae05723737ddb1d2937ce87cb2a5d20992dc3/fastuuid-0.14.0-cp314-cp314-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:1bf539a7a95f35b419f9ad105d5a8a35036df35fdafae48fb2fd2e5f318f0d75" }, - { url = "https://mirrors.aliyun.com/pypi/packages/cb/e2/f78c271b909c034d429218f2798ca4e89eeda7983f4257d7865976ddbb6c/fastuuid-0.14.0-cp314-cp314-musllinux_1_1_aarch64.whl", hash = "sha256:9a133bf9cc78fdbd1179cb58a59ad0100aa32d8675508150f3658814aeefeaa4" }, - { url = "https://mirrors.aliyun.com/pypi/packages/1e/f0/5ff209d865897667a2ff3e7a572267a9ced8f7313919f6d6043aed8b1caa/fastuuid-0.14.0-cp314-cp314-musllinux_1_1_i686.whl", hash = "sha256:f54d5b36c56a2d5e1a31e73b950b28a0d83eb0c37b91d10408875a5a29494bad" }, - { url = "https://mirrors.aliyun.com/pypi/packages/e0/c8/2ce1c78f983a2c4987ea865d9516dbdfb141a120fd3abb977ae6f02ba7ca/fastuuid-0.14.0-cp314-cp314-musllinux_1_1_x86_64.whl", hash = "sha256:ec27778c6ca3393ef662e2762dba8af13f4ec1aaa32d08d77f71f2a70ae9feb8" }, - { url = "https://mirrors.aliyun.com/pypi/packages/df/60/dad662ec9a33b4a5fe44f60699258da64172c39bd041da2994422cdc40fe/fastuuid-0.14.0-cp314-cp314-win32.whl", hash = "sha256:e23fc6a83f112de4be0cc1990e5b127c27663ae43f866353166f87df58e73d06" }, - { url = "https://mirrors.aliyun.com/pypi/packages/1f/f6/da4db31001e854025ffd26bc9ba0740a9cbba2c3259695f7c5834908b336/fastuuid-0.14.0-cp314-cp314-win_amd64.whl", hash = "sha256:df61342889d0f5e7a32f7284e55ef95103f2110fee433c2ae7c2c0956d76ac8a" }, -] - [[package]] name = "filelock" version = "3.20.0" @@ -1991,18 +1928,6 @@ wheels = [ { url = "https://mirrors.aliyun.com/pypi/packages/cb/b1/3846dd7f199d53cb17f49cba7e651e9ce294d8497c8c150530ed11865bb8/iniconfig-2.3.0-py3-none-any.whl", hash = "sha256:f631c04d2c48c52b84d0d0549c99ff3859c98df65b3101406327ecc7d53fbf12" }, ] -[[package]] -name = "jinja2" -version = "3.1.6" -source = { registry = "https://mirrors.aliyun.com/pypi/simple/" } -dependencies = [ - { name = "markupsafe" }, -] -sdist = { url = "https://mirrors.aliyun.com/pypi/packages/df/bf/f7da0350254c0ed7c72f3e33cef02e048281fec7ecec5f032d4aac52226b/jinja2-3.1.6.tar.gz", hash = "sha256:0137fb05990d35f1275a587e9aee6d56da821fc83491a0fb838183be43f66d6d" } -wheels = [ - { url = "https://mirrors.aliyun.com/pypi/packages/62/a1/3d680cbfd5f4b8f15abc1d571870c5fc3e594bb582bc3b64ea099db13e56/jinja2-3.1.6-py3-none-any.whl", hash = "sha256:85ece4451f492d0c13c5dd7c13a64681a86afae63a5f347908daf103ce6d2f67" }, -] - [[package]] name = "jiter" version = "0.14.0" @@ -2309,29 +2234,6 @@ antlr4-13-2 = [ { name = "antlr4-python3-runtime" }, ] -[[package]] -name = "litellm" -version = "1.82.6" -source = { registry = "https://mirrors.aliyun.com/pypi/simple/" } -dependencies = [ - { name = "aiohttp" }, - { name = "click" }, - { name = "fastuuid" }, - { name = "httpx" }, - { name = "importlib-metadata" }, - { name = "jinja2" }, - { name = "jsonschema" }, - { name = "openai" }, - { name = "pydantic" }, - { name = "python-dotenv" }, - { name = "tiktoken" }, - { name = "tokenizers" }, -] -sdist = { url = "https://mirrors.aliyun.com/pypi/packages/29/75/1c537aa458426a9127a92bc2273787b2f987f4e5044e21f01f2eed5244fd/litellm-1.82.6.tar.gz", hash = "sha256:2aa1c2da21fe940c33613aa447119674a3ad4d2ad5eb064e4d5ce5ee42420136" } -wheels = [ - { url = "https://mirrors.aliyun.com/pypi/packages/02/6c/5327667e6dbe9e98cbfbd4261c8e91386a52e38f41419575854248bbab6a/litellm-1.82.6-py3-none-any.whl", hash = "sha256:164a3ef3e19f309e3cabc199bef3d2045212712fefdfa25fc7f75884a5b5b205" }, -] - [[package]] name = "magiccube" version = "0.3.0" @@ -2356,91 +2258,6 @@ wheels = [ { url = "https://mirrors.aliyun.com/pypi/packages/94/54/e7d793b573f298e1c9013b8c4dade17d481164aa517d1d7148619c2cedbf/markdown_it_py-4.0.0-py3-none-any.whl", hash = "sha256:87327c59b172c5011896038353a81343b6754500a08cd7a4973bb48c6d578147" }, ] -[[package]] -name = "markupsafe" -version = "3.0.3" -source = { registry = "https://mirrors.aliyun.com/pypi/simple/" } -sdist = { url = "https://mirrors.aliyun.com/pypi/packages/7e/99/7690b6d4034fffd95959cbe0c02de8deb3098cc577c67bb6a24fe5d7caa7/markupsafe-3.0.3.tar.gz", hash = "sha256:722695808f4b6457b320fdc131280796bdceb04ab50fe1795cd540799ebe1698" } -wheels = [ - { url = "https://mirrors.aliyun.com/pypi/packages/e8/4b/3541d44f3937ba468b75da9eebcae497dcf67adb65caa16760b0a6807ebb/markupsafe-3.0.3-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:2f981d352f04553a7171b8e44369f2af4055f888dfb147d55e42d29e29e74559" }, - { url = "https://mirrors.aliyun.com/pypi/packages/98/1b/fbd8eed11021cabd9226c37342fa6ca4e8a98d8188a8d9b66740494960e4/markupsafe-3.0.3-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:e1c1493fb6e50ab01d20a22826e57520f1284df32f2d8601fdd90b6304601419" }, - { url = "https://mirrors.aliyun.com/pypi/packages/40/01/e560d658dc0bb8ab762670ece35281dec7b6c1b33f5fbc09ebb57a185519/markupsafe-3.0.3-cp310-cp310-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:1ba88449deb3de88bd40044603fafffb7bc2b055d626a330323a9ed736661695" }, - { url = "https://mirrors.aliyun.com/pypi/packages/af/cd/ce6e848bbf2c32314c9b237839119c5a564a59725b53157c856e90937b7a/markupsafe-3.0.3-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:f42d0984e947b8adf7dd6dde396e720934d12c506ce84eea8476409563607591" }, - { url = "https://mirrors.aliyun.com/pypi/packages/c9/2a/b5c12c809f1c3045c4d580b035a743d12fcde53cf685dbc44660826308da/markupsafe-3.0.3-cp310-cp310-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:c0c0b3ade1c0b13b936d7970b1d37a57acde9199dc2aecc4c336773e1d86049c" }, - { url = "https://mirrors.aliyun.com/pypi/packages/cf/e3/9427a68c82728d0a88c50f890d0fc072a1484de2f3ac1ad0bfc1a7214fd5/markupsafe-3.0.3-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:0303439a41979d9e74d18ff5e2dd8c43ed6c6001fd40e5bf2e43f7bd9bbc523f" }, - { url = "https://mirrors.aliyun.com/pypi/packages/bc/36/23578f29e9e582a4d0278e009b38081dbe363c5e7165113fad546918a232/markupsafe-3.0.3-cp310-cp310-musllinux_1_2_riscv64.whl", hash = "sha256:d2ee202e79d8ed691ceebae8e0486bd9a2cd4794cec4824e1c99b6f5009502f6" }, - { url = "https://mirrors.aliyun.com/pypi/packages/56/21/dca11354e756ebd03e036bd8ad58d6d7168c80ce1fe5e75218e4945cbab7/markupsafe-3.0.3-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:177b5253b2834fe3678cb4a5f0059808258584c559193998be2601324fdeafb1" }, - { url = "https://mirrors.aliyun.com/pypi/packages/87/99/faba9369a7ad6e4d10b6a5fbf71fa2a188fe4a593b15f0963b73859a1bbd/markupsafe-3.0.3-cp310-cp310-win32.whl", hash = "sha256:2a15a08b17dd94c53a1da0438822d70ebcd13f8c3a95abe3a9ef9f11a94830aa" }, - { url = "https://mirrors.aliyun.com/pypi/packages/d6/25/55dc3ab959917602c96985cb1253efaa4ff42f71194bddeb61eb7278b8be/markupsafe-3.0.3-cp310-cp310-win_amd64.whl", hash = "sha256:c4ffb7ebf07cfe8931028e3e4c85f0357459a3f9f9490886198848f4fa002ec8" }, - { url = "https://mirrors.aliyun.com/pypi/packages/d0/9e/0a02226640c255d1da0b8d12e24ac2aa6734da68bff14c05dd53b94a0fc3/markupsafe-3.0.3-cp310-cp310-win_arm64.whl", hash = "sha256:e2103a929dfa2fcaf9bb4e7c091983a49c9ac3b19c9061b6d5427dd7d14d81a1" }, - { url = "https://mirrors.aliyun.com/pypi/packages/08/db/fefacb2136439fc8dd20e797950e749aa1f4997ed584c62cfb8ef7c2be0e/markupsafe-3.0.3-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:1cc7ea17a6824959616c525620e387f6dd30fec8cb44f649e31712db02123dad" }, - { url = "https://mirrors.aliyun.com/pypi/packages/e1/2e/5898933336b61975ce9dc04decbc0a7f2fee78c30353c5efba7f2d6ff27a/markupsafe-3.0.3-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:4bd4cd07944443f5a265608cc6aab442e4f74dff8088b0dfc8238647b8f6ae9a" }, - { url = "https://mirrors.aliyun.com/pypi/packages/1d/09/adf2df3699d87d1d8184038df46a9c80d78c0148492323f4693df54e17bb/markupsafe-3.0.3-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:6b5420a1d9450023228968e7e6a9ce57f65d148ab56d2313fcd589eee96a7a50" }, - { url = "https://mirrors.aliyun.com/pypi/packages/30/ac/0273f6fcb5f42e314c6d8cd99effae6a5354604d461b8d392b5ec9530a54/markupsafe-3.0.3-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:0bf2a864d67e76e5c9a34dc26ec616a66b9888e25e7b9460e1c76d3293bd9dbf" }, - { url = "https://mirrors.aliyun.com/pypi/packages/19/ae/31c1be199ef767124c042c6c3e904da327a2f7f0cd63a0337e1eca2967a8/markupsafe-3.0.3-cp311-cp311-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:bc51efed119bc9cfdf792cdeaa4d67e8f6fcccab66ed4bfdd6bde3e59bfcbb2f" }, - { url = "https://mirrors.aliyun.com/pypi/packages/b2/76/7edcab99d5349a4532a459e1fe64f0b0467a3365056ae550d3bcf3f79e1e/markupsafe-3.0.3-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:068f375c472b3e7acbe2d5318dea141359e6900156b5b2ba06a30b169086b91a" }, - { url = "https://mirrors.aliyun.com/pypi/packages/a4/28/6e74cdd26d7514849143d69f0bf2399f929c37dc2b31e6829fd2045b2765/markupsafe-3.0.3-cp311-cp311-musllinux_1_2_riscv64.whl", hash = "sha256:7be7b61bb172e1ed687f1754f8e7484f1c8019780f6f6b0786e76bb01c2ae115" }, - { url = "https://mirrors.aliyun.com/pypi/packages/62/7e/a145f36a5c2945673e590850a6f8014318d5577ed7e5920a4b3448e0865d/markupsafe-3.0.3-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:f9e130248f4462aaa8e2552d547f36ddadbeaa573879158d721bbd33dfe4743a" }, - { url = "https://mirrors.aliyun.com/pypi/packages/0f/62/d9c46a7f5c9adbeeeda52f5b8d802e1094e9717705a645efc71b0913a0a8/markupsafe-3.0.3-cp311-cp311-win32.whl", hash = "sha256:0db14f5dafddbb6d9208827849fad01f1a2609380add406671a26386cdf15a19" }, - { url = "https://mirrors.aliyun.com/pypi/packages/83/8a/4414c03d3f891739326e1783338e48fb49781cc915b2e0ee052aa490d586/markupsafe-3.0.3-cp311-cp311-win_amd64.whl", hash = "sha256:de8a88e63464af587c950061a5e6a67d3632e36df62b986892331d4620a35c01" }, - { url = "https://mirrors.aliyun.com/pypi/packages/35/73/893072b42e6862f319b5207adc9ae06070f095b358655f077f69a35601f0/markupsafe-3.0.3-cp311-cp311-win_arm64.whl", hash = "sha256:3b562dd9e9ea93f13d53989d23a7e775fdfd1066c33494ff43f5418bc8c58a5c" }, - { url = "https://mirrors.aliyun.com/pypi/packages/5a/72/147da192e38635ada20e0a2e1a51cf8823d2119ce8883f7053879c2199b5/markupsafe-3.0.3-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:d53197da72cc091b024dd97249dfc7794d6a56530370992a5e1a08983ad9230e" }, - { url = "https://mirrors.aliyun.com/pypi/packages/9a/81/7e4e08678a1f98521201c3079f77db69fb552acd56067661f8c2f534a718/markupsafe-3.0.3-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:1872df69a4de6aead3491198eaf13810b565bdbeec3ae2dc8780f14458ec73ce" }, - { url = "https://mirrors.aliyun.com/pypi/packages/1e/2c/799f4742efc39633a1b54a92eec4082e4f815314869865d876824c257c1e/markupsafe-3.0.3-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:3a7e8ae81ae39e62a41ec302f972ba6ae23a5c5396c8e60113e9066ef893da0d" }, - { url = "https://mirrors.aliyun.com/pypi/packages/3c/2e/8d0c2ab90a8c1d9a24f0399058ab8519a3279d1bd4289511d74e909f060e/markupsafe-3.0.3-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:d6dd0be5b5b189d31db7cda48b91d7e0a9795f31430b7f271219ab30f1d3ac9d" }, - { url = "https://mirrors.aliyun.com/pypi/packages/2c/54/887f3092a85238093a0b2154bd629c89444f395618842e8b0c41783898ea/markupsafe-3.0.3-cp312-cp312-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:94c6f0bb423f739146aec64595853541634bde58b2135f27f61c1ffd1cd4d16a" }, - { url = "https://mirrors.aliyun.com/pypi/packages/c9/2f/336b8c7b6f4a4d95e91119dc8521402461b74a485558d8f238a68312f11c/markupsafe-3.0.3-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:be8813b57049a7dc738189df53d69395eba14fb99345e0a5994914a3864c8a4b" }, - { url = "https://mirrors.aliyun.com/pypi/packages/32/43/67935f2b7e4982ffb50a4d169b724d74b62a3964bc1a9a527f5ac4f1ee2b/markupsafe-3.0.3-cp312-cp312-musllinux_1_2_riscv64.whl", hash = "sha256:83891d0e9fb81a825d9a6d61e3f07550ca70a076484292a70fde82c4b807286f" }, - { url = "https://mirrors.aliyun.com/pypi/packages/89/e0/4486f11e51bbba8b0c041098859e869e304d1c261e59244baa3d295d47b7/markupsafe-3.0.3-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:77f0643abe7495da77fb436f50f8dab76dbc6e5fd25d39589a0f1fe6548bfa2b" }, - { url = "https://mirrors.aliyun.com/pypi/packages/2f/e1/78ee7a023dac597a5825441ebd17170785a9dab23de95d2c7508ade94e0e/markupsafe-3.0.3-cp312-cp312-win32.whl", hash = "sha256:d88b440e37a16e651bda4c7c2b930eb586fd15ca7406cb39e211fcff3bf3017d" }, - { url = "https://mirrors.aliyun.com/pypi/packages/aa/5b/bec5aa9bbbb2c946ca2733ef9c4ca91c91b6a24580193e891b5f7dbe8e1e/markupsafe-3.0.3-cp312-cp312-win_amd64.whl", hash = "sha256:26a5784ded40c9e318cfc2bdb30fe164bdb8665ded9cd64d500a34fb42067b1c" }, - { url = "https://mirrors.aliyun.com/pypi/packages/e5/f1/216fc1bbfd74011693a4fd837e7026152e89c4bcf3e77b6692fba9923123/markupsafe-3.0.3-cp312-cp312-win_arm64.whl", hash = "sha256:35add3b638a5d900e807944a078b51922212fb3dedb01633a8defc4b01a3c85f" }, - { url = "https://mirrors.aliyun.com/pypi/packages/38/2f/907b9c7bbba283e68f20259574b13d005c121a0fa4c175f9bed27c4597ff/markupsafe-3.0.3-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:e1cf1972137e83c5d4c136c43ced9ac51d0e124706ee1c8aa8532c1287fa8795" }, - { url = "https://mirrors.aliyun.com/pypi/packages/9c/d9/5f7756922cdd676869eca1c4e3c0cd0df60ed30199ffd775e319089cb3ed/markupsafe-3.0.3-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:116bb52f642a37c115f517494ea5feb03889e04df47eeff5b130b1808ce7c219" }, - { url = "https://mirrors.aliyun.com/pypi/packages/00/07/575a68c754943058c78f30db02ee03a64b3c638586fba6a6dd56830b30a3/markupsafe-3.0.3-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:133a43e73a802c5562be9bbcd03d090aa5a1fe899db609c29e8c8d815c5f6de6" }, - { url = "https://mirrors.aliyun.com/pypi/packages/a9/21/9b05698b46f218fc0e118e1f8168395c65c8a2c750ae2bab54fc4bd4e0e8/markupsafe-3.0.3-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:ccfcd093f13f0f0b7fdd0f198b90053bf7b2f02a3927a30e63f3ccc9df56b676" }, - { url = "https://mirrors.aliyun.com/pypi/packages/7f/71/544260864f893f18b6827315b988c146b559391e6e7e8f7252839b1b846a/markupsafe-3.0.3-cp313-cp313-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:509fa21c6deb7a7a273d629cf5ec029bc209d1a51178615ddf718f5918992ab9" }, - { url = "https://mirrors.aliyun.com/pypi/packages/c2/28/b50fc2f74d1ad761af2f5dcce7492648b983d00a65b8c0e0cb457c82ebbe/markupsafe-3.0.3-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:a4afe79fb3de0b7097d81da19090f4df4f8d3a2b3adaa8764138aac2e44f3af1" }, - { url = "https://mirrors.aliyun.com/pypi/packages/ed/76/104b2aa106a208da8b17a2fb72e033a5a9d7073c68f7e508b94916ed47a9/markupsafe-3.0.3-cp313-cp313-musllinux_1_2_riscv64.whl", hash = "sha256:795e7751525cae078558e679d646ae45574b47ed6e7771863fcc079a6171a0fc" }, - { url = "https://mirrors.aliyun.com/pypi/packages/b5/99/16a5eb2d140087ebd97180d95249b00a03aa87e29cc224056274f2e45fd6/markupsafe-3.0.3-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:8485f406a96febb5140bfeca44a73e3ce5116b2501ac54fe953e488fb1d03b12" }, - { url = "https://mirrors.aliyun.com/pypi/packages/19/bc/e7140ed90c5d61d77cea142eed9f9c303f4c4806f60a1044c13e3f1471d0/markupsafe-3.0.3-cp313-cp313-win32.whl", hash = "sha256:bdd37121970bfd8be76c5fb069c7751683bdf373db1ed6c010162b2a130248ed" }, - { url = "https://mirrors.aliyun.com/pypi/packages/05/73/c4abe620b841b6b791f2edc248f556900667a5a1cf023a6646967ae98335/markupsafe-3.0.3-cp313-cp313-win_amd64.whl", hash = "sha256:9a1abfdc021a164803f4d485104931fb8f8c1efd55bc6b748d2f5774e78b62c5" }, - { url = "https://mirrors.aliyun.com/pypi/packages/f0/3a/fa34a0f7cfef23cf9500d68cb7c32dd64ffd58a12b09225fb03dd37d5b80/markupsafe-3.0.3-cp313-cp313-win_arm64.whl", hash = "sha256:7e68f88e5b8799aa49c85cd116c932a1ac15caaa3f5db09087854d218359e485" }, - { url = "https://mirrors.aliyun.com/pypi/packages/e4/d7/e05cd7efe43a88a17a37b3ae96e79a19e846f3f456fe79c57ca61356ef01/markupsafe-3.0.3-cp313-cp313t-macosx_10_13_x86_64.whl", hash = "sha256:218551f6df4868a8d527e3062d0fb968682fe92054e89978594c28e642c43a73" }, - { url = "https://mirrors.aliyun.com/pypi/packages/99/9e/e412117548182ce2148bdeacdda3bb494260c0b0184360fe0d56389b523b/markupsafe-3.0.3-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:3524b778fe5cfb3452a09d31e7b5adefeea8c5be1d43c4f810ba09f2ceb29d37" }, - { url = "https://mirrors.aliyun.com/pypi/packages/bc/e6/fa0ffcda717ef64a5108eaa7b4f5ed28d56122c9a6d70ab8b72f9f715c80/markupsafe-3.0.3-cp313-cp313t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:4e885a3d1efa2eadc93c894a21770e4bc67899e3543680313b09f139e149ab19" }, - { url = "https://mirrors.aliyun.com/pypi/packages/96/ec/2102e881fe9d25fc16cb4b25d5f5cde50970967ffa5dddafdb771237062d/markupsafe-3.0.3-cp313-cp313t-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:8709b08f4a89aa7586de0aadc8da56180242ee0ada3999749b183aa23df95025" }, - { url = "https://mirrors.aliyun.com/pypi/packages/4b/30/6f2fce1f1f205fc9323255b216ca8a235b15860c34b6798f810f05828e32/markupsafe-3.0.3-cp313-cp313t-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:b8512a91625c9b3da6f127803b166b629725e68af71f8184ae7e7d54686a56d6" }, - { url = "https://mirrors.aliyun.com/pypi/packages/58/47/4a0ccea4ab9f5dcb6f79c0236d954acb382202721e704223a8aafa38b5c8/markupsafe-3.0.3-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:9b79b7a16f7fedff2495d684f2b59b0457c3b493778c9eed31111be64d58279f" }, - { url = "https://mirrors.aliyun.com/pypi/packages/6a/70/3780e9b72180b6fecb83a4814d84c3bf4b4ae4bf0b19c27196104149734c/markupsafe-3.0.3-cp313-cp313t-musllinux_1_2_riscv64.whl", hash = "sha256:12c63dfb4a98206f045aa9563db46507995f7ef6d83b2f68eda65c307c6829eb" }, - { url = "https://mirrors.aliyun.com/pypi/packages/98/c5/c03c7f4125180fc215220c035beac6b9cb684bc7a067c84fc69414d315f5/markupsafe-3.0.3-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:8f71bc33915be5186016f675cd83a1e08523649b0e33efdb898db577ef5bb009" }, - { url = "https://mirrors.aliyun.com/pypi/packages/80/d6/2d1b89f6ca4bff1036499b1e29a1d02d282259f3681540e16563f27ebc23/markupsafe-3.0.3-cp313-cp313t-win32.whl", hash = "sha256:69c0b73548bc525c8cb9a251cddf1931d1db4d2258e9599c28c07ef3580ef354" }, - { url = "https://mirrors.aliyun.com/pypi/packages/2b/98/e48a4bfba0a0ffcf9925fe2d69240bfaa19c6f7507b8cd09c70684a53c1e/markupsafe-3.0.3-cp313-cp313t-win_amd64.whl", hash = "sha256:1b4b79e8ebf6b55351f0d91fe80f893b4743f104bff22e90697db1590e47a218" }, - { url = "https://mirrors.aliyun.com/pypi/packages/0e/72/e3cc540f351f316e9ed0f092757459afbc595824ca724cbc5a5d4263713f/markupsafe-3.0.3-cp313-cp313t-win_arm64.whl", hash = "sha256:ad2cf8aa28b8c020ab2fc8287b0f823d0a7d8630784c31e9ee5edea20f406287" }, - { url = "https://mirrors.aliyun.com/pypi/packages/33/8a/8e42d4838cd89b7dde187011e97fe6c3af66d8c044997d2183fbd6d31352/markupsafe-3.0.3-cp314-cp314-macosx_10_13_x86_64.whl", hash = "sha256:eaa9599de571d72e2daf60164784109f19978b327a3910d3e9de8c97b5b70cfe" }, - { url = "https://mirrors.aliyun.com/pypi/packages/b5/64/7660f8a4a8e53c924d0fa05dc3a55c9cee10bbd82b11c5afb27d44b096ce/markupsafe-3.0.3-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:c47a551199eb8eb2121d4f0f15ae0f923d31350ab9280078d1e5f12b249e0026" }, - { url = "https://mirrors.aliyun.com/pypi/packages/da/ef/e648bfd021127bef5fa12e1720ffed0c6cbb8310c8d9bea7266337ff06de/markupsafe-3.0.3-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:f34c41761022dd093b4b6896d4810782ffbabe30f2d443ff5f083e0cbbb8c737" }, - { url = "https://mirrors.aliyun.com/pypi/packages/41/3c/a36c2450754618e62008bf7435ccb0f88053e07592e6028a34776213d877/markupsafe-3.0.3-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:457a69a9577064c05a97c41f4e65148652db078a3a509039e64d3467b9e7ef97" }, - { url = "https://mirrors.aliyun.com/pypi/packages/bc/20/b7fdf89a8456b099837cd1dc21974632a02a999ec9bf7ca3e490aacd98e7/markupsafe-3.0.3-cp314-cp314-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:e8afc3f2ccfa24215f8cb28dcf43f0113ac3c37c2f0f0806d8c70e4228c5cf4d" }, - { url = "https://mirrors.aliyun.com/pypi/packages/9a/a7/591f592afdc734f47db08a75793a55d7fbcc6902a723ae4cfbab61010cc5/markupsafe-3.0.3-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:ec15a59cf5af7be74194f7ab02d0f59a62bdcf1a537677ce67a2537c9b87fcda" }, - { url = "https://mirrors.aliyun.com/pypi/packages/7d/33/45b24e4f44195b26521bc6f1a82197118f74df348556594bd2262bda1038/markupsafe-3.0.3-cp314-cp314-musllinux_1_2_riscv64.whl", hash = "sha256:0eb9ff8191e8498cca014656ae6b8d61f39da5f95b488805da4bb029cccbfbaf" }, - { url = "https://mirrors.aliyun.com/pypi/packages/ff/0e/53dfaca23a69fbfbbf17a4b64072090e70717344c52eaaaa9c5ddff1e5f0/markupsafe-3.0.3-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:2713baf880df847f2bece4230d4d094280f4e67b1e813eec43b4c0e144a34ffe" }, - { url = "https://mirrors.aliyun.com/pypi/packages/46/11/f333a06fc16236d5238bfe74daccbca41459dcd8d1fa952e8fbd5dccfb70/markupsafe-3.0.3-cp314-cp314-win32.whl", hash = "sha256:729586769a26dbceff69f7a7dbbf59ab6572b99d94576a5592625d5b411576b9" }, - { url = "https://mirrors.aliyun.com/pypi/packages/28/52/182836104b33b444e400b14f797212f720cbc9ed6ba34c800639d154e821/markupsafe-3.0.3-cp314-cp314-win_amd64.whl", hash = "sha256:bdc919ead48f234740ad807933cdf545180bfbe9342c2bb451556db2ed958581" }, - { url = "https://mirrors.aliyun.com/pypi/packages/6f/18/acf23e91bd94fd7b3031558b1f013adfa21a8e407a3fdb32745538730382/markupsafe-3.0.3-cp314-cp314-win_arm64.whl", hash = "sha256:5a7d5dc5140555cf21a6fefbdbf8723f06fcd2f63ef108f2854de715e4422cb4" }, - { url = "https://mirrors.aliyun.com/pypi/packages/3c/f0/57689aa4076e1b43b15fdfa646b04653969d50cf30c32a102762be2485da/markupsafe-3.0.3-cp314-cp314t-macosx_10_13_x86_64.whl", hash = "sha256:1353ef0c1b138e1907ae78e2f6c63ff67501122006b0f9abad68fda5f4ffc6ab" }, - { url = "https://mirrors.aliyun.com/pypi/packages/89/c3/2e67a7ca217c6912985ec766c6393b636fb0c2344443ff9d91404dc4c79f/markupsafe-3.0.3-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:1085e7fbddd3be5f89cc898938f42c0b3c711fdcb37d75221de2666af647c175" }, - { url = "https://mirrors.aliyun.com/pypi/packages/f0/00/be561dce4e6ca66b15276e184ce4b8aec61fe83662cce2f7d72bd3249d28/markupsafe-3.0.3-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:1b52b4fb9df4eb9ae465f8d0c228a00624de2334f216f178a995ccdcf82c4634" }, - { url = "https://mirrors.aliyun.com/pypi/packages/50/09/c419f6f5a92e5fadde27efd190eca90f05e1261b10dbd8cbcb39cd8ea1dc/markupsafe-3.0.3-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:fed51ac40f757d41b7c48425901843666a6677e3e8eb0abcff09e4ba6e664f50" }, - { url = "https://mirrors.aliyun.com/pypi/packages/22/44/a0681611106e0b2921b3033fc19bc53323e0b50bc70cffdd19f7d679bb66/markupsafe-3.0.3-cp314-cp314t-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:f190daf01f13c72eac4efd5c430a8de82489d9cff23c364c3ea822545032993e" }, - { url = "https://mirrors.aliyun.com/pypi/packages/5f/57/1b0b3f100259dc9fffe780cfb60d4be71375510e435efec3d116b6436d43/markupsafe-3.0.3-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:e56b7d45a839a697b5eb268c82a71bd8c7f6c94d6fd50c3d577fa39a9f1409f5" }, - { url = "https://mirrors.aliyun.com/pypi/packages/26/6a/4bf6d0c97c4920f1597cc14dd720705eca0bf7c787aebc6bb4d1bead5388/markupsafe-3.0.3-cp314-cp314t-musllinux_1_2_riscv64.whl", hash = "sha256:f3e98bb3798ead92273dc0e5fd0f31ade220f59a266ffd8a4f6065e0a3ce0523" }, - { url = "https://mirrors.aliyun.com/pypi/packages/14/c7/ca723101509b518797fedc2fdf79ba57f886b4aca8a7d31857ba3ee8281f/markupsafe-3.0.3-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:5678211cb9333a6468fb8d8be0305520aa073f50d17f089b5b4b477ea6e67fdc" }, - { url = "https://mirrors.aliyun.com/pypi/packages/fb/df/5bd7a48c256faecd1d36edc13133e51397e41b73bb77e1a69deab746ebac/markupsafe-3.0.3-cp314-cp314t-win32.whl", hash = "sha256:915c04ba3851909ce68ccc2b8e2cd691618c4dc4c4232fb7982bca3f41fd8c3d" }, - { url = "https://mirrors.aliyun.com/pypi/packages/1a/8a/0402ba61a2f16038b48b39bccca271134be00c5c9f0f623208399333c448/markupsafe-3.0.3-cp314-cp314t-win_amd64.whl", hash = "sha256:4faffd047e07c38848ce017e8725090413cd80cbc23d86e55c587bf979e579c9" }, - { url = "https://mirrors.aliyun.com/pypi/packages/70/bc/6f1c2f612465f5fa89b95bead1f44dcb607670fd42891d8fdcd5d039f4f4/markupsafe-3.0.3-cp314-cp314t-win_arm64.whl", hash = "sha256:32001d6a8fc98c8cb5c947787c5d08b0a50663d139f1305bac5885d98d9b40fa" }, -] - [[package]] name = "math-verify" version = "0.8.0" @@ -4432,7 +4249,8 @@ builder = [ model-service = [ { name = "alibabacloud-cr20181201" }, { name = "fastapi" }, - { name = "litellm" }, + { name = "httpx" }, + { name = "openai" }, { name = "psutil" }, { name = "swebench" }, { name = "uvicorn" }, @@ -4495,11 +4313,12 @@ requires-dist = [ { name = "gem-llm", marker = "extra == 'rocklet'", specifier = ">=0.1.0" }, { name = "gem-llm", marker = "extra == 'sandbox-actor'", specifier = ">=0.1.0" }, { name = "httpx" }, + { name = "httpx", marker = "extra == 'model-service'" }, { name = "kubernetes", marker = "extra == 'admin'", specifier = ">=35.0.0" }, - { name = "litellm", marker = "extra == 'model-service'", specifier = ">=1.50.0" }, { name = "nacos-sdk-python", marker = "extra == 'admin'", specifier = ">=0.1.14" }, { name = "nacos-sdk-python", marker = "extra == 'sandbox-actor'", specifier = ">=0.1.14" }, { name = "numpy", marker = "extra == 'rocklet'", specifier = "<=2.2.6" }, + { name = "openai", marker = "extra == 'model-service'", specifier = ">=1.50.0" }, { name = "opentelemetry-api" }, { name = "opentelemetry-exporter-otlp" }, { name = "opentelemetry-exporter-prometheus" }, @@ -4949,94 +4768,6 @@ wheels = [ { url = "https://mirrors.aliyun.com/pypi/packages/e5/30/643397144bfbfec6f6ef821f36f33e57d35946c44a2352d3c9f0ae847619/tenacity-9.1.2-py3-none-any.whl", hash = "sha256:f77bf36710d8b73a50b2dd155c97b870017ad21afe6ab300326b0371b3b05138" }, ] -[[package]] -name = "tiktoken" -version = "0.12.0" -source = { registry = "https://mirrors.aliyun.com/pypi/simple/" } -dependencies = [ - { name = "regex" }, - { name = "requests" }, -] -sdist = { url = "https://mirrors.aliyun.com/pypi/packages/7d/ab/4d017d0f76ec3171d469d80fc03dfbb4e48a4bcaddaa831b31d526f05edc/tiktoken-0.12.0.tar.gz", hash = "sha256:b18ba7ee2b093863978fcb14f74b3707cdc8d4d4d3836853ce7ec60772139931" } -wheels = [ - { url = "https://mirrors.aliyun.com/pypi/packages/89/b3/2cb7c17b6c4cf8ca983204255d3f1d95eda7213e247e6947a0ee2c747a2c/tiktoken-0.12.0-cp310-cp310-macosx_10_12_x86_64.whl", hash = "sha256:3de02f5a491cfd179aec916eddb70331814bd6bf764075d39e21d5862e533970" }, - { url = "https://mirrors.aliyun.com/pypi/packages/27/0f/df139f1df5f6167194ee5ab24634582ba9a1b62c6b996472b0277ec80f66/tiktoken-0.12.0-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:b6cfb6d9b7b54d20af21a912bfe63a2727d9cfa8fbda642fd8322c70340aad16" }, - { url = "https://mirrors.aliyun.com/pypi/packages/ef/5d/26a691f28ab220d5edc09b9b787399b130f24327ef824de15e5d85ef21aa/tiktoken-0.12.0-cp310-cp310-manylinux_2_28_aarch64.whl", hash = "sha256:cde24cdb1b8a08368f709124f15b36ab5524aac5fa830cc3fdce9c03d4fb8030" }, - { url = "https://mirrors.aliyun.com/pypi/packages/b2/94/443fab3d4e5ebecac895712abd3849b8da93b7b7dec61c7db5c9c7ebe40c/tiktoken-0.12.0-cp310-cp310-manylinux_2_28_x86_64.whl", hash = "sha256:6de0da39f605992649b9cfa6f84071e3f9ef2cec458d08c5feb1b6f0ff62e134" }, - { url = "https://mirrors.aliyun.com/pypi/packages/54/35/388f941251b2521c70dd4c5958e598ea6d2c88e28445d2fb8189eecc1dfc/tiktoken-0.12.0-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:6faa0534e0eefbcafaccb75927a4a380463a2eaa7e26000f0173b920e98b720a" }, - { url = "https://mirrors.aliyun.com/pypi/packages/f8/00/c6681c7f833dd410576183715a530437a9873fa910265817081f65f9105f/tiktoken-0.12.0-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:82991e04fc860afb933efb63957affc7ad54f83e2216fe7d319007dab1ba5892" }, - { url = "https://mirrors.aliyun.com/pypi/packages/5f/d2/82e795a6a9bafa034bf26a58e68fe9a89eeaaa610d51dbeb22106ba04f0a/tiktoken-0.12.0-cp310-cp310-win_amd64.whl", hash = "sha256:6fb2995b487c2e31acf0a9e17647e3b242235a20832642bb7a9d1a181c0c1bb1" }, - { url = "https://mirrors.aliyun.com/pypi/packages/de/46/21ea696b21f1d6d1efec8639c204bdf20fde8bafb351e1355c72c5d7de52/tiktoken-0.12.0-cp311-cp311-macosx_10_12_x86_64.whl", hash = "sha256:6e227c7f96925003487c33b1b32265fad2fbcec2b7cf4817afb76d416f40f6bb" }, - { url = "https://mirrors.aliyun.com/pypi/packages/c9/d9/35c5d2d9e22bb2a5f74ba48266fb56c63d76ae6f66e02feb628671c0283e/tiktoken-0.12.0-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:c06cf0fcc24c2cb2adb5e185c7082a82cba29c17575e828518c2f11a01f445aa" }, - { url = "https://mirrors.aliyun.com/pypi/packages/01/84/961106c37b8e49b9fdcf33fe007bb3a8fdcc380c528b20cc7fbba80578b8/tiktoken-0.12.0-cp311-cp311-manylinux_2_28_aarch64.whl", hash = "sha256:f18f249b041851954217e9fd8e5c00b024ab2315ffda5ed77665a05fa91f42dc" }, - { url = "https://mirrors.aliyun.com/pypi/packages/6a/d0/3d9275198e067f8b65076a68894bb52fd253875f3644f0a321a720277b8a/tiktoken-0.12.0-cp311-cp311-manylinux_2_28_x86_64.whl", hash = "sha256:47a5bc270b8c3db00bb46ece01ef34ad050e364b51d406b6f9730b64ac28eded" }, - { url = "https://mirrors.aliyun.com/pypi/packages/78/db/a58e09687c1698a7c592e1038e01c206569b86a0377828d51635561f8ebf/tiktoken-0.12.0-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:508fa71810c0efdcd1b898fda574889ee62852989f7c1667414736bcb2b9a4bd" }, - { url = "https://mirrors.aliyun.com/pypi/packages/9e/1b/a9e4d2bf91d515c0f74afc526fd773a812232dd6cda33ebea7f531202325/tiktoken-0.12.0-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:a1af81a6c44f008cba48494089dd98cccb8b313f55e961a52f5b222d1e507967" }, - { url = "https://mirrors.aliyun.com/pypi/packages/9d/15/963819345f1b1fb0809070a79e9dd96938d4ca41297367d471733e79c76c/tiktoken-0.12.0-cp311-cp311-win_amd64.whl", hash = "sha256:3e68e3e593637b53e56f7237be560f7a394451cb8c11079755e80ae64b9e6def" }, - { url = "https://mirrors.aliyun.com/pypi/packages/a4/85/be65d39d6b647c79800fd9d29241d081d4eeb06271f383bb87200d74cf76/tiktoken-0.12.0-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:b97f74aca0d78a1ff21b8cd9e9925714c15a9236d6ceacf5c7327c117e6e21e8" }, - { url = "https://mirrors.aliyun.com/pypi/packages/4a/42/6573e9129bc55c9bf7300b3a35bef2c6b9117018acca0dc760ac2d93dffe/tiktoken-0.12.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:2b90f5ad190a4bb7c3eb30c5fa32e1e182ca1ca79f05e49b448438c3e225a49b" }, - { url = "https://mirrors.aliyun.com/pypi/packages/66/c5/ed88504d2f4a5fd6856990b230b56d85a777feab84e6129af0822f5d0f70/tiktoken-0.12.0-cp312-cp312-manylinux_2_28_aarch64.whl", hash = "sha256:65b26c7a780e2139e73acc193e5c63ac754021f160df919add909c1492c0fb37" }, - { url = "https://mirrors.aliyun.com/pypi/packages/f4/90/3dae6cc5436137ebd38944d396b5849e167896fc2073da643a49f372dc4f/tiktoken-0.12.0-cp312-cp312-manylinux_2_28_x86_64.whl", hash = "sha256:edde1ec917dfd21c1f2f8046b86348b0f54a2c0547f68149d8600859598769ad" }, - { url = "https://mirrors.aliyun.com/pypi/packages/a3/fe/26df24ce53ffde419a42f5f53d755b995c9318908288c17ec3f3448313a3/tiktoken-0.12.0-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:35a2f8ddd3824608b3d650a000c1ef71f730d0c56486845705a8248da00f9fe5" }, - { url = "https://mirrors.aliyun.com/pypi/packages/20/cc/b064cae1a0e9fac84b0d2c46b89f4e57051a5f41324e385d10225a984c24/tiktoken-0.12.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:83d16643edb7fa2c99eff2ab7733508aae1eebb03d5dfc46f5565862810f24e3" }, - { url = "https://mirrors.aliyun.com/pypi/packages/81/10/b8523105c590c5b8349f2587e2fdfe51a69544bd5a76295fc20f2374f470/tiktoken-0.12.0-cp312-cp312-win_amd64.whl", hash = "sha256:ffc5288f34a8bc02e1ea7047b8d041104791d2ddbf42d1e5fa07822cbffe16bd" }, - { url = "https://mirrors.aliyun.com/pypi/packages/00/61/441588ee21e6b5cdf59d6870f86beb9789e532ee9718c251b391b70c68d6/tiktoken-0.12.0-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:775c2c55de2310cc1bc9a3ad8826761cbdc87770e586fd7b6da7d4589e13dab3" }, - { url = "https://mirrors.aliyun.com/pypi/packages/1f/05/dcf94486d5c5c8d34496abe271ac76c5b785507c8eae71b3708f1ad9b45a/tiktoken-0.12.0-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:a01b12f69052fbe4b080a2cfb867c4de12c704b56178edf1d1d7b273561db160" }, - { url = "https://mirrors.aliyun.com/pypi/packages/a0/70/5163fe5359b943f8db9946b62f19be2305de8c3d78a16f629d4165e2f40e/tiktoken-0.12.0-cp313-cp313-manylinux_2_28_aarch64.whl", hash = "sha256:01d99484dc93b129cd0964f9d34eee953f2737301f18b3c7257bf368d7615baa" }, - { url = "https://mirrors.aliyun.com/pypi/packages/0c/da/c028aa0babf77315e1cef357d4d768800c5f8a6de04d0eac0f377cb619fa/tiktoken-0.12.0-cp313-cp313-manylinux_2_28_x86_64.whl", hash = "sha256:4a1a4fcd021f022bfc81904a911d3df0f6543b9e7627b51411da75ff2fe7a1be" }, - { url = "https://mirrors.aliyun.com/pypi/packages/a0/5a/886b108b766aa53e295f7216b509be95eb7d60b166049ce2c58416b25f2a/tiktoken-0.12.0-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:981a81e39812d57031efdc9ec59fa32b2a5a5524d20d4776574c4b4bd2e9014a" }, - { url = "https://mirrors.aliyun.com/pypi/packages/f4/f8/4db272048397636ac7a078d22773dd2795b1becee7bc4922fe6207288d57/tiktoken-0.12.0-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:9baf52f84a3f42eef3ff4e754a0db79a13a27921b457ca9832cf944c6be4f8f3" }, - { url = "https://mirrors.aliyun.com/pypi/packages/8e/32/45d02e2e0ea2be3a9ed22afc47d93741247e75018aac967b713b2941f8ea/tiktoken-0.12.0-cp313-cp313-win_amd64.whl", hash = "sha256:b8a0cd0c789a61f31bf44851defbd609e8dd1e2c8589c614cc1060940ef1f697" }, - { url = "https://mirrors.aliyun.com/pypi/packages/ce/76/994fc868f88e016e6d05b0da5ac24582a14c47893f4474c3e9744283f1d5/tiktoken-0.12.0-cp313-cp313t-macosx_10_13_x86_64.whl", hash = "sha256:d5f89ea5680066b68bcb797ae85219c72916c922ef0fcdd3480c7d2315ffff16" }, - { url = "https://mirrors.aliyun.com/pypi/packages/f6/b8/57ef1456504c43a849821920d582a738a461b76a047f352f18c0b26c6516/tiktoken-0.12.0-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:b4e7ed1c6a7a8a60a3230965bdedba8cc58f68926b835e519341413370e0399a" }, - { url = "https://mirrors.aliyun.com/pypi/packages/72/90/13da56f664286ffbae9dbcfadcc625439142675845baa62715e49b87b68b/tiktoken-0.12.0-cp313-cp313t-manylinux_2_28_aarch64.whl", hash = "sha256:fc530a28591a2d74bce821d10b418b26a094bf33839e69042a6e86ddb7a7fb27" }, - { url = "https://mirrors.aliyun.com/pypi/packages/05/df/4f80030d44682235bdaecd7346c90f67ae87ec8f3df4a3442cb53834f7e4/tiktoken-0.12.0-cp313-cp313t-manylinux_2_28_x86_64.whl", hash = "sha256:06a9f4f49884139013b138920a4c393aa6556b2f8f536345f11819389c703ebb" }, - { url = "https://mirrors.aliyun.com/pypi/packages/22/1f/ae535223a8c4ef4c0c1192e3f9b82da660be9eb66b9279e95c99288e9dab/tiktoken-0.12.0-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:04f0e6a985d95913cabc96a741c5ffec525a2c72e9df086ff17ebe35985c800e" }, - { url = "https://mirrors.aliyun.com/pypi/packages/78/a7/f8ead382fce0243cb625c4f266e66c27f65ae65ee9e77f59ea1653b6d730/tiktoken-0.12.0-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:0ee8f9ae00c41770b5f9b0bb1235474768884ae157de3beb5439ca0fd70f3e25" }, - { url = "https://mirrors.aliyun.com/pypi/packages/93/e0/6cc82a562bc6365785a3ff0af27a2a092d57c47d7a81d9e2295d8c36f011/tiktoken-0.12.0-cp313-cp313t-win_amd64.whl", hash = "sha256:dc2dd125a62cb2b3d858484d6c614d136b5b848976794edfb63688d539b8b93f" }, - { url = "https://mirrors.aliyun.com/pypi/packages/72/05/3abc1db5d2c9aadc4d2c76fa5640134e475e58d9fbb82b5c535dc0de9b01/tiktoken-0.12.0-cp314-cp314-macosx_10_13_x86_64.whl", hash = "sha256:a90388128df3b3abeb2bfd1895b0681412a8d7dc644142519e6f0a97c2111646" }, - { url = "https://mirrors.aliyun.com/pypi/packages/e3/7b/50c2f060412202d6c95f32b20755c7a6273543b125c0985d6fa9465105af/tiktoken-0.12.0-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:da900aa0ad52247d8794e307d6446bd3cdea8e192769b56276695d34d2c9aa88" }, - { url = "https://mirrors.aliyun.com/pypi/packages/14/27/bf795595a2b897e271771cd31cb847d479073497344c637966bdf2853da1/tiktoken-0.12.0-cp314-cp314-manylinux_2_28_aarch64.whl", hash = "sha256:285ba9d73ea0d6171e7f9407039a290ca77efcdb026be7769dccc01d2c8d7fff" }, - { url = "https://mirrors.aliyun.com/pypi/packages/f5/de/9341a6d7a8f1b448573bbf3425fa57669ac58258a667eb48a25dfe916d70/tiktoken-0.12.0-cp314-cp314-manylinux_2_28_x86_64.whl", hash = "sha256:d186a5c60c6a0213f04a7a802264083dea1bbde92a2d4c7069e1a56630aef830" }, - { url = "https://mirrors.aliyun.com/pypi/packages/75/0d/881866647b8d1be4d67cb24e50d0c26f9f807f994aa1510cb9ba2fe5f612/tiktoken-0.12.0-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:604831189bd05480f2b885ecd2d1986dc7686f609de48208ebbbddeea071fc0b" }, - { url = "https://mirrors.aliyun.com/pypi/packages/b3/1e/b651ec3059474dab649b8d5b69f5c65cd8fcd8918568c1935bd4136c9392/tiktoken-0.12.0-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:8f317e8530bb3a222547b85a58583238c8f74fd7a7408305f9f63246d1a0958b" }, - { url = "https://mirrors.aliyun.com/pypi/packages/80/57/ce64fd16ac390fafde001268c364d559447ba09b509181b2808622420eec/tiktoken-0.12.0-cp314-cp314-win_amd64.whl", hash = "sha256:399c3dd672a6406719d84442299a490420b458c44d3ae65516302a99675888f3" }, - { url = "https://mirrors.aliyun.com/pypi/packages/ac/a4/72eed53e8976a099539cdd5eb36f241987212c29629d0a52c305173e0a68/tiktoken-0.12.0-cp314-cp314t-macosx_10_13_x86_64.whl", hash = "sha256:c2c714c72bc00a38ca969dae79e8266ddec999c7ceccd603cc4f0d04ccd76365" }, - { url = "https://mirrors.aliyun.com/pypi/packages/e6/d7/0110b8f54c008466b19672c615f2168896b83706a6611ba6e47313dbc6e9/tiktoken-0.12.0-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:cbb9a3ba275165a2cb0f9a83f5d7025afe6b9d0ab01a22b50f0e74fee2ad253e" }, - { url = "https://mirrors.aliyun.com/pypi/packages/5f/77/4f268c41a3957c418b084dd576ea2fad2e95da0d8e1ab705372892c2ca22/tiktoken-0.12.0-cp314-cp314t-manylinux_2_28_aarch64.whl", hash = "sha256:dfdfaa5ffff8993a3af94d1125870b1d27aed7cb97aa7eb8c1cefdbc87dbee63" }, - { url = "https://mirrors.aliyun.com/pypi/packages/4e/2b/fc46c90fe5028bd094cd6ee25a7db321cb91d45dc87531e2bdbb26b4867a/tiktoken-0.12.0-cp314-cp314t-manylinux_2_28_x86_64.whl", hash = "sha256:584c3ad3d0c74f5269906eb8a659c8bfc6144a52895d9261cdaf90a0ae5f4de0" }, - { url = "https://mirrors.aliyun.com/pypi/packages/28/c0/3c7a39ff68022ddfd7d93f3337ad90389a342f761c4d71de99a3ccc57857/tiktoken-0.12.0-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:54c891b416a0e36b8e2045b12b33dd66fb34a4fe7965565f1b482da50da3e86a" }, - { url = "https://mirrors.aliyun.com/pypi/packages/ab/0d/c1ad6f4016a3968c048545f5d9b8ffebf577774b2ede3e2e352553b685fe/tiktoken-0.12.0-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:5edb8743b88d5be814b1a8a8854494719080c28faaa1ccbef02e87354fe71ef0" }, - { url = "https://mirrors.aliyun.com/pypi/packages/af/df/c7891ef9d2712ad774777271d39fdef63941ffba0a9d59b7ad1fd2765e57/tiktoken-0.12.0-cp314-cp314t-win_amd64.whl", hash = "sha256:f61c0aea5565ac82e2ec50a05e02a6c44734e91b51c10510b084ea1b8e633a71" }, -] - -[[package]] -name = "tokenizers" -version = "0.23.1" -source = { registry = "https://mirrors.aliyun.com/pypi/simple/" } -dependencies = [ - { name = "huggingface-hub" }, -] -sdist = { url = "https://mirrors.aliyun.com/pypi/packages/c1/60/21f715d9faba5f5407ff759472ade058ec4a507ad62bcea47cb847239a73/tokenizers-0.23.1.tar.gz", hash = "sha256:1feeeadf865a7915adc25445dea30e9933e593c31bb96c277cee36de227c8bfa" } -wheels = [ - { url = "https://mirrors.aliyun.com/pypi/packages/87/39/b87a87d5bb9470610b80a2d31df42fcffeaf35118b8b97952b2aff598cc7/tokenizers-0.23.1-cp310-abi3-macosx_10_12_x86_64.whl", hash = "sha256:e03d6ffcbe0d56ee9c1ccd070e70a13fa750727c0277e138152acbc0252c2224" }, - { url = "https://mirrors.aliyun.com/pypi/packages/e2/6a/068ed9f6e444c9d7e9d55ce134181325700f3d7f30410721bdc8f848d727/tokenizers-0.23.1-cp310-abi3-macosx_11_0_arm64.whl", hash = "sha256:e0948bbb1ac1d7cdfc9fb6d62c596e3b7550036ad60ecd654a66ad273326324e" }, - { url = "https://mirrors.aliyun.com/pypi/packages/6c/36/e006edf031154cba92b8416057d92c3abe3635e4c4b0aa0b5b9bb39dde70/tokenizers-0.23.1-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:1bf13402aff9bc533c89cb849ec3b412dc3fbeacc9744840e423d7bf3f7dc0e3" }, - { url = "https://mirrors.aliyun.com/pypi/packages/a2/ef/7735d226f9c7f874a6bee5e3f27fb25ecabdf207d37b8cf45286d0795893/tokenizers-0.23.1-cp310-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:f836ca703b89ae07919a309f9651f7a88fd5a33d5f718ba5ad0870ec0256bad6" }, - { url = "https://mirrors.aliyun.com/pypi/packages/b9/d9/24827036f6e21297bfffda0768e58eb6096a4f411e932964a01707857931/tokenizers-0.23.1-cp310-abi3-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:ae848657742035523fdf261773630cb819a26995fcd3d9ecae0c1daf6e5a4959" }, - { url = "https://mirrors.aliyun.com/pypi/packages/0c/9a/22f3582b3a4f49358293a5206e25317621ee4526bfe9cdaa0f07a12e770e/tokenizers-0.23.1-cp310-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:53b09e85775d5187941e7bab30e941b4134ab4a7dd8c68e783d231fb7ca27c51" }, - { url = "https://mirrors.aliyun.com/pypi/packages/7e/65/b8f8814eef95800f20721384136d9a1d22241d50b2874357cb70542c392f/tokenizers-0.23.1-cp310-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:ea5a0ce170074329faaa8ea3f6400ecde604b6678192688533af80980daae71a" }, - { url = "https://mirrors.aliyun.com/pypi/packages/0d/d5/1353e5f677ec27c2494fb6a6725e82d56c985f53e90ec511369e7e4f02c6/tokenizers-0.23.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:5075b405006415ea148a992d093699c66eb01952bf59f4d5727089a98bda45a4" }, - { url = "https://mirrors.aliyun.com/pypi/packages/71/89/39b6b8fc073fb6d413d0147aa333dc7eff7be65639ac9d19930a0b21bf33/tokenizers-0.23.1-cp310-abi3-manylinux_2_31_riscv64.whl", hash = "sha256:56f3a77de629917652f876294dc9fe6bad4a0c43bc229dc72e59bb23a0f4729a" }, - { url = "https://mirrors.aliyun.com/pypi/packages/0f/80/127c854da64827e5b79264ce524993a90dddcb320e5cd42412c5c02f9e8a/tokenizers-0.23.1-cp310-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:9d10a6d957ef01896dc274e890eee27d41bd0e74ef31e60616f0fc311345184e" }, - { url = "https://mirrors.aliyun.com/pypi/packages/fe/ba/44c2502feb1a058f096ddfb4e0996ef3225a01a388e1a9b094e91689fe93/tokenizers-0.23.1-cp310-abi3-musllinux_1_2_armv7l.whl", hash = "sha256:1974288a609c343774f1b897c8b482c791ab17b75ab5c8c2b1737565c1d82288" }, - { url = "https://mirrors.aliyun.com/pypi/packages/9e/c1/464019a9fb059870bfe4eebb4ba12208f3042035e258bf5e782906bd3847/tokenizers-0.23.1-cp310-abi3-musllinux_1_2_i686.whl", hash = "sha256:120468fb4c24faf0543c835a4fabafa4deb3f20a035c9b6e83d0b553a97615d4" }, - { url = "https://mirrors.aliyun.com/pypi/packages/79/94/3ac1432bda31626071e9b6a12709b97ae05131c804b94c8f3ac622c5da32/tokenizers-0.23.1-cp310-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:e3d8f40ea6268047de7046906326abed5134f27d4e8447b23763afe5808c8a96" }, - { url = "https://mirrors.aliyun.com/pypi/packages/6a/dd/631b21433c771b1382535326f0eca80b9c9cee2e64961dd993bc9ac4669e/tokenizers-0.23.1-cp310-abi3-win32.whl", hash = "sha256:93120a930b919416da7cd10a2f606ac9919cc69cacae7980fa2140e277660948" }, - { url = "https://mirrors.aliyun.com/pypi/packages/97/c9/2553f72aaf65a2797d4229e37fa7fbe38ffbf3e32912d31bdd78b3323e59/tokenizers-0.23.1-cp310-abi3-win_amd64.whl", hash = "sha256:e7bfaf995c1bdbbd21d13539decb6650967013759318627d85daeb7881af16b7" }, - { url = "https://mirrors.aliyun.com/pypi/packages/cd/2b/2be299bab55fc595e3d38567edb1a87f86e594842968fa9515a07bdcf422/tokenizers-0.23.1-cp310-abi3-win_arm64.whl", hash = "sha256:a26197957d8e4425dfba746315f3c425ea00cfa8367c5fbc4ec73447893dcea9" }, -] - [[package]] name = "toml" version = "0.10.2" From 164e4272870735fb2028325d67c901d037a9f1aa Mon Sep 17 00:00:00 2001 From: "pengshixin.psx" Date: Tue, 12 May 2026 06:08:27 +0000 Subject: [PATCH 05/25] refactor: use openai sdk --- rock/sdk/model/server/api/proxy.py | 64 ++------- rock/sdk/model/server/config.py | 3 - rock/sdk/model/server/main.py | 23 ++-- rock/sdk/model/server/sse_utils.py | 92 +++++++++++++ tests/unit/sdk/model/test_proxy.py | 4 - tests/unit/sdk/model/test_sse_utils.py | 172 +++++++++++++++++++++++++ 6 files changed, 287 insertions(+), 71 deletions(-) create mode 100644 rock/sdk/model/server/sse_utils.py create mode 100644 tests/unit/sdk/model/test_sse_utils.py diff --git a/rock/sdk/model/server/api/proxy.py b/rock/sdk/model/server/api/proxy.py index 8161c4adc0..bd000cd80d 100644 --- a/rock/sdk/model/server/api/proxy.py +++ b/rock/sdk/model/server/api/proxy.py @@ -21,7 +21,6 @@ import json import time -import uuid from collections.abc import AsyncIterator from typing import Any @@ -35,6 +34,12 @@ from rock.sdk.model.server.config import ModelServiceConfig from rock.sdk.model.server.integrations.traj_recorder import TrajectoryRecorder from rock.sdk.model.server.integrations.traj_replayer import SequentialCursor, TrajectoryExhausted +from rock.sdk.model.server.sse_utils import ( + SSE_DONE, + completion_to_chunk_dict, + encode_sse_event, + parse_sse_data_chunks, +) logger = init_logger(__name__) @@ -84,56 +89,10 @@ def _filter_headers(headers) -> dict[str, str]: return out -def _completion_to_chunk(response: dict, *, model: str) -> dict: - """Convert a recorded ``chat.completion`` response into a single - ``chat.completion.chunk`` shape (move ``message`` → ``delta``). Used only by - the replay streaming path.""" - choices_in = response.get("choices") or [] - choices_out = [] - for choice in choices_in: - delta = dict(choice.get("message") or {}) - choices_out.append( - { - "index": choice.get("index", 0), - "delta": delta, - "finish_reason": choice.get("finish_reason"), - "logprobs": choice.get("logprobs"), - } - ) - return { - "id": response.get("id") or f"chatcmpl-{uuid.uuid4()}", - "object": "chat.completion.chunk", - "created": response.get("created") or int(time.time()), - "model": response.get("model") or model, - "choices": choices_out, - } - - async def _replay_sse_iter(response: dict, *, model: str) -> AsyncIterator[bytes]: """Emit a recorded response as one SSE chunk + ``[DONE]``.""" - chunk = _completion_to_chunk(response, model=model) - yield f"data: {json.dumps(chunk, ensure_ascii=False)}\n\n".encode() - yield b"data: [DONE]\n\n" - - -def _parse_sse_chunks_into_state(buffer: bytes, state: ChatCompletionStreamState) -> bytes: - """Pull complete SSE events out of ``buffer`` and feed each ``data:`` line - (other than ``[DONE]``) to the openai stream-state aggregator. Returns the - leftover bytes that did not yet form a complete event.""" - while b"\n\n" in buffer: - event, buffer = buffer.split(b"\n\n", 1) - for raw_line in event.split(b"\n"): - line = raw_line.decode("utf-8", errors="replace").strip() - if not line.startswith("data:"): - continue - payload = line[len("data:") :].strip() - if not payload or payload == "[DONE]": - continue - try: - state.handle_chunk(ChatCompletionChunk.model_validate(json.loads(payload))) - except Exception as exc: # parser error: forward continues, traj will be partial - logger.debug(f"[record] chunk parse failed (forward continues): {exc}") - return buffer + yield encode_sse_event(completion_to_chunk_dict(response, model=model)) + yield SSE_DONE async def _forward_stream_and_record( @@ -158,7 +117,12 @@ async def _forward_stream_and_record( upstream_status = r.status_code async for chunk in r.aiter_bytes(): yield chunk - parse_buffer = _parse_sse_chunks_into_state(parse_buffer + chunk, state) + chunk_dicts, parse_buffer = parse_sse_data_chunks(parse_buffer + chunk) + for chunk_dict in chunk_dicts: + try: + state.handle_chunk(ChatCompletionChunk.model_validate(chunk_dict)) + except Exception as exc: # parser error: forward continues, traj will be partial + logger.debug(f"[record] chunk parse failed (forward continues): {exc}") except httpx.RequestError as exc: # Connection died mid-stream. The bytes already sent reach the client; # we still try to record what we got. diff --git a/rock/sdk/model/server/config.py b/rock/sdk/model/server/config.py index 8c992fb4b3..3923d43c6a 100644 --- a/rock/sdk/model/server/config.py +++ b/rock/sdk/model/server/config.py @@ -54,9 +54,6 @@ class ModelServiceConfig(BaseModel): num_retries: int = Field(default=6) """Number of retries for retryable failures (passed through to litellm).""" - traj_enabled: bool = Field(default=True) - """When True, write each chat/completions call as a JSONL trajectory line.""" - traj_file: str | None = Field(default=None) """Override default trajectory file path. None → uses TRAJ_FILE (LOG_DIR/LLMTraj.jsonl).""" diff --git a/rock/sdk/model/server/main.py b/rock/sdk/model/server/main.py index 31b4918e55..1ba0b54a7b 100644 --- a/rock/sdk/model/server/main.py +++ b/rock/sdk/model/server/main.py @@ -56,14 +56,11 @@ def _configure_proxy_integrations(app: FastAPI, config: ModelServiceConfig) -> N """Wire up record/replay integrations and attach them to ``app.state``. - Replay mode (``replay_traj_path`` set): load the trajectory into a - ``SequentialCursor`` and stash it as ``app.state.replay_cursor``. - - Forward/record mode (default): if ``traj_enabled`` is True, attach a - ``TrajectoryRecorder`` instance as ``app.state.recorder``. The proxy - handler invokes it explicitly after each forwarded call. - - Replay and record are mutually exclusive — in replay mode we don't record, - since replayed responses round-tripping back into the source file would - inflate metrics and corrupt the trajectory. + ``SequentialCursor`` and stash it as ``app.state.replay_cursor``. No + recorder is attached — replaying back into the source file would corrupt it. + - Forward mode (default): attach a ``TrajectoryRecorder`` instance as + ``app.state.recorder``. The proxy handler invokes it explicitly after + each forwarded call. """ if config.replay_traj_path: from rock.sdk.model.server.integrations.traj_replayer import SequentialCursor @@ -72,12 +69,11 @@ def _configure_proxy_integrations(app: FastAPI, config: ModelServiceConfig) -> N logger.info(f"replay cursor loaded, traj_path={config.replay_traj_path}") return - if config.traj_enabled: - from rock.sdk.model.server.integrations.traj_recorder import TrajectoryRecorder + from rock.sdk.model.server.integrations.traj_recorder import TrajectoryRecorder - traj_path = config.traj_file or TRAJ_FILE - app.state.recorder = TrajectoryRecorder(traj_file=traj_path) - logger.info(f"trajectory recorder attached, traj_file={traj_path}") + traj_path = config.traj_file or TRAJ_FILE + app.state.recorder = TrajectoryRecorder(traj_file=traj_path) + logger.info(f"trajectory recorder attached, traj_file={traj_path}") def main( @@ -134,7 +130,6 @@ def create_config_from_args(args) -> ModelServiceConfig: logger.info(f"num_retries set from command line: {args.num_retries}") if getattr(args, "traj_file", None): config.replay_traj_path = args.traj_file - config.traj_enabled = False logger.info(f"replay mode enabled via --traj-file: {args.traj_file}") return config diff --git a/rock/sdk/model/server/sse_utils.py b/rock/sdk/model/server/sse_utils.py new file mode 100644 index 0000000000..1cbe27298f --- /dev/null +++ b/rock/sdk/model/server/sse_utils.py @@ -0,0 +1,92 @@ +"""SSE codec utilities for the chat/completions proxy. + +Three pure helpers, no openai/litellm dependencies: + +- :func:`parse_sse_data_chunks` — incremental SSE byte stream → list of decoded + ``data:`` payload dicts (used by the forward path to feed chunks into the + stream-state aggregator while bytes pass through verbatim to the client). +- :func:`completion_to_chunk_dict` — convert a non-streaming ``chat.completion`` + response into a single ``chat.completion.chunk`` dict, by renaming + ``message`` → ``delta``. Used by the replay path's streaming output. +- :func:`encode_sse_event` — encode a payload dict as ``data: \\n\\n`` + bytes (one SSE event). +""" + +from __future__ import annotations + +import json +import time +import uuid +from typing import Final + +# Terminal SSE event sent at the end of a chat/completions stream. +SSE_DONE: Final[bytes] = b"data: [DONE]\n\n" + + +def parse_sse_data_chunks(buffer: bytes) -> tuple[list[dict], bytes]: + """Extract complete SSE events from a (possibly partial) byte buffer. + + Returns ``(chunks, leftover)``: the parsed ``data:`` JSON payload dicts and + the bytes that did not yet form a complete event (``\\n\\n``-terminated). + + - ``data: [DONE]`` is skipped (terminal marker, has no JSON payload). + - Lines that don't start with ``data:`` (``event:`` / ``id:`` / blank) + are ignored. + - Malformed JSON in a ``data:`` line is silently skipped — caller logs at + its own discretion (typically ``debug``). + + Caller pattern:: + + chunks, buffer = parse_sse_data_chunks(buffer + new_bytes) + for chunk_dict in chunks: + ... feed to aggregator, etc ... + """ + chunks: list[dict] = [] + while b"\n\n" in buffer: + event, buffer = buffer.split(b"\n\n", 1) + for raw_line in event.split(b"\n"): + line = raw_line.decode("utf-8", errors="replace").strip() + if not line.startswith("data:"): + continue + payload = line[len("data:") :].strip() + if not payload or payload == "[DONE]": + continue + try: + chunks.append(json.loads(payload)) + except json.JSONDecodeError: + continue + return chunks, buffer + + +def completion_to_chunk_dict(response: dict, *, model: str) -> dict: + """Convert a recorded ``chat.completion`` dict into a single + ``chat.completion.chunk`` dict, suitable for re-streaming. + + Only ``message`` → ``delta`` is renamed; every other field (including + provider-specific extras like ``reasoning_content`` inside the message) + flows through unchanged. ``id`` / ``created`` are synthesized when missing. + """ + choices_in = response.get("choices") or [] + choices_out = [] + for choice in choices_in: + delta = dict(choice.get("message") or {}) + choices_out.append( + { + "index": choice.get("index", 0), + "delta": delta, + "finish_reason": choice.get("finish_reason"), + "logprobs": choice.get("logprobs"), + } + ) + return { + "id": response.get("id") or f"chatcmpl-{uuid.uuid4()}", + "object": "chat.completion.chunk", + "created": response.get("created") or int(time.time()), + "model": response.get("model") or model, + "choices": choices_out, + } + + +def encode_sse_event(data: dict) -> bytes: + """Encode a JSON payload as one SSE ``data:`` event (terminated by ``\\n\\n``).""" + return f"data: {json.dumps(data, ensure_ascii=False)}\n\n".encode() diff --git a/tests/unit/sdk/model/test_proxy.py b/tests/unit/sdk/model/test_proxy.py index 88c61edcb3..fe5634fab1 100644 --- a/tests/unit/sdk/model/test_proxy.py +++ b/tests/unit/sdk/model/test_proxy.py @@ -439,7 +439,6 @@ def test_config_default_host_and_port(): def test_config_default_traj_and_replay(): config = ModelServiceConfig() - assert config.traj_enabled is True assert config.traj_file is None assert config.replay_traj_path is None @@ -450,14 +449,12 @@ async def test_config_loads_traj_and_replay_from_file(tmp_path): conf_file.write_text( yaml.dump( { - "traj_enabled": False, "traj_file": "/tmp/my-traj.jsonl", "replay_traj_path": "/tmp/in.jsonl", } ) ) config = ModelServiceConfig.from_file(str(conf_file)) - assert config.traj_enabled is False assert config.traj_file == "/tmp/my-traj.jsonl" assert config.replay_traj_path == "/tmp/in.jsonl" @@ -504,7 +501,6 @@ def test_cli_traj_file_enables_replay(): ) config = create_config_from_args(args) assert config.replay_traj_path == "/tmp/in.jsonl" - assert config.traj_enabled is False # ---------- Metrics singleton + legacy record_traj (still used by local mode) ---------- diff --git a/tests/unit/sdk/model/test_sse_utils.py b/tests/unit/sdk/model/test_sse_utils.py new file mode 100644 index 0000000000..6c9318a510 --- /dev/null +++ b/tests/unit/sdk/model/test_sse_utils.py @@ -0,0 +1,172 @@ +"""Tests for the pure SSE codec utilities (no openai/litellm dependencies).""" + +import json + +from rock.sdk.model.server.sse_utils import ( + SSE_DONE, + completion_to_chunk_dict, + encode_sse_event, + parse_sse_data_chunks, +) + +# ---------- parse_sse_data_chunks ---------- + + +def test_parse_returns_complete_events_and_leftover_buffer(): + raw = b'data: {"a": 1}\n\ndata: {"a": 2}\n\ndata: {"a": 3}' # 3rd event is incomplete + chunks, leftover = parse_sse_data_chunks(raw) + + assert chunks == [{"a": 1}, {"a": 2}] + assert leftover == b'data: {"a": 3}' + + +def test_parse_skips_done_marker(): + raw = b'data: {"x": 1}\n\ndata: [DONE]\n\n' + chunks, leftover = parse_sse_data_chunks(raw) + + assert chunks == [{"x": 1}] + assert leftover == b"" + + +def test_parse_skips_non_data_lines(): + raw = b'event: progress\ndata: {"y": 2}\nid: abc\n\n' + chunks, leftover = parse_sse_data_chunks(raw) + + assert chunks == [{"y": 2}] + assert leftover == b"" + + +def test_parse_silently_skips_malformed_json(): + raw = b'data: not-json-at-all\n\ndata: {"ok": true}\n\n' + chunks, leftover = parse_sse_data_chunks(raw) + + assert chunks == [{"ok": True}] + assert leftover == b"" + + +def test_parse_handles_empty_buffer(): + chunks, leftover = parse_sse_data_chunks(b"") + assert chunks == [] + assert leftover == b"" + + +def test_parse_incremental_streaming_pattern(): + """Simulates feeding bytes in arbitrary chunks; final concatenation == all events.""" + full_stream = b'data: {"i": 0}\n\ndata: {"i": 1}\n\ndata: {"i": 2}\n\ndata: [DONE]\n\n' + fragments = [full_stream[i : i + 5] for i in range(0, len(full_stream), 5)] + + buffer = b"" + collected: list[dict] = [] + for frag in fragments: + new_chunks, buffer = parse_sse_data_chunks(buffer + frag) + collected.extend(new_chunks) + + assert collected == [{"i": 0}, {"i": 1}, {"i": 2}] + assert buffer == b"" + + +def test_parse_handles_unicode_payload(): + raw = b'data: {"content": "\xe4\xbd\xa0\xe5\xa5\xbd"}\n\n' # "你好" UTF-8 + chunks, _ = parse_sse_data_chunks(raw) + assert chunks == [{"content": "你好"}] + + +# ---------- completion_to_chunk_dict ---------- + + +def test_completion_to_chunk_renames_message_to_delta(): + response = { + "id": "rec-1", + "object": "chat.completion", + "created": 100, + "model": "gpt-4", + "choices": [ + { + "index": 0, + "message": {"role": "assistant", "content": "hi"}, + "finish_reason": "stop", + } + ], + } + chunk = completion_to_chunk_dict(response, model="gpt-4") + + assert chunk["object"] == "chat.completion.chunk" + assert chunk["id"] == "rec-1" + assert chunk["created"] == 100 + assert chunk["model"] == "gpt-4" + assert chunk["choices"][0]["delta"] == {"role": "assistant", "content": "hi"} + assert chunk["choices"][0]["finish_reason"] == "stop" + assert chunk["choices"][0]["index"] == 0 + assert "message" not in chunk["choices"][0] + + +def test_completion_to_chunk_preserves_provider_specific_message_fields(): + """reasoning_content / tool_calls / etc inside message are kept verbatim in delta.""" + response = { + "choices": [ + { + "index": 0, + "message": { + "role": "assistant", + "content": "answer", + "reasoning_content": "step-by-step thinking", + "tool_calls": [{"id": "t1", "type": "function"}], + }, + "finish_reason": "tool_calls", + } + ], + } + chunk = completion_to_chunk_dict(response, model="glm-5") + + assert chunk["choices"][0]["delta"]["reasoning_content"] == "step-by-step thinking" + assert chunk["choices"][0]["delta"]["tool_calls"] == [{"id": "t1", "type": "function"}] + assert chunk["choices"][0]["finish_reason"] == "tool_calls" + + +def test_completion_to_chunk_synthesizes_id_and_created_when_missing(): + chunk = completion_to_chunk_dict( + {"choices": [{"index": 0, "message": {"role": "assistant"}, "finish_reason": "stop"}]}, + model="any", + ) + assert chunk["id"].startswith("chatcmpl-") + assert isinstance(chunk["created"], int) and chunk["created"] > 0 + assert chunk["model"] == "any" + + +def test_completion_to_chunk_handles_empty_choices(): + chunk = completion_to_chunk_dict({"choices": []}, model="m") + assert chunk["choices"] == [] + + +# ---------- encode_sse_event ---------- + + +def test_encode_sse_event_appends_double_newline_terminator(): + out = encode_sse_event({"k": "v"}) + assert out.endswith(b"\n\n") + assert out.startswith(b"data: ") + body = out[len(b"data: ") : -len(b"\n\n")] + assert json.loads(body) == {"k": "v"} + + +def test_encode_sse_event_preserves_unicode_without_escapes(): + out = encode_sse_event({"content": "你好"}) + # ensure_ascii=False is critical so Chinese stays readable in the wire format + assert "你好".encode() in out + + +def test_sse_done_constant(): + assert SSE_DONE == b"data: [DONE]\n\n" + + +# ---------- round-trip ---------- + + +def test_roundtrip_encode_then_parse(): + """encode → parse must round-trip a payload dict.""" + payloads = [{"i": 0, "text": "alpha"}, {"i": 1, "text": "beta 中文"}] + wire = b"".join(encode_sse_event(p) for p in payloads) + SSE_DONE + chunks, leftover = parse_sse_data_chunks(wire) + + assert chunks == payloads + assert leftover == b"" From a3459c9977950335eff05d708bbabe4451af7089 Mon Sep 17 00:00:00 2001 From: "pengshixin.psx" Date: Tue, 12 May 2026 06:14:22 +0000 Subject: [PATCH 06/25] chore(model-service): remove litellm remnants (num_retries, stale comments) Co-Authored-By: Claude Sonnet 4.6 --- rock/sdk/model/server/config.py | 5 +---- rock/sdk/model/server/main.py | 11 +---------- rock/sdk/model/server/utils.py | 15 ++------------- 3 files changed, 4 insertions(+), 27 deletions(-) diff --git a/rock/sdk/model/server/config.py b/rock/sdk/model/server/config.py index 3923d43c6a..bd19edf902 100644 --- a/rock/sdk/model/server/config.py +++ b/rock/sdk/model/server/config.py @@ -51,14 +51,11 @@ class ModelServiceConfig(BaseModel): request_timeout: int = Field(default=120) """Request timeout in seconds.""" - num_retries: int = Field(default=6) - """Number of retries for retryable failures (passed through to litellm).""" - traj_file: str | None = Field(default=None) """Override default trajectory file path. None → uses TRAJ_FILE (LOG_DIR/LLMTraj.jsonl).""" replay_traj_path: str | None = Field(default=None) - """Path to a .jsonl trajectory file or a directory of .jsonl files for replay mode. + """Path to a .jsonl trajectory file for replay mode. When set, requests are served from recorded responses instead of a real upstream.""" @classmethod diff --git a/rock/sdk/model/server/main.py b/rock/sdk/model/server/main.py index 1ba0b54a7b..83aec58f56 100644 --- a/rock/sdk/model/server/main.py +++ b/rock/sdk/model/server/main.py @@ -125,9 +125,6 @@ def create_config_from_args(args) -> ModelServiceConfig: if args.request_timeout: config.request_timeout = args.request_timeout logger.info(f"request_timeout set from command line: {args.request_timeout}s") - if getattr(args, "num_retries", None) is not None: - config.num_retries = args.num_retries - logger.info(f"num_retries set from command line: {args.num_retries}") if getattr(args, "traj_file", None): config.replay_traj_path = args.traj_file logger.info(f"replay mode enabled via --traj-file: {args.traj_file}") @@ -173,17 +170,11 @@ def create_config_from_args(args) -> ModelServiceConfig: parser.add_argument( "--request-timeout", type=int, default=None, help="Request timeout in seconds. Overrides config file." ) - parser.add_argument( - "--num-retries", - type=int, - default=None, - help="Number of retries for retryable failures (passed through to litellm). Overrides config file.", - ) parser.add_argument( "--traj-file", type=str, default=None, - help="Replay mode: path to a recorded .jsonl traj file or directory. Disables real LLM upstreams.", + help="Replay mode: path to a recorded .jsonl traj file. Disables real LLM upstreams.", ) args = parser.parse_args() diff --git a/rock/sdk/model/server/utils.py b/rock/sdk/model/server/utils.py index 86b7414e29..639ca3995b 100644 --- a/rock/sdk/model/server/utils.py +++ b/rock/sdk/model/server/utils.py @@ -26,13 +26,7 @@ def _get_or_create_metrics_monitor() -> MetricsMonitor: def _write_traj(data: dict): - """Write traj data to file in JSONL format. - - Used by the legacy ``@record_traj`` decorator on the ``local`` model-service - flow. The proxy flow now persists trajectories via - :class:`rock.sdk.model.server.integrations.traj_recorder.TrajectoryRecorder` - instead, which uses litellm's StandardLoggingPayload schema. - """ + """Write traj data to file in JSONL format.""" from rock import env_vars append = env_vars.ROCK_MODEL_SERVICE_TRAJ_APPEND_MODE @@ -44,12 +38,7 @@ def _write_traj(data: dict): def record_traj(func: Callable): - """Decorator to record chat completions input/output as traj. - - Kept for the ``local`` model-service mode (rock/sdk/model/server/api/local.py). - The ``proxy`` mode no longer uses this decorator — it relies on the - TrajectoryRecorder litellm callback for richer payloads. - """ + """Decorator to record chat completions input/output as traj (local mode only).""" @wraps(func) async def wrapper(*args, **kwargs): From f125d335eb469ef288a99e318a7551edb91de281 Mon Sep 17 00:00:00 2001 From: "pengshixin.psx" Date: Tue, 12 May 2026 06:38:11 +0000 Subject: [PATCH 07/25] refactor(model-service): split proxy handler into _ReplayBackend / _ForwardBackend Strategy pattern eliminates the replay/forward branch inside chat_completions. The backend is selected once at startup (_configure_proxy_integrations) and attached to app.state.backend; the endpoint just parses the request and dispatches. Each backend keeps the stream/non-stream branch local to itself. A union type alias _CompletionBackend documents the closed set of backends and a typed _get_backend(request) accessor wraps the app.state read. Co-Authored-By: Claude Sonnet 4.6 --- rock/sdk/model/server/api/proxy.py | 213 +++++++++++++++++------------ rock/sdk/model/server/main.py | 26 ++-- tests/unit/sdk/model/test_proxy.py | 19 +-- 3 files changed, 149 insertions(+), 109 deletions(-) diff --git a/rock/sdk/model/server/api/proxy.py b/rock/sdk/model/server/api/proxy.py index bd000cd80d..0ad5d698a2 100644 --- a/rock/sdk/model/server/api/proxy.py +++ b/rock/sdk/model/server/api/proxy.py @@ -1,8 +1,8 @@ """OpenAI-compatible chat/completions proxy with trajectory record/replay. -Two paths share this handler: +Two backends share the ``/v1/chat/completions`` route: -1. **Forward / record mode** (default) — body bytes are POSTed verbatim to the +1. **_ForwardBackend** (default) — body bytes are POSTed verbatim to the configured upstream via plain ``httpx``. The upstream response is forwarded byte-for-byte back to the client (raw JSON for non-stream, raw SSE bytes for stream). On the side we run a parser (``ChatCompletionChunk`` + @@ -12,9 +12,10 @@ returns (provider-specific ``reasoning_content``, ``citations``, ...) is passed through untouched. -2. **Replay mode** (``replay_traj_path`` set) — the request is served directly - from the next record in ``app.state.replay_cursor`` without any upstream - call. Streaming emits the recorded response as one SSE chunk + ``[DONE]``. +2. **_ReplayBackend** (``replay_traj_path`` set) — the request is served + directly from the next record in the ``SequentialCursor`` without any + upstream call. Streaming emits the recorded response as one SSE chunk + + ``[DONE]``. """ from __future__ import annotations @@ -158,32 +159,15 @@ async def _forward_stream_and_record( ) -@proxy_router.post("/v1/chat/completions") -async def chat_completions(request: Request): - """OpenAI-compatible chat completions proxy endpoint. - - Reads the body as raw bytes (no parsing on the forward path) and either - serves it from the replay cursor or forwards it to the configured upstream. - """ - config: ModelServiceConfig = request.app.state.model_service_config - recorder: TrajectoryRecorder | None = getattr(request.app.state, "recorder", None) - - body_bytes = await request.body() - try: - request_dict = json.loads(body_bytes) if body_bytes else {} - except json.JSONDecodeError: - raise HTTPException(status_code=400, detail="Request body is not valid JSON.") - if not isinstance(request_dict, dict): - raise HTTPException(status_code=400, detail="Request body must be a JSON object.") +class _ReplayBackend: + """Serves requests from a pre-recorded trajectory; no upstream calls made.""" - model_name = request_dict.get("model", "") - is_stream = bool(request_dict.get("stream")) + def __init__(self, cursor: SequentialCursor) -> None: + self._cursor = cursor - # ---- Replay mode: short-circuit, no upstream call ---- - if config.replay_traj_path: - cursor: SequentialCursor = request.app.state.replay_cursor + async def serve(self, *, model_name: str, is_stream: bool, **_: Any) -> Response: try: - record = await cursor.next(expected_model=model_name) + record = await self._cursor.next(expected_model=model_name) except TrajectoryExhausted as exc: raise HTTPException(status_code=404, detail=str(exc)) @@ -191,9 +175,9 @@ async def chat_completions(request: Request): if not isinstance(response_dict, dict): raise HTTPException( status_code=500, - detail=f"replay record at step {cursor.position - 1} has no usable response dict", + detail=f"replay record at step {self._cursor.position - 1} has no usable response dict", ) - logger.info(f"[replay] step {cursor.position}/{cursor.total} served for model={model_name!r}") + logger.info(f"[replay] step {self._cursor.position}/{self._cursor.total} served for model={model_name!r}") if is_stream: return StreamingResponse( @@ -202,71 +186,124 @@ async def chat_completions(request: Request): ) return JSONResponse(status_code=200, content=response_dict) - # ---- Forward / record mode: byte-passthrough via httpx ---- - upstream_url = f"{get_base_url(model_name, config)}/chat/completions" - fwd_headers = _filter_headers(request.headers) - logger.info(f"Routing model {model_name!r} to {upstream_url}") - - if is_stream: - return StreamingResponse( - _forward_stream_and_record( - upstream_url=upstream_url, - body_bytes=body_bytes, - fwd_headers=fwd_headers, - timeout=config.request_timeout, - request_dict=request_dict, - recorder=recorder, - ), - media_type="text/event-stream", - ) - # Non-stream: single POST, return upstream's status + body verbatim, record on the side. - start = time.time() - try: - async with httpx.AsyncClient(timeout=config.request_timeout) as client: - r = await client.post(upstream_url, content=body_bytes, headers=fwd_headers) - except httpx.TimeoutException as exc: - if recorder is not None: - await recorder.record( - request=request_dict, - response=None, - status="failure", - start_time=start, - end_time=time.time(), - error=f"timeout: {exc}", +class _ForwardBackend: + """Forwards requests byte-for-byte to the upstream and optionally records the trajectory.""" + + def __init__(self, config: ModelServiceConfig, recorder: TrajectoryRecorder | None = None) -> None: + self._config = config + self._recorder = recorder + + async def serve( + self, + *, + model_name: str, + is_stream: bool, + body_bytes: bytes, + fwd_headers: dict[str, str], + request_dict: dict[str, Any], + **_: Any, + ) -> Response: + upstream_url = f"{get_base_url(model_name, self._config)}/chat/completions" + logger.info(f"Routing model {model_name!r} to {upstream_url}") + + if is_stream: + return StreamingResponse( + _forward_stream_and_record( + upstream_url=upstream_url, + body_bytes=body_bytes, + fwd_headers=fwd_headers, + timeout=self._config.request_timeout, + request_dict=request_dict, + recorder=self._recorder, + ), + media_type="text/event-stream", ) - raise HTTPException(status_code=504, detail=f"Upstream timed out: {exc}") - except httpx.RequestError as exc: - if recorder is not None: - await recorder.record( + + # Non-stream: single POST, return upstream's status + body verbatim, record on the side. + start = time.time() + try: + async with httpx.AsyncClient(timeout=self._config.request_timeout) as client: + r = await client.post(upstream_url, content=body_bytes, headers=fwd_headers) + except httpx.TimeoutException as exc: + if self._recorder is not None: + await self._recorder.record( + request=request_dict, + response=None, + status="failure", + start_time=start, + end_time=time.time(), + error=f"timeout: {exc}", + ) + raise HTTPException(status_code=504, detail=f"Upstream timed out: {exc}") + except httpx.RequestError as exc: + if self._recorder is not None: + await self._recorder.record( + request=request_dict, + response=None, + status="failure", + start_time=start, + end_time=time.time(), + error=f"{type(exc).__name__}: {exc}", + ) + raise HTTPException(status_code=502, detail=f"Upstream request failed: {exc}") + + response_text = r.text # bytes already read by httpx; .text decodes once + response_dict: dict | None = None + try: + parsed = json.loads(response_text) if response_text else None + if isinstance(parsed, dict): + response_dict = parsed + except json.JSONDecodeError: + pass + + if self._recorder is not None: + await self._recorder.record( request=request_dict, - response=None, - status="failure", + response=response_dict, + status="success" if r.status_code < 400 else "failure", start_time=start, end_time=time.time(), - error=f"{type(exc).__name__}: {exc}", + error=None if r.status_code < 400 else f"upstream_status={r.status_code}", ) - raise HTTPException(status_code=502, detail=f"Upstream request failed: {exc}") - response_text = r.text # bytes already read by httpx; .text decodes once - response_dict: dict | None = None + # Forward bytes verbatim — preserves any provider-specific fields untouched. + media_type = r.headers.get("content-type", "application/json") + return Response(content=response_text, status_code=r.status_code, media_type=media_type) + + +_CompletionBackend = _ReplayBackend | _ForwardBackend + + +def _get_backend(request: Request) -> _CompletionBackend: + """Typed accessor for the backend attached at startup by ``_configure_proxy_integrations``.""" + return request.app.state.backend + + +@proxy_router.post("/v1/chat/completions") +async def chat_completions(request: Request): + """OpenAI-compatible chat completions proxy endpoint. + + Reads the body as raw bytes (no parsing on the forward path) and delegates + to the backend attached at startup (replay or forward). + """ + body_bytes = await request.body() try: - parsed = json.loads(response_text) if response_text else None - if isinstance(parsed, dict): - response_dict = parsed + request_dict = json.loads(body_bytes) if body_bytes else {} except json.JSONDecodeError: - pass - - if recorder is not None: - await recorder.record( - request=request_dict, - response=response_dict, - status="success" if r.status_code < 400 else "failure", - start_time=start, - end_time=time.time(), - error=None if r.status_code < 400 else f"upstream_status={r.status_code}", - ) + raise HTTPException(status_code=400, detail="Request body is not valid JSON.") + if not isinstance(request_dict, dict): + raise HTTPException(status_code=400, detail="Request body must be a JSON object.") - # Forward bytes verbatim — preserves any provider-specific fields untouched. - media_type = r.headers.get("content-type", "application/json") - return Response(content=response_text, status_code=r.status_code, media_type=media_type) + model_name = request_dict.get("model", "") + is_stream = bool(request_dict.get("stream")) + fwd_headers = _filter_headers(request.headers) + + backend = _get_backend(request) + return await backend.serve( + model_name=model_name, + is_stream=is_stream, + body_bytes=body_bytes, + fwd_headers=fwd_headers, + request_dict=request_dict, + ) diff --git a/rock/sdk/model/server/main.py b/rock/sdk/model/server/main.py index 83aec58f56..da2ee10a36 100644 --- a/rock/sdk/model/server/main.py +++ b/rock/sdk/model/server/main.py @@ -53,27 +53,29 @@ async def global_exception_handler(request, exc): def _configure_proxy_integrations(app: FastAPI, config: ModelServiceConfig) -> None: - """Wire up record/replay integrations and attach them to ``app.state``. - - - Replay mode (``replay_traj_path`` set): load the trajectory into a - ``SequentialCursor`` and stash it as ``app.state.replay_cursor``. No - recorder is attached — replaying back into the source file would corrupt it. - - Forward mode (default): attach a ``TrajectoryRecorder`` instance as - ``app.state.recorder``. The proxy handler invokes it explicitly after - each forwarded call. + """Attach the appropriate backend to ``app.state.backend``. + + - Replay mode (``replay_traj_path`` set): ``_ReplayBackend`` wrapping a + ``SequentialCursor``; no recorder — replaying back into the source file + would corrupt it. + - Forward mode (default): ``_ForwardBackend`` with a ``TrajectoryRecorder``. """ + from rock.sdk.model.server.api.proxy import _ForwardBackend, _ReplayBackend + if config.replay_traj_path: from rock.sdk.model.server.integrations.traj_replayer import SequentialCursor - app.state.replay_cursor = SequentialCursor.load(config.replay_traj_path) - logger.info(f"replay cursor loaded, traj_path={config.replay_traj_path}") + cursor = SequentialCursor.load(config.replay_traj_path) + app.state.backend = _ReplayBackend(cursor) + logger.info(f"replay backend attached, traj_path={config.replay_traj_path}") return from rock.sdk.model.server.integrations.traj_recorder import TrajectoryRecorder traj_path = config.traj_file or TRAJ_FILE - app.state.recorder = TrajectoryRecorder(traj_file=traj_path) - logger.info(f"trajectory recorder attached, traj_file={traj_path}") + recorder = TrajectoryRecorder(traj_file=traj_path) + app.state.backend = _ForwardBackend(config, recorder=recorder) + logger.info(f"forward backend attached, traj_file={traj_path}") def main( diff --git a/tests/unit/sdk/model/test_proxy.py b/tests/unit/sdk/model/test_proxy.py index fe5634fab1..b3e9d5b1ed 100644 --- a/tests/unit/sdk/model/test_proxy.py +++ b/tests/unit/sdk/model/test_proxy.py @@ -27,12 +27,16 @@ ) -def _build_app(config: ModelServiceConfig, *, replay_cursor=None) -> FastAPI: +def _build_app(config: ModelServiceConfig, *, replay_cursor=None, recorder=None) -> FastAPI: """Build a FastAPI app with the proxy router and the given config attached.""" + from rock.sdk.model.server.api.proxy import _ForwardBackend, _ReplayBackend + app = FastAPI() app.state.model_service_config = config if replay_cursor is not None: - app.state.replay_cursor = replay_cursor + app.state.backend = _ReplayBackend(replay_cursor) + else: + app.state.backend = _ForwardBackend(config, recorder=recorder) app.include_router(proxy_router) return app @@ -264,19 +268,16 @@ def handler(request: httpx.Request) -> httpx.Response: @pytest.mark.asyncio async def test_forward_invokes_recorder_on_success(tmp_path): - """When app.state.recorder is set, success calls write a JSONL line with the - request and the upstream response verbatim.""" + """When a recorder is attached to the backend, success calls write a JSONL line.""" from rock.sdk.model.server.integrations.traj_recorder import TrajectoryRecorder upstream_payload = _success_response_json(content="recorded reply") + traj_file = tmp_path / "traj.jsonl" def handler(request: httpx.Request) -> httpx.Response: return httpx.Response(200, json=upstream_payload) config = ModelServiceConfig() - app = _build_app(config) - traj_file = tmp_path / "traj.jsonl" - app.state.recorder = TrajectoryRecorder(traj_file=traj_file) with ( _patch_httpx_with_handler(handler), @@ -284,8 +285,8 @@ def handler(request: httpx.Request) -> httpx.Response: "rock.sdk.model.server.integrations.traj_recorder._get_or_create_metrics_monitor", return_value=MagicMock() ), ): - # Re-create the recorder so it picks up the patched monitor. - app.state.recorder = TrajectoryRecorder(traj_file=traj_file) + recorder = TrajectoryRecorder(traj_file=traj_file) + app = _build_app(config, recorder=recorder) transport = ASGITransport(app=app) async with AsyncClient(transport=transport, base_url="http://test") as ac: await ac.post( From 4b7a35abbf266ec6b717541e59b78291461b30a0 Mon Sep 17 00:00:00 2001 From: "pengshixin.psx" Date: Tue, 12 May 2026 06:41:52 +0000 Subject: [PATCH 08/25] =?UTF-8?q?refactor(model-service):=20drop=20=5F=20p?= =?UTF-8?q?refix=20from=20public=20Backend=20classes,=20rename=20replay=5F?= =?UTF-8?q?traj=5Fpath=20=E2=86=92=20replay=5Ftraj=5Ffile?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The Backend classes are imported by main.py and tests, so the leading underscore mis-signalled them as module-internal. Rename the config field to align with traj_file naming. Also drop the defensive getattr() for args.traj_file — argparse always sets it. Co-Authored-By: Claude Sonnet 4.6 --- rock/sdk/model/server/api/proxy.py | 12 ++++++------ rock/sdk/model/server/config.py | 2 +- rock/sdk/model/server/main.py | 20 ++++++++++---------- tests/unit/sdk/model/test_proxy.py | 20 ++++++++++---------- 4 files changed, 27 insertions(+), 27 deletions(-) diff --git a/rock/sdk/model/server/api/proxy.py b/rock/sdk/model/server/api/proxy.py index 0ad5d698a2..2aa911771a 100644 --- a/rock/sdk/model/server/api/proxy.py +++ b/rock/sdk/model/server/api/proxy.py @@ -2,7 +2,7 @@ Two backends share the ``/v1/chat/completions`` route: -1. **_ForwardBackend** (default) — body bytes are POSTed verbatim to the +1. **ForwardBackend** (default) — body bytes are POSTed verbatim to the configured upstream via plain ``httpx``. The upstream response is forwarded byte-for-byte back to the client (raw JSON for non-stream, raw SSE bytes for stream). On the side we run a parser (``ChatCompletionChunk`` + @@ -12,7 +12,7 @@ returns (provider-specific ``reasoning_content``, ``citations``, ...) is passed through untouched. -2. **_ReplayBackend** (``replay_traj_path`` set) — the request is served +2. **ReplayBackend** (``replay_traj_file`` set) — the request is served directly from the next record in the ``SequentialCursor`` without any upstream call. Streaming emits the recorded response as one SSE chunk + ``[DONE]``. @@ -159,7 +159,7 @@ async def _forward_stream_and_record( ) -class _ReplayBackend: +class ReplayBackend: """Serves requests from a pre-recorded trajectory; no upstream calls made.""" def __init__(self, cursor: SequentialCursor) -> None: @@ -187,7 +187,7 @@ async def serve(self, *, model_name: str, is_stream: bool, **_: Any) -> Response return JSONResponse(status_code=200, content=response_dict) -class _ForwardBackend: +class ForwardBackend: """Forwards requests byte-for-byte to the upstream and optionally records the trajectory.""" def __init__(self, config: ModelServiceConfig, recorder: TrajectoryRecorder | None = None) -> None: @@ -272,10 +272,10 @@ async def serve( return Response(content=response_text, status_code=r.status_code, media_type=media_type) -_CompletionBackend = _ReplayBackend | _ForwardBackend +CompletionBackend = ReplayBackend | ForwardBackend -def _get_backend(request: Request) -> _CompletionBackend: +def _get_backend(request: Request) -> CompletionBackend: """Typed accessor for the backend attached at startup by ``_configure_proxy_integrations``.""" return request.app.state.backend diff --git a/rock/sdk/model/server/config.py b/rock/sdk/model/server/config.py index bd19edf902..76e080305c 100644 --- a/rock/sdk/model/server/config.py +++ b/rock/sdk/model/server/config.py @@ -54,7 +54,7 @@ class ModelServiceConfig(BaseModel): traj_file: str | None = Field(default=None) """Override default trajectory file path. None → uses TRAJ_FILE (LOG_DIR/LLMTraj.jsonl).""" - replay_traj_path: str | None = Field(default=None) + replay_traj_file: str | None = Field(default=None) """Path to a .jsonl trajectory file for replay mode. When set, requests are served from recorded responses instead of a real upstream.""" diff --git a/rock/sdk/model/server/main.py b/rock/sdk/model/server/main.py index da2ee10a36..951e0d5dff 100644 --- a/rock/sdk/model/server/main.py +++ b/rock/sdk/model/server/main.py @@ -55,26 +55,26 @@ async def global_exception_handler(request, exc): def _configure_proxy_integrations(app: FastAPI, config: ModelServiceConfig) -> None: """Attach the appropriate backend to ``app.state.backend``. - - Replay mode (``replay_traj_path`` set): ``_ReplayBackend`` wrapping a + - Replay mode (``replay_traj_file`` set): ``ReplayBackend`` wrapping a ``SequentialCursor``; no recorder — replaying back into the source file would corrupt it. - - Forward mode (default): ``_ForwardBackend`` with a ``TrajectoryRecorder``. + - Forward mode (default): ``ForwardBackend`` with a ``TrajectoryRecorder``. """ - from rock.sdk.model.server.api.proxy import _ForwardBackend, _ReplayBackend + from rock.sdk.model.server.api.proxy import ForwardBackend, ReplayBackend - if config.replay_traj_path: + if config.replay_traj_file: from rock.sdk.model.server.integrations.traj_replayer import SequentialCursor - cursor = SequentialCursor.load(config.replay_traj_path) - app.state.backend = _ReplayBackend(cursor) - logger.info(f"replay backend attached, traj_path={config.replay_traj_path}") + cursor = SequentialCursor.load(config.replay_traj_file) + app.state.backend = ReplayBackend(cursor) + logger.info(f"replay backend attached, traj_path={config.replay_traj_file}") return from rock.sdk.model.server.integrations.traj_recorder import TrajectoryRecorder traj_path = config.traj_file or TRAJ_FILE recorder = TrajectoryRecorder(traj_file=traj_path) - app.state.backend = _ForwardBackend(config, recorder=recorder) + app.state.backend = ForwardBackend(config, recorder=recorder) logger.info(f"forward backend attached, traj_file={traj_path}") @@ -127,8 +127,8 @@ def create_config_from_args(args) -> ModelServiceConfig: if args.request_timeout: config.request_timeout = args.request_timeout logger.info(f"request_timeout set from command line: {args.request_timeout}s") - if getattr(args, "traj_file", None): - config.replay_traj_path = args.traj_file + if args.traj_file: + config.replay_traj_file = args.traj_file logger.info(f"replay mode enabled via --traj-file: {args.traj_file}") return config diff --git a/tests/unit/sdk/model/test_proxy.py b/tests/unit/sdk/model/test_proxy.py index b3e9d5b1ed..caa5fb2254 100644 --- a/tests/unit/sdk/model/test_proxy.py +++ b/tests/unit/sdk/model/test_proxy.py @@ -29,14 +29,14 @@ def _build_app(config: ModelServiceConfig, *, replay_cursor=None, recorder=None) -> FastAPI: """Build a FastAPI app with the proxy router and the given config attached.""" - from rock.sdk.model.server.api.proxy import _ForwardBackend, _ReplayBackend + from rock.sdk.model.server.api.proxy import ForwardBackend, ReplayBackend app = FastAPI() app.state.model_service_config = config if replay_cursor is not None: - app.state.backend = _ReplayBackend(replay_cursor) + app.state.backend = ReplayBackend(replay_cursor) else: - app.state.backend = _ForwardBackend(config, recorder=recorder) + app.state.backend = ForwardBackend(config, recorder=recorder) app.include_router(proxy_router) return app @@ -327,7 +327,7 @@ async def test_replay_returns_recorded_response_no_upstream_call(tmp_path): traj.write_text(json.dumps(record) + "\n", encoding="utf-8") config = ModelServiceConfig() - config.replay_traj_path = str(traj) + config.replay_traj_file = str(traj) app = _build_app(config, replay_cursor=SequentialCursor.load(traj)) transport = ASGITransport(app=app) @@ -362,7 +362,7 @@ async def test_replay_streaming_emits_recorded_response_as_sse(tmp_path): traj.write_text(json.dumps(record) + "\n", encoding="utf-8") config = ModelServiceConfig() - config.replay_traj_path = str(traj) + config.replay_traj_file = str(traj) app = _build_app(config, replay_cursor=SequentialCursor.load(traj)) transport = ASGITransport(app=app) @@ -392,7 +392,7 @@ async def test_replay_returns_404_when_cursor_exhausted(tmp_path): traj.write_text(json.dumps(record) + "\n", encoding="utf-8") config = ModelServiceConfig() - config.replay_traj_path = str(traj) + config.replay_traj_file = str(traj) app = _build_app(config, replay_cursor=SequentialCursor.load(traj)) transport = ASGITransport(app=app) @@ -441,7 +441,7 @@ def test_config_default_host_and_port(): def test_config_default_traj_and_replay(): config = ModelServiceConfig() assert config.traj_file is None - assert config.replay_traj_path is None + assert config.replay_traj_file is None @pytest.mark.asyncio @@ -451,13 +451,13 @@ async def test_config_loads_traj_and_replay_from_file(tmp_path): yaml.dump( { "traj_file": "/tmp/my-traj.jsonl", - "replay_traj_path": "/tmp/in.jsonl", + "replay_traj_file": "/tmp/in.jsonl", } ) ) config = ModelServiceConfig.from_file(str(conf_file)) assert config.traj_file == "/tmp/my-traj.jsonl" - assert config.replay_traj_path == "/tmp/in.jsonl" + assert config.replay_traj_file == "/tmp/in.jsonl" def test_cli_args_override_config_file(tmp_path): @@ -501,7 +501,7 @@ def test_cli_traj_file_enables_replay(): traj_file="/tmp/in.jsonl", ) config = create_config_from_args(args) - assert config.replay_traj_path == "/tmp/in.jsonl" + assert config.replay_traj_file == "/tmp/in.jsonl" # ---------- Metrics singleton + legacy record_traj (still used by local mode) ---------- From 0e5cb2093ce2d5ef6a2740ce09bf9b32fdc3471c Mon Sep 17 00:00:00 2001 From: "pengshixin.psx" Date: Tue, 12 May 2026 06:56:11 +0000 Subject: [PATCH 09/25] feat(model-service): restore retry on retryable_status_codes + connection errors MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Retry was lost when litellm was dropped (it provided num_retries internally). Restored using a unified _send_with_retry helper that: - Always opens upstream with stream=True so the same code path serves both stream and non-stream callers (non-stream just await resp.aread()). - Retries on httpx.TimeoutException, httpx.ConnectError, and HTTP statuses in config.retryable_status_codes (default [429, 500]). - Defaults: 6 attempts, exponential backoff 2s→32s with jitter — matches the original perform_llm_request behavior. - For stream: retry happens before any byte is yielded; mid-stream drops are not retried (would corrupt downstream). Module-level retry constants are read at call time so tests can monkeypatch them. Added 4 tests covering: success after retry, exhausted retries returning last response, non-whitelisted status not retried, and stream retry path. Co-Authored-By: Claude Sonnet 4.6 --- rock/sdk/model/server/api/proxy.py | 211 +++++++++++++++++++++-------- tests/unit/sdk/model/test_proxy.py | 119 +++++++++++++++- 2 files changed, 273 insertions(+), 57 deletions(-) diff --git a/rock/sdk/model/server/api/proxy.py b/rock/sdk/model/server/api/proxy.py index 2aa911771a..55eec129cf 100644 --- a/rock/sdk/model/server/api/proxy.py +++ b/rock/sdk/model/server/api/proxy.py @@ -20,7 +20,9 @@ from __future__ import annotations +import asyncio import json +import random import time from collections.abc import AsyncIterator from typing import Any @@ -53,6 +55,64 @@ # - transfer-encoding / connection: RFC 7230 hop-by-hop, scoped to one connection _HEADERS_NOT_TO_FORWARD = frozenset({"host", "content-length", "transfer-encoding", "connection"}) +# Retry knobs for upstream POST. Read at call-time so tests can monkeypatch them. +# Default: up to 6 attempts with exponential backoff (2s → 4s → 8s → 16s → 32s, jittered). +_RETRY_MAX_ATTEMPTS = 6 +_RETRY_DELAY_SECONDS = 2.0 +_RETRY_BACKOFF = 2.0 +_RETRY_EXCEPTIONS: tuple[type[Exception], ...] = ( + httpx.TimeoutException, + httpx.ConnectError, + httpx.HTTPStatusError, +) + + +async def _send_with_retry( + client: httpx.AsyncClient, + url: str, + *, + body_bytes: bytes, + headers: dict[str, str], + retryable_codes: list[int], +) -> httpx.Response: + """POST with retry on connection errors and whitelisted statuses, returning + an open streaming response. + + Always uses ``stream=True`` so the same path serves both stream and non-stream + callers — non-stream just calls ``await resp.aread()`` to materialize the body. + Assumes a failed upstream returns its error body before any byte is yielded + to downstream (so retry can still discard it cleanly). + + Caller MUST ``await resp.aclose()`` after consuming. + """ + last_exc: Exception | None = None + delay = _RETRY_DELAY_SECONDS + for attempt in range(1, _RETRY_MAX_ATTEMPTS + 1): + try: + resp = await client.send( + client.build_request("POST", url, content=body_bytes, headers=headers), + stream=True, + ) + except (httpx.TimeoutException, httpx.ConnectError) as exc: + last_exc = exc + if attempt >= _RETRY_MAX_ATTEMPTS: + raise + logger.warning(f"connect failed (attempt {attempt}/{_RETRY_MAX_ATTEMPTS}): {exc}") + await asyncio.sleep(random.uniform(0, delay * 2)) + delay *= _RETRY_BACKOFF + continue + + if resp.status_code in retryable_codes and attempt < _RETRY_MAX_ATTEMPTS: + await resp.aclose() + logger.warning(f"upstream status {resp.status_code}, retry {attempt}/{_RETRY_MAX_ATTEMPTS}") + await asyncio.sleep(random.uniform(0, delay * 2)) + delay *= _RETRY_BACKOFF + continue + + return resp + + raise last_exc # pragma: no cover # unreachable + def get_base_url(model_name: str, config: ModelServiceConfig) -> str: """Pick the upstream base URL by model name. @@ -104,39 +164,65 @@ async def _forward_stream_and_record( timeout: float, request_dict: dict[str, Any], recorder: TrajectoryRecorder | None, + retryable_codes: list[int], ) -> AsyncIterator[bytes]: """SSE bytes are forwarded verbatim; chunks are parsed in parallel and - aggregated into the final ChatCompletion that the recorder writes to JSONL.""" + aggregated into the final ChatCompletion that the recorder writes to JSONL. + + Retry on connection errors and whitelisted statuses happens BEFORE any byte + is yielded; mid-stream connection drops are not retried (would corrupt the + client transmission).""" state = ChatCompletionStreamState() start = time.time() parse_buffer = b"" upstream_status = 0 - try: - async with httpx.AsyncClient(timeout=timeout) as client: - async with client.stream("POST", upstream_url, content=body_bytes, headers=fwd_headers) as r: - upstream_status = r.status_code - async for chunk in r.aiter_bytes(): - yield chunk - chunk_dicts, parse_buffer = parse_sse_data_chunks(parse_buffer + chunk) - for chunk_dict in chunk_dicts: - try: - state.handle_chunk(ChatCompletionChunk.model_validate(chunk_dict)) - except Exception as exc: # parser error: forward continues, traj will be partial - logger.debug(f"[record] chunk parse failed (forward continues): {exc}") - except httpx.RequestError as exc: - # Connection died mid-stream. The bytes already sent reach the client; - # we still try to record what we got. - if recorder is not None: - await recorder.record( - request=request_dict, - response=None, - status="failure", - start_time=start, - end_time=time.time(), - error=f"{type(exc).__name__}: {exc}", + async with httpx.AsyncClient(timeout=timeout) as client: + try: + resp = await _send_with_retry( + client, + upstream_url, + body_bytes=body_bytes, + headers=fwd_headers, + retryable_codes=retryable_codes, ) - return + except (httpx.TimeoutException, httpx.ConnectError) as exc: + if recorder is not None: + await recorder.record( + request=request_dict, + response=None, + status="failure", + start_time=start, + end_time=time.time(), + error=f"{type(exc).__name__}: {exc}", + ) + return + + try: + upstream_status = resp.status_code + async for chunk in resp.aiter_bytes(): + yield chunk + chunk_dicts, parse_buffer = parse_sse_data_chunks(parse_buffer + chunk) + for chunk_dict in chunk_dicts: + try: + state.handle_chunk(ChatCompletionChunk.model_validate(chunk_dict)) + except Exception as exc: # parser error: forward continues, traj will be partial + logger.debug(f"[record] chunk parse failed (forward continues): {exc}") + except httpx.RequestError as exc: + # Connection died mid-stream — bytes already sent reach the client; + # record what we got and return. + if recorder is not None: + await recorder.record( + request=request_dict, + response=None, + status="failure", + start_time=start, + end_time=time.time(), + error=f"{type(exc).__name__}: {exc}", + ) + return + finally: + await resp.aclose() if recorder is None: return @@ -216,39 +302,53 @@ async def serve( timeout=self._config.request_timeout, request_dict=request_dict, recorder=self._recorder, + retryable_codes=self._config.retryable_status_codes, ), media_type="text/event-stream", ) - # Non-stream: single POST, return upstream's status + body verbatim, record on the side. + # Non-stream: same retry path as stream (open with stream=True), then aread() the body. start = time.time() - try: - async with httpx.AsyncClient(timeout=self._config.request_timeout) as client: - r = await client.post(upstream_url, content=body_bytes, headers=fwd_headers) - except httpx.TimeoutException as exc: - if self._recorder is not None: - await self._recorder.record( - request=request_dict, - response=None, - status="failure", - start_time=start, - end_time=time.time(), - error=f"timeout: {exc}", - ) - raise HTTPException(status_code=504, detail=f"Upstream timed out: {exc}") - except httpx.RequestError as exc: - if self._recorder is not None: - await self._recorder.record( - request=request_dict, - response=None, - status="failure", - start_time=start, - end_time=time.time(), - error=f"{type(exc).__name__}: {exc}", + async with httpx.AsyncClient(timeout=self._config.request_timeout) as client: + try: + resp = await _send_with_retry( + client, + upstream_url, + body_bytes=body_bytes, + headers=fwd_headers, + retryable_codes=self._config.retryable_status_codes, ) - raise HTTPException(status_code=502, detail=f"Upstream request failed: {exc}") - - response_text = r.text # bytes already read by httpx; .text decodes once + except httpx.TimeoutException as exc: + if self._recorder is not None: + await self._recorder.record( + request=request_dict, + response=None, + status="failure", + start_time=start, + end_time=time.time(), + error=f"timeout: {exc}", + ) + raise HTTPException(status_code=504, detail=f"Upstream timed out: {exc}") + except httpx.RequestError as exc: + if self._recorder is not None: + await self._recorder.record( + request=request_dict, + response=None, + status="failure", + start_time=start, + end_time=time.time(), + error=f"{type(exc).__name__}: {exc}", + ) + raise HTTPException(status_code=502, detail=f"Upstream request failed: {exc}") + + try: + response_bytes = await resp.aread() + status_code = resp.status_code + content_type = resp.headers.get("content-type", "application/json") + finally: + await resp.aclose() + + response_text = response_bytes.decode("utf-8", errors="replace") response_dict: dict | None = None try: parsed = json.loads(response_text) if response_text else None @@ -261,15 +361,14 @@ async def serve( await self._recorder.record( request=request_dict, response=response_dict, - status="success" if r.status_code < 400 else "failure", + status="success" if status_code < 400 else "failure", start_time=start, end_time=time.time(), - error=None if r.status_code < 400 else f"upstream_status={r.status_code}", + error=None if status_code < 400 else f"upstream_status={status_code}", ) # Forward bytes verbatim — preserves any provider-specific fields untouched. - media_type = r.headers.get("content-type", "application/json") - return Response(content=response_text, status_code=r.status_code, media_type=media_type) + return Response(content=response_bytes, status_code=status_code, media_type=content_type) CompletionBackend = ReplayBackend | ForwardBackend diff --git a/tests/unit/sdk/model/test_proxy.py b/tests/unit/sdk/model/test_proxy.py index caa5fb2254..f47813ac8b 100644 --- a/tests/unit/sdk/model/test_proxy.py +++ b/tests/unit/sdk/model/test_proxy.py @@ -247,7 +247,10 @@ def handler(request: httpx.Request) -> httpx.Response: @pytest.mark.asyncio -async def test_forward_502_on_upstream_connection_failure(): +async def test_forward_502_on_upstream_connection_failure(monkeypatch): + """ConnectError → 502. Retry disabled here to keep the test fast.""" + monkeypatch.setattr("rock.sdk.model.server.api.proxy._RETRY_MAX_ATTEMPTS", 1) + def handler(request: httpx.Request) -> httpx.Response: raise httpx.ConnectError("upstream is down") @@ -263,6 +266,120 @@ def handler(request: httpx.Request) -> httpx.Response: assert r.status_code == 502 +# ---------- Forward path: retry ---------- + + +@pytest.mark.asyncio +async def test_forward_retries_on_retryable_status_then_succeeds(monkeypatch): + """A 429 is retried; the next attempt's 200 is returned to the client.""" + monkeypatch.setattr("rock.sdk.model.server.api.proxy._RETRY_DELAY_SECONDS", 0.0) + + attempts = [] + + def handler(request: httpx.Request) -> httpx.Response: + attempts.append(1) + if len(attempts) < 3: + return httpx.Response(429, json={"error": "rate limited"}) + return httpx.Response(200, json=_success_response_json(content="finally")) + + app = _build_app(ModelServiceConfig()) # default retryable_status_codes = [429, 500] + with _patch_httpx_with_handler(handler): + transport = ASGITransport(app=app) + async with AsyncClient(transport=transport, base_url="http://test") as ac: + r = await ac.post( + "/v1/chat/completions", + json={"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "hi"}]}, + ) + + assert r.status_code == 200 + assert r.json()["choices"][0]["message"]["content"] == "finally" + assert len(attempts) == 3 + + +@pytest.mark.asyncio +async def test_forward_returns_last_response_when_retries_exhausted(monkeypatch): + """All attempts return 429 → the final 429 body+status is forwarded verbatim.""" + monkeypatch.setattr("rock.sdk.model.server.api.proxy._RETRY_MAX_ATTEMPTS", 3) + monkeypatch.setattr("rock.sdk.model.server.api.proxy._RETRY_DELAY_SECONDS", 0.0) + + attempts = [] + + def handler(request: httpx.Request) -> httpx.Response: + attempts.append(1) + return httpx.Response(429, json={"error": "still rate limited"}) + + app = _build_app(ModelServiceConfig()) + with _patch_httpx_with_handler(handler): + transport = ASGITransport(app=app) + async with AsyncClient(transport=transport, base_url="http://test") as ac: + r = await ac.post( + "/v1/chat/completions", + json={"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "hi"}]}, + ) + + assert r.status_code == 429 + assert r.json() == {"error": "still rate limited"} + assert len(attempts) == 3 + + +@pytest.mark.asyncio +async def test_forward_does_not_retry_non_whitelisted_status(monkeypatch): + """400 is not in retryable_status_codes → forwarded immediately, no retry.""" + monkeypatch.setattr("rock.sdk.model.server.api.proxy._RETRY_DELAY_SECONDS", 0.0) + + attempts = [] + + def handler(request: httpx.Request) -> httpx.Response: + attempts.append(1) + return httpx.Response(400, json={"error": "bad request"}) + + app = _build_app(ModelServiceConfig()) + with _patch_httpx_with_handler(handler): + transport = ASGITransport(app=app) + async with AsyncClient(transport=transport, base_url="http://test") as ac: + r = await ac.post( + "/v1/chat/completions", + json={"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "hi"}]}, + ) + + assert r.status_code == 400 + assert len(attempts) == 1 + + +@pytest.mark.asyncio +async def test_forward_stream_retries_on_retryable_status_then_succeeds(monkeypatch): + """Streaming: 500 on first attempt, then 200 SSE on second — client sees only the 200 body.""" + monkeypatch.setattr("rock.sdk.model.server.api.proxy._RETRY_DELAY_SECONDS", 0.0) + + attempts = [] + sse_body = ( + b'data: {"id":"x","object":"chat.completion.chunk","choices":[{"index":0,' + b'"delta":{"content":"hello"},"finish_reason":null}]}\n\n' + b"data: [DONE]\n\n" + ) + + def handler(request: httpx.Request) -> httpx.Response: + attempts.append(1) + if len(attempts) < 2: + return httpx.Response(500, json={"error": "internal"}) + return httpx.Response(200, content=sse_body, headers={"content-type": "text/event-stream"}) + + app = _build_app(ModelServiceConfig()) + with _patch_httpx_with_handler(handler): + transport = ASGITransport(app=app) + async with AsyncClient(transport=transport, base_url="http://test") as ac: + r = await ac.post( + "/v1/chat/completions", + json={"model": "gpt-3.5-turbo", "stream": True, "messages": [{"role": "user", "content": "hi"}]}, + ) + + body = r.text + assert "hello" in body + assert "[DONE]" in body + assert "internal" not in body # the 500 attempt is not leaked to client + assert len(attempts) == 2 + + # ---------- Forward path: recording ---------- From 1b20a37f423f7f068b84ed7b82a6d9bf42616362 Mon Sep 17 00:00:00 2001 From: "pengshixin.psx" Date: Tue, 12 May 2026 07:41:41 +0000 Subject: [PATCH 10/25] test(model-service): add e2e tests against an in-thread uvicorn mock upstream MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit A tiny FastAPI mock app runs in a background thread via uvicorn; the proxy calls it over real TCP through its own httpx.AsyncClient — production code path, no transport injection or patching. Three scenarios: - non-stream forward: vendor field round-trips, recorder writes JSONL - stream forward: SSE chunks reach the client, recorder gets aggregated final completion - record-then-replay: replay phase uses a bogus base_url to prove the upstream isn't called Tests use FastAPI's TestClient (sync) so the test bodies read top-down with no async noise; the async wiring lives inside MockUpstreamServer. Drive-by cleanups in proxy.py: localize the openai SDK imports inside the streaming aggregator (only needed there), and drop the now-unused _RETRY_EXCEPTIONS constant. Co-Authored-By: Claude Sonnet 4.6 --- rock/sdk/model/server/api/proxy.py | 12 +- tests/unit/sdk/model/test_proxy_e2e.py | 222 +++++++++++++++++++++++++ 2 files changed, 227 insertions(+), 7 deletions(-) create mode 100644 tests/unit/sdk/model/test_proxy_e2e.py diff --git a/rock/sdk/model/server/api/proxy.py b/rock/sdk/model/server/api/proxy.py index 55eec129cf..5fa750e5d2 100644 --- a/rock/sdk/model/server/api/proxy.py +++ b/rock/sdk/model/server/api/proxy.py @@ -30,8 +30,6 @@ import httpx from fastapi import APIRouter, HTTPException, Request from fastapi.responses import JSONResponse, Response, StreamingResponse -from openai.lib.streaming.chat import ChatCompletionStreamState -from openai.types.chat import ChatCompletionChunk from rock.logger import init_logger from rock.sdk.model.server.config import ModelServiceConfig @@ -60,11 +58,6 @@ _RETRY_MAX_ATTEMPTS = 6 _RETRY_DELAY_SECONDS = 2.0 _RETRY_BACKOFF = 2.0 -_RETRY_EXCEPTIONS: tuple[type[Exception], ...] = ( - httpx.TimeoutException, - httpx.ConnectError, - httpx.HTTPStatusError, -) async def _send_with_retry( @@ -172,6 +165,11 @@ async def _forward_stream_and_record( Retry on connection errors and whitelisted statuses happens BEFORE any byte is yielded; mid-stream connection drops are not retried (would corrupt the client transmission).""" + # openai SDK is used purely as a stream-aggregation parser — keep the import + # local so module load doesn't pull it in for callers that never stream. + from openai.lib.streaming.chat import ChatCompletionStreamState + from openai.types.chat import ChatCompletionChunk + state = ChatCompletionStreamState() start = time.time() parse_buffer = b"" diff --git a/tests/unit/sdk/model/test_proxy_e2e.py b/tests/unit/sdk/model/test_proxy_e2e.py new file mode 100644 index 0000000000..c028832b8c --- /dev/null +++ b/tests/unit/sdk/model/test_proxy_e2e.py @@ -0,0 +1,222 @@ +"""End-to-end: real in-process TCP mock upstream + real proxy router + recorder. + +The mock upstream is a tiny FastAPI app served by uvicorn in a background thread +(real TCP). The proxy stays in-process and is hit via FastAPI's ``TestClient``; +its outbound ``httpx.AsyncClient`` makes a real TCP call to the mock — production +code path, no transport injection, no patching. +""" + +from __future__ import annotations + +import asyncio +import json +import threading +import time +from collections.abc import Iterator +from pathlib import Path + +import pytest +import uvicorn +from fastapi import FastAPI, Request +from fastapi.responses import JSONResponse, StreamingResponse +from fastapi.testclient import TestClient + +from rock.sdk.model.server.api.proxy import ForwardBackend, ReplayBackend, proxy_router +from rock.sdk.model.server.config import ModelServiceConfig +from rock.sdk.model.server.integrations.traj_recorder import TrajectoryRecorder +from rock.sdk.model.server.integrations.traj_replayer import SequentialCursor +from rock.utils.system import find_free_port + +# ---------- Mock upstream: a tiny FastAPI app behind a real TCP uvicorn ---------- + + +def _build_mock_upstream() -> FastAPI: + """One stream + one non-stream reply, with a vendor field to verify byte-passthrough.""" + app = FastAPI() + + def completion(model: str) -> dict: + return { + "id": "chatcmpl-mock-1", + "object": "chat.completion", + "created": 0, + "model": model, + "choices": [ + { + "index": 0, + "message": { + "role": "assistant", + "content": "Hello, ROCK!", + "reasoning_content": "thinking step-by-step", + }, + "finish_reason": "stop", + } + ], + "usage": {"prompt_tokens": 5, "completion_tokens": 4, "total_tokens": 9}, + } + + async def stream_gen(model: str): + base = {"id": "chatcmpl-mock-1", "object": "chat.completion.chunk", "created": 0, "model": model} + for piece in ["Hello", ", ", "ROCK", "!"]: + chunk = { + **base, + "choices": [{"index": 0, "delta": {"role": "assistant", "content": piece}, "finish_reason": None}], + } + yield f"data: {json.dumps(chunk, ensure_ascii=False)}\n\n".encode() + await asyncio.sleep(0.005) + final = {**base, "choices": [{"index": 0, "delta": {}, "finish_reason": "stop"}]} + yield f"data: {json.dumps(final)}\n\n".encode() + yield b"data: [DONE]\n\n" + + @app.post("/v1/chat/completions") + async def chat_completions(request: Request): + body = await request.json() + model = body.get("model", "mock") + if body.get("stream"): + return StreamingResponse(stream_gen(model), media_type="text/event-stream") + return JSONResponse(status_code=200, content=completion(model)) + + return app + + +class MockUpstreamServer: + """Runs ``_build_mock_upstream()`` behind a real TCP uvicorn in a background thread. + + Use as ``with MockUpstreamServer() as base_url: ...``. ``Server.run()`` + spins up its own asyncio loop inside the thread; we poll ``server.started`` + to know when it's accepting connections. + """ + + def __init__(self) -> None: + port = asyncio.run(find_free_port()) + config = uvicorn.Config( + _build_mock_upstream(), host="127.0.0.1", port=port, log_level="warning", access_log=False + ) + self._server = uvicorn.Server(config) + self._thread = threading.Thread(target=self._server.run, daemon=True) + self.base_url = f"http://127.0.0.1:{port}/v1" + + def __enter__(self) -> str: + self._thread.start() + deadline = time.time() + 5.0 + while not self._server.started: + if time.time() > deadline: + raise RuntimeError("mock upstream did not start within 5s") + time.sleep(0.02) + return self.base_url + + def __exit__(self, *_exc) -> None: + self._server.should_exit = True + self._thread.join(timeout=5) + + +@pytest.fixture +def mock_upstream() -> Iterator[str]: + with MockUpstreamServer() as base_url: + yield base_url + + +# ---------- Proxy app builder + tests ---------- + + +def _build_proxy_app(*, mock_url: str, traj_file: Path | None = None, replay_cursor=None) -> FastAPI: + config = ModelServiceConfig() + config.proxy_base_url = mock_url + + app = FastAPI() + app.state.model_service_config = config + if replay_cursor is not None: + app.state.backend = ReplayBackend(replay_cursor) + else: + recorder = TrajectoryRecorder(traj_file=traj_file) if traj_file is not None else None + app.state.backend = ForwardBackend(config, recorder=recorder) + app.include_router(proxy_router) + return app + + +def test_e2e_non_stream_forwards_and_records(mock_upstream, tmp_path): + """Vendor field reaches the client; recorder writes a JSONL line with the full response.""" + traj_file = tmp_path / "traj.jsonl" + proxy_app = _build_proxy_app(mock_url=mock_upstream, traj_file=traj_file) + + with TestClient(proxy_app) as client: + r = client.post( + "/v1/chat/completions", + json={"model": "mock-model", "messages": [{"role": "user", "content": "hi"}]}, + headers={"Authorization": "Bearer test-key"}, + ) + + assert r.status_code == 200 + msg = r.json()["choices"][0]["message"] + assert msg["content"] == "Hello, ROCK!" + assert msg["reasoning_content"] == "thinking step-by-step" + + rec = json.loads(traj_file.read_text(encoding="utf-8").strip()) + assert rec["status"] == "success" + assert rec["stream"] is False + assert rec["response"]["choices"][0]["message"]["reasoning_content"] == "thinking step-by-step" + + +def test_e2e_stream_forwards_chunks_and_records_aggregated(mock_upstream, tmp_path): + """Each upstream SSE chunk reaches the client; recorder gets the aggregated final completion.""" + traj_file = tmp_path / "traj.jsonl" + proxy_app = _build_proxy_app(mock_url=mock_upstream, traj_file=traj_file) + + with TestClient(proxy_app) as client: + with client.stream( + "POST", + "/v1/chat/completions", + json={"model": "mock-model", "stream": True, "messages": [{"role": "user", "content": "hi"}]}, + headers={"Authorization": "Bearer test-key"}, + ) as r: + body = b"".join(r.iter_bytes()).decode("utf-8") + + for piece in ["Hello", "ROCK"]: + assert f'"content": "{piece}"' in body + assert '"finish_reason": "stop"' in body + assert body.rstrip().endswith("data: [DONE]") + + rec = json.loads(traj_file.read_text(encoding="utf-8").strip()) + assert rec["status"] == "success" + assert rec["stream"] is True + assert rec["response"]["choices"][0]["message"]["content"] == "Hello, ROCK!" + + +def test_e2e_record_then_replay_returns_same_content(mock_upstream, tmp_path): + """Record one non-stream + one stream call, then replay them without touching the upstream.""" + traj_file = tmp_path / "traj.jsonl" + + # ---- record phase ---- + proxy_record = _build_proxy_app(mock_url=mock_upstream, traj_file=traj_file) + with TestClient(proxy_record) as client: + r1 = client.post( + "/v1/chat/completions", + json={"model": "mock-model", "messages": [{"role": "user", "content": "hi"}]}, + ) + with client.stream( + "POST", + "/v1/chat/completions", + json={"model": "mock-model", "stream": True, "messages": [{"role": "user", "content": "hi"}]}, + ) as st: + for _ in st.iter_bytes(): + pass + recorded = r1.json() + + # ---- replay phase: bogus base_url proves the upstream isn't called ---- + cursor = SequentialCursor.load(traj_file) + proxy_replay = _build_proxy_app(mock_url="http://invalid.local:1/v1", replay_cursor=cursor) + with TestClient(proxy_replay) as client: + ns2 = client.post( + "/v1/chat/completions", + json={"model": "mock-model", "messages": [{"role": "user", "content": "different"}]}, + ) + with client.stream( + "POST", + "/v1/chat/completions", + json={"model": "mock-model", "stream": True, "messages": [{"role": "user", "content": "different"}]}, + ) as st: + st_body = b"".join(st.iter_bytes()).decode("utf-8") + + assert ns2.status_code == 200 + assert ns2.json()["choices"][0]["message"]["content"] == recorded["choices"][0]["message"]["content"] + assert "Hello, ROCK!" in st_body + assert st_body.rstrip().endswith("data: [DONE]") From 6a36702e5ecc0f04cc3781acd6f810ce679ab23b Mon Sep 17 00:00:00 2001 From: "pengshixin.psx" Date: Tue, 12 May 2026 08:07:44 +0000 Subject: [PATCH 11/25] fix(model-service): inject positional index into replay-stream tool_calls deltas A recorded non-stream message.tool_calls carries no 'index' field, but the OpenAI streaming spec requires it on chunk deltas. Without it, downstream clients using the openai SDK reject the replay-stream chunk with a pydantic ValidationError ('Field required: index'). completion_to_chunk_dict now injects a positional index when missing (existing 'index' is preserved). Co-Authored-By: Claude Opus 4.7 --- rock/sdk/model/server/sse_utils.py | 7 ++++ tests/unit/sdk/model/test_sse_utils.py | 55 +++++++++++++++++++++++++- 2 files changed, 60 insertions(+), 2 deletions(-) diff --git a/rock/sdk/model/server/sse_utils.py b/rock/sdk/model/server/sse_utils.py index 1cbe27298f..f1cca034e6 100644 --- a/rock/sdk/model/server/sse_utils.py +++ b/rock/sdk/model/server/sse_utils.py @@ -65,11 +65,18 @@ def completion_to_chunk_dict(response: dict, *, model: str) -> dict: Only ``message`` → ``delta`` is renamed; every other field (including provider-specific extras like ``reasoning_content`` inside the message) flows through unchanged. ``id`` / ``created`` are synthesized when missing. + + ``tool_calls`` items get a positional ``index`` injected if missing — the + OpenAI streaming spec requires it on chunk deltas (a recorded non-stream + ``message.tool_calls`` carries no ``index``, but downstream stream parsers + e.g. the openai SDK will reject the chunk without one). """ choices_in = response.get("choices") or [] choices_out = [] for choice in choices_in: delta = dict(choice.get("message") or {}) + if "tool_calls" in delta and delta["tool_calls"]: + delta["tool_calls"] = [{"index": tc.get("index", i), **tc} for i, tc in enumerate(delta["tool_calls"])] choices_out.append( { "index": choice.get("index", 0), diff --git a/tests/unit/sdk/model/test_sse_utils.py b/tests/unit/sdk/model/test_sse_utils.py index 6c9318a510..f660d2751b 100644 --- a/tests/unit/sdk/model/test_sse_utils.py +++ b/tests/unit/sdk/model/test_sse_utils.py @@ -101,7 +101,8 @@ def test_completion_to_chunk_renames_message_to_delta(): def test_completion_to_chunk_preserves_provider_specific_message_fields(): - """reasoning_content / tool_calls / etc inside message are kept verbatim in delta.""" + """reasoning_content kept verbatim; tool_calls get a positional index injected + (required by the OpenAI streaming spec — see test below).""" response = { "choices": [ { @@ -119,10 +120,60 @@ def test_completion_to_chunk_preserves_provider_specific_message_fields(): chunk = completion_to_chunk_dict(response, model="glm-5") assert chunk["choices"][0]["delta"]["reasoning_content"] == "step-by-step thinking" - assert chunk["choices"][0]["delta"]["tool_calls"] == [{"id": "t1", "type": "function"}] + assert chunk["choices"][0]["delta"]["tool_calls"] == [{"index": 0, "id": "t1", "type": "function"}] assert chunk["choices"][0]["finish_reason"] == "tool_calls" +def test_completion_to_chunk_injects_tool_call_index_for_openai_sdk_compat(): + """A recorded non-stream message has tool_calls without 'index'; the OpenAI + streaming spec requires it on chunk deltas, and the openai SDK's + ChatCompletionChunk.model_validate() rejects the chunk otherwise. We inject + a positional index so replay-stream output is parseable by strict clients.""" + response = { + "choices": [ + { + "index": 0, + "message": { + "role": "assistant", + "tool_calls": [ + {"id": "a", "type": "function", "function": {"name": "f1", "arguments": "{}"}}, + {"id": "b", "type": "function", "function": {"name": "f2", "arguments": "{}"}}, + ], + }, + "finish_reason": "tool_calls", + } + ], + } + chunk = completion_to_chunk_dict(response, model="m") + tcs = chunk["choices"][0]["delta"]["tool_calls"] + assert [tc["index"] for tc in tcs] == [0, 1] + + # End-to-end: openai SDK accepts the chunk + from openai.types.chat import ChatCompletionChunk + + ChatCompletionChunk.model_validate(chunk) # must not raise + + +def test_completion_to_chunk_preserves_explicit_tool_call_index(): + """If the recorded tool_calls already have 'index', we don't overwrite it.""" + response = { + "choices": [ + { + "index": 0, + "message": { + "role": "assistant", + "tool_calls": [ + {"index": 5, "id": "a", "type": "function", "function": {"name": "f", "arguments": "{}"}}, + ], + }, + "finish_reason": "tool_calls", + } + ], + } + chunk = completion_to_chunk_dict(response, model="m") + assert chunk["choices"][0]["delta"]["tool_calls"][0]["index"] == 5 + + def test_completion_to_chunk_synthesizes_id_and_created_when_missing(): chunk = completion_to_chunk_dict( {"choices": [{"index": 0, "message": {"role": "assistant"}, "finish_reason": "stop"}]}, From 79c868e37519249b054728ae114d7db79ecc9153 Mon Sep 17 00:00:00 2001 From: "pengshixin.psx" Date: Tue, 12 May 2026 08:10:16 +0000 Subject: [PATCH 12/25] test(model-service): refactor proxy e2e into MockUpstream + TestProxyRecordReplay MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Rename test_proxy_e2e.py → test_proxy_record_replay.py to make the file purpose explicit (the suite revolves around the record→replay capability). Refactor the test surface: - MockUpstream class encapsulates the FastAPI app, server lifecycle, the canonical reply, and an assert_message() helper. Test data and the handler stay in sync because they share the same constants. - TestProxyRecordReplay class groups the three tests with shorter names: test_forward_non_stream test_forward_stream test_replay (parametrized over record × replay stream/non-stream) - _call_chat_completions helper unifies stream/non-stream call sites. Expand coverage to 2 parallel tool_calls (get_weather + get_time) — exercises the openai SDK aggregator's multi-index tool_call assembly. Co-Authored-By: Claude Opus 4.7 --- tests/unit/sdk/model/test_proxy_e2e.py | 222 ------------ .../sdk/model/test_proxy_record_replay.py | 333 ++++++++++++++++++ 2 files changed, 333 insertions(+), 222 deletions(-) delete mode 100644 tests/unit/sdk/model/test_proxy_e2e.py create mode 100644 tests/unit/sdk/model/test_proxy_record_replay.py diff --git a/tests/unit/sdk/model/test_proxy_e2e.py b/tests/unit/sdk/model/test_proxy_e2e.py deleted file mode 100644 index c028832b8c..0000000000 --- a/tests/unit/sdk/model/test_proxy_e2e.py +++ /dev/null @@ -1,222 +0,0 @@ -"""End-to-end: real in-process TCP mock upstream + real proxy router + recorder. - -The mock upstream is a tiny FastAPI app served by uvicorn in a background thread -(real TCP). The proxy stays in-process and is hit via FastAPI's ``TestClient``; -its outbound ``httpx.AsyncClient`` makes a real TCP call to the mock — production -code path, no transport injection, no patching. -""" - -from __future__ import annotations - -import asyncio -import json -import threading -import time -from collections.abc import Iterator -from pathlib import Path - -import pytest -import uvicorn -from fastapi import FastAPI, Request -from fastapi.responses import JSONResponse, StreamingResponse -from fastapi.testclient import TestClient - -from rock.sdk.model.server.api.proxy import ForwardBackend, ReplayBackend, proxy_router -from rock.sdk.model.server.config import ModelServiceConfig -from rock.sdk.model.server.integrations.traj_recorder import TrajectoryRecorder -from rock.sdk.model.server.integrations.traj_replayer import SequentialCursor -from rock.utils.system import find_free_port - -# ---------- Mock upstream: a tiny FastAPI app behind a real TCP uvicorn ---------- - - -def _build_mock_upstream() -> FastAPI: - """One stream + one non-stream reply, with a vendor field to verify byte-passthrough.""" - app = FastAPI() - - def completion(model: str) -> dict: - return { - "id": "chatcmpl-mock-1", - "object": "chat.completion", - "created": 0, - "model": model, - "choices": [ - { - "index": 0, - "message": { - "role": "assistant", - "content": "Hello, ROCK!", - "reasoning_content": "thinking step-by-step", - }, - "finish_reason": "stop", - } - ], - "usage": {"prompt_tokens": 5, "completion_tokens": 4, "total_tokens": 9}, - } - - async def stream_gen(model: str): - base = {"id": "chatcmpl-mock-1", "object": "chat.completion.chunk", "created": 0, "model": model} - for piece in ["Hello", ", ", "ROCK", "!"]: - chunk = { - **base, - "choices": [{"index": 0, "delta": {"role": "assistant", "content": piece}, "finish_reason": None}], - } - yield f"data: {json.dumps(chunk, ensure_ascii=False)}\n\n".encode() - await asyncio.sleep(0.005) - final = {**base, "choices": [{"index": 0, "delta": {}, "finish_reason": "stop"}]} - yield f"data: {json.dumps(final)}\n\n".encode() - yield b"data: [DONE]\n\n" - - @app.post("/v1/chat/completions") - async def chat_completions(request: Request): - body = await request.json() - model = body.get("model", "mock") - if body.get("stream"): - return StreamingResponse(stream_gen(model), media_type="text/event-stream") - return JSONResponse(status_code=200, content=completion(model)) - - return app - - -class MockUpstreamServer: - """Runs ``_build_mock_upstream()`` behind a real TCP uvicorn in a background thread. - - Use as ``with MockUpstreamServer() as base_url: ...``. ``Server.run()`` - spins up its own asyncio loop inside the thread; we poll ``server.started`` - to know when it's accepting connections. - """ - - def __init__(self) -> None: - port = asyncio.run(find_free_port()) - config = uvicorn.Config( - _build_mock_upstream(), host="127.0.0.1", port=port, log_level="warning", access_log=False - ) - self._server = uvicorn.Server(config) - self._thread = threading.Thread(target=self._server.run, daemon=True) - self.base_url = f"http://127.0.0.1:{port}/v1" - - def __enter__(self) -> str: - self._thread.start() - deadline = time.time() + 5.0 - while not self._server.started: - if time.time() > deadline: - raise RuntimeError("mock upstream did not start within 5s") - time.sleep(0.02) - return self.base_url - - def __exit__(self, *_exc) -> None: - self._server.should_exit = True - self._thread.join(timeout=5) - - -@pytest.fixture -def mock_upstream() -> Iterator[str]: - with MockUpstreamServer() as base_url: - yield base_url - - -# ---------- Proxy app builder + tests ---------- - - -def _build_proxy_app(*, mock_url: str, traj_file: Path | None = None, replay_cursor=None) -> FastAPI: - config = ModelServiceConfig() - config.proxy_base_url = mock_url - - app = FastAPI() - app.state.model_service_config = config - if replay_cursor is not None: - app.state.backend = ReplayBackend(replay_cursor) - else: - recorder = TrajectoryRecorder(traj_file=traj_file) if traj_file is not None else None - app.state.backend = ForwardBackend(config, recorder=recorder) - app.include_router(proxy_router) - return app - - -def test_e2e_non_stream_forwards_and_records(mock_upstream, tmp_path): - """Vendor field reaches the client; recorder writes a JSONL line with the full response.""" - traj_file = tmp_path / "traj.jsonl" - proxy_app = _build_proxy_app(mock_url=mock_upstream, traj_file=traj_file) - - with TestClient(proxy_app) as client: - r = client.post( - "/v1/chat/completions", - json={"model": "mock-model", "messages": [{"role": "user", "content": "hi"}]}, - headers={"Authorization": "Bearer test-key"}, - ) - - assert r.status_code == 200 - msg = r.json()["choices"][0]["message"] - assert msg["content"] == "Hello, ROCK!" - assert msg["reasoning_content"] == "thinking step-by-step" - - rec = json.loads(traj_file.read_text(encoding="utf-8").strip()) - assert rec["status"] == "success" - assert rec["stream"] is False - assert rec["response"]["choices"][0]["message"]["reasoning_content"] == "thinking step-by-step" - - -def test_e2e_stream_forwards_chunks_and_records_aggregated(mock_upstream, tmp_path): - """Each upstream SSE chunk reaches the client; recorder gets the aggregated final completion.""" - traj_file = tmp_path / "traj.jsonl" - proxy_app = _build_proxy_app(mock_url=mock_upstream, traj_file=traj_file) - - with TestClient(proxy_app) as client: - with client.stream( - "POST", - "/v1/chat/completions", - json={"model": "mock-model", "stream": True, "messages": [{"role": "user", "content": "hi"}]}, - headers={"Authorization": "Bearer test-key"}, - ) as r: - body = b"".join(r.iter_bytes()).decode("utf-8") - - for piece in ["Hello", "ROCK"]: - assert f'"content": "{piece}"' in body - assert '"finish_reason": "stop"' in body - assert body.rstrip().endswith("data: [DONE]") - - rec = json.loads(traj_file.read_text(encoding="utf-8").strip()) - assert rec["status"] == "success" - assert rec["stream"] is True - assert rec["response"]["choices"][0]["message"]["content"] == "Hello, ROCK!" - - -def test_e2e_record_then_replay_returns_same_content(mock_upstream, tmp_path): - """Record one non-stream + one stream call, then replay them without touching the upstream.""" - traj_file = tmp_path / "traj.jsonl" - - # ---- record phase ---- - proxy_record = _build_proxy_app(mock_url=mock_upstream, traj_file=traj_file) - with TestClient(proxy_record) as client: - r1 = client.post( - "/v1/chat/completions", - json={"model": "mock-model", "messages": [{"role": "user", "content": "hi"}]}, - ) - with client.stream( - "POST", - "/v1/chat/completions", - json={"model": "mock-model", "stream": True, "messages": [{"role": "user", "content": "hi"}]}, - ) as st: - for _ in st.iter_bytes(): - pass - recorded = r1.json() - - # ---- replay phase: bogus base_url proves the upstream isn't called ---- - cursor = SequentialCursor.load(traj_file) - proxy_replay = _build_proxy_app(mock_url="http://invalid.local:1/v1", replay_cursor=cursor) - with TestClient(proxy_replay) as client: - ns2 = client.post( - "/v1/chat/completions", - json={"model": "mock-model", "messages": [{"role": "user", "content": "different"}]}, - ) - with client.stream( - "POST", - "/v1/chat/completions", - json={"model": "mock-model", "stream": True, "messages": [{"role": "user", "content": "different"}]}, - ) as st: - st_body = b"".join(st.iter_bytes()).decode("utf-8") - - assert ns2.status_code == 200 - assert ns2.json()["choices"][0]["message"]["content"] == recorded["choices"][0]["message"]["content"] - assert "Hello, ROCK!" in st_body - assert st_body.rstrip().endswith("data: [DONE]") diff --git a/tests/unit/sdk/model/test_proxy_record_replay.py b/tests/unit/sdk/model/test_proxy_record_replay.py new file mode 100644 index 0000000000..1bb9832784 --- /dev/null +++ b/tests/unit/sdk/model/test_proxy_record_replay.py @@ -0,0 +1,333 @@ +"""End-to-end: real in-process TCP mock upstream + real proxy router + recorder. + +The mock upstream is a tiny FastAPI app served by uvicorn in a background thread +(real TCP). The proxy stays in-process and is hit via FastAPI's ``TestClient``; +its outbound ``httpx.AsyncClient`` makes a real TCP call to the mock — production +code path, no transport injection, no patching. +""" + +from __future__ import annotations + +import asyncio +import json +import threading +import time +from collections.abc import Iterator +from pathlib import Path + +import pytest +import uvicorn +from fastapi import FastAPI, Request +from fastapi.responses import JSONResponse, StreamingResponse +from fastapi.testclient import TestClient + +from rock.sdk.model.server.api.proxy import ForwardBackend, ReplayBackend, proxy_router +from rock.sdk.model.server.config import ModelServiceConfig +from rock.sdk.model.server.integrations.traj_recorder import TrajectoryRecorder +from rock.sdk.model.server.integrations.traj_replayer import SequentialCursor +from rock.sdk.model.server.sse_utils import parse_sse_data_chunks +from rock.utils.system import find_free_port + +# --------------------------------------------------------------------------- +# Mock upstream — a tiny FastAPI app behind a real TCP uvicorn in a thread. +# Owns both the canned reply AND the assertion helper, so the response shape +# and the expectations stay in sync if either is edited. +# --------------------------------------------------------------------------- + + +class MockUpstream: + """Mock OpenAI-compatible upstream. + + Single canonical reply (returned for both stream and non-stream requests) + contains three fields the proxy must preserve end-to-end: + - ``content`` (plain text) + - ``reasoning_content`` (vendor-specific thinking) + - ``tool_calls`` (a function call) + The streaming variant splits each field into multiple deltas so the + recorder also exercises the openai SDK's stream-state aggregator. + + Use as ``with MockUpstream() as mock: ...``; ``mock.base_url`` points at + the running server. ``mock.assert_message(msg)`` checks any received + assistant message matches the canonical reply. + """ + + # Canonical reply values — change here, both the handler and the assertion + # helper pick them up automatically. Two parallel tool_calls cover the + # multi-tool-call case (modern LLMs commonly emit several at once). + EXPECTED_CONTENT = "Checking weather and time for you." + EXPECTED_REASONING = "User wants weather + time; calling both tools in parallel." + EXPECTED_TOOL_CALLS = [ + { + "id": "call_weather", + "type": "function", + "function": {"name": "get_weather", "arguments": '{"city":"Tokyo","unit":"celsius"}'}, + }, + { + "id": "call_time", + "type": "function", + "function": {"name": "get_time", "arguments": '{"city":"Tokyo"}'}, + }, + ] + + def __init__(self) -> None: + port = asyncio.run(find_free_port()) + config = uvicorn.Config(self._build_app(), host="127.0.0.1", port=port, log_level="warning", access_log=False) + self._server = uvicorn.Server(config) + self._thread = threading.Thread(target=self._server.run, daemon=True) + self.base_url = f"http://127.0.0.1:{port}/v1" + + # ---- lifecycle ---- + + def __enter__(self) -> MockUpstream: + self._thread.start() + deadline = time.time() + 5.0 + while not self._server.started: + if time.time() > deadline: + raise RuntimeError("mock upstream did not start within 5s") + time.sleep(0.02) + return self + + def __exit__(self, *_exc) -> None: + self._server.should_exit = True + self._thread.join(timeout=5) + + # ---- assertion helper ---- + + def assert_message(self, msg: dict) -> None: + """Assert ``msg`` is the canonical full message (content + reasoning + 2 parallel tool_calls).""" + assert msg["content"] == self.EXPECTED_CONTENT + assert msg["reasoning_content"] == self.EXPECTED_REASONING + tcs = msg["tool_calls"] + assert len(tcs) == len(self.EXPECTED_TOOL_CALLS) + for actual, expected in zip(tcs, self.EXPECTED_TOOL_CALLS, strict=True): + assert actual["id"] == expected["id"] + assert actual["type"] == expected["type"] + assert actual["function"]["name"] == expected["function"]["name"] + assert json.loads(actual["function"]["arguments"]) == json.loads(expected["function"]["arguments"]) + + # ---- internal: FastAPI app + handlers ---- + + def _build_app(self) -> FastAPI: + app = FastAPI() + + @app.post("/v1/chat/completions") + async def chat_completions(request: Request): + body = await request.json() + model = body.get("model", "mock") + if body.get("stream"): + return StreamingResponse(self._stream_gen(model), media_type="text/event-stream") + return JSONResponse(status_code=200, content=self._completion_json(model)) + + return app + + def _completion_json(self, model: str) -> dict: + return { + "id": "chatcmpl-mock-1", + "object": "chat.completion", + "created": 0, + "model": model, + "choices": [ + { + "index": 0, + "message": { + "role": "assistant", + "content": self.EXPECTED_CONTENT, + "reasoning_content": self.EXPECTED_REASONING, + "tool_calls": self.EXPECTED_TOOL_CALLS, + }, + "finish_reason": "tool_calls", + } + ], + "usage": {"prompt_tokens": 12, "completion_tokens": 24, "total_tokens": 36}, + } + + async def _stream_gen(self, model: str): + base = {"id": "chatcmpl-mock-1", "object": "chat.completion.chunk", "created": 0, "model": model} + + def emit(delta: dict, finish_reason=None) -> bytes: + payload = {**base, "choices": [{"index": 0, "delta": delta, "finish_reason": finish_reason}]} + return f"data: {json.dumps(payload, ensure_ascii=False)}\n\n".encode() + + # 1-2. Reasoning split in two deltas + yield emit({"role": "assistant", "reasoning_content": "User wants weather + time; "}) + await asyncio.sleep(0.005) + yield emit({"reasoning_content": "calling both tools in parallel."}) + await asyncio.sleep(0.005) + # 3-4. Content split in two deltas + yield emit({"content": "Checking weather"}) + await asyncio.sleep(0.005) + yield emit({"content": " and time for you."}) + await asyncio.sleep(0.005) + + # 5-7. tool_call[0] (get_weather): announce, then arguments in two pieces + yield emit( + { + "tool_calls": [ + { + "index": 0, + "id": "call_weather", + "type": "function", + "function": {"name": "get_weather", "arguments": ""}, + } + ] + } + ) + await asyncio.sleep(0.005) + yield emit({"tool_calls": [{"index": 0, "function": {"arguments": '{"city":"Tokyo",'}}]}) + await asyncio.sleep(0.005) + yield emit({"tool_calls": [{"index": 0, "function": {"arguments": '"unit":"celsius"}'}}]}) + await asyncio.sleep(0.005) + + # 8-9. tool_call[1] (get_time): announce + arguments in one piece + yield emit( + { + "tool_calls": [ + { + "index": 1, + "id": "call_time", + "type": "function", + "function": {"name": "get_time", "arguments": ""}, + } + ] + } + ) + await asyncio.sleep(0.005) + yield emit({"tool_calls": [{"index": 1, "function": {"arguments": '{"city":"Tokyo"}'}}]}) + await asyncio.sleep(0.005) + + # 10. Finish + yield emit({}, finish_reason="tool_calls") + yield b"data: [DONE]\n\n" + + +@pytest.fixture +def mock_upstream() -> Iterator[MockUpstream]: + with MockUpstream() as m: + yield m + + +# --------------------------------------------------------------------------- +# Proxy app builder + request helper (module-level, generic) +# --------------------------------------------------------------------------- + + +def _build_proxy_app(*, mock_url: str | None = None, traj_file: Path | None = None, replay_cursor=None) -> FastAPI: + config = ModelServiceConfig() + # ReplayBackend never calls upstream, so mock_url is only relevant for forward mode. + if mock_url is not None: + config.proxy_base_url = mock_url + + app = FastAPI() + app.state.model_service_config = config + if replay_cursor is not None: + app.state.backend = ReplayBackend(replay_cursor) + else: + recorder = TrajectoryRecorder(traj_file=traj_file) if traj_file is not None else None + app.state.backend = ForwardBackend(config, recorder=recorder) + app.include_router(proxy_router) + return app + + +def _call_chat_completions(client: TestClient, *, stream: bool) -> dict: + """One chat.completions call. Returns the assistant message dict. + + - non-stream: just unwraps ``choices[0].message``. + - stream: replay always emits exactly one chunk + ``[DONE]`` (see + ``completion_to_chunk_dict``), so the chunk's ``delta`` IS the full + message — no aggregation needed. + """ + payload = {"model": "mock-model", "messages": [{"role": "user", "content": "hi"}]} + if not stream: + r = client.post("/v1/chat/completions", json=payload) + assert r.status_code == 200 + return r.json()["choices"][0]["message"] + + with client.stream("POST", "/v1/chat/completions", json={**payload, "stream": True}) as r: + assert r.status_code == 200 + body_bytes = b"".join(r.iter_bytes()) + chunks, _ = parse_sse_data_chunks(body_bytes) + return chunks[0]["choices"][0]["delta"] + + +# --------------------------------------------------------------------------- +# Tests +# --------------------------------------------------------------------------- + + +class TestProxyRecordReplay: + """End-to-end: real TCP mock upstream <-> real proxy router + recorder/replayer.""" + + def test_forward_non_stream(self, mock_upstream: MockUpstream, tmp_path): + """Vendor field reaches the client; recorder writes a JSONL line with the full response.""" + traj_file = tmp_path / "traj.jsonl" + proxy_app = _build_proxy_app(mock_url=mock_upstream.base_url, traj_file=traj_file) + + with TestClient(proxy_app) as client: + r = client.post( + "/v1/chat/completions", + json={"model": "mock-model", "messages": [{"role": "user", "content": "hi"}]}, + headers={"Authorization": "Bearer test-key"}, + ) + + assert r.status_code == 200 + body = r.json() + assert body["choices"][0]["finish_reason"] == "tool_calls" + mock_upstream.assert_message(body["choices"][0]["message"]) + + rec = json.loads(traj_file.read_text(encoding="utf-8").strip()) + assert rec["status"] == "success" + assert rec["stream"] is False + assert rec["response"]["choices"][0]["finish_reason"] == "tool_calls" + mock_upstream.assert_message(rec["response"]["choices"][0]["message"]) + + def test_forward_stream(self, mock_upstream: MockUpstream, tmp_path): + """Each upstream SSE chunk reaches the client; recorder gets the aggregated final completion + with reasoning_content concatenated and tool_calls.arguments assembled from deltas.""" + traj_file = tmp_path / "traj.jsonl" + proxy_app = _build_proxy_app(mock_url=mock_upstream.base_url, traj_file=traj_file) + + with TestClient(proxy_app) as client: + with client.stream( + "POST", + "/v1/chat/completions", + json={"model": "mock-model", "stream": True, "messages": [{"role": "user", "content": "hi"}]}, + headers={"Authorization": "Bearer test-key"}, + ) as r: + body = b"".join(r.iter_bytes()).decode("utf-8") + + # Raw chunks make it to the client untouched + assert '"reasoning_content": "User wants weather + time; "' in body + assert '"reasoning_content": "calling both tools in parallel."' in body + assert '"content": "Checking weather"' in body + assert '"content": " and time for you."' in body + assert '"name": "get_weather"' in body + assert '"name": "get_time"' in body + assert '"finish_reason": "tool_calls"' in body + assert body.rstrip().endswith("data: [DONE]") + + # Recorder's aggregated message matches the canonical reply + rec = json.loads(traj_file.read_text(encoding="utf-8").strip()) + assert rec["status"] == "success" + assert rec["stream"] is True + assert rec["response"]["choices"][0]["finish_reason"] == "tool_calls" + mock_upstream.assert_message(rec["response"]["choices"][0]["message"]) + + @pytest.mark.parametrize("replay_stream", [False, True], ids=["replay_nonstream", "replay_stream"]) + @pytest.mark.parametrize("record_stream", [False, True], ids=["record_nonstream", "record_stream"]) + def test_replay(self, mock_upstream: MockUpstream, tmp_path, record_stream: bool, replay_stream: bool): + """Recorded mode and replayed mode are orthogonal — all 4 combinations of + (stream/non-stream) on each side must yield the same full message.""" + traj_file = tmp_path / "traj.jsonl" + + # ---- record phase ---- + proxy_record = _build_proxy_app(mock_url=mock_upstream.base_url, traj_file=traj_file) + with TestClient(proxy_record) as client: + _call_chat_completions(client, stream=record_stream) + + # ---- replay phase: no upstream URL needed — ReplayBackend never calls upstream ---- + cursor = SequentialCursor.load(traj_file) + proxy_replay = _build_proxy_app(replay_cursor=cursor) + with TestClient(proxy_replay) as client: + msg = _call_chat_completions(client, stream=replay_stream) + + mock_upstream.assert_message(msg) From c7bbd5a458adae985792d03ba5935e33a454218d Mon Sep 17 00:00:00 2001 From: "pengshixin.psx" Date: Tue, 12 May 2026 08:12:57 +0000 Subject: [PATCH 13/25] chore: remove useless comment in pyproject.toml --- pyproject.toml | 5 ----- 1 file changed, 5 deletions(-) diff --git a/pyproject.toml b/pyproject.toml index ac87c14c41..d7d7a591b0 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -86,11 +86,6 @@ model-service = [ "psutil", "swebench", "alibabacloud_cr20181201==2.0.5", - # openai SDK is used as a TYPE/parser library only — for ChatCompletionChunk - # validation and ChatCompletionStreamState (the official stream chunk aggregator). - # We do NOT use AsyncOpenAI as an HTTP client; transport is plain httpx so the - # proxy can forward upstream bytes verbatim, including any provider-specific - # fields (reasoning_content, citations, ...) without re-encoding OpenAI protocol. "openai>=1.50.0", "httpx", ] From 56a644cd04a01675b09656e5b800992ed7a807f4 Mon Sep 17 00:00:00 2001 From: "pengshixin.psx" Date: Tue, 12 May 2026 08:14:33 +0000 Subject: [PATCH 14/25] chore: remove uesless dev docs --- docs/dev/litellm_proxy_refactor.md | 535 ----------------------------- 1 file changed, 535 deletions(-) delete mode 100644 docs/dev/litellm_proxy_refactor.md diff --git a/docs/dev/litellm_proxy_refactor.md b/docs/dev/litellm_proxy_refactor.md deleted file mode 100644 index be30dbb4b2..0000000000 --- a/docs/dev/litellm_proxy_refactor.md +++ /dev/null @@ -1,535 +0,0 @@ -# LiteLLM 重构 model-service proxy + 加 record/replay —— Handoff 文档 - -> 这份文档是给"接手者"(可能是另一个 Claude session 或人)看的,目的是让接手者**完全不看上一段对话**也能从我离开的地方继续往下做。文档放在 `docs/dev/litellm_proxy_refactor.md`。 - ---- - -## 0. TL;DR - -**目标**:把 `rock model-service --type proxy` 的自写 httpx forward + retry 替换为基于 `litellm` SDK 的实现;同时把 chat/completions 轨迹的"录制 + 顺序回放"作为一等公民能力做进来,服务 SWE-agent / mini-swe-agent / OpenHands 类 deterministic agent 的"无 LLM 成本"调试。 - -**当前状态**:**代码改动、单元测试、lint 全部完成通过**。下一步是集成验证(实际起 proxy + curl)和写 PR。 - -**完成清单**: -- ✅ `pyproject.toml` `model-service` extras 加 `litellm>=1.50.0` -- ✅ `ModelServiceConfig` 加 `traj_enabled / traj_file / traj_append / replay_enabled / replay_traj_path / num_retries` 6 个字段 -- ✅ 新模块 `rock/sdk/model/server/integrations/{__init__.py, traj_recorder.py, traj_replayer.py}` -- ✅ `rock/sdk/model/server/api/proxy.py` 整文件重写为 litellm SDK 调用 -- ✅ `rock/sdk/model/server/main.py` 加 `_configure_litellm_for_proxy()` + 新 CLI flags(`--num-retries / --traj-file / --no-traj / --replay-traj`) -- ✅ `rock/sdk/model/server/utils.py` 保留 `record_traj` 装饰器(给 local 模式继续用),proxy 模式不再用 -- ✅ `tests/unit/sdk/model/test_proxy.py` 改造完成(把 `patch perform_llm_request` 改为 `patch litellm.acompletion`) -- ✅ 新测试 `tests/unit/sdk/model/test_traj_recorder.py` + `test_traj_replayer.py` -- ✅ `examples/model_service/config_record.yaml` + `config_replay.yaml` -- ✅ **单元测试全部通过**(`uv run pytest tests/unit/sdk/model/` → 47 passed) -- ✅ **Lint/format 全部干净**(`ruff check` + `ruff format --check`,修了一个 `Optional[str]` → `str | None` 的 UP045) - -**未完成 / 阻塞**: -- ⏳ **集成验证**(实际起 proxy + curl + agent 端到端,见第 4.4 节) -- ⏳ **PR 描述里的 breaking change 提示**(见第 5 节) - -**原始 plan 文件**(更详细的设计推演):`/home/xinshi/.claude/plans/litellm-chat-completions-traj-replay-ser-lucky-rainbow.md`(在主 Claude 配置目录,不在 rock 仓内)。 - ---- - -## 1. 背景与目标 - -### 起因 - -用户问:"litellm 能支持把 chat/completions 接口的轨迹落盘吗,然后我想看看能否支持根据 traj 文件做一个 replay server, 比如给一些其他的 agent (swe-agent, openhands) 等用来做 traj 回放"。 - -### 需求方向的几次迭代(避免接手者重走弯路) - -1. **第一版方向**:做一个独立 Python 项目 `litellm-traj`,里面定义 `CustomLogger` 子类(record)和 `CustomLLM` 子类(replay),通过 dotted-path 注册到 litellm proxy 的 `config.yaml`。**已废弃**。 -2. **第二版方向**:在 rock 仓内把这个能力做进 `rock/sdk/model/server/api/proxy.py`(rock 已有 model-service)。但用户进一步要求:**重构掉 rock 自写的 proxy 实现,改为基于 litellm**。 -3. **最终方向(本次)**:用 **litellm SDK** 替换 `proxy.py` 内手写的 httpx forward + `retry_async`;record 接 `CustomLogger`,replay 接 `CustomLLM` provider。`rock model-service` CLI、`local` 模式、FastAPI app/health/metrics 全部保留不动 —— 只动 proxy 模式。 - -### 为什么是 litellm SDK 而不是 litellm proxy - -我们已经有 rock 自己的 FastAPI app + CLI + auth/metrics middleware,只需要一个"OpenAI 兼容上游调用 + 错误归一化 + 流式聚合 + record/replay 接入点"。**litellm SDK 是这层能力的最小外加**,不需要把 litellm proxy 整套生命周期/配置体系拽进来。litellm proxy 适合"完全没有 server 的人"用,我们已经有 server。 - -### 用户最终拍板的 4 个关键设计选择 - -| 维度 | 选择 | 理由 | -|---|---|---| -| 集成模式 | **litellm SDK** | 改动面最小,保留 rock 既有 FastAPI/CLI/metrics | -| traj schema | **`StandardLoggingPayload`(litellm 原生)** | 字段最全(messages/response/usage/timing/error_information/trace_id),与 litellm 生态互通 | -| 是否本期做 replay | **是,record + replay 一起** | 用户原始诉求就是回放;基础设施一次性铺好 | -| 流式 | **顺便解禁** | litellm 自动聚合,record/replay 走流式不增加复杂度 | - ---- - -## 2. 改动清单(按文件) - -### 2.1 `pyproject.toml` —— 修改 - -`[project.optional-dependencies]` 的 `model-service` 数组追加一项 `"litellm>=1.50.0"`。其它 extras 不动。 - -```toml -model-service = [ - "fastapi", - "uvicorn", - "psutil", - "swebench", - "alibabacloud_cr20181201==2.0.5", - "litellm>=1.50.0", # ← 这一行新加 -] -``` - -为什么是 `>=1.50.0`:这个版本之后 `StandardLoggingPayload`、`CustomLogger.async_log_success_event` 接口、`async_mock_completion_streaming_obj` 都已稳定。本仓现有 model-service 测试集没装过 litellm,所以全新引入,不存在升级冲突。 - -### 2.2 `rock/sdk/model/server/config.py` —— 修改 - -在 `ModelServiceConfig` 末尾新增 6 个字段(注意顺序、类型、默认值): - -```python -num_retries: int = Field(default=6) - -traj_enabled: bool = Field(default=True) -traj_file: str | None = Field(default=None) -traj_append: bool = Field(default=True) # 注意:旧默认是 False(覆盖),这里翻成 True - -replay_enabled: bool = Field(default=False) -replay_traj_path: str | None = Field(default=None) -``` - -每个字段的语义和取值范围都写在 docstring 里。`traj_append=True` 是这次的**默认行为变更**(旧的 `_write_traj` 默认覆盖,被认为是 bug)。`TRAJ_FILE`、`LOG_FILE`、`LOG_DIR` 模块级常量保留不动。 - -### 2.3 `rock/sdk/model/server/integrations/__init__.py` —— 新增(空文件) - -只为了让 `integrations` 成为一个包,内容为空。 - -### 2.4 `rock/sdk/model/server/integrations/traj_recorder.py` —— 新增 - -`TrajectoryRecorder(CustomLogger)`,实现两个钩子:`async_log_success_event` 和 `async_log_failure_event`。每次调用从 `kwargs["standard_logging_object"]` 取出 `StandardLoggingPayload`(dict 形态),append 一行 JSON 到 `traj_file`,同时上报 OTLP `model_service.request.{rt,count}` metrics。 - -关键设计点(展开见第 3.1 节): -- streaming 不分支(litellm 已在 callback 触发前把 chunks 聚合写入 `payload.response`) -- `asyncio.Lock` per recorder + `asyncio.to_thread` 包同步写,避免在 event loop 阻塞 -- `append=False` 模式只在**首次写**时截断(避免每次调用覆盖) -- metrics 复用 `rock.sdk.model.server.utils._get_or_create_metrics_monitor`,`MODEL_SERVICE_REQUEST_RT/COUNT` 常量 - -### 2.5 `rock/sdk/model/server/integrations/traj_replayer.py` —— 新增 - -包含两个类 + 两个 helper: - -- `SequentialCursor`:从 jsonl 文件或目录加载 records,`async next()` 返回下一条并推进游标,越界 raise `CustomLLMError(404)`。带 `asyncio.Lock` 防并发推进。`reset()` 用于回到起点。 -- `_record_to_model_response(record)` / `_extract_assistant_text(record)`:把 record 还原成 `litellm.types.utils.ModelResponse` 或抽出 assistant text(给 streaming 用)。 -- `TrajectoryReplayer(CustomLLM)`:实现 `acompletion` 和 `astreaming`。流式拆分直接调 `litellm.utils.async_mock_completion_streaming_obj`,不自己造轮子。 - -`acompletion`/`astreaming` 的签名是 `(self, model, messages, *args, **kwargs)`。litellm 调 CustomLLM 时**全部用关键字参数**(litellm/main.py:4302-4319 实测),所以 `kwargs.get("model_response")` 能可靠拿到流式拆分需要的目标对象。 - -### 2.6 `rock/sdk/model/server/utils.py` —— 修改(保留 + 注释更新) - -**关键决定**:不删 `record_traj` / `_write_traj`。原因:`local.py` 仍在用 `@record_traj`,plan 阶段说过"local 模式不动";所以 record_traj 保留,docstring 加一段说明"proxy 不再用,只给 local 用",新引导走 `TrajectoryRecorder`。 - -`_get_or_create_metrics_monitor` / `MODEL_SERVICE_REQUEST_RT` / `MODEL_SERVICE_REQUEST_COUNT` 不动 —— `traj_recorder.py` 复用之。 - -### 2.7 `rock/sdk/model/server/api/proxy.py` —— 整文件重写 - -旧实现: -- `httpx.AsyncClient` 全局 + `@retry_async` 6 次指数退避 -- `perform_llm_request(url, body, headers, config)` 自管 retry -- `@record_traj` 挂在 handler 上同步落盘 + metrics -- 强制 `stream=False`(MVP 限制) - -新实现: -- `litellm.acompletion(model, api_base, extra_headers, timeout, num_retries, **body)` -- 错误归一化:catch `RateLimitError / APIError / BadRequestError / AuthenticationError / Timeout` → `_format_error_response()` 回退到 `{error:{message,type,code}}` schema(agent 端关键字检测兼容) -- 流式开放:`stream=True` 走 `StreamingResponse(_sse_iter(...))` -- 不再有装饰器 —— record 落盘改由 `main.py` 在启动时挂的 `litellm.callbacks` 完成 - -`get_base_url()` 路由优先级**完全保留**(`proxy_base_url` > `proxy_rules[model]` > `proxy_rules["default"]`)。`_filter_headers()` 把 hop-by-hop headers(host/content-length/content-type/transfer-encoding/connection)滤掉,Authorization 等保留。 - -replay 模式下:`litellm_model = f"traj-replay/{model_name}"`,`api_base=None`。litellm 看到 `traj-replay/` 前缀会查 `litellm.custom_provider_map`,找到 `TrajectoryReplayer` 实例并调它的 `acompletion`/`astreaming`。 - -### 2.8 `rock/sdk/model/server/main.py` —— 修改 - -新增私有函数 `_configure_litellm_for_proxy(config)`,在 `main()` 进入 proxy 分支时(`include_router(proxy_router)` 之前)调用一次。两个分支: - -```python -if config.replay_enabled: - # 注册 TrajectoryReplayer 到 litellm.custom_provider_map - ... -elif config.traj_enabled: - # 把 TrajectoryRecorder 加到 litellm.callbacks - ... -``` - -**注意**:replay 和 record 互斥(replay 不要再录,否则录回放结果会污染 source-of-truth)。 - -`create_config_from_args()` 新增 4 个 CLI override:`--num-retries / --traj-file / --no-traj / --replay-traj`。所有用 `getattr(args, "", default)` 的方式取,这样老的调用方(传不带这些字段的 Namespace)不会炸。 - -`from rock.sdk.model.server.config import TRAJ_FILE, ModelServiceConfig` —— 新增 `TRAJ_FILE` 导入,因为 `_configure_litellm_for_proxy` 在 `traj_file` 未指定时回退到 `TRAJ_FILE`。 - -### 2.9 `tests/unit/sdk/model/test_proxy.py` —— 重写 - -- 删除:`test_perform_llm_request_*`(4 个,perform_llm_request 已不存在) -- 改造:`test_chat_completions_routing_*`、`test_proxy_base_url_overrides_proxy_rules` —— `patch_path` 从 `proxy.perform_llm_request` 改为 `proxy.litellm.acompletion` -- 改造:断言从"perform_llm_request 第一个位置参数 == URL"改为"litellm.acompletion kwargs 中 `api_base == 期望值`,`model == 'openai/'`" -- 新增:`test_chat_completions_passes_num_retries_and_timeout` / `test_chat_completions_litellm_error_returns_proxy_schema` / `test_chat_completions_replay_mode_uses_traj_replay_provider` / `test_chat_completions_strips_hop_by_hop_headers` / `test_config_default_traj_and_replay` / `test_config_loads_traj_and_replay_from_file` / `test_cli_replay_traj_enables_replay` -- 保留:所有 lifespan / config-load / metrics-monitor / record_traj 测试(record_traj 在 utils.py 还在,给 local 用) - -mock 返回的 ModelResponse:用 `SimpleNamespace(model_dump=lambda: payload)` 假装一个 pydantic 对象 —— 因为 handler 只调 `.model_dump()`,不需要真 import 整个 ModelResponse。 - -### 2.10 `tests/unit/sdk/model/test_traj_recorder.py` —— 新增 - -7 个测试:JSONL append / `append=False` 首次截断 / metrics + sandbox_id / failure 落盘 / 缺 standard_logging_object 跳过 / 自动建父目录 / `response_time` 缺失时回退到 `endTime - startTime`。 - -mock 思路:`patch("rock.sdk.model.server.integrations.traj_recorder._get_or_create_metrics_monitor", return_value=mock_monitor)` —— recorder 内部 import 了这个函数,mock 它的引用。 - -### 2.11 `tests/unit/sdk/model/test_traj_replayer.py` —— 新增 - -11 个测试:cursor 加载单文件/目录(按文件名 sort)/空行/缺失文件 raise / `next()` 顺序返回 / 越界 raise / `reset()` 回到起点 / model mismatch 只 warn / Replayer.acompletion 命中 record / cursor 推进 / streaming chunk 拼回 == 原文 / 越界 raise CustomLLMError。 - -streaming 测试构造一个 `SimpleNamespace(choices=[SimpleNamespace(delta=SimpleNamespace(role=None, content=None), index=0)])` 当 model_response,因为 `async_mock_completion_streaming_obj` 内部会写 `model_response.choices[0].delta.content = ...`。 - -### 2.12 `examples/model_service/config_record.yaml` 和 `config_replay.yaml` —— 新增 - -两份开箱即用的 yaml,带详细注释。`config_record.yaml` 默认开 `traj_enabled: true / traj_append: true`,关 replay。`config_replay.yaml` 默认关 traj_enabled / 开 replay,`replay_traj_path: "/data/logs/LLMTraj.jsonl"` 占位 —— 实际部署时根据 traj 位置改。 - -### 2.13 `/mnt/xinshi/github/litellm-traj/` —— 已删除 - -第一版独立项目骨架(`pyproject.toml / src/litellm_traj/cursor.py / .gitignore / LICENSE`)在方向变更时已 `rm -rf`。所有有效内容都迁回了 rock 的 integrations/ 模块。 - ---- - -## 3. 关键代码细节(踩坑点 + "为什么这么写") - -下文展开几个最容易让接手者迷失的设计选择。每一项都标了 litellm 仓内的源码定位(litellm 主仓在 `/mnt/xinshi/github/litellm/`),便于交叉验证。 - -### 3.1 Streaming 聚合在 litellm 内部完成,Recorder 不需要分支 - -`StandardLoggingPayload.response` 字段在 `success_handler` 触发前**已经是聚合完整的 OpenAI shape dict**。流式与非流式走同一条路径:litellm 在 streaming 结束时调用 `stream_chunk_builder` 拼出 `complete_streaming_response`(litellm 仓 `litellm/litellm_core_utils/litellm_logging.py:1930-1955`),然后写入 `standard_logging_object.response`。 - -实际后果:`TrajectoryRecorder.async_log_success_event` 拿到的 payload 永远含完整 response,我**不需要写 `async_log_stream_event`**。这也是为什么 stream 解禁几乎"零成本" —— 录制端无任何额外代码。 - -### 3.2 `model: "openai/"` 前缀的含义 - -litellm 把"provider"前缀作为路由依据。`openai/gpt-3.5-turbo` 表示"上游是 OpenAI 兼容协议的服务,模型名叫 gpt-3.5-turbo"。配合 `api_base="https://api.modelscope.cn/v1"` 这种第三方 OpenAI 兼容 endpoint 也能用 —— 这正是 rock 现有 `proxy_rules` 里的 ModelScope/OpenAI 等场景。 - -`traj-replay/` 是我们注册的自定义 provider。litellm 看到这个前缀会查 `litellm.custom_provider_map`,匹配到 `provider == "traj-replay"` 的项,把 `custom_handler.acompletion`/`astreaming` 当上游调(litellm 仓 `litellm/main.py:4280-4326`)。 - -### 3.3 错误归一化:为什么 catch 那 5 个 exception - -`proxy.py` catch 顺序:`RateLimitError, APIError, BadRequestError, AuthenticationError, Timeout`。这五个在 `litellm/exceptions.py` 全部继承自 `openai.OpenAIError` 派生类,**都带 `.status_code` 属性**。`_format_error_response` 用 `getattr(exc, "status_code", None) or 502` 提取上游真实状态码;message 走 `str(exc)` —— litellm 异常的 `__str__` 已经包含"上游原始 error message",所以 agent 端的关键字检测(如 `"context length exceeded"` / `"content violation"`)继续工作。 - -`type` 字段用 `type(exc).__name__`(`"BadRequestError"` 等),不再是旧的固定 `"proxy_retry_failed"`。这是 schema 的语义变化:同一个 `error.type` 字段,旧版本返回固定字符串,新版本返回 exception 类名。如果有下游消费 `error.type` 做分支,需要适配。 - -兜底 `except Exception` 走 `HTTPException(500)`,会被 `main.py` 里的 `global_exception_handler` 接住,返回 `{error:{message,type:"internal_error",code:"internal_error"}}` —— 这条路径与重构前完全一致。 - -### 3.4 retry 行为:从 `retry_async` 切到 `litellm.num_retries` - -旧实现:`@retry_async(max_attempts=6, delay_seconds=2.0, backoff=2.0, jitter=True, exceptions=(TimeoutException, ConnectError, HTTPStatusError))`。仅在 `status_code in retryable_status_codes` 时 raise,这样 401 不会触发 retry,而 429/500 会。 - -新实现:`config.num_retries`(默认 6) 直接传给 `litellm.acompletion(num_retries=...)`。litellm 内部对 `RateLimitError / APIError / Timeout / ServiceUnavailableError` 自动重试,**不暴露 `retryable_status_codes` 维度**。我保留 `retryable_status_codes` 字段在 config 里,但当前**handler 没用它**(向后兼容旧 yaml,不会因为多了字段而 reject)。 - -如果将来有人投诉"自定义重试码列表失效",这是已知的语义差异。fallback 方案:在 handler 里手写 `for attempt in range(config.num_retries):` 包一层,根据 status code 做白名单。本期不做,因为 litellm 默认行为已经覆盖最常见的 429/500。 - -### 3.5 `_filter_headers` 黑名单 vs 白名单 - -我用黑名单:`host / content-length / content-type / transfer-encoding / connection` 不转发,其余全部透传给 litellm 的 `extra_headers`。这与旧实现保持一致(旧的也是去掉前 4 个,新增 connection 是为了更标准)。Authorization/X-* 等都自动通过。 - -注意:`extra_headers` 在 litellm 里被合并到上游 HTTP 请求里(litellm 自己的 OpenAI client),不会覆盖 litellm 自己生成的 `Authorization: Bearer `。如果 rock 不主动设 `OPENAI_API_KEY`,而 client 又传了 Authorization header,litellm 会用 client 的;反之 litellm 会用环境变量。这一层逻辑全在 litellm 自己。 - -### 3.6 `traj_append=False` 的"首次截断"行为 - -旧 `_write_traj` 在 `append=False` 时**每次调用都 `mode="w"`**,导致 jsonl 永远只有最后一行 —— 这是个 bug。 - -新 `TrajectoryRecorder` 的修复:维护一个 `self._truncated` 实例标志;`append=False` 时,**第一次写**用 `mode="w"`(覆盖上一进程留下的旧 traj),**后续写**用 `mode="a"`(本进程内 append)。所以: -- 进程启动时:旧 traj 文件清空(如果存在) -- 进程运行中:每次调用 append 一行 -- 进程重启:再次清空,从头记 - -效果上等于"per-run 一份完整 traj"。我把这个语义在 docstring 里讲清楚了,因为这是和旧默认行为最不同的一点。 - -`traj_append=True`(新默认)就是纯 append-only,不管旧文件。 - -### 3.7 SequentialCursor 的并发模型 - -`async next()` 用 `asyncio.Lock` 保护索引 + 自增。**单进程多并发请求场景下** cursor 推进是原子的,但**含义是"按到达顺序消费"**,所以多个 agent 并发打过来会被串成一个伪顺序 —— 这是 v1 的已知约束(plan 里明确列出),约定"单 agent 串行回放"。 - -**model mismatch 只 warn 不 raise**:expected_model 来自调用方传入,recorded model 来自 record 内的 `model` 字段。两者不一致只打 warning,record 仍然返回。理由:agent 端可能切换了 base_url 但没改 model 名(常见调试场景),不该硬阻塞。 - -### 3.8 CustomLLM 的调用约定 —— `*args, **kwargs` 收尾很重要 - -`litellm/main.py:4302-4319` 实测调用方式是**全关键字参数**: -```python -response = handler_fn( - model=model, messages=messages, headers=headers, - model_response=model_response, print_verbose=..., - api_key=..., api_base=..., acompletion=..., logging_obj=..., - optional_params=..., litellm_params=..., logger_fn=..., - timeout=..., custom_prompt_dict=..., client=..., encoding=..., -) -``` -但 litellm 各小版本会不会增减字段不确定。`TrajectoryReplayer.acompletion(self, model, messages, *args, **kwargs)` 这种"显式 model+messages,其余吞掉"的签名,既能 PEP-484 注解,又对 litellm 后续加字段免疫。 - -**不要改成 `def acompletion(self, model, messages, *, optional_params, ...)`** 否则 litellm 加新字段时会 TypeError。 - -### 3.9 `LITELLM_TRAJ_FILE` env vs `traj_file` 字段 - -我没引入新 env var。`config.traj_file` 在 `main.py:_configure_litellm_for_proxy` 里通过 `config.traj_file or TRAJ_FILE` 取值,而 `TRAJ_FILE` 来自 `config.py:13`,= `LOG_DIR + "/LLMTraj.jsonl"`,`LOG_DIR = env_vars.ROCK_MODEL_SERVICE_DATA_DIR`(默认 `/data/logs`)。 - -所以路径优先级:`--traj-file CLI` > `traj_file: yaml` > `LOG_DIR/LLMTraj.jsonl`(LOG_DIR 受 `ROCK_MODEL_SERVICE_DATA_DIR` env 控制)。和旧体系一致。 - -### 3.10 `record_traj` 装饰器为什么保留 - -`local.py:75` 仍然用 `@record_traj` 装饰它的 chat_completions handler。local 模式不调 litellm,FileHandler 直接通过文件 marker 跟 Roll 通信 —— 没有 litellm callback 触发的窗口。所以为了保留 local 模式的"调用次数 + RT 上报",我把 `record_traj` 留在 `utils.py`,让 local 继续用,docstring 写明"proxy 模式不再用,改走 TrajectoryRecorder"。 - -代价:local 模式录的 traj schema 是旧的 `{request, response}`,proxy 模式是 `StandardLoggingPayload`。两种 schema 共存于同一个 `LLMTraj.jsonl` 文件路径上(因为 `TRAJ_FILE` 是同一个常量)。**实际部署时 local 和 proxy 用同一个进程的概率为 0**(`--type` 互斥),所以同一个 traj 文件不会混合两种 schema。但如果有人定时切换 `--type` 跑 + `traj_append=true` 不轮换文件,会出现混合。文档建议:**replay 时只读 proxy 模式录的 traj**(StandardLoggingPayload 格式),local 模式的 traj 仅用于 local 调试。 - ---- - -## 4. 跑测试 / 验证步骤(接手者从这里继续) - -### 4.1 准备 Python 环境 - -**已验证**:`uv sync` 后 litellm 已正常安装。使用 `uv run` 执行,不需要手动激活 venv。 - -```bash -cd /mnt/xinshi/github/Self-ROCK -uv sync --extra model-service --group test -``` - -验证依赖(已通过): - -```bash -uv run python -c "from litellm.integrations.custom_logger import CustomLogger; print('ok')" -uv run python -c "from litellm.llms.custom_llm import CustomLLM, CustomLLMError; print('ok')" -uv run python -c "from litellm.utils import async_mock_completion_streaming_obj; print('ok')" -``` - -### 4.2 静态检查 / lint - -```bash -uv run ruff check rock/sdk/model/server/ tests/unit/sdk/model/ -uv run ruff format --check rock/sdk/model/server/ tests/unit/sdk/model/ -``` - -如果 ruff format 报 diff,直接 `uv run ruff format rock/sdk/model/server/ tests/unit/sdk/model/` 修。代码写的时候我没跑 ruff,可能有 line-length / import 排序之类的小问题。 - -### 4.3 单测(已全部通过) - -```bash -uv run pytest tests/unit/sdk/model/ -v -# → 47 passed in ~4s -``` - -**已验证通过的测试集**: -- `test_proxy.py` (27 个):routing/error/replay/header/cli/config/metrics -- `test_traj_recorder.py` (7 个):JSONL append/truncate/metrics/failure/missing payload/mkdir/rt fallback -- `test_traj_replayer.py` (11 个):cursor 加载/顺序/越界/reset/model mismatch/acompletion/streaming/exhaustion -- `test_model_client.py` (2 个):原有测试保留通过 - -**已知但不影响测试的边界情况**(生产注意): -- tool_calls 场景下 `_extract_assistant_text` 返回 `""`,replay 流式会返回空流(已知限制,不在本期范围) -- `litellm.callbacks` 是全局 list,测试隔离靠 patch,生产只起一次 server 无问题 - -### 4.4 集成验证(测试通过后) - -#### Record 模式 - -```bash -# 终端 1 -export OPENAI_API_KEY="sk-..." -export ROCK_MODEL_SERVICE_DATA_DIR=/tmp/rock-traj -mkdir -p /tmp/rock-traj -uv run python -m rock.sdk.model.server.main \ - --type proxy \ - --config-file examples/model_service/config_record.yaml \ - --port 8080 - -# 终端 2 -curl -X POST http://127.0.0.1:8080/v1/chat/completions \ - -H "Authorization: Bearer $OPENAI_API_KEY" \ - -H "Content-Type: application/json" \ - -d '{"model":"gpt-3.5-turbo","messages":[{"role":"user","content":"say hi"}]}' - -# 验证 traj -cat /tmp/rock-traj/LLMTraj.jsonl | jq '.id, .model, .response.choices[0].message.content' -# 应该看到 chatcmpl-xxx / gpt-3.5-turbo / "..." -``` - -#### Replay 模式 - -```bash -# 终端 1 -uv run python -m rock.sdk.model.server.main \ - --type proxy \ - --replay-traj /tmp/rock-traj/LLMTraj.jsonl \ - --port 8081 - -# 终端 2 - 同样的 curl 打 8081 -curl -X POST http://127.0.0.1:8081/v1/chat/completions \ - -H "Content-Type: application/json" \ - -d '{"model":"gpt-3.5-turbo","messages":[{"role":"user","content":"anything (replay ignores msgs)"}]}' - -# 应该返回与录制时同样的 response.choices[0].message.content -# 第二次 curl 会 404(traj exhausted),证明 cursor 在工作 -``` - -#### Streaming 验证 - -```bash -curl -N -X POST http://127.0.0.1:8080/v1/chat/completions \ - -H "Authorization: Bearer $OPENAI_API_KEY" \ - -H "Content-Type: application/json" \ - -d '{"model":"gpt-3.5-turbo","stream":true,"messages":[{"role":"user","content":"count to 5"}]}' -# 应该看到 SSE chunks: data: {...}\n\n ... data: [DONE]\n\n -# traj 文件里那一行的 .stream == true,.response 是聚合后的完整 dict -``` - -#### Agent 端到端(最终验证) - -`mini-swe-agent` 跑一个 SWE-bench 实例,base_url 指向 8080(record),完了用同 instance 接 8081(replay),期望 agent 最终生成的 patch 与录制时一致。这是最强 check,但跑起来麻烦,可以在 PR review 阶段再做。 - ---- - -## 5. Breaking Changes(PR 描述里必须写清楚) - -### 5.1 traj 文件 schema 改变 - -`LLMTraj.jsonl` 每行从 `{"request": {...}, "response": {...}}` 变成 `StandardLoggingPayload`(几十个字段:`id/trace_id/model/messages/response/model_parameters/usage/startTime/endTime/status/...`)。 - -如果有下游消费者依赖旧的两字段 schema(脚本、UI、统计),会破坏。本期不提供"双格式输出"或"旧→新转换"工具,如有需要可单独写 `scripts/convert_traj.py`。 - -### 5.2 `traj_append` 默认值翻转 - -旧的 `ROCK_MODEL_SERVICE_TRAJ_APPEND_MODE` 默认 `"false"` → `_write_traj` 用 `mode="w"`,实际表现是"每次调用覆盖,文件只剩最后一条"。新的 `ModelServiceConfig.traj_append` 默认 `True`(append-only)。 - -如果有人**之前依赖每次都覆盖来获取"最近一次调用"**(很罕见但可能),需要在 yaml 显式设 `traj_append: false`。 - -### 5.3 `error.type` 字段语义变化 - -旧值:固定字符串 `"proxy_retry_failed"`(retry 用尽)或 `"internal_error"`(其他)。 -新值:litellm 异常类名,如 `"BadRequestError" / "RateLimitError" / "Timeout" / "AuthenticationError" / "APIError"`。 - -`error.message` 仍以 `"LLM backend error: ..."` 开头,关键字检测兼容。 - -### 5.4 `retryable_status_codes` 字段不再生效 - -旧版本根据 `retryable_status_codes` 白名单决定哪些状态码触发 retry(如 401 不 retry,429/500 retry)。新版本由 litellm 内部决定(对 `RateLimitError / APIError / Timeout / ServiceUnavailableError` 自动 retry,4xx 一般不 retry)。 - -字段保留在 yaml 不报错,但 handler 不读它。如果将来需要恢复白名单,见 3.4 节"fallback 方案"。 - -### 5.5 `stream=true` 不再被强制拒绝 - -旧版本对 `stream=true` 返回 400 + `"Streaming requests (stream=True) are not supported"`。新版本正常处理,返回 SSE。 - -如果有 client 之前**依赖** 400 来探测"是否启用流式",会破坏。但这种用法很反常,基本不会有。 - -### 5.6 `perform_llm_request` 函数已删除 - -下游不应该 import 这个 —— 它本来就是 proxy.py 内的 helper。如果有 test/script 直接 import 它,需要适配。`tests/unit/sdk/model/test_proxy.py` 我已改完。 - -### 5.7 新的依赖 - -`pip install rl-rock[model-service]` 会多装 litellm(及其依赖链:`openai>=1.x / tiktoken / aiohttp / tokenizers / ...`)。包大小 +~50MB。 - ---- - -## 6. 已知坑 / 接手时的注意事项 - -### 6.1 `local.py` 仍在 import `record_traj` - -我**没改 local.py**(plan 明确"local 不动")。`local.py:12` 的 `from rock.sdk.model.server.utils import record_traj` 仍然成立,因为 utils.py 保留了 record_traj。如果接手者看到这个 import 想清理,**不要清理** —— 那会破坏 local 模式。 - -### 6.2 `litellm.callbacks` 是全局 list - -`main.py:_configure_litellm_for_proxy` 用 `litellm.callbacks.append(recorder)`。如果同一进程多次启动(测试场景),会注册多次,导致每次调用落多份 traj。生产部署只跑一次没问题。**如果要写"重复初始化也安全"的逻辑**,可以改成 `if not any(isinstance(cb, TrajectoryRecorder) for cb in litellm.callbacks): litellm.callbacks.append(recorder)`。我没做,因为生产路径是"启动一次"。 - -同理 `litellm.custom_provider_map = [...]` 是赋值不是 append,所以 replay 重复初始化是幂等的。 - -### 6.3 SequentialCursor 在测试里要小心 cursor 跨用例 - -`SequentialCursor` 是实例属性 `self._idx`,每个测试自己 `SequentialCursor.load(p)` 都是新实例,不会跨用例污染。但如果有人写"模块级单例 replayer + 多个测试调它"的 fixture,会撞 idx。当前测试都是 per-test 实例,OK。 - -### 6.4 `litellm` import 较慢 - -litellm import 时会加载几个 OpenAI/HuggingFace 客户端,首次 import 可能 1-2 秒。`main.py` 把 `import litellm` 放在 `_configure_litellm_for_proxy()` 内部(函数级延迟 import),只在 proxy 模式启动时触发。`proxy.py` 是模块顶级 `import litellm`,handler 文件首次加载就触发 —— 这是 fastapi 路由注册时的开销,不影响请求路径性能。 - -### 6.5 `pyproject.toml` 的 `tzdata` 依赖 - -我看到 pyproject.toml 里 ide_diagnostics 报 `httpx/uuid/anyio/tzdata/...` 未安装 —— 这是 ide 当前 Python 环境没装 rock 主仓依赖,与本次改动无关。`uv sync` 后这些 hint 自动消失。 - -### 6.6 `__pycache__` 残留 - -旧 `proxy.py` 有 `__pycache__/proxy.cpython-310.pyc`。重写后第一次 import 会重新生成,**正常情况下没问题**。如果跑测试时报 `ImportError: cannot import name 'perform_llm_request'`,先 `find rock -name __pycache__ -exec rm -rf {} +` 清掉缓存。 - -### 6.7 别忘了 `extra_headers` 可能含敏感信息 - -`_filter_headers` 把所有非 hop-by-hop header 透传给上游,包括 client 传的 `Authorization`。这是**故意的** —— 让 client 自己带 API key 是 rock 现有约定。但意味着 traj 录的 `StandardLoggingPayload.metadata.headers`(如果有) 可能含 Bearer token。litellm 自己有 `turn_off_message_logging` / `redact_user_api_key_info` 等开关,**目前没启用**。如果将来 traj 文件要分发,需要先脱敏。 - ---- - -## 7. 不在本次范围 / 后续扩展(v2) - -### 不在范围(明确不做) - -- local 模式(`--type local`)的任何改动 -- DB 持久化(traj 只走 JSONL) -- 旧 `{request, response}` traj 的兼容读取(replay 只接受新 schema) -- SWE-agent / OpenHands 原生 traj 格式互转 -- replay 时 streaming 的细粒度时序还原(只保证 chunk 序列正确) -- tool_calls 的增量流式拆分(本期 streaming replay 只到 message-level chunk) - -### 后续扩展(留了接口) - -- **基于 messages hash 的乱序匹配**:`SequentialCursor` 旁加 `HashMatcher`,通过 `replay_mode: sequential | hash` 切换。当 agent 内部不严格按录制顺序调 LLM(分支/retry)时用。 -- **多并发回放**:用请求 metadata 中的 `run_id` 路由到不同 cursor;`SequentialCursor` 改成 `dict[run_id, Cursor]`。 -- **passthrough on miss**:cursor 用尽时回落到真 LLM(`import litellm; await litellm.acompletion(...)`)。用于"录到一半 traj 不够长"的调试场景。 -- **`/admin/reset` HTTP 端点**:不重启 proxy 即可把 cursor 归零。 -- **`scripts/convert_traj.py`**:把 SWE-agent `.traj` 或 OpenHands event log 转成 StandardLoggingPayload,反向也行。 -- **traj 脱敏 hook**:写盘前过 `redact_keys: list[str]` 把指定字段抹掉。 - ---- - -## 8. 关键路径速查 - -### Rock 仓内(本次改动的) - -| 路径 | 角色 | -|---|---| -| `pyproject.toml` | model-service extras 加 litellm | -| `rock/sdk/model/server/config.py` | ModelServiceConfig 新字段 | -| `rock/sdk/model/server/api/proxy.py` | 重写为 litellm SDK | -| `rock/sdk/model/server/main.py` | `_configure_litellm_for_proxy` + 新 CLI flags | -| `rock/sdk/model/server/utils.py` | 保留 record_traj 给 local | -| `rock/sdk/model/server/integrations/__init__.py` | 空,只为成包 | -| `rock/sdk/model/server/integrations/traj_recorder.py` | TrajectoryRecorder(CustomLogger) | -| `rock/sdk/model/server/integrations/traj_replayer.py` | SequentialCursor + TrajectoryReplayer(CustomLLM) | -| `rock/sdk/model/server/api/local.py` | **没改**(仍用 record_traj) | -| `tests/unit/sdk/model/test_proxy.py` | 改造完 | -| `tests/unit/sdk/model/test_traj_recorder.py` | 新 | -| `tests/unit/sdk/model/test_traj_replayer.py` | 新 | -| `examples/model_service/config_record.yaml` | 新 | -| `examples/model_service/config_replay.yaml` | 新 | - -### litellm 仓(交叉验证用,在 `/mnt/xinshi/github/litellm/`) - -| 关注点 | 路径 | -|---|---| -| CustomLogger 接口(基类) | `litellm/integrations/custom_logger.py:67` | -| CustomLLM 接口(基类) | `litellm/llms/custom_llm.py:47` | -| StandardLoggingPayload schema | `litellm/types/utils.py:2764` | -| streaming 聚合写入 payload | `litellm/litellm_core_utils/litellm_logging.py:1930-1955` | -| async_mock_completion_streaming_obj | `litellm/utils.py:6831` | -| custom_provider_map 加载流程(实际是怎么调 acompletion 的) | `litellm/main.py:4280-4326` | -| LiteLLM 异常基类(status_code 来源) | `litellm/exceptions.py` | - -### 历史 / 对话产物 - -- 原始 plan 文件(详细设计推演): `/home/xinshi/.claude/plans/litellm-chat-completions-traj-replay-ser-lucky-rainbow.md` -- 已废弃的独立项目骨架: `/mnt/xinshi/github/litellm-traj/`(已 `rm -rf`) - ---- - -## 9. 给接手者的 1 分钟上手 - -1. `cd /mnt/xinshi/github/Self-ROCK && uv sync --extra model-service --group test` -2. `uv run pytest tests/unit/sdk/model/ -v` → 应得 **47 passed**(已验证) -3. 跑集成验证(第 4.4 节) -4. 写 PR 描述,**重点说第 5 节的 breaking changes** -5. PR 评审里如果有人问"为什么不沿用 retry_async 的 status code 白名单",答:见第 3.4 节(litellm 默认 retry 已覆盖最常见场景,白名单后续可选加) - -如果想了解整个项目背景而不只是这次 refactor,看顶层 `CLAUDE.md`。如果想知道 litellm 内部细节,看 `/mnt/xinshi/github/litellm/CLAUDE.md`(litellm 主仓的)。 From 0679d3271085761812264e3ac831ccadf8a518c3 Mon Sep 17 00:00:00 2001 From: "pengshixin.psx" Date: Tue, 12 May 2026 08:36:09 +0000 Subject: [PATCH 15/25] =?UTF-8?q?refactor(model-service):=20flatten=20layo?= =?UTF-8?q?ut=20=E2=80=94=20drop=20integrations/,=20rename=20sse=5Futils?= =?UTF-8?q?=E2=86=92sse,=20merge=20traj=5F*=E2=86=92traj?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The integrations/ directory only ever held two files (traj_recorder, traj_replayer) and the litellm CustomLogger angle that justified the name is long gone. Both modules share one JSONL schema, so collapsing them into a single traj.py keeps the schema and its read/write halves visible together. sse_utils.py → sse.py: the codec is the module's whole purpose, the _utils suffix added nothing. Drop traj_recorder.now() — a one-line wrapper around time.time() with no callers. Also remove a stray _get_or_create_metrics_monitor patch in test_forward_invokes_recorder_on_success: OTLP create-time failure only logs a warning, so the patch was protecting against nothing. Co-Authored-By: Claude Opus 4.7 --- rock/sdk/model/server/api/proxy.py | 5 +- .../sdk/model/server/integrations/__init__.py | 0 .../server/integrations/traj_recorder.py | 90 ---------- .../server/integrations/traj_replayer.py | 82 --------- rock/sdk/model/server/main.py | 4 +- .../sdk/model/server/{sse_utils.py => sse.py} | 0 rock/sdk/model/server/traj.py | 156 ++++++++++++++++++ tests/unit/sdk/model/test_proxy.py | 11 +- .../sdk/model/test_proxy_record_replay.py | 5 +- .../model/{test_sse_utils.py => test_sse.py} | 2 +- tests/unit/sdk/model/test_traj_recorder.py | 4 +- tests/unit/sdk/model/test_traj_replayer.py | 2 +- 12 files changed, 169 insertions(+), 192 deletions(-) delete mode 100644 rock/sdk/model/server/integrations/__init__.py delete mode 100644 rock/sdk/model/server/integrations/traj_recorder.py delete mode 100644 rock/sdk/model/server/integrations/traj_replayer.py rename rock/sdk/model/server/{sse_utils.py => sse.py} (100%) create mode 100644 rock/sdk/model/server/traj.py rename tests/unit/sdk/model/{test_sse_utils.py => test_sse.py} (99%) diff --git a/rock/sdk/model/server/api/proxy.py b/rock/sdk/model/server/api/proxy.py index 5fa750e5d2..462788fe90 100644 --- a/rock/sdk/model/server/api/proxy.py +++ b/rock/sdk/model/server/api/proxy.py @@ -33,14 +33,13 @@ from rock.logger import init_logger from rock.sdk.model.server.config import ModelServiceConfig -from rock.sdk.model.server.integrations.traj_recorder import TrajectoryRecorder -from rock.sdk.model.server.integrations.traj_replayer import SequentialCursor, TrajectoryExhausted -from rock.sdk.model.server.sse_utils import ( +from rock.sdk.model.server.sse import ( SSE_DONE, completion_to_chunk_dict, encode_sse_event, parse_sse_data_chunks, ) +from rock.sdk.model.server.traj import SequentialCursor, TrajectoryExhausted, TrajectoryRecorder logger = init_logger(__name__) diff --git a/rock/sdk/model/server/integrations/__init__.py b/rock/sdk/model/server/integrations/__init__.py deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/rock/sdk/model/server/integrations/traj_recorder.py b/rock/sdk/model/server/integrations/traj_recorder.py deleted file mode 100644 index a0c7e08fc7..0000000000 --- a/rock/sdk/model/server/integrations/traj_recorder.py +++ /dev/null @@ -1,90 +0,0 @@ -"""Append chat/completions trajectories as JSONL. - -The recorder is invoked **explicitly** from ``proxy.py`` after each forwarded -call (success or failure). It is no longer a litellm CustomLogger — we removed -the litellm SDK dependency in favor of httpx-based byte forwarding, and call -this object directly so writes stay deterministic and locally testable. - -Schema per line: a small dict with ``request`` / ``response`` / ``status`` / -``response_time`` / ``model`` / ``stream``. Faithful enough to drive the -sequential replayer; not a full StandardLoggingPayload. -""" - -from __future__ import annotations - -import asyncio -import json -import os -import time -from pathlib import Path -from typing import Any - -from rock.logger import init_logger -from rock.sdk.model.server.utils import ( - MODEL_SERVICE_REQUEST_COUNT, - MODEL_SERVICE_REQUEST_RT, - _get_or_create_metrics_monitor, -) - -logger = init_logger(__name__) - - -class TrajectoryRecorder: - """Appends one JSONL line per chat/completions call and reports OTLP metrics.""" - - def __init__(self, traj_file: str | os.PathLike) -> None: - self.traj_file = Path(traj_file) - self.traj_file.parent.mkdir(parents=True, exist_ok=True) - self._lock = asyncio.Lock() - self._monitor = _get_or_create_metrics_monitor() - - async def record( - self, - *, - request: dict[str, Any], - response: dict[str, Any] | None, - status: str, - start_time: float, - end_time: float, - error: str | None = None, - ) -> None: - """Persist one call to the JSONL file and report RT/count metrics. - - ``request`` / ``response`` are stored verbatim (whatever the upstream - returned, including provider-specific fields like ``reasoning_content``). - For streaming calls, ``response`` is the aggregated final ChatCompletion - produced by ``ChatCompletionStreamState.get_final_completion().model_dump()``. - """ - rt_seconds = end_time - start_time - payload = { - "model": request.get("model"), - "stream": bool(request.get("stream")), - "status": status, - "response_time": rt_seconds, - "start_time": start_time, - "end_time": end_time, - "request": request, - "response": response, - "error": error, - } - - line = json.dumps(payload, ensure_ascii=False, default=str) + "\n" - async with self._lock: - await asyncio.to_thread(self._write_line, line) - - attrs = { - "type": "chat_completions", - "status": status, - "sandbox_id": os.getenv("ROCK_SANDBOX_ID", "unknown"), - } - self._monitor.record_gauge_by_name(MODEL_SERVICE_REQUEST_RT, rt_seconds * 1000.0, attributes=attrs) - self._monitor.record_counter_by_name(MODEL_SERVICE_REQUEST_COUNT, 1, attributes=attrs) - - def _write_line(self, line: str) -> None: - with self.traj_file.open("a", encoding="utf-8") as f: - f.write(line) - - -def now() -> float: - """Wall-clock seconds (single shim so callers don't import time directly).""" - return time.time() diff --git a/rock/sdk/model/server/integrations/traj_replayer.py b/rock/sdk/model/server/integrations/traj_replayer.py deleted file mode 100644 index af2fdd6bb4..0000000000 --- a/rock/sdk/model/server/integrations/traj_replayer.py +++ /dev/null @@ -1,82 +0,0 @@ -"""Sequential cursor over a recorded JSONL trajectory. - -Loaded once at startup; ``await cursor.next(expected_model=...)`` hands out the -next record (full StandardLoggingPayload dict) and advances. Going past the end -raises :class:`TrajectoryExhausted` so the proxy can return a clean 404 without -involving litellm — that's the whole point: replay does NOT need to go through -litellm's CustomLLM machinery, the proxy serves recorded responses directly. -""" - -from __future__ import annotations - -import asyncio -import json -import os -from pathlib import Path - -from rock.logger import init_logger - -logger = init_logger(__name__) - - -class TrajectoryExhausted(Exception): - """Raised by ``SequentialCursor.next`` when all recorded steps have been served.""" - - def __init__(self, position: int, total: int) -> None: - super().__init__(f"trajectory exhausted at step {position} (total recorded steps={total})") - self.position = position - self.total = total - - -class SequentialCursor: - """Hands out trajectory records one at a time, in recorded order.""" - - def __init__(self, records: list[dict]) -> None: - self.records = records - self._idx = 0 - self._lock = asyncio.Lock() - - @classmethod - def load(cls, path: str | os.PathLike) -> SequentialCursor: - path = Path(path) - if not path.is_file(): - raise FileNotFoundError(f"traj file not found: {path}") - - records: list[dict] = [] - with path.open("r", encoding="utf-8") as fp: - for line in fp: - line = line.strip() - if not line: - continue - records.append(json.loads(line)) - - logger.info(f"[traj-replay] loaded {len(records)} record(s) from {path}") - return cls(records) - - async def next(self, expected_model: str | None = None) -> dict: - async with self._lock: - if self._idx >= len(self.records): - raise TrajectoryExhausted(position=self._idx, total=len(self.records)) - record = self.records[self._idx] - self._idx += 1 - current_idx = self._idx - 1 - - if expected_model: - recorded_model = record.get("model") - if recorded_model and recorded_model != expected_model: - logger.warning( - f"[traj-replay] step {current_idx} model mismatch: " - f"recorded={recorded_model!r} requested={expected_model!r}" - ) - return record - - def reset(self) -> None: - self._idx = 0 - - @property - def position(self) -> int: - return self._idx - - @property - def total(self) -> int: - return len(self.records) diff --git a/rock/sdk/model/server/main.py b/rock/sdk/model/server/main.py index 951e0d5dff..063d4fa31f 100644 --- a/rock/sdk/model/server/main.py +++ b/rock/sdk/model/server/main.py @@ -63,14 +63,14 @@ def _configure_proxy_integrations(app: FastAPI, config: ModelServiceConfig) -> N from rock.sdk.model.server.api.proxy import ForwardBackend, ReplayBackend if config.replay_traj_file: - from rock.sdk.model.server.integrations.traj_replayer import SequentialCursor + from rock.sdk.model.server.traj import SequentialCursor cursor = SequentialCursor.load(config.replay_traj_file) app.state.backend = ReplayBackend(cursor) logger.info(f"replay backend attached, traj_path={config.replay_traj_file}") return - from rock.sdk.model.server.integrations.traj_recorder import TrajectoryRecorder + from rock.sdk.model.server.traj import TrajectoryRecorder traj_path = config.traj_file or TRAJ_FILE recorder = TrajectoryRecorder(traj_file=traj_path) diff --git a/rock/sdk/model/server/sse_utils.py b/rock/sdk/model/server/sse.py similarity index 100% rename from rock/sdk/model/server/sse_utils.py rename to rock/sdk/model/server/sse.py diff --git a/rock/sdk/model/server/traj.py b/rock/sdk/model/server/traj.py new file mode 100644 index 0000000000..e12c229c7f --- /dev/null +++ b/rock/sdk/model/server/traj.py @@ -0,0 +1,156 @@ +"""Trajectory record + replay for the chat/completions proxy. + +Two halves around the same JSONL schema (one record per line): + +- :class:`TrajectoryRecorder` — invoked by the forward path after each upstream + call (success or failure). Appends a small dict with + ``request`` / ``response`` / ``status`` / ``response_time`` / ``model`` / + ``stream``, and reports OTLP RT/count metrics. Stores responses verbatim + (provider-specific fields like ``reasoning_content`` survive); for streaming + calls ``response`` is the aggregated final ChatCompletion produced by + ``ChatCompletionStreamState.get_final_completion().model_dump()``. + +- :class:`SequentialCursor` — loads a JSONL trajectory once at startup; + ``await cursor.next(expected_model=...)`` hands out the next record (full + payload dict) and advances. Going past the end raises + :class:`TrajectoryExhausted` so the proxy can return a clean 404. +""" + +from __future__ import annotations + +import asyncio +import json +import os +from pathlib import Path +from typing import Any + +from rock.logger import init_logger +from rock.sdk.model.server.utils import ( + MODEL_SERVICE_REQUEST_COUNT, + MODEL_SERVICE_REQUEST_RT, + _get_or_create_metrics_monitor, +) + +logger = init_logger(__name__) + + +# --------------------------------------------------------------------------- +# Recorder +# --------------------------------------------------------------------------- + + +class TrajectoryRecorder: + """Appends one JSONL line per chat/completions call and reports OTLP metrics.""" + + def __init__(self, traj_file: str | os.PathLike) -> None: + self.traj_file = Path(traj_file) + self.traj_file.parent.mkdir(parents=True, exist_ok=True) + self._lock = asyncio.Lock() + self._monitor = _get_or_create_metrics_monitor() + + async def record( + self, + *, + request: dict[str, Any], + response: dict[str, Any] | None, + status: str, + start_time: float, + end_time: float, + error: str | None = None, + ) -> None: + rt_seconds = end_time - start_time + payload = { + "model": request.get("model"), + "stream": bool(request.get("stream")), + "status": status, + "response_time": rt_seconds, + "start_time": start_time, + "end_time": end_time, + "request": request, + "response": response, + "error": error, + } + + line = json.dumps(payload, ensure_ascii=False, default=str) + "\n" + async with self._lock: + await asyncio.to_thread(self._write_line, line) + + attrs = { + "type": "chat_completions", + "status": status, + "sandbox_id": os.getenv("ROCK_SANDBOX_ID", "unknown"), + } + self._monitor.record_gauge_by_name(MODEL_SERVICE_REQUEST_RT, rt_seconds * 1000.0, attributes=attrs) + self._monitor.record_counter_by_name(MODEL_SERVICE_REQUEST_COUNT, 1, attributes=attrs) + + def _write_line(self, line: str) -> None: + with self.traj_file.open("a", encoding="utf-8") as f: + f.write(line) + + +# --------------------------------------------------------------------------- +# Replay cursor +# --------------------------------------------------------------------------- + + +class TrajectoryExhausted(Exception): + """Raised by ``SequentialCursor.next`` when all recorded steps have been served.""" + + def __init__(self, position: int, total: int) -> None: + super().__init__(f"trajectory exhausted at step {position} (total recorded steps={total})") + self.position = position + self.total = total + + +class SequentialCursor: + """Hands out trajectory records one at a time, in recorded order.""" + + def __init__(self, records: list[dict]) -> None: + self.records = records + self._idx = 0 + self._lock = asyncio.Lock() + + @classmethod + def load(cls, path: str | os.PathLike) -> SequentialCursor: + path = Path(path) + if not path.is_file(): + raise FileNotFoundError(f"traj file not found: {path}") + + records: list[dict] = [] + with path.open("r", encoding="utf-8") as fp: + for line in fp: + line = line.strip() + if not line: + continue + records.append(json.loads(line)) + + logger.info(f"[traj-replay] loaded {len(records)} record(s) from {path}") + return cls(records) + + async def next(self, expected_model: str | None = None) -> dict: + async with self._lock: + if self._idx >= len(self.records): + raise TrajectoryExhausted(position=self._idx, total=len(self.records)) + record = self.records[self._idx] + self._idx += 1 + current_idx = self._idx - 1 + + if expected_model: + recorded_model = record.get("model") + if recorded_model and recorded_model != expected_model: + logger.warning( + f"[traj-replay] step {current_idx} model mismatch: " + f"recorded={recorded_model!r} requested={expected_model!r}" + ) + return record + + def reset(self) -> None: + self._idx = 0 + + @property + def position(self) -> int: + return self._idx + + @property + def total(self) -> int: + return len(self.records) diff --git a/tests/unit/sdk/model/test_proxy.py b/tests/unit/sdk/model/test_proxy.py index f47813ac8b..cd262df2e0 100644 --- a/tests/unit/sdk/model/test_proxy.py +++ b/tests/unit/sdk/model/test_proxy.py @@ -17,8 +17,8 @@ from rock.sdk.model.server.api.proxy import proxy_router from rock.sdk.model.server.config import ModelServiceConfig -from rock.sdk.model.server.integrations.traj_replayer import SequentialCursor from rock.sdk.model.server.main import create_config_from_args, lifespan +from rock.sdk.model.server.traj import SequentialCursor from rock.sdk.model.server.utils import ( MODEL_SERVICE_REQUEST_COUNT, MODEL_SERVICE_REQUEST_RT, @@ -386,7 +386,7 @@ def handler(request: httpx.Request) -> httpx.Response: @pytest.mark.asyncio async def test_forward_invokes_recorder_on_success(tmp_path): """When a recorder is attached to the backend, success calls write a JSONL line.""" - from rock.sdk.model.server.integrations.traj_recorder import TrajectoryRecorder + from rock.sdk.model.server.traj import TrajectoryRecorder upstream_payload = _success_response_json(content="recorded reply") traj_file = tmp_path / "traj.jsonl" @@ -396,12 +396,7 @@ def handler(request: httpx.Request) -> httpx.Response: config = ModelServiceConfig() - with ( - _patch_httpx_with_handler(handler), - patch( - "rock.sdk.model.server.integrations.traj_recorder._get_or_create_metrics_monitor", return_value=MagicMock() - ), - ): + with _patch_httpx_with_handler(handler): recorder = TrajectoryRecorder(traj_file=traj_file) app = _build_app(config, recorder=recorder) transport = ASGITransport(app=app) diff --git a/tests/unit/sdk/model/test_proxy_record_replay.py b/tests/unit/sdk/model/test_proxy_record_replay.py index 1bb9832784..0b70ed0cf8 100644 --- a/tests/unit/sdk/model/test_proxy_record_replay.py +++ b/tests/unit/sdk/model/test_proxy_record_replay.py @@ -23,9 +23,8 @@ from rock.sdk.model.server.api.proxy import ForwardBackend, ReplayBackend, proxy_router from rock.sdk.model.server.config import ModelServiceConfig -from rock.sdk.model.server.integrations.traj_recorder import TrajectoryRecorder -from rock.sdk.model.server.integrations.traj_replayer import SequentialCursor -from rock.sdk.model.server.sse_utils import parse_sse_data_chunks +from rock.sdk.model.server.sse import parse_sse_data_chunks +from rock.sdk.model.server.traj import SequentialCursor, TrajectoryRecorder from rock.utils.system import find_free_port # --------------------------------------------------------------------------- diff --git a/tests/unit/sdk/model/test_sse_utils.py b/tests/unit/sdk/model/test_sse.py similarity index 99% rename from tests/unit/sdk/model/test_sse_utils.py rename to tests/unit/sdk/model/test_sse.py index f660d2751b..251016a0a8 100644 --- a/tests/unit/sdk/model/test_sse_utils.py +++ b/tests/unit/sdk/model/test_sse.py @@ -2,7 +2,7 @@ import json -from rock.sdk.model.server.sse_utils import ( +from rock.sdk.model.server.sse import ( SSE_DONE, completion_to_chunk_dict, encode_sse_event, diff --git a/tests/unit/sdk/model/test_traj_recorder.py b/tests/unit/sdk/model/test_traj_recorder.py index 6eb3b49571..3f06481639 100644 --- a/tests/unit/sdk/model/test_traj_recorder.py +++ b/tests/unit/sdk/model/test_traj_recorder.py @@ -5,14 +5,14 @@ import pytest -from rock.sdk.model.server.integrations.traj_recorder import TrajectoryRecorder +from rock.sdk.model.server.traj import TrajectoryRecorder @pytest.fixture def mock_monitor(): monitor = MagicMock() with patch( - "rock.sdk.model.server.integrations.traj_recorder._get_or_create_metrics_monitor", + "rock.sdk.model.server.traj._get_or_create_metrics_monitor", return_value=monitor, ): yield monitor diff --git a/tests/unit/sdk/model/test_traj_replayer.py b/tests/unit/sdk/model/test_traj_replayer.py index e4a379bd0d..ffcc5c4011 100644 --- a/tests/unit/sdk/model/test_traj_replayer.py +++ b/tests/unit/sdk/model/test_traj_replayer.py @@ -9,7 +9,7 @@ import pytest -from rock.sdk.model.server.integrations.traj_replayer import SequentialCursor, TrajectoryExhausted +from rock.sdk.model.server.traj import SequentialCursor, TrajectoryExhausted def _record(*, msg: str, model: str = "gpt-3.5-turbo", call_id: str = "x") -> dict: From dcd7905ff6249ca25c6d42ab03185df9396ae2bd Mon Sep 17 00:00:00 2001 From: "pengshixin.psx" Date: Tue, 12 May 2026 08:42:49 +0000 Subject: [PATCH 16/25] =?UTF-8?q?refactor(model-service):=20rename=20traj?= =?UTF-8?q?=5Ffile=E2=86=92recording=5Ffile,=20replay=5Ftraj=5Ffile?= =?UTF-8?q?=E2=86=92replay=5Ffile?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The old names were ambiguous: "traj_file" alone gave no hint of write vs read, and the CLI flag --traj-file was actually wired to config.replay_traj_file — same word pointing in opposite directions depending on context. New names mirror the backend pair (ForwardBackend = recording, ReplayBackend = replay) so the role is obvious at the field. CLI is split into two independent flags accordingly. Recorder constructor still takes traj_file= since it names the JSONL file type, not its role; only the config-field / CLI surface changes. Co-Authored-By: Claude Opus 4.7 --- rock/sdk/model/server/api/proxy.py | 2 +- rock/sdk/model/server/config.py | 11 +++++----- rock/sdk/model/server/main.py | 34 +++++++++++++++++++----------- tests/unit/sdk/model/test_proxy.py | 34 +++++++++++++++--------------- 4 files changed, 46 insertions(+), 35 deletions(-) diff --git a/rock/sdk/model/server/api/proxy.py b/rock/sdk/model/server/api/proxy.py index 462788fe90..6364d4a131 100644 --- a/rock/sdk/model/server/api/proxy.py +++ b/rock/sdk/model/server/api/proxy.py @@ -12,7 +12,7 @@ returns (provider-specific ``reasoning_content``, ``citations``, ...) is passed through untouched. -2. **ReplayBackend** (``replay_traj_file`` set) — the request is served +2. **ReplayBackend** (``replay_file`` set) — the request is served directly from the next record in the ``SequentialCursor`` without any upstream call. Streaming emits the recorded response as one SSE chunk + ``[DONE]``. diff --git a/rock/sdk/model/server/config.py b/rock/sdk/model/server/config.py index 76e080305c..0f6ed69b66 100644 --- a/rock/sdk/model/server/config.py +++ b/rock/sdk/model/server/config.py @@ -51,12 +51,13 @@ class ModelServiceConfig(BaseModel): request_timeout: int = Field(default=120) """Request timeout in seconds.""" - traj_file: str | None = Field(default=None) - """Override default trajectory file path. None → uses TRAJ_FILE (LOG_DIR/LLMTraj.jsonl).""" + recording_file: str | None = Field(default=None) + """Recording mode output: where ForwardBackend writes the trajectory JSONL. + None → uses TRAJ_FILE (LOG_DIR/LLMTraj.jsonl).""" - replay_traj_file: str | None = Field(default=None) - """Path to a .jsonl trajectory file for replay mode. - When set, requests are served from recorded responses instead of a real upstream.""" + replay_file: str | None = Field(default=None) + """Replay mode input: a .jsonl trajectory file. When set, ReplayBackend serves + requests from recorded responses instead of calling a real upstream.""" @classmethod def from_file(cls, config_path: str | None = None): diff --git a/rock/sdk/model/server/main.py b/rock/sdk/model/server/main.py index 063d4fa31f..89e87ac0f9 100644 --- a/rock/sdk/model/server/main.py +++ b/rock/sdk/model/server/main.py @@ -55,27 +55,28 @@ async def global_exception_handler(request, exc): def _configure_proxy_integrations(app: FastAPI, config: ModelServiceConfig) -> None: """Attach the appropriate backend to ``app.state.backend``. - - Replay mode (``replay_traj_file`` set): ``ReplayBackend`` wrapping a + - Replay mode (``replay_file`` set): ``ReplayBackend`` wrapping a ``SequentialCursor``; no recorder — replaying back into the source file would corrupt it. - - Forward mode (default): ``ForwardBackend`` with a ``TrajectoryRecorder``. + - Forward mode (default): ``ForwardBackend`` with a ``TrajectoryRecorder`` + writing to ``recording_file`` (or ``TRAJ_FILE`` if unset). """ from rock.sdk.model.server.api.proxy import ForwardBackend, ReplayBackend - if config.replay_traj_file: + if config.replay_file: from rock.sdk.model.server.traj import SequentialCursor - cursor = SequentialCursor.load(config.replay_traj_file) + cursor = SequentialCursor.load(config.replay_file) app.state.backend = ReplayBackend(cursor) - logger.info(f"replay backend attached, traj_path={config.replay_traj_file}") + logger.info(f"replay backend attached, replay_file={config.replay_file}") return from rock.sdk.model.server.traj import TrajectoryRecorder - traj_path = config.traj_file or TRAJ_FILE - recorder = TrajectoryRecorder(traj_file=traj_path) + recording_path = config.recording_file or TRAJ_FILE + recorder = TrajectoryRecorder(traj_file=recording_path) app.state.backend = ForwardBackend(config, recorder=recorder) - logger.info(f"forward backend attached, traj_file={traj_path}") + logger.info(f"forward backend attached, recording_file={recording_path}") def main( @@ -127,9 +128,12 @@ def create_config_from_args(args) -> ModelServiceConfig: if args.request_timeout: config.request_timeout = args.request_timeout logger.info(f"request_timeout set from command line: {args.request_timeout}s") - if args.traj_file: - config.replay_traj_file = args.traj_file - logger.info(f"replay mode enabled via --traj-file: {args.traj_file}") + if args.recording_file: + config.recording_file = args.recording_file + logger.info(f"recording_file set from command line: {args.recording_file}") + if args.replay_file: + config.replay_file = args.replay_file + logger.info(f"replay mode enabled via --replay-file: {args.replay_file}") return config @@ -173,7 +177,13 @@ def create_config_from_args(args) -> ModelServiceConfig: "--request-timeout", type=int, default=None, help="Request timeout in seconds. Overrides config file." ) parser.add_argument( - "--traj-file", + "--recording-file", + type=str, + default=None, + help="Forward mode: where to write the trajectory JSONL. Defaults to TRAJ_FILE.", + ) + parser.add_argument( + "--replay-file", type=str, default=None, help="Replay mode: path to a recorded .jsonl traj file. Disables real LLM upstreams.", diff --git a/tests/unit/sdk/model/test_proxy.py b/tests/unit/sdk/model/test_proxy.py index cd262df2e0..b658b9343e 100644 --- a/tests/unit/sdk/model/test_proxy.py +++ b/tests/unit/sdk/model/test_proxy.py @@ -439,7 +439,7 @@ async def test_replay_returns_recorded_response_no_upstream_call(tmp_path): traj.write_text(json.dumps(record) + "\n", encoding="utf-8") config = ModelServiceConfig() - config.replay_traj_file = str(traj) + config.replay_file = str(traj) app = _build_app(config, replay_cursor=SequentialCursor.load(traj)) transport = ASGITransport(app=app) @@ -474,7 +474,7 @@ async def test_replay_streaming_emits_recorded_response_as_sse(tmp_path): traj.write_text(json.dumps(record) + "\n", encoding="utf-8") config = ModelServiceConfig() - config.replay_traj_file = str(traj) + config.replay_file = str(traj) app = _build_app(config, replay_cursor=SequentialCursor.load(traj)) transport = ASGITransport(app=app) @@ -504,7 +504,7 @@ async def test_replay_returns_404_when_cursor_exhausted(tmp_path): traj.write_text(json.dumps(record) + "\n", encoding="utf-8") config = ModelServiceConfig() - config.replay_traj_file = str(traj) + config.replay_file = str(traj) app = _build_app(config, replay_cursor=SequentialCursor.load(traj)) transport = ASGITransport(app=app) @@ -550,26 +550,26 @@ def test_config_default_host_and_port(): assert config.port == 8080 -def test_config_default_traj_and_replay(): +def test_config_default_recording_and_replay(): config = ModelServiceConfig() - assert config.traj_file is None - assert config.replay_traj_file is None + assert config.recording_file is None + assert config.replay_file is None @pytest.mark.asyncio -async def test_config_loads_traj_and_replay_from_file(tmp_path): +async def test_config_loads_recording_and_replay_from_file(tmp_path): conf_file = tmp_path / "proxy.yml" conf_file.write_text( yaml.dump( { - "traj_file": "/tmp/my-traj.jsonl", - "replay_traj_file": "/tmp/in.jsonl", + "recording_file": "/tmp/my-traj.jsonl", + "replay_file": "/tmp/in.jsonl", } ) ) config = ModelServiceConfig.from_file(str(conf_file)) - assert config.traj_file == "/tmp/my-traj.jsonl" - assert config.replay_traj_file == "/tmp/in.jsonl" + assert config.recording_file == "/tmp/my-traj.jsonl" + assert config.replay_file == "/tmp/in.jsonl" def test_cli_args_override_config_file(tmp_path): @@ -591,8 +591,8 @@ def test_cli_args_override_config_file(tmp_path): proxy_base_url="https://cli-url.example.com/v1", retryable_status_codes=None, request_timeout=30, - num_retries=None, - traj_file=None, + recording_file=None, + replay_file=None, ) config = create_config_from_args(args) assert config.host == "0.0.0.0" @@ -601,7 +601,7 @@ def test_cli_args_override_config_file(tmp_path): assert config.request_timeout == 30 -def test_cli_traj_file_enables_replay(): +def test_cli_replay_file_enables_replay(): args = argparse.Namespace( config_file=None, host=None, @@ -609,11 +609,11 @@ def test_cli_traj_file_enables_replay(): proxy_base_url=None, retryable_status_codes=None, request_timeout=None, - num_retries=None, - traj_file="/tmp/in.jsonl", + recording_file=None, + replay_file="/tmp/in.jsonl", ) config = create_config_from_args(args) - assert config.replay_traj_file == "/tmp/in.jsonl" + assert config.replay_file == "/tmp/in.jsonl" # ---------- Metrics singleton + legacy record_traj (still used by local mode) ---------- From b87b61d52e1eeb380b59f4c4d1a92ec730cf6d7d Mon Sep 17 00:00:00 2001 From: "pengshixin.psx" Date: Tue, 12 May 2026 08:47:40 +0000 Subject: [PATCH 17/25] feat(model-service): enforce recording_file/replay_file mutex via model_validator Setting both at once was silently resolved in favor of replay (the backend factory checks replay_file first), masking what is really a configuration error. A Pydantic model_validator now rejects the combination at construction time; validate_assignment=True extends the check to CLI-style field-by-field overrides applied after a yaml load. Three tests added: construction-time mutex, assignment-time mutex, and the existing yaml-load test split into one-side-only variants since the original deliberately set both fields. Co-Authored-By: Claude Opus 4.7 --- rock/sdk/model/server/config.py | 15 +++++++++++++- tests/unit/sdk/model/test_proxy.py | 33 ++++++++++++++++++++++-------- 2 files changed, 38 insertions(+), 10 deletions(-) diff --git a/rock/sdk/model/server/config.py b/rock/sdk/model/server/config.py index 0f6ed69b66..e734c29878 100644 --- a/rock/sdk/model/server/config.py +++ b/rock/sdk/model/server/config.py @@ -1,7 +1,7 @@ from pathlib import Path import yaml -from pydantic import BaseModel, Field +from pydantic import BaseModel, ConfigDict, Field, model_validator from rock import env_vars @@ -27,6 +27,10 @@ class ModelServiceConfig(BaseModel): """Configuration for the LLM Model Service.""" + # validate_assignment=True so the recording/replay mutex below also fires when + # CLI overrides are applied field-by-field (not only at construction time). + model_config = ConfigDict(validate_assignment=True) + host: str = "0.0.0.0" """Server host address.""" @@ -59,6 +63,15 @@ class ModelServiceConfig(BaseModel): """Replay mode input: a .jsonl trajectory file. When set, ReplayBackend serves requests from recorded responses instead of calling a real upstream.""" + @model_validator(mode="after") + def _recording_replay_mutually_exclusive(self): + if self.recording_file and self.replay_file: + raise ValueError( + "recording_file and replay_file are mutually exclusive — " + "set one (recording mode) or the other (replay mode), not both." + ) + return self + @classmethod def from_file(cls, config_path: str | None = None): """ diff --git a/tests/unit/sdk/model/test_proxy.py b/tests/unit/sdk/model/test_proxy.py index b658b9343e..345ea31775 100644 --- a/tests/unit/sdk/model/test_proxy.py +++ b/tests/unit/sdk/model/test_proxy.py @@ -557,19 +557,34 @@ def test_config_default_recording_and_replay(): @pytest.mark.asyncio -async def test_config_loads_recording_and_replay_from_file(tmp_path): +async def test_config_loads_recording_file_from_yaml(tmp_path): conf_file = tmp_path / "proxy.yml" - conf_file.write_text( - yaml.dump( - { - "recording_file": "/tmp/my-traj.jsonl", - "replay_file": "/tmp/in.jsonl", - } - ) - ) + conf_file.write_text(yaml.dump({"recording_file": "/tmp/my-traj.jsonl"})) config = ModelServiceConfig.from_file(str(conf_file)) assert config.recording_file == "/tmp/my-traj.jsonl" + assert config.replay_file is None + + +@pytest.mark.asyncio +async def test_config_loads_replay_file_from_yaml(tmp_path): + conf_file = tmp_path / "proxy.yml" + conf_file.write_text(yaml.dump({"replay_file": "/tmp/in.jsonl"})) + config = ModelServiceConfig.from_file(str(conf_file)) assert config.replay_file == "/tmp/in.jsonl" + assert config.recording_file is None + + +def test_config_recording_and_replay_are_mutually_exclusive(): + """Setting both at construction time fails Pydantic validation.""" + with pytest.raises(ValueError, match="mutually exclusive"): + ModelServiceConfig(recording_file="/tmp/a.jsonl", replay_file="/tmp/b.jsonl") + + +def test_config_recording_replay_mutex_fires_on_assignment(): + """validate_assignment=True so CLI-style field-by-field overrides also trip the mutex.""" + config = ModelServiceConfig(recording_file="/tmp/a.jsonl") + with pytest.raises(ValueError, match="mutually exclusive"): + config.replay_file = "/tmp/b.jsonl" def test_cli_args_override_config_file(tmp_path): From 8448ef24cda380831ff9768b1d60ae0f0d682766 Mon Sep 17 00:00:00 2001 From: "pengshixin.psx" Date: Tue, 12 May 2026 08:50:48 +0000 Subject: [PATCH 18/25] refactor(model-service): move _replay_sse_iter into ReplayBackend as a staticmethod Module-level function with a single call site inside ReplayBackend; the SSE chunk-emit shape is purely a replay-mode implementation detail. Moving it inside the class also makes the pairing with the JSON branch in serve() visible at a glance. Co-Authored-By: Claude Opus 4.7 --- rock/sdk/model/server/api/proxy.py | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/rock/sdk/model/server/api/proxy.py b/rock/sdk/model/server/api/proxy.py index 6364d4a131..3ada8615c8 100644 --- a/rock/sdk/model/server/api/proxy.py +++ b/rock/sdk/model/server/api/proxy.py @@ -142,12 +142,6 @@ def _filter_headers(headers) -> dict[str, str]: return out -async def _replay_sse_iter(response: dict, *, model: str) -> AsyncIterator[bytes]: - """Emit a recorded response as one SSE chunk + ``[DONE]``.""" - yield encode_sse_event(completion_to_chunk_dict(response, model=model)) - yield SSE_DONE - - async def _forward_stream_and_record( *, upstream_url: str, @@ -264,11 +258,17 @@ async def serve(self, *, model_name: str, is_stream: bool, **_: Any) -> Response if is_stream: return StreamingResponse( - _replay_sse_iter(response_dict, model=model_name), + self._sse_iter(response_dict, model=model_name), media_type="text/event-stream", ) return JSONResponse(status_code=200, content=response_dict) + @staticmethod + async def _sse_iter(response: dict, *, model: str) -> AsyncIterator[bytes]: + """Emit a recorded response as one SSE chunk + ``[DONE]``.""" + yield encode_sse_event(completion_to_chunk_dict(response, model=model)) + yield SSE_DONE + class ForwardBackend: """Forwards requests byte-for-byte to the upstream and optionally records the trajectory.""" From d169ebe004beb54ac8177ba4bba79e2e325f147f Mon Sep 17 00:00:00 2001 From: "pengshixin.psx" Date: Tue, 12 May 2026 08:52:26 +0000 Subject: [PATCH 19/25] refactor(model-service): move get_base_url into ForwardBackend as _resolve_base_url MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Single call site inside ForwardBackend.serve, and the function reads only self._config — drop the redundant config parameter and rename to _resolve_base_url to make the multi-source fallback (proxy_base_url → proxy_rules[model] → proxy_rules['default']) explicit. Co-Authored-By: Claude Opus 4.7 --- rock/sdk/model/server/api/proxy.py | 49 +++++++++++++++--------------- 1 file changed, 24 insertions(+), 25 deletions(-) diff --git a/rock/sdk/model/server/api/proxy.py b/rock/sdk/model/server/api/proxy.py index 3ada8615c8..b9f9910edd 100644 --- a/rock/sdk/model/server/api/proxy.py +++ b/rock/sdk/model/server/api/proxy.py @@ -106,30 +106,6 @@ async def _send_with_retry( raise last_exc # pragma: no cover # unreachable -def get_base_url(model_name: str, config: ModelServiceConfig) -> str: - """Pick the upstream base URL by model name. - - ``proxy_base_url`` takes precedence; falls back to ``proxy_rules[model]`` and - then ``proxy_rules["default"]``. Trailing slashes are stripped so the caller - can append ``/chat/completions`` directly. - """ - if config.proxy_base_url: - return config.proxy_base_url.rstrip("/") - - if not model_name: - raise HTTPException(status_code=400, detail="Model name is required for routing.") - - rules = config.proxy_rules - base_url = rules.get(model_name) or rules.get("default") - if not base_url: - raise HTTPException( - status_code=400, - detail=f"Model '{model_name}' is not configured and no 'default' rule found.", - ) - - return base_url.rstrip("/") - - def _filter_headers(headers) -> dict[str, str]: """Drop headers that are scoped to the client↔proxy hop or rebuilt by httpx. ``Authorization`` is forwarded verbatim — proxy stays stateless about which @@ -277,6 +253,29 @@ def __init__(self, config: ModelServiceConfig, recorder: TrajectoryRecorder | No self._config = config self._recorder = recorder + def _resolve_base_url(self, model_name: str) -> str: + """Pick the upstream base URL by model name. + + ``proxy_base_url`` takes precedence; falls back to ``proxy_rules[model]`` and + then ``proxy_rules["default"]``. Trailing slashes are stripped so the caller + can append ``/chat/completions`` directly. + """ + if self._config.proxy_base_url: + return self._config.proxy_base_url.rstrip("/") + + if not model_name: + raise HTTPException(status_code=400, detail="Model name is required for routing.") + + rules = self._config.proxy_rules + base_url = rules.get(model_name) or rules.get("default") + if not base_url: + raise HTTPException( + status_code=400, + detail=f"Model '{model_name}' is not configured and no 'default' rule found.", + ) + + return base_url.rstrip("/") + async def serve( self, *, @@ -287,7 +286,7 @@ async def serve( request_dict: dict[str, Any], **_: Any, ) -> Response: - upstream_url = f"{get_base_url(model_name, self._config)}/chat/completions" + upstream_url = f"{self._resolve_base_url(model_name)}/chat/completions" logger.info(f"Routing model {model_name!r} to {upstream_url}") if is_stream: From 1d6b2d1b60dd09dd5142823805a8b000de63edba Mon Sep 17 00:00:00 2001 From: "pengshixin.psx" Date: Tue, 12 May 2026 08:59:43 +0000 Subject: [PATCH 20/25] refactor(model-service): move _forward_stream_and_record into ForwardBackend as _stream_and_record MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Same single-call-site argument as the previous moves: 3 of the 7 kwargs were just relaying self._config / self._recorder. As an instance method the parameter list drops to 4 and the streaming path mirrors the structure of ReplayBackend._sse_iter. _send_with_retry stays at module scope — it's a pure helper bound to the httpx.AsyncClient lifecycle, not to any backend state. Co-Authored-By: Claude Opus 4.7 --- rock/sdk/model/server/api/proxy.py | 190 ++++++++++++++--------------- 1 file changed, 92 insertions(+), 98 deletions(-) diff --git a/rock/sdk/model/server/api/proxy.py b/rock/sdk/model/server/api/proxy.py index b9f9910edd..73f74e3f62 100644 --- a/rock/sdk/model/server/api/proxy.py +++ b/rock/sdk/model/server/api/proxy.py @@ -118,100 +118,6 @@ def _filter_headers(headers) -> dict[str, str]: return out -async def _forward_stream_and_record( - *, - upstream_url: str, - body_bytes: bytes, - fwd_headers: dict[str, str], - timeout: float, - request_dict: dict[str, Any], - recorder: TrajectoryRecorder | None, - retryable_codes: list[int], -) -> AsyncIterator[bytes]: - """SSE bytes are forwarded verbatim; chunks are parsed in parallel and - aggregated into the final ChatCompletion that the recorder writes to JSONL. - - Retry on connection errors and whitelisted statuses happens BEFORE any byte - is yielded; mid-stream connection drops are not retried (would corrupt the - client transmission).""" - # openai SDK is used purely as a stream-aggregation parser — keep the import - # local so module load doesn't pull it in for callers that never stream. - from openai.lib.streaming.chat import ChatCompletionStreamState - from openai.types.chat import ChatCompletionChunk - - state = ChatCompletionStreamState() - start = time.time() - parse_buffer = b"" - upstream_status = 0 - - async with httpx.AsyncClient(timeout=timeout) as client: - try: - resp = await _send_with_retry( - client, - upstream_url, - body_bytes=body_bytes, - headers=fwd_headers, - retryable_codes=retryable_codes, - ) - except (httpx.TimeoutException, httpx.ConnectError) as exc: - if recorder is not None: - await recorder.record( - request=request_dict, - response=None, - status="failure", - start_time=start, - end_time=time.time(), - error=f"{type(exc).__name__}: {exc}", - ) - return - - try: - upstream_status = resp.status_code - async for chunk in resp.aiter_bytes(): - yield chunk - chunk_dicts, parse_buffer = parse_sse_data_chunks(parse_buffer + chunk) - for chunk_dict in chunk_dicts: - try: - state.handle_chunk(ChatCompletionChunk.model_validate(chunk_dict)) - except Exception as exc: # parser error: forward continues, traj will be partial - logger.debug(f"[record] chunk parse failed (forward continues): {exc}") - except httpx.RequestError as exc: - # Connection died mid-stream — bytes already sent reach the client; - # record what we got and return. - if recorder is not None: - await recorder.record( - request=request_dict, - response=None, - status="failure", - start_time=start, - end_time=time.time(), - error=f"{type(exc).__name__}: {exc}", - ) - return - finally: - await resp.aclose() - - if recorder is None: - return - - status = "success" if upstream_status < 400 else "failure" - final_dict: dict | None = None - if status == "success": - try: - final_dict = state.get_final_completion().model_dump() - except Exception as exc: - logger.warning(f"[record] stream aggregation failed: {exc}") - - await recorder.record( - request=request_dict, - response=final_dict, - status=status, - start_time=start, - end_time=time.time(), - error=None if status == "success" else f"upstream_status={upstream_status}", - ) - - class ReplayBackend: """Serves requests from a pre-recorded trajectory; no upstream calls made.""" @@ -291,14 +197,11 @@ async def serve( if is_stream: return StreamingResponse( - _forward_stream_and_record( + self._stream_and_record( upstream_url=upstream_url, body_bytes=body_bytes, fwd_headers=fwd_headers, - timeout=self._config.request_timeout, request_dict=request_dict, - recorder=self._recorder, - retryable_codes=self._config.retryable_status_codes, ), media_type="text/event-stream", ) @@ -366,6 +269,97 @@ async def serve( # Forward bytes verbatim — preserves any provider-specific fields untouched. return Response(content=response_bytes, status_code=status_code, media_type=content_type) + async def _stream_and_record( + self, + *, + upstream_url: str, + body_bytes: bytes, + fwd_headers: dict[str, str], + request_dict: dict[str, Any], + ) -> AsyncIterator[bytes]: + """SSE bytes are forwarded verbatim; chunks are parsed in parallel and + aggregated into the final ChatCompletion that the recorder writes to JSONL. + + Retry on connection errors and whitelisted statuses happens BEFORE any byte + is yielded; mid-stream connection drops are not retried (would corrupt the + client transmission).""" + # openai SDK is used purely as a stream-aggregation parser — keep the import + # local so module load doesn't pull it in for callers that never stream. + from openai.lib.streaming.chat import ChatCompletionStreamState + from openai.types.chat import ChatCompletionChunk + + state = ChatCompletionStreamState() + start = time.time() + parse_buffer = b"" + upstream_status = 0 + + async with httpx.AsyncClient(timeout=self._config.request_timeout) as client: + try: + resp = await _send_with_retry( + client, + upstream_url, + body_bytes=body_bytes, + headers=fwd_headers, + retryable_codes=self._config.retryable_status_codes, + ) + except (httpx.TimeoutException, httpx.ConnectError) as exc: + if self._recorder is not None: + await self._recorder.record( + request=request_dict, + response=None, + status="failure", + start_time=start, + end_time=time.time(), + error=f"{type(exc).__name__}: {exc}", + ) + return + + try: + upstream_status = resp.status_code + async for chunk in resp.aiter_bytes(): + yield chunk + chunk_dicts, parse_buffer = parse_sse_data_chunks(parse_buffer + chunk) + for chunk_dict in chunk_dicts: + try: + state.handle_chunk(ChatCompletionChunk.model_validate(chunk_dict)) + except Exception as exc: # parser error: forward continues, traj will be partial + logger.debug(f"[record] chunk parse failed (forward continues): {exc}") + except httpx.RequestError as exc: + # Connection died mid-stream — bytes already sent reach the client; + # record what we got and return. + if self._recorder is not None: + await self._recorder.record( + request=request_dict, + response=None, + status="failure", + start_time=start, + end_time=time.time(), + error=f"{type(exc).__name__}: {exc}", + ) + return + finally: + await resp.aclose() + + if self._recorder is None: + return + + status = "success" if upstream_status < 400 else "failure" + final_dict: dict | None = None + if status == "success": + try: + final_dict = state.get_final_completion().model_dump() + except Exception as exc: + logger.warning(f"[record] stream aggregation failed: {exc}") + + await self._recorder.record( + request=request_dict, + response=final_dict, + status=status, + start_time=start, + end_time=time.time(), + error=None if status == "success" else f"upstream_status={upstream_status}", + ) + CompletionBackend = ReplayBackend | ForwardBackend From 55475496b83652596a6f4bc78508b263b2291ae5 Mon Sep 17 00:00:00 2001 From: "pengshixin.psx" Date: Tue, 12 May 2026 09:03:27 +0000 Subject: [PATCH 21/25] docs(model-service): move + rewrite proxy README under docs/dev/model-service/ Old examples/model_service/README.md was stale: still mentioned litellm, StandardLoggingPayload, --num-retries, and the conflated --traj-file flag. Rewritten to reflect current shape: ForwardBackend / ReplayBackend pair, recording_file / replay_file mutex, retry-on-status-code with the documented attempt budget, openai SDK only used as the stream-state aggregator behind the forwarding path. Also calls out that the rock model-service start subcommand hasn't been wired up with the new flags yet. Co-Authored-By: Claude Opus 4.7 --- docs/dev/model-service/proxy.md | 155 +++++++++++++++++++++++++++++++ examples/model_service/README.md | 90 ------------------ 2 files changed, 155 insertions(+), 90 deletions(-) create mode 100644 docs/dev/model-service/proxy.md delete mode 100644 examples/model_service/README.md diff --git a/docs/dev/model-service/proxy.md b/docs/dev/model-service/proxy.md new file mode 100644 index 0000000000..bcf8a64266 --- /dev/null +++ b/docs/dev/model-service/proxy.md @@ -0,0 +1,155 @@ +# model-service `proxy` 模式 + +`rock model-service` 的 proxy 模式在 `/v1/chat/completions` 上提供一个 OpenAI 兼容的转发层, +两种工作模式互斥: + +| 模式 | 触发条件 | 上游调用 | 写盘 | +|-----------|---------------------------------------|----------|----------------------| +| Recording | 默认 | 真实调用 | append 到 JSONL traj | +| Replay | `--replay-file` / `replay_file` 设置 | 不调用 | 不写 | + +设计目标是让 SWE-agent / mini-swe-agent / OpenHands 等 agent 框架在录制 → 回放之间无感切换: +agent 不变,只换 base URL。 + +下文所有命令以 `python -m rock.sdk.model.server.main` 启动。注意 `rock model-service start` +子命令目前**还没**对外暴露 `--recording-file` / `--replay-file`(CLI argparse 在 +[rock/cli/command/model_service.py](../../../rock/cli/command/model_service.py) 单独定义), +所以涉及录制/回放的场景必须走 `python -m` 入口。 + +--- + +## 1. Recording(默认) + +转发到单个上游,每次调用 append 一行 JSONL 到 `recording_file`(缺省 `LOG_DIR/LLMTraj.jsonl`, +其中 `LOG_DIR = $ROCK_MODEL_SERVICE_DATA_DIR`): + +```bash +export OPENAI_API_KEY="sk-..." +export ROCK_MODEL_SERVICE_DATA_DIR=/tmp/rock-traj + +python -m rock.sdk.model.server.main \ + --type proxy \ + --proxy-base-url https://api.openai.com/v1 \ + --port 8080 +``` + +调用: + +```bash +curl -X POST http://127.0.0.1:8080/v1/chat/completions \ + -H "Authorization: Bearer $OPENAI_API_KEY" \ + -H "Content-Type: application/json" \ + -d '{"model":"gpt-3.5-turbo","messages":[{"role":"user","content":"hi"}]}' + +cat /tmp/rock-traj/LLMTraj.jsonl | jq '.model, .response.choices[0].message.content' +``` + +流式同样支持,上游字节原样转给客户端,recorder 在后台聚合最终的 `ChatCompletion` 写盘 +(用 openai SDK 的 `ChatCompletionStreamState`,所以 `tool_calls.function.arguments` 等 +跨 chunk 拼接的字段会被还原成完整形态): + +```bash +curl -N -X POST http://127.0.0.1:8080/v1/chat/completions \ + -H "Authorization: Bearer $OPENAI_API_KEY" \ + -H "Content-Type: application/json" \ + -d '{"model":"gpt-3.5-turbo","stream":true,"messages":[{"role":"user","content":"count to 5"}]}' +``` + +显式指定写到别的路径: + +```bash +python -m rock.sdk.model.server.main \ + --type proxy \ + --proxy-base-url https://api.openai.com/v1 \ + --recording-file /tmp/my-session.jsonl \ + --port 8080 +``` + +--- + +## 2. Replay + +把 `--replay-file` 指到一个录好的 jsonl,proxy 不再访问真实 LLM,按录制顺序返回响应; +agent 把 base URL 换成 `http://127.0.0.1:8081/v1` 即可重放: + +```bash +python -m rock.sdk.model.server.main \ + --type proxy \ + --replay-file /tmp/rock-traj/LLMTraj.jsonl \ + --port 8081 +``` + +行为细节: + +- cursor 单调推进,每次请求消耗一条记录;用尽后返回 **404**。 +- 流式请求会拿录制的 `ChatCompletion` 重新发一帧 SSE chunk + `[DONE]`。 + `tool_calls` 的 `index` 字段会被自动注入(OpenAI 的流式协议要求 chunk delta 上有 `index`, + 但录制态的 `message.tool_calls` 没有)。 +- request 里的 `model` 会跟录制的 `model` 比对,不一致只打 warning,不阻断。 + +`recording_file` 和 `replay_file` 是**互斥**的——同时配置(无论是 CLI 还是 YAML)会在启动时 +被 Pydantic `model_validator` 拦下并报 `ValidationError`,避免"录到一半把源文件覆盖"这类隐性 bug。 + +--- + +## 3. 重试和超时 + +- 默认对 connection error / timeout 和 `retryable_status_codes`(默认 `[429, 500]`)触发重试, + 最多 6 次,指数退避 2s 起步 ×2 + 抖动;最后一次仍失败时把上游响应原样转给客户端 + (**不**包装成 502/504,让 agent 自己看到真实状态码)。 +- 对**流式**请求,重试只发生在第一个字节抵达客户端**之前**——一旦字节流开始转发, + 连接中断不会重试(已发出去的字节无法收回)。 + +```bash +python -m rock.sdk.model.server.main \ + --type proxy \ + --proxy-base-url https://api.openai.com/v1 \ + --retryable-status-codes 429,500,502,503 \ + --request-timeout 60 \ + --port 8080 +``` + +--- + +## 4. 多模型路由(YAML) + +按 model name 分流到不同上游需要 YAML(CLI 只暴露单一 `--proxy-base-url`)。新建 `routes.yaml`: + +```yaml +proxy_rules: + gpt-3.5-turbo: "https://api.openai.com/v1" + gpt-4o: "https://api.openai.com/v1" + default: "https://api-inference.modelscope.cn/v1" + +retryable_status_codes: [429, 500, 502] +request_timeout: 60 +recording_file: /tmp/rock-traj/multi.jsonl +``` + +启动: + +```bash +python -m rock.sdk.model.server.main \ + --type proxy \ + --config-file routes.yaml \ + --port 8080 +``` + +CLI flag(`--proxy-base-url` / `--port` / `--retryable-status-codes` / ...)覆盖 YAML 同名字段。 +路由解析顺序:`proxy_base_url` → `proxy_rules[model]` → `proxy_rules["default"]`,都没有则 400。 + +--- + +## 5. 实现要点(仅供参考) + +- `chat_completions` endpoint 把请求分发给 `app.state.backend`,后者要么是 `ForwardBackend` + 要么是 `ReplayBackend`,由启动时的 `_configure_proxy_integrations` 根据 `replay_file` + 是否设置二选一注入。 +- `ForwardBackend` 走 httpx 字节透传:non-stream 是 `await resp.aread()`,stream 是 + `resp.aiter_bytes()` 直接 yield 给客户端,**不**经过任何 SDK 的反序列化/再序列化,所以上游 + 返回的 `reasoning_content` / `provider_specific_fields` 等任意 vendor 字段都不会被吃掉。 + recorder 在另一条独立路径上把字节流喂给 openai SDK 的 stream-state aggregator,仅用于写盘。 +- `ReplayBackend` 完全本地,不持有 httpx client。 + +更深入的代码导览看 [rock/sdk/model/server/api/proxy.py](../../../rock/sdk/model/server/api/proxy.py) +顶部的 module docstring。 diff --git a/examples/model_service/README.md b/examples/model_service/README.md deleted file mode 100644 index 7a169764fe..0000000000 --- a/examples/model_service/README.md +++ /dev/null @@ -1,90 +0,0 @@ -# model-service proxy 用法示例 - -`rock model-service` 的 `proxy` 模式把 `/v1/chat/completions` 转发到上游 LLM,并把每次调用以 -`StandardLoggingPayload` 格式 append 到 JSONL traj 文件。配合 `--traj-file` 可以让相同 base URL 的 -agent(SWE-agent / mini-swe-agent / OpenHands)从录制的 traj 回放,实现"无 LLM 成本"调试。 - -下面所有命令都用 `python -m rock.sdk.model.server.main` 启动,等价于 `rock model-service start`。 - -## 1. Record 模式(默认) - -转发到单个上游,每次调用 append 到 `LOG_DIR/LLMTraj.jsonl`: - -```bash -export OPENAI_API_KEY="sk-..." -export ROCK_MODEL_SERVICE_DATA_DIR=/tmp/rock-traj # traj 文件落盘根目录 - -python -m rock.sdk.model.server.main \ - --type proxy \ - --proxy-base-url https://api.openai.com/v1 \ - --port 8080 -``` - -调用: - -```bash -curl -X POST http://127.0.0.1:8080/v1/chat/completions \ - -H "Authorization: Bearer $OPENAI_API_KEY" \ - -H "Content-Type: application/json" \ - -d '{"model":"gpt-3.5-turbo","messages":[{"role":"user","content":"hi"}]}' - -# 查看 traj -cat /tmp/rock-traj/LLMTraj.jsonl | jq '.id, .model, .response.choices[0].message.content' -``` - -支持流式(litellm 自动聚合写入 traj): - -```bash -curl -N -X POST http://127.0.0.1:8080/v1/chat/completions \ - -H "Authorization: Bearer $OPENAI_API_KEY" \ - -H "Content-Type: application/json" \ - -d '{"model":"gpt-3.5-turbo","stream":true,"messages":[{"role":"user","content":"count to 5"}]}' -``` - -## 2. Replay 模式 - -把 `--traj-file` 指到一个录好的 jsonl,proxy 不再访问真实 LLM,按录制顺序返回响应: - -```bash -python -m rock.sdk.model.server.main \ - --type proxy \ - --traj-file /tmp/rock-traj/LLMTraj.jsonl \ - --port 8081 -``` - -agent 把 base URL 换成 `http://127.0.0.1:8081/v1` 即可重放,cursor 用尽后返回 404。 -`--traj-file` 必须是单个 jsonl 文件路径。 - -## 3. 调整重试和超时 - -```bash -python -m rock.sdk.model.server.main \ - --type proxy \ - --proxy-base-url https://api.openai.com/v1 \ - --num-retries 3 \ - --request-timeout 60 \ - --port 8080 -``` - -## 4. 多模型路由(需要 YAML) - -只有在按 model name 分流到不同上游时才需要 YAML(CLI 只暴露单一 `--proxy-base-url`)。新建 -`routes.yaml`: - -```yaml -proxy_rules: - gpt-3.5-turbo: "https://api.openai.com/v1" - gpt-4o: "https://api.openai.com/v1" - default: "https://api-inference.modelscope.cn/v1" -``` - -启动时配合 CLI: - -```bash -python -m rock.sdk.model.server.main \ - --type proxy \ - --config-file routes.yaml \ - --port 8080 -``` - -CLI 上指定的 `--proxy-base-url` / `--port` / `--num-retries` 等仍会覆盖 YAML 的同名字段。 From d65269c8a511a2167162daed516389ff7337c0b1 Mon Sep 17 00:00:00 2001 From: "pengshixin.psx" Date: Tue, 12 May 2026 09:09:40 +0000 Subject: [PATCH 22/25] feat(model-service): expose --recording-file / --replay-file on rock model-service start MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The two flags previously existed only on the python -m rock.sdk.model.server.main entry; rock model-service start (which subprocess-spawns that same module) had no way to thread them through, forcing users to bypass the CLI for any record/replay scenario. Wire them through ModelServiceCommand argparse → ModelService.start → start_sandbox_service → subprocess argv. Add three tests around the argv construction (default omits both flags, recording_file forwarded, replay_file forwarded). Doc updated: drop the python -m caveat and switch all example commands to rock model-service start. Co-Authored-By: Claude Opus 4.7 --- docs/dev/model-service/proxy.md | 17 ++++++------- rock/cli/command/model_service.py | 14 ++++++++++ rock/sdk/model/service.py | 10 ++++++++ tests/unit/sdk/model/test_service.py | 38 ++++++++++++++++++++++++++++ 4 files changed, 70 insertions(+), 9 deletions(-) create mode 100644 tests/unit/sdk/model/test_service.py diff --git a/docs/dev/model-service/proxy.md b/docs/dev/model-service/proxy.md index bcf8a64266..b0f77da0f3 100644 --- a/docs/dev/model-service/proxy.md +++ b/docs/dev/model-service/proxy.md @@ -11,10 +11,9 @@ 设计目标是让 SWE-agent / mini-swe-agent / OpenHands 等 agent 框架在录制 → 回放之间无感切换: agent 不变,只换 base URL。 -下文所有命令以 `python -m rock.sdk.model.server.main` 启动。注意 `rock model-service start` -子命令目前**还没**对外暴露 `--recording-file` / `--replay-file`(CLI argparse 在 -[rock/cli/command/model_service.py](../../../rock/cli/command/model_service.py) 单独定义), -所以涉及录制/回放的场景必须走 `python -m` 入口。 +下文所有命令以 `rock model-service start` 启动;该子命令最终会 `subprocess` 拉起 +`rock.sdk.model.server.main`,两者支持的 flag 一致。直接调试时也可以用 +`python -m rock.sdk.model.server.main` 跳过 PID 文件管理。 --- @@ -27,7 +26,7 @@ agent 不变,只换 base URL。 export OPENAI_API_KEY="sk-..." export ROCK_MODEL_SERVICE_DATA_DIR=/tmp/rock-traj -python -m rock.sdk.model.server.main \ +rock model-service start \ --type proxy \ --proxy-base-url https://api.openai.com/v1 \ --port 8080 @@ -58,7 +57,7 @@ curl -N -X POST http://127.0.0.1:8080/v1/chat/completions \ 显式指定写到别的路径: ```bash -python -m rock.sdk.model.server.main \ +rock model-service start \ --type proxy \ --proxy-base-url https://api.openai.com/v1 \ --recording-file /tmp/my-session.jsonl \ @@ -73,7 +72,7 @@ python -m rock.sdk.model.server.main \ agent 把 base URL 换成 `http://127.0.0.1:8081/v1` 即可重放: ```bash -python -m rock.sdk.model.server.main \ +rock model-service start \ --type proxy \ --replay-file /tmp/rock-traj/LLMTraj.jsonl \ --port 8081 @@ -101,7 +100,7 @@ python -m rock.sdk.model.server.main \ 连接中断不会重试(已发出去的字节无法收回)。 ```bash -python -m rock.sdk.model.server.main \ +rock model-service start \ --type proxy \ --proxy-base-url https://api.openai.com/v1 \ --retryable-status-codes 429,500,502,503 \ @@ -129,7 +128,7 @@ recording_file: /tmp/rock-traj/multi.jsonl 启动: ```bash -python -m rock.sdk.model.server.main \ +rock model-service start \ --type proxy \ --config-file routes.yaml \ --port 8080 diff --git a/rock/cli/command/model_service.py b/rock/cli/command/model_service.py index 87e6ca60e6..03cc59582d 100644 --- a/rock/cli/command/model_service.py +++ b/rock/cli/command/model_service.py @@ -82,6 +82,8 @@ async def arun(self, args: argparse.Namespace): proxy_base_url=args.proxy_base_url, retryable_status_codes=args.retryable_status_codes, request_timeout=args.request_timeout, + recording_file=args.recording_file, + replay_file=args.replay_file, ) logger.info(f"model service started, pid: {pid}") with open(self.DEFAULT_MODEL_SERVICE_PID_FILE, "w") as f: @@ -178,6 +180,18 @@ async def add_parser_to(subparsers: argparse._SubParsersAction): default=None, help="Request timeout in seconds. Overrides config file.", ) + start_parser.add_argument( + "--recording-file", + type=str, + default=None, + help="Proxy mode only: where to write the trajectory JSONL. Defaults to LOG_DIR/LLMTraj.jsonl.", + ) + start_parser.add_argument( + "--replay-file", + type=str, + default=None, + help="Proxy mode only: replay from a recorded .jsonl traj file. Mutually exclusive with --recording-file.", + ) watch_agent_parser = model_service_subparsers.add_parser( "watch-agent", diff --git a/rock/sdk/model/service.py b/rock/sdk/model/service.py index b1b523ed27..24cd7ede38 100644 --- a/rock/sdk/model/service.py +++ b/rock/sdk/model/service.py @@ -17,6 +17,8 @@ def start_sandbox_service( proxy_base_url: str | None = None, retryable_status_codes: str | None = None, request_timeout: int | None = None, + recording_file: str | None = None, + replay_file: str | None = None, ) -> subprocess.Popen: """start sandbox service""" current_file = Path(__file__).resolve() @@ -38,6 +40,10 @@ def start_sandbox_service( cmd.extend(["--retryable-status-codes", retryable_status_codes]) if request_timeout: cmd.extend(["--request-timeout", str(request_timeout)]) + if recording_file: + cmd.extend(["--recording-file", recording_file]) + if replay_file: + cmd.extend(["--replay-file", replay_file]) process = subprocess.Popen(cmd, cwd=str(service_dir)) return process @@ -51,6 +57,8 @@ async def start( proxy_base_url: str | None = None, retryable_status_codes: str | None = None, request_timeout: int | None = None, + recording_file: str | None = None, + replay_file: str | None = None, ) -> str: process = self.start_sandbox_service( model_service_type=model_service_type, @@ -60,6 +68,8 @@ async def start( proxy_base_url=proxy_base_url, retryable_status_codes=retryable_status_codes, request_timeout=request_timeout, + recording_file=recording_file, + replay_file=replay_file, ) pid = process.pid diff --git a/tests/unit/sdk/model/test_service.py b/tests/unit/sdk/model/test_service.py new file mode 100644 index 0000000000..61176173bd --- /dev/null +++ b/tests/unit/sdk/model/test_service.py @@ -0,0 +1,38 @@ +"""Tests for ModelService.start_sandbox_service subprocess command construction. + +Covers the CLI flag wiring without actually spawning a subprocess: mock Popen +and inspect the argv it would have been called with. +""" + +from unittest.mock import patch + +from rock.sdk.model.service import ModelService + + +def _captured_argv(**start_kwargs) -> list[str]: + with patch("rock.sdk.model.service.subprocess.Popen") as mock_popen: + ModelService().start_sandbox_service(**start_kwargs) + return mock_popen.call_args[0][0] + + +def test_start_sandbox_service_omits_recording_and_replay_flags_by_default(): + argv = _captured_argv(model_service_type="proxy", proxy_base_url="https://api.openai.com/v1", port=8080) + assert argv[1:5] == ["-m", "main", "--type", "proxy"] + assert "--proxy-base-url" in argv and "https://api.openai.com/v1" in argv + assert "--port" in argv and "8080" in argv + assert "--recording-file" not in argv + assert "--replay-file" not in argv + + +def test_start_sandbox_service_passes_recording_file(): + argv = _captured_argv(model_service_type="proxy", recording_file="/tmp/my-traj.jsonl") + idx = argv.index("--recording-file") + assert argv[idx + 1] == "/tmp/my-traj.jsonl" + assert "--replay-file" not in argv + + +def test_start_sandbox_service_passes_replay_file(): + argv = _captured_argv(model_service_type="proxy", replay_file="/tmp/in.jsonl") + idx = argv.index("--replay-file") + assert argv[idx + 1] == "/tmp/in.jsonl" + assert "--recording-file" not in argv From 5cf33bd2ca504ac1f2911b651376637d56b6f367 Mon Sep 17 00:00:00 2001 From: "pengshixin.psx" Date: Tue, 12 May 2026 09:12:06 +0000 Subject: [PATCH 23/25] test(model-service): add CLI-layer coverage for --recording-file / --replay-file wiring MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The existing tests/unit/sdk/model/test_service.py covers the subprocess argv construction (catches cmd-string typos like --recording_file vs --recording-file) but mocks nothing above the SDK layer, so a missing kwarg in ModelServiceCommand.arun would slip through. Add tests/unit/cli/command/test_model_service.py mirroring the test_job.py pattern: drive the real argparse sub-parser end-to-end and mock ModelService.start to assert the kwargs it receives. Covers the new flags both in isolation and in their default (omitted) state. Two layers, two bug surfaces — together they cover the full path from CLI argv to subprocess argv. Co-Authored-By: Claude Opus 4.7 --- tests/unit/cli/command/test_model_service.py | 120 +++++++++++++++++++ 1 file changed, 120 insertions(+) create mode 100644 tests/unit/cli/command/test_model_service.py diff --git a/tests/unit/cli/command/test_model_service.py b/tests/unit/cli/command/test_model_service.py new file mode 100644 index 0000000000..86849c718b --- /dev/null +++ b/tests/unit/cli/command/test_model_service.py @@ -0,0 +1,120 @@ +"""Unit tests for rock.cli.command.model_service.ModelServiceCommand. + +Drive the sub-parser end-to-end with argparse so the surface that users +actually type at the terminal is what we exercise. ``ModelService.start`` is +mocked — these tests assert wiring (argparse → handler → SDK call), not the +subprocess command construction (covered separately in +tests/unit/sdk/model/test_service.py). +""" + +from __future__ import annotations + +import argparse +import asyncio +from unittest.mock import AsyncMock + +import pytest + +from rock.cli.command.model_service import ModelServiceCommand + + +def _build_parser() -> argparse.ArgumentParser: + """Top-level parser with `model-service` subcommand wired in, same as the CLI.""" + top = argparse.ArgumentParser(prog="rock") + subparsers = top.add_subparsers(dest="command") + asyncio.run(ModelServiceCommand.add_parser_to(subparsers)) + return top + + +@pytest.fixture +def isolate_pid_file(monkeypatch, tmp_path): + """Redirect PID dir/file into tmp so arun() doesn't touch ./data/cli/model.""" + monkeypatch.setattr(ModelServiceCommand, "DEFAULT_MODEL_SERVICE_DIR", str(tmp_path)) + monkeypatch.setattr(ModelServiceCommand, "DEFAULT_MODEL_SERVICE_PID_FILE", str(tmp_path / "pid.txt")) + + +@pytest.fixture +def fake_start(monkeypatch): + """Replace ModelService.start with an AsyncMock returning a fixed pid.""" + mock = AsyncMock(return_value="12345") + monkeypatch.setattr("rock.cli.command.model_service.ModelService.start", mock) + return mock + + +# ---------- argparse: the new flags must parse ---------- + + +def test_recording_file_flag_parses(): + parser = _build_parser() + ns = parser.parse_args(["model-service", "start", "--type", "proxy", "--recording-file", "/tmp/out.jsonl"]) + assert ns.recording_file == "/tmp/out.jsonl" + assert ns.replay_file is None + + +def test_replay_file_flag_parses(): + parser = _build_parser() + ns = parser.parse_args(["model-service", "start", "--type", "proxy", "--replay-file", "/tmp/in.jsonl"]) + assert ns.replay_file == "/tmp/in.jsonl" + assert ns.recording_file is None + + +def test_neither_flag_defaults_to_none(): + parser = _build_parser() + ns = parser.parse_args(["model-service", "start", "--type", "proxy"]) + assert ns.recording_file is None + assert ns.replay_file is None + + +# ---------- handler: passes parsed args through to ModelService.start ---------- + + +def test_start_handler_forwards_recording_file(isolate_pid_file, fake_start): + parser = _build_parser() + ns = parser.parse_args( + [ + "model-service", + "start", + "--type", + "proxy", + "--proxy-base-url", + "https://api.openai.com/v1", + "--recording-file", + "/tmp/out.jsonl", + ] + ) + asyncio.run(ModelServiceCommand().arun(ns)) + + kwargs = fake_start.call_args.kwargs + assert kwargs["recording_file"] == "/tmp/out.jsonl" + assert kwargs["replay_file"] is None + assert kwargs["proxy_base_url"] == "https://api.openai.com/v1" + assert kwargs["model_service_type"] == "proxy" + + +def test_start_handler_forwards_replay_file(isolate_pid_file, fake_start): + parser = _build_parser() + ns = parser.parse_args( + [ + "model-service", + "start", + "--type", + "proxy", + "--replay-file", + "/tmp/in.jsonl", + ] + ) + asyncio.run(ModelServiceCommand().arun(ns)) + + kwargs = fake_start.call_args.kwargs + assert kwargs["replay_file"] == "/tmp/in.jsonl" + assert kwargs["recording_file"] is None + + +def test_start_handler_omits_both_when_unset(isolate_pid_file, fake_start): + parser = _build_parser() + ns = parser.parse_args(["model-service", "start", "--type", "proxy"]) + asyncio.run(ModelServiceCommand().arun(ns)) + + kwargs = fake_start.call_args.kwargs + assert kwargs["recording_file"] is None + assert kwargs["replay_file"] is None From dd35b6dafb5d42e5e17bbdde91a2868d1c82ce8a Mon Sep 17 00:00:00 2001 From: "pengshixin.psx" Date: Tue, 12 May 2026 09:28:21 +0000 Subject: [PATCH 24/25] =?UTF-8?q?test(model-service):=20rename=20test=5Fse?= =?UTF-8?q?rvice.py=20=E2=86=92=20test=5Fservice=5Fsubprocess.py=20to=20fi?= =?UTF-8?q?x=20CI=20collision?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit tests/integration/envhub/test_service.py shares the same basename, and pytest's default importmode (prepend) collapses both into a single 'test_service' module in sys.modules, so collection fails as soon as both files are picked up: import file mismatch: imported module 'test_service' has this __file__ attribute: .../envhub/test_service.py which is not the same as the test file we want to collect: .../sdk/model/test_service.py Renaming the new file is the smallest fix and keeps importmode=prepend behavior unchanged for the rest of the suite. The new name also describes the file better (it tests how start_sandbox_service builds the subprocess argv). Co-Authored-By: Claude Opus 4.7 --- .../sdk/model/{test_service.py => test_service_subprocess.py} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename tests/unit/sdk/model/{test_service.py => test_service_subprocess.py} (100%) diff --git a/tests/unit/sdk/model/test_service.py b/tests/unit/sdk/model/test_service_subprocess.py similarity index 100% rename from tests/unit/sdk/model/test_service.py rename to tests/unit/sdk/model/test_service_subprocess.py From bc52c25c46116f3be89c8293d7ac05760be9f178 Mon Sep 17 00:00:00 2001 From: "pengshixin.psx" Date: Tue, 12 May 2026 09:31:48 +0000 Subject: [PATCH 25/25] =?UTF-8?q?test(model-service):=20rename=20test=5Fpr?= =?UTF-8?q?oxy=5Frecord=5Freplay.py=20=E2=86=92=20...=5Fe2e.py?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The file is the only one in the model-service unit suite that boots a real uvicorn upstream in a background thread and drives the proxy through real HTTP — append _e2e to make that scope obvious in the file name. It stays under tests/unit/ because the project's integration/ tier is reserved for tests requiring out-of-process services (Docker, Ray, admin), which this one doesn't. Co-Authored-By: Claude Opus 4.7 --- ...est_proxy_record_replay.py => test_proxy_record_replay_e2e.py} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename tests/unit/sdk/model/{test_proxy_record_replay.py => test_proxy_record_replay_e2e.py} (100%) diff --git a/tests/unit/sdk/model/test_proxy_record_replay.py b/tests/unit/sdk/model/test_proxy_record_replay_e2e.py similarity index 100% rename from tests/unit/sdk/model/test_proxy_record_replay.py rename to tests/unit/sdk/model/test_proxy_record_replay_e2e.py