Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
55ec314
refactor(model-service): rebuild proxy on litellm SDK with traj recor…
BCeZn May 12, 2026
a8fb54c
fix(model-service): pass api_key + use custom_openai prefix + suppres…
BCeZn May 12, 2026
11852ae
refactor(model-service): drop CustomLLM, serve replay directly from c…
BCeZn May 12, 2026
7a4b37f
refactor(model-service): drop litellm, use httpx byte-passthrough + o…
BCeZn May 12, 2026
164e427
refactor: use openai sdk
BCeZn May 12, 2026
a3459c9
chore(model-service): remove litellm remnants (num_retries, stale com…
BCeZn May 12, 2026
f125d33
refactor(model-service): split proxy handler into _ReplayBackend / _F…
BCeZn May 12, 2026
4b7a35a
refactor(model-service): drop _ prefix from public Backend classes, r…
BCeZn May 12, 2026
0e5cb20
feat(model-service): restore retry on retryable_status_codes + connec…
BCeZn May 12, 2026
1b20a37
test(model-service): add e2e tests against an in-thread uvicorn mock …
BCeZn May 12, 2026
6a36702
fix(model-service): inject positional index into replay-stream tool_c…
BCeZn May 12, 2026
79c868e
test(model-service): refactor proxy e2e into MockUpstream + TestProxy…
BCeZn May 12, 2026
c7bbd5a
chore: remove useless comment in pyproject.toml
BCeZn May 12, 2026
56a644c
chore: remove uesless dev docs
BCeZn May 12, 2026
0679d32
refactor(model-service): flatten layout — drop integrations/, rename …
BCeZn May 12, 2026
dcd7905
refactor(model-service): rename traj_file→recording_file, replay_traj…
BCeZn May 12, 2026
b87b61d
feat(model-service): enforce recording_file/replay_file mutex via mod…
BCeZn May 12, 2026
8448ef2
refactor(model-service): move _replay_sse_iter into ReplayBackend as …
BCeZn May 12, 2026
d169ebe
refactor(model-service): move get_base_url into ForwardBackend as _re…
BCeZn May 12, 2026
1d6b2d1
refactor(model-service): move _forward_stream_and_record into Forward…
BCeZn May 12, 2026
5547549
docs(model-service): move + rewrite proxy README under docs/dev/model…
BCeZn May 12, 2026
d65269c
feat(model-service): expose --recording-file / --replay-file on rock …
BCeZn May 12, 2026
5cf33bd
test(model-service): add CLI-layer coverage for --recording-file / --…
BCeZn May 12, 2026
dd35b6d
test(model-service): rename test_service.py → test_service_subprocess…
BCeZn May 12, 2026
bc52c25
test(model-service): rename test_proxy_record_replay.py → ..._e2e.py
BCeZn May 12, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
154 changes: 154 additions & 0 deletions docs/dev/model-service/proxy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
# model-service `proxy` 模式

`rock model-service` 的 proxy 模式在 `/v1/chat/completions` 上提供一个 OpenAI 兼容的转发层,
两种工作模式互斥:

| 模式 | 触发条件 | 上游调用 | 写盘 |
|-----------|---------------------------------------|----------|----------------------|
| Recording | 默认 | 真实调用 | append 到 JSONL traj |
| Replay | `--replay-file` / `replay_file` 设置 | 不调用 | 不写 |

设计目标是让 SWE-agent / mini-swe-agent / OpenHands 等 agent 框架在录制 → 回放之间无感切换:
agent 不变,只换 base URL。

下文所有命令以 `rock model-service start` 启动;该子命令最终会 `subprocess` 拉起
`rock.sdk.model.server.main`,两者支持的 flag 一致。直接调试时也可以用
`python -m rock.sdk.model.server.main` 跳过 PID 文件管理。

---

## 1. Recording(默认)

转发到单个上游,每次调用 append 一行 JSONL 到 `recording_file`(缺省 `LOG_DIR/LLMTraj.jsonl`,
其中 `LOG_DIR = $ROCK_MODEL_SERVICE_DATA_DIR`):

```bash
export OPENAI_API_KEY="sk-..."
export ROCK_MODEL_SERVICE_DATA_DIR=/tmp/rock-traj

rock model-service start \
--type proxy \
--proxy-base-url https://api.openai.com/v1 \
--port 8080
```

调用:

```bash
curl -X POST http://127.0.0.1:8080/v1/chat/completions \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"gpt-3.5-turbo","messages":[{"role":"user","content":"hi"}]}'

cat /tmp/rock-traj/LLMTraj.jsonl | jq '.model, .response.choices[0].message.content'
```

流式同样支持,上游字节原样转给客户端,recorder 在后台聚合最终的 `ChatCompletion` 写盘
(用 openai SDK 的 `ChatCompletionStreamState`,所以 `tool_calls.function.arguments` 等
跨 chunk 拼接的字段会被还原成完整形态):

```bash
curl -N -X POST http://127.0.0.1:8080/v1/chat/completions \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"gpt-3.5-turbo","stream":true,"messages":[{"role":"user","content":"count to 5"}]}'
```

显式指定写到别的路径:

```bash
rock model-service start \
--type proxy \
--proxy-base-url https://api.openai.com/v1 \
--recording-file /tmp/my-session.jsonl \
--port 8080
```

---

## 2. Replay

把 `--replay-file` 指到一个录好的 jsonl,proxy 不再访问真实 LLM,按录制顺序返回响应;
agent 把 base URL 换成 `http://127.0.0.1:8081/v1` 即可重放:

```bash
rock model-service start \
--type proxy \
--replay-file /tmp/rock-traj/LLMTraj.jsonl \
--port 8081
```

行为细节:

- cursor 单调推进,每次请求消耗一条记录;用尽后返回 **404**。
- 流式请求会拿录制的 `ChatCompletion` 重新发一帧 SSE chunk + `[DONE]`。
`tool_calls` 的 `index` 字段会被自动注入(OpenAI 的流式协议要求 chunk delta 上有 `index`,
但录制态的 `message.tool_calls` 没有)。
- request 里的 `model` 会跟录制的 `model` 比对,不一致只打 warning,不阻断。

`recording_file` 和 `replay_file` 是**互斥**的——同时配置(无论是 CLI 还是 YAML)会在启动时
被 Pydantic `model_validator` 拦下并报 `ValidationError`,避免"录到一半把源文件覆盖"这类隐性 bug。

---

## 3. 重试和超时

- 默认对 connection error / timeout 和 `retryable_status_codes`(默认 `[429, 500]`)触发重试,
最多 6 次,指数退避 2s 起步 ×2 + 抖动;最后一次仍失败时把上游响应原样转给客户端
(**不**包装成 502/504,让 agent 自己看到真实状态码)。
- 对**流式**请求,重试只发生在第一个字节抵达客户端**之前**——一旦字节流开始转发,
连接中断不会重试(已发出去的字节无法收回)。

```bash
rock model-service start \
--type proxy \
--proxy-base-url https://api.openai.com/v1 \
--retryable-status-codes 429,500,502,503 \
--request-timeout 60 \
--port 8080
```

---

## 4. 多模型路由(YAML)

按 model name 分流到不同上游需要 YAML(CLI 只暴露单一 `--proxy-base-url`)。新建 `routes.yaml`:

```yaml
proxy_rules:
gpt-3.5-turbo: "https://api.openai.com/v1"
gpt-4o: "https://api.openai.com/v1"
default: "https://api-inference.modelscope.cn/v1"

retryable_status_codes: [429, 500, 502]
request_timeout: 60
recording_file: /tmp/rock-traj/multi.jsonl
```

启动:

```bash
rock model-service start \
--type proxy \
--config-file routes.yaml \
--port 8080
```

CLI flag(`--proxy-base-url` / `--port` / `--retryable-status-codes` / ...)覆盖 YAML 同名字段。
路由解析顺序:`proxy_base_url` → `proxy_rules[model]` → `proxy_rules["default"]`,都没有则 400。

---

## 5. 实现要点(仅供参考)

- `chat_completions` endpoint 把请求分发给 `app.state.backend`,后者要么是 `ForwardBackend`
要么是 `ReplayBackend`,由启动时的 `_configure_proxy_integrations` 根据 `replay_file`
是否设置二选一注入。
- `ForwardBackend` 走 httpx 字节透传:non-stream 是 `await resp.aread()`,stream 是
`resp.aiter_bytes()` 直接 yield 给客户端,**不**经过任何 SDK 的反序列化/再序列化,所以上游
返回的 `reasoning_content` / `provider_specific_fields` 等任意 vendor 字段都不会被吃掉。
recorder 在另一条独立路径上把字节流喂给 openai SDK 的 stream-state aggregator,仅用于写盘。
- `ReplayBackend` 完全本地,不持有 httpx client。

更深入的代码导览看 [rock/sdk/model/server/api/proxy.py](../../../rock/sdk/model/server/api/proxy.py)
顶部的 module docstring。
2 changes: 2 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,8 @@ model-service = [
"psutil",
"swebench",
"alibabacloud_cr20181201==2.0.5",
"openai>=1.50.0",
"httpx",
]


Expand Down
14 changes: 14 additions & 0 deletions rock/cli/command/model_service.py
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,8 @@ async def arun(self, args: argparse.Namespace):
proxy_base_url=args.proxy_base_url,
retryable_status_codes=args.retryable_status_codes,
request_timeout=args.request_timeout,
recording_file=args.recording_file,
replay_file=args.replay_file,
)
logger.info(f"model service started, pid: {pid}")
with open(self.DEFAULT_MODEL_SERVICE_PID_FILE, "w") as f:
Expand Down Expand Up @@ -178,6 +180,18 @@ async def add_parser_to(subparsers: argparse._SubParsersAction):
default=None,
help="Request timeout in seconds. Overrides config file.",
)
start_parser.add_argument(
"--recording-file",
type=str,
default=None,
help="Proxy mode only: where to write the trajectory JSONL. Defaults to LOG_DIR/LLMTraj.jsonl.",
)
start_parser.add_argument(
"--replay-file",
type=str,
default=None,
help="Proxy mode only: replay from a recorded .jsonl traj file. Mutually exclusive with --recording-file.",
)

watch_agent_parser = model_service_subparsers.add_parser(
"watch-agent",
Expand Down
Loading
Loading