diff --git a/docs/i18n/zh-Hans/docusaurus-plugin-content-docs/version-1.7.x/Getting Started/rock-agent.md b/docs/i18n/zh-Hans/docusaurus-plugin-content-docs/version-1.7.x/Getting Started/rock-agent.md index bd2dd69b05..1898fbdcd0 100644 --- a/docs/i18n/zh-Hans/docusaurus-plugin-content-docs/version-1.7.x/Getting Started/rock-agent.md +++ b/docs/i18n/zh-Hans/docusaurus-plugin-content-docs/version-1.7.x/Getting Started/rock-agent.md @@ -4,70 +4,73 @@ sidebar_position: 4 # Rock Agent 快速启动 -Rock Agent 是 ROCK 提供的 AI Agent 运行框架,支持在沙箱环境中运行各种类型的 Agent。 +ROCK 提供两种并列的 agent 使用能力,各自适用不同场景: + +- **Job**:通过 BashJob / HarborJob 在 sandbox 里跑一次 agent 评测/任务(典型基准:SWE-bench、Terminal Bench),是入门主要场景。 +- **install-agent**:直接在单个沙箱里安装并运行 agent,适合本地开发、单次调试。 + +下面优先介绍 Job 用法,install-agent 用法见末尾或 [Install Agent in Sandbox (Experimental)](../References/Python%20SDK%20References/rock-agent.md)。 ## 前置条件 -- 确保有可用的ROCK服务, 如果需要本地拉起服务端, 参考[快速启动](quickstart.md) +- 确保有可用的 ROCK 服务,如果需要本地拉起服务端,参考[快速启动](quickstart.md) -## 使用示例 +--- -ROCK 提供了两个Hello World Agent 示例,位于 `examples/agents/` 目录下: +## 一、用 Job 运行 Agent -``` -examples/agents/ -├── claude_code/ # ClaudeCode Agent 示例 -└── iflow_cli/ # IFlowCli Agent 示例 -``` +Job 有两种 backend:**Harbor Job** 用于运行 AI agent 基准评测任务(SWE-bench、Terminal Bench 等);**Bash Job** 用于在沙箱里跑自定义 shell 脚本。 -### 运行 IFlowCli 示例 +### 1.1 准备 yaml -```bash -cd examples/agents/iflow_cli -python iflow_cli_demo.py -``` +挑一类作为起点,直接复制对应模板: -### 运行 ClaudeCode 示例 +- Harbor Job(Terminal Bench):[`examples/job/harbor/tb_job_config.yaml.template`](https://github.com/alibaba/ROCK/tree/master/examples/job/harbor/tb_job_config.yaml.template) +- Bash Job(claw-eval):[`examples/job/bash/claw_eval/claw_eval_bashjob.yaml.template`](https://github.com/alibaba/ROCK/tree/master/examples/job/bash/claw_eval/claw_eval_bashjob.yaml.template) -```bash -cd examples/agents/claude_code -python claude_code_demo.py -``` +按模板填好对应字段即可。两类 Job 的完整字段说明见 [Use Job to Run Agent](../References/Python%20SDK%20References/job.md)。 -## IFlowCli 配置文件 +### 1.2 通过 Python SDK 启动 -配置文件位于 `examples/agents/iflow_cli/rock_agent_config.yaml`: +```python +import asyncio +from rock.sdk.job import Job, JobConfig -```yaml -run_cmd: "iflow -p ${prompt} --yolo" +async def main(): + config = JobConfig.from_yaml("swe_job_config.yaml") + result = await Job(config).run() -runtime_env_config: - type: node - npm_registry: "https://registry.npmmirror.com" - custom_install_cmd: "npm i -g @iflow-ai/iflow-cli@latest" + print(f"status={result.status}, score={result.score}") + for trial in result.trial_results: + print(f" {trial.task_name}: score={trial.score} ({trial.status})") -env: - IFLOW_API_KEY: "" # 填入你的 API Key - IFLOW_BASE_URL: "" # 填入你的 Base URL - IFLOW_MODEL_NAME: "" # 填入你的模型名称 +asyncio.run(main()) ``` -## ClaudeCode 配置文件 +BashJob 的用法、完整字段说明、结果处理详见 [Use Job to Run Agent](../References/Python%20SDK%20References/job.md)。 -配置文件位于 `examples/agents/claude_code/rock_agent_config.yaml`: +--- -```yaml -run_cmd: "claude -p ${prompt}" +## 二、install-agent:在沙箱里安装并运行 Agent -runtime_env_config: - type: node - custom_install_cmd: "npm install -g @anthropic-ai/claude-code" +适合本地开发或单次调试 agent 的场景,核心 API: -env: - ANTHROPIC_BASE_URL: "" # 填入你的anthropic base url - ANTHROPIC_API_KEY: "" # 填入你的anthropic api key +```python +await sandbox.agent.install(config="rock_agent_config.yaml") +result = await sandbox.agent.run(prompt="hello") ``` -## 相关文档 +`examples/install-agents/` 下提供了多个开箱即用的示例: + +- `examples/install-agents/iflow_cli/` — IFlowCli +- `examples/install-agents/claude_code/` — Claude Code +- `examples/install-agents/cursor_cli/`、`qwen_code/`、`swe_agent/`、`openclaw/` — 其他 + +运行 Claude Code 示例: + +```bash +cd examples/install-agents/claude_code +python claude_code_demo.py +``` -- [RockAgent 参考](../References/Python%20SDK%20References/rock-agent.md) +完整 RockAgentConfig 字段说明、占位符语义、API 参考详见 [Install Agent in Sandbox (Experimental)](../References/Python%20SDK%20References/rock-agent.md)。 diff --git a/docs/i18n/zh-Hans/docusaurus-plugin-content-docs/version-1.7.x/References/Python SDK References/job.md b/docs/i18n/zh-Hans/docusaurus-plugin-content-docs/version-1.7.x/References/Python SDK References/job.md new file mode 100644 index 0000000000..6dee1f678b --- /dev/null +++ b/docs/i18n/zh-Hans/docusaurus-plugin-content-docs/version-1.7.x/References/Python SDK References/job.md @@ -0,0 +1,150 @@ +# Use Job to Run Agent + +> 这是 ROCK 两种并列的 agent 使用能力中 **Job** 的参考文档,核心 API 是 `rock.sdk.job.Job` 与 `JobConfig`,用于在沙箱里跑一次 agent 评测/任务。有两种 backend:**Bash Job** 与 **Harbor Bench Job**。 +> +> 另一种能力是在单个沙箱里安装并运行 agent,见 [Install Agent in Sandbox](./rock-agent.md)。两种能力使用各自独立的配置 schema,**不要互相套用**。 + +`rock.sdk.job` 通过同一套 `Job` API 支持两种模式,通过配置类型区分: + +- **Bash Job**:在沙箱中运行自定义 Shell 脚本,适合数据处理、外部评测工具等 +- **Harbor Bench Job**:通过 Harbor 框架运行 AI agent 基准评测任务(SWE-bench、Terminal Bench 等) + +## 端到端示例 + +最小可跑通的 Python 用法: + +```python +import asyncio +from rock.sdk.job import Job, JobConfig + +async def main(): + config = JobConfig.from_yaml("swe_job_config.yaml") # 含 agents: 与 datasets: + result = await Job(config).run() + + print(f"status={result.status}, score={result.score}") + for trial in result.trial_results: + print(f" {trial.task_name}: score={trial.score} ({trial.status})") + +asyncio.run(main()) +``` + +完整 yaml 模板参考 `examples/job/harbor/swe_job_config.yaml.template`。 + +--- + +## Bash Job + +Bash Job 适用于在沙箱内执行任意 Shell 脚本的场景,例如运行评测工具、数据处理流程等。 + +完整示例参考:[`examples/job/bash/claw_eval/`](https://github.com/alibaba/ROCK/tree/master/examples/job/bash/claw_eval) + +- `run_claw_eval.py` — 主入口,演示 `JobConfig.from_yaml()` + `Job(config).run()` +- `claw_eval_bashjob.yaml.template` — YAML 配置模板,含 `script_path`、`environment`、`uploads`、`env` 等字段 +- `run_claw_eval.sh` — 沙箱内实际执行的脚本,演示 DinD 启动、日志写入和评分输出 + +### BashJobConfig 配置字段 + +| 字段 | 类型 | 默认值 | 说明 | +|------|------|--------|------| +| `script` | `str \| None` | `None` | 内联脚本内容,与 `script_path` 二选一 | +| `script_path` | `str \| None` | `None` | 本地脚本文件路径,运行时读取并上传执行 | +| `job_name` | `str` | 当前时间戳 | 任务名称,用于日志和产物路径区分 | +| `environment` | `EnvironmentConfig` | — | 沙箱连接及资源配置,详见下表 | +| `namespace` | `str \| None` | `None` | 命名空间 | +| `experiment_id` | `str \| None` | `None` | 实验 ID | +| `timeout` | `int` | `7200` | 整体超时秒数(2 小时) | + +**`environment` 常用字段:** + +| 字段 | 类型 | 说明 | +|------|------|------| +| `image` | `str` | 沙箱 Docker 镜像 | +| `base_url` | `str` | ROCK 平台地址 | +| `xrl_authorization` | `str` | 鉴权 Token | +| `cluster` | `str` | 目标集群 | +| `memory` | `str` | 内存大小(如 `"64g"`) | +| `cpus` | `int` | CPU 核数 | +| `auto_stop` | `bool` | 任务完成后是否自动停止沙箱 | +| `uploads` | `list` | 本地文件/目录上传列表,格式:`[本地路径, 沙箱目标路径]` | +| `env` | `dict[str, str]` | 注入沙箱会话的环境变量 | + +--- + +## Harbor Bench Job + +Harbor Bench Job 适用于通过 Harbor 框架运行 AI agent 基准评测任务,如 SWE-bench、Terminal Bench 等。 + +> **注意**:`rock.sdk.bench.Job` 已废弃,将在未来移除。请改用 `rock.sdk.job.Job` + `HarborJobConfig`。 + +完整示例参考:[`examples/job/harbor/`](https://github.com/alibaba/ROCK/tree/master/examples/job/harbor) + +- `harbor_demo.py` — 主入口,演示 `JobConfig.from_yaml()` + `Job(config).run()` + 结果遍历 +- `swe_job_config.yaml.template` — SWE-bench 任务配置模板 +- `swe_job_config-verifier.yaml.template` — 附带 `verifier.mode: native` 的变体 +- `tb_job_config.yaml.template` — Terminal Bench 任务配置模板 + +### HarborJobConfig 核心配置字段 + +**基础字段:** + +| 字段 | 类型 | 默认值 | 说明 | +|------|------|--------|------| +| `experiment_id` | `str` | 必填 | 实验 ID,Harbor 中必须提供 | +| `job_name` | `str \| None` | 自动生成 | 格式:`{dataset}_{task}_{uuid[:8]}` | +| `namespace` | `str \| None` | `None` | 命名空间,从沙箱自动反填 | +| `environment` | `RockEnvironmentConfig` | — | 沙箱连接及资源配置 | + +**执行控制字段:** + +| 字段 | 类型 | 默认值 | 说明 | +|------|------|--------|------| +| `n_attempts` | `int` | `1` | 每个 Trial 的尝试次数 | +| `timeout` | `int` | `7200` | 整体超时秒数(自动从 agent_timeout 推算) | +| `debug` | `bool` | `False` | 调试模式,保留更多中间产物 | + +**组件字段:** + +| 字段 | 类型 | 说明 | +|------|------|------| +| `agents` | `list[AgentConfig]` | Harbor 框架自身的 agent 配置(典型字段:`name`、`model_name`),完整字段见 `examples/job/harbor/swe_job_config.yaml.template` | +| `datasets` | `list[DatasetConfig]` | 数据集配置列表 | +| `verifier` | `VerifierConfig` | Verifier 评测配置 | +| `orchestrator` | `OrchestratorConfig` | 并发调度配置 | + +--- + +## 结果处理 + +两种 Job 模式均返回 `JobResult`: + +```python +result = await Job(config).run() + +print(f"status={result.status}, score={result.score}") +for trial in result.trial_results: + print(f" {trial.task_name}: score={trial.score} ({trial.status})") + if trial.exception_info: + print(f" {trial.exception_info.exception_type}: {trial.exception_info.exception_message}") +``` + +### JobResult 字段 + +| 字段 / 属性 | 类型 | 说明 | +|------------|------|------| +| `status` | `JobStatus` | 任务整体状态 | +| `trial_results` | `list[TrialResult]` | 所有 Trial 结果列表 | +| `score` | `float`(属性) | 所有 Trial `score` 的平均值 | +| `n_completed` | `int`(属性) | 状态为 `completed` 的 Trial 数 | +| `n_failed` | `int`(属性) | 状态为 `failed` 的 Trial 数 | + +### TrialResult 字段 + +| 字段 / 属性 | 类型 | 说明 | +|------------|------|------| +| `task_name` | `str` | 任务名称 | +| `exit_code` | `int` | 进程退出码 | +| `raw_output` | `str` | 进程原始输出 | +| `exception_info` | `ExceptionInfo \| None` | 若有异常则填充 | +| `status` | `str`(属性) | `"completed"` 或 `"failed"` | +| `duration_sec` | `float`(属性) | 执行耗时(秒) | +| `score` | `float`(属性) | 评分(Bash Job 默认 `0.0`,Harbor 模式来自 verifier) | diff --git a/docs/i18n/zh-Hans/docusaurus-plugin-content-docs/version-1.7.x/References/Python SDK References/rock-agent.md b/docs/i18n/zh-Hans/docusaurus-plugin-content-docs/version-1.7.x/References/Python SDK References/rock-agent.md index c3f03b1efc..cd252750aa 100644 --- a/docs/i18n/zh-Hans/docusaurus-plugin-content-docs/version-1.7.x/References/Python SDK References/rock-agent.md +++ b/docs/i18n/zh-Hans/docusaurus-plugin-content-docs/version-1.7.x/References/Python SDK References/rock-agent.md @@ -1,8 +1,12 @@ -# Rock Agent(实验性) +# Install Agent in Sandbox (Experimental) -RockAgent 是 ROCK 框架中的核心 Agent 实现,直接继承自 `Agent` 抽象基类。它提供了完整的 Agent 生命周期管理,包括环境初始化、ModelService 集成、命令执行等功能。 +> 这是 ROCK 两种并列的 agent 使用能力中 **install-agent** 的参考文档,核心 API 是 `sandbox.agent.install()` 与 `sandbox.agent.run(prompt)`,用于在单个沙箱里安装并运行 agent。 +> +> 另一种能力是用 Job 在沙箱里跑一次 agent 评测/任务,见 [Use Job to Run Agent](./job.md)。两种能力使用各自独立的配置 schema。 -使用 `sandbox.agent.install()` 以及 `sandbox.agent.run(prompt)` 就可以在 Rock 提供的 Sandbox 环境中安装和运行 Agent。 +RockAgent 是 ROCK 框架用来在沙箱中 install 自定义 agent 的能力,负责完整的 agent 生命周期管理:环境初始化、ModelService 集成、命令执行等。 + +使用 `sandbox.agent.install()` 与 `sandbox.agent.run(prompt)` 就可以在 Rock 提供的 Sandbox 环境中安装和运行 Agent。 ## 核心概念 @@ -185,10 +189,10 @@ model_service_config: # 具体参考 ModelService 有 执行 Agent 任务。 **执行流程**: -1. 替换占位符, 准备Agent 运行命令 -4. 启动 agent 进程 -5. 如果启用 ModelService,启动 `watch_agent` -6. 等待任务完成并返回结果 +1. 替换占位符,准备 Agent 运行命令 +2. 启动 agent 进程 +3. 如果启用 ModelService,启动 `watch_agent` +4. 等待任务完成并返回结果 ## 高级用法 @@ -281,10 +285,26 @@ model_service_config: ## 使用示例 -### 使用 YAML 配置文件(推荐) +### 使用 YAML 配置文件(推荐) ```python -# prepare a rock_agent_config.yaml -await sandbox.agent.install(config="rock_agent_config.yaml") -await sandbox.agent.run(prompt="hello") +import asyncio +from rock.sdk.sandbox import Sandbox, SandboxConfig + +async def main(): + sandbox = Sandbox(SandboxConfig()) + await sandbox.start() + try: + # rock_agent_config.yaml 与本文档「快速开始」中的示例一致 + await sandbox.agent.install(config="rock_agent_config.yaml") + result = await sandbox.agent.run(prompt="hello") + print(result) + finally: + await sandbox.stop() + +asyncio.run(main()) ``` + +更多开箱即用的示例参见 `examples/install-agents/`(Claude Code、IFlowCli、Cursor CLI、Qwen Code、SWE-agent、OpenClaw 等)。 + +如需通过 Job 跑 agent 评测/基准任务(另一条代码路径,有独立的配置 schema),见 [Use Job to Run Agent](./job.md)。 diff --git a/docs/versioned_docs/version-1.7.x/Getting Started/rock-agent.md b/docs/versioned_docs/version-1.7.x/Getting Started/rock-agent.md index eb6b54a65b..19369c1aff 100644 --- a/docs/versioned_docs/version-1.7.x/Getting Started/rock-agent.md +++ b/docs/versioned_docs/version-1.7.x/Getting Started/rock-agent.md @@ -4,69 +4,73 @@ sidebar_position: 4 # Rock Agent Quick Start -Rock Agent is an AI Agent runtime framework provided by ROCK, supporting various types of Agents running in sandbox environments. +ROCK provides two parallel ways to use agents, each suited to a different scenario: + +- **Job**: Run an agent evaluation/task in a sandbox via BashJob / HarborJob (typical benchmarks: SWE-bench, Terminal Bench) — the primary entry point. +- **install-agent**: Install and run an agent directly inside a single sandbox — for local development and one-off debugging. + +Job is covered first. The install-agent section follows at the end, with full reference at [Install Agent in Sandbox (Experimental)](../References/Python%20SDK%20References/rock-agent.md). ## Prerequisites -- Make sure you have a working ROCK service, if you need to locally start the service side, refer to [Quick Start](quickstart.md). -## Examples +- Make sure you have a working ROCK service. If you need to start the service locally, refer to [Quick Start](quickstart.md). -ROCK provides two Hello World Agent examples in the `examples/agents/` directory: +--- -``` -examples/agents/ -├── claude_code/ # ClaudeCode Agent example -└── iflow_cli/ # IFlowCli Agent example -``` +## 1. Use Job to Run Agent -### Run IFlowCli Example +Job has two backends: **Harbor Job** runs an AI agent benchmark task (SWE-bench, Terminal Bench, etc.); **Bash Job** runs a custom shell script inside a sandbox. -```bash -cd examples/agents/iflow_cli -python iflow_cli_demo.py -``` +### 1.1 Prepare a yaml -### Run ClaudeCode Example +Pick a starting point and copy the matching template: -```bash -cd examples/agents/claude_code -python claude_code_demo.py -``` +- Harbor Job (Terminal Bench): [`examples/job/harbor/tb_job_config.yaml.template`](https://github.com/alibaba/ROCK/tree/master/examples/job/harbor/tb_job_config.yaml.template) +- Bash Job (claw-eval): [`examples/job/bash/claw_eval/claw_eval_bashjob.yaml.template`](https://github.com/alibaba/ROCK/tree/master/examples/job/bash/claw_eval/claw_eval_bashjob.yaml.template) + +Fill in the fields per the template. See [Use Job to Run Agent](../References/Python%20SDK%20References/job.md) for the full field reference of both backends. -## IFlowCli Configuration File +### 1.2 Launch via Python SDK -The configuration file is located at `examples/agents/iflow_cli/rock_agent_config.yaml`: +```python +import asyncio +from rock.sdk.job import Job, JobConfig -```yaml -run_cmd: "iflow -p ${prompt} --yolo" +async def main(): + config = JobConfig.from_yaml("swe_job_config.yaml") + result = await Job(config).run() -runtime_env_config: - type: node - npm_registry: "https://registry.npmmirror.com" - custom_install_cmd: "npm i -g @iflow-ai/iflow-cli@latest" + print(f"status={result.status}, score={result.score}") + for trial in result.trial_results: + print(f" {trial.task_name}: score={trial.score} ({trial.status})") -env: - IFLOW_API_KEY: "" # Enter your API key - IFLOW_BASE_URL: "" # Enter your base URL - IFLOW_MODEL_NAME: "" # Enter your model name +asyncio.run(main()) ``` -## ClaudeCode Configuration File +For BashJob usage, full field references, and result-handling details, see [Use Job to Run Agent](../References/Python%20SDK%20References/job.md). -The configuration file is located at `examples/agents/claude_code/rock_agent_config.yaml`: +--- -```yaml -run_cmd: "claude -p ${prompt}" +## 2. install-agent: Install and Run an Agent in a Sandbox -runtime_env_config: - type: node - custom_install_cmd: "npm install -g @anthropic-ai/claude-code" +For local development or debugging a single agent run, the core API is: -env: - ANTHROPIC_BASE_URL: "" # Enter your anthropic base url - ANTHROPIC_API_KEY: "" # Enter your anthropic api key +```python +await sandbox.agent.install(config="rock_agent_config.yaml") +result = await sandbox.agent.run(prompt="hello") ``` -## Related Documentation +The `examples/install-agents/` directory ships ready-to-run examples: + +- `examples/install-agents/iflow_cli/` — IFlowCli +- `examples/install-agents/claude_code/` — Claude Code +- `examples/install-agents/cursor_cli/`, `qwen_code/`, `swe_agent/`, `openclaw/` — others + +Run the Claude Code example: + +```bash +cd examples/install-agents/claude_code +python claude_code_demo.py +``` -- [RockAgent Reference](../References/Python%20SDK%20References/rock-agent.md) +For full RockAgentConfig field details, placeholder semantics, and API reference, see [Install Agent in Sandbox (Experimental)](../References/Python%20SDK%20References/rock-agent.md). diff --git a/docs/versioned_docs/version-1.7.x/References/Python SDK References/job.md b/docs/versioned_docs/version-1.7.x/References/Python SDK References/job.md new file mode 100644 index 0000000000..a7dd7bcfd1 --- /dev/null +++ b/docs/versioned_docs/version-1.7.x/References/Python SDK References/job.md @@ -0,0 +1,150 @@ +# Use Job to Run Agent + +> This is the reference for **Job**, one of ROCK's two parallel ways to use agents. Its core API is `rock.sdk.job.Job` with `JobConfig`, used to run an agent evaluation/task in a sandbox. Two backends are supported: **Bash Job** and **Harbor Bench Job**. +> +> The other way is to install and run an agent inside a single sandbox — see [Install Agent in Sandbox](./rock-agent.md). The two ways use distinct config schemas — **do not mix them**. + +`rock.sdk.job` exposes a single `Job` API that supports two modes, distinguished by the config type: + +- **Bash Job**: Runs an arbitrary shell script inside a sandbox — useful for data processing, external evaluation tools, etc. +- **Harbor Bench Job**: Runs an AI agent benchmark task via the Harbor framework (SWE-bench, Terminal Bench, etc.). + +## End-to-End Example + +A minimal runnable Python snippet: + +```python +import asyncio +from rock.sdk.job import Job, JobConfig + +async def main(): + config = JobConfig.from_yaml("swe_job_config.yaml") # contains agents: and datasets: + result = await Job(config).run() + + print(f"status={result.status}, score={result.score}") + for trial in result.trial_results: + print(f" {trial.task_name}: score={trial.score} ({trial.status})") + +asyncio.run(main()) +``` + +The full yaml template is in `examples/job/harbor/swe_job_config.yaml.template`. + +--- + +## Bash Job + +Bash Job is for running arbitrary shell scripts inside a sandbox — running an external evaluation tool, processing data, etc. + +Full example: [`examples/job/bash/claw_eval/`](https://github.com/alibaba/ROCK/tree/master/examples/job/bash/claw_eval) + +- `run_claw_eval.py` — Entry point demonstrating `JobConfig.from_yaml()` + `Job(config).run()` +- `claw_eval_bashjob.yaml.template` — YAML template with `script_path`, `environment`, `uploads`, `env`, etc. +- `run_claw_eval.sh` — The script that actually runs in the sandbox (DinD startup, log writing, score output) + +### BashJobConfig Fields + +| Field | Type | Default | Description | +|------|------|---------|-------------| +| `script` | `str \| None` | `None` | Inline script content (mutually exclusive with `script_path`) | +| `script_path` | `str \| None` | `None` | Local script path; the file is read and uploaded at runtime | +| `job_name` | `str` | current timestamp | Name used for log and artifact paths | +| `environment` | `EnvironmentConfig` | — | Sandbox connection and resource config (see below) | +| `namespace` | `str \| None` | `None` | Namespace | +| `experiment_id` | `str \| None` | `None` | Experiment ID | +| `timeout` | `int` | `7200` | Overall timeout in seconds (2 hours) | + +**Common `environment` fields:** + +| Field | Type | Description | +|------|------|-------------| +| `image` | `str` | Sandbox Docker image | +| `base_url` | `str` | ROCK platform URL | +| `xrl_authorization` | `str` | Auth token | +| `cluster` | `str` | Target cluster | +| `memory` | `str` | Memory size (e.g. `"64g"`) | +| `cpus` | `int` | Number of CPUs | +| `auto_stop` | `bool` | Whether to stop the sandbox after the job | +| `uploads` | `list` | Local-to-sandbox file/dir uploads, format: `[local_path, sandbox_path]` | +| `env` | `dict[str, str]` | Environment variables injected into the sandbox session | + +--- + +## Harbor Bench Job + +Harbor Bench Job runs AI agent benchmark tasks like SWE-bench and Terminal Bench via the Harbor framework. + +> **Note**: `rock.sdk.bench.Job` is deprecated and will be removed in a future release. Use `rock.sdk.job.Job` + `HarborJobConfig` instead. + +Full example: [`examples/job/harbor/`](https://github.com/alibaba/ROCK/tree/master/examples/job/harbor) + +- `harbor_demo.py` — Entry point demonstrating `JobConfig.from_yaml()` + `Job(config).run()` + result iteration +- `swe_job_config.yaml.template` — SWE-bench task config template +- `swe_job_config-verifier.yaml.template` — Variant with `verifier.mode: native` +- `tb_job_config.yaml.template` — Terminal Bench task config template + +### HarborJobConfig Core Fields + +**Basic fields:** + +| Field | Type | Default | Description | +|------|------|---------|-------------| +| `experiment_id` | `str` | required | Experiment ID — required by Harbor | +| `job_name` | `str \| None` | auto-generated | Format: `{dataset}_{task}_{uuid[:8]}` | +| `namespace` | `str \| None` | `None` | Namespace, auto-filled from the sandbox | +| `environment` | `RockEnvironmentConfig` | — | Sandbox connection and resource config | + +**Execution control:** + +| Field | Type | Default | Description | +|------|------|---------|-------------| +| `n_attempts` | `int` | `1` | Attempts per Trial | +| `timeout` | `int` | `7200` | Overall timeout (auto-derived from agent_timeout) | +| `debug` | `bool` | `False` | Debug mode — keeps more intermediate artifacts | + +**Components:** + +| Field | Type | Description | +|------|------|-------------| +| `agents` | `list[AgentConfig]` | Harbor's own agent config (typical fields: `name`, `model_name`) — see `examples/job/harbor/swe_job_config.yaml.template` for the canonical shape | +| `datasets` | `list[DatasetConfig]` | Dataset configs | +| `verifier` | `VerifierConfig` | Verifier evaluation config | +| `orchestrator` | `OrchestratorConfig` | Concurrency / scheduling config | + +--- + +## Result Handling + +Both Job modes return a `JobResult`: + +```python +result = await Job(config).run() + +print(f"status={result.status}, score={result.score}") +for trial in result.trial_results: + print(f" {trial.task_name}: score={trial.score} ({trial.status})") + if trial.exception_info: + print(f" {trial.exception_info.exception_type}: {trial.exception_info.exception_message}") +``` + +### JobResult Fields + +| Field / Property | Type | Description | +|------------------|------|-------------| +| `status` | `JobStatus` | Overall task status | +| `trial_results` | `list[TrialResult]` | List of all Trial results | +| `score` | `float` (property) | Average `score` across all Trials | +| `n_completed` | `int` (property) | Number of Trials with status `completed` | +| `n_failed` | `int` (property) | Number of Trials with status `failed` | + +### TrialResult Fields + +| Field / Property | Type | Description | +|------------------|------|-------------| +| `task_name` | `str` | Task name | +| `exit_code` | `int` | Process exit code | +| `raw_output` | `str` | Raw process output | +| `exception_info` | `ExceptionInfo \| None` | Populated if an exception occurred | +| `status` | `str` (property) | `"completed"` or `"failed"` | +| `duration_sec` | `float` (property) | Execution time in seconds | +| `score` | `float` (property) | Score (Bash Job defaults to `0.0`; Harbor mode comes from the verifier) | diff --git a/docs/versioned_docs/version-1.7.x/References/Python SDK References/rock-agent.md b/docs/versioned_docs/version-1.7.x/References/Python SDK References/rock-agent.md index f24ade49ac..9cc184d46c 100644 --- a/docs/versioned_docs/version-1.7.x/References/Python SDK References/rock-agent.md +++ b/docs/versioned_docs/version-1.7.x/References/Python SDK References/rock-agent.md @@ -1,6 +1,10 @@ -# Rock Agent (Experimental) +# Install Agent in Sandbox (Experimental) -RockAgent is the core Agent implementation in the ROCK framework, directly inheriting from the `Agent` abstract base class. It provides complete Agent lifecycle management, including environment initialization, ModelService integration, command execution, and more. +> This is the reference for **install-agent**, one of ROCK's two parallel ways to use agents. Its core API is `sandbox.agent.install()` and `sandbox.agent.run(prompt)`, used to install and run an agent inside a single sandbox. +> +> The other way is to run an agent evaluation/task via Job — see [Use Job to Run Agent](./job.md). The two ways use distinct config schemas. + +RockAgent is the ROCK framework's mechanism for installing a custom agent inside a sandbox. It manages the full agent lifecycle — environment initialization, ModelService integration, command execution, and so on. Using `sandbox.agent.install()` and `sandbox.agent.run(prompt)`, you can install and run Agents in the Sandbox environment provided by Rock. @@ -284,7 +288,23 @@ model_service_config: ### Using YAML Configuration File (Recommended) ```python -# prepare a rock_agent_config.yaml -await sandbox.agent.install(config="rock_agent_config.yaml") -await sandbox.agent.run(prompt="hello") +import asyncio +from rock.sdk.sandbox import Sandbox, SandboxConfig + +async def main(): + sandbox = Sandbox(SandboxConfig()) + await sandbox.start() + try: + # rock_agent_config.yaml matches the examples in "Quick Start" above + await sandbox.agent.install(config="rock_agent_config.yaml") + result = await sandbox.agent.run(prompt="hello") + print(result) + finally: + await sandbox.stop() + +asyncio.run(main()) ``` + +More ready-to-run examples are in `examples/install-agents/` (Claude Code, IFlowCli, Cursor CLI, Qwen Code, SWE-agent, OpenClaw, etc.). + +To run an agent evaluation/benchmark task via Job (a different code path with its own config schema), see [Use Job to Run Agent](./job.md). diff --git a/examples/evaluation/README.md b/examples/evaluation/README.md new file mode 100644 index 0000000000..291fe57697 --- /dev/null +++ b/examples/evaluation/README.md @@ -0,0 +1,19 @@ +# evaluation + +End-to-end evaluation demos that combine sandbox lifecycle, agent install/run, and a test suite — useful for understanding how individual pieces fit together at the script level. + +## Layout + +| Subdir | Path | Description | +|--------|------|-------------| +| [`swe_bench/`](./swe_bench/) | install-agent | Single-task SWE-bench Verified demo: starts a sandbox, installs an agent via `sandbox.agent.install()`, runs the agent on the task, runs the test suite, parses the result | + +## When to use this vs `job/harbor/` + +| | `evaluation/swe_bench/` | [`job/harbor/`](../job/harbor/) | +|--|------------------------|-------------------------------| +| Path | install-agent | Job (Harbor) | +| When | Debugging task setup or test parsing — full pipeline visible in script form | Production benchmark runs through the Harbor framework | +| API | `Sandbox` + `sandbox.agent.install()` | `Job(JobConfig.from_yaml(...)).run()` | + +If you're running SWE-bench through the standard pipeline, prefer [`job/harbor/`](../job/harbor/). diff --git a/examples/install-agents/README.md b/examples/install-agents/README.md new file mode 100644 index 0000000000..951430d556 --- /dev/null +++ b/examples/install-agents/README.md @@ -0,0 +1,28 @@ +# install-agents + +Examples for the **install-agent** way of using ROCK: install and run an agent inside a single sandbox via `sandbox.agent.install()` + `sandbox.agent.run(prompt)`. + +To run an agent evaluation/benchmark task via Job, see [`../job/`](../job/) instead. + +## Layout + +| Subdir | Agent runtime | +|--------|---------------| +| [`claude_code/`](./claude_code/) | Anthropic Claude Code CLI (`@anthropic-ai/claude-code`) | +| [`cursor_cli/`](./cursor_cli/) | Cursor CLI | +| [`iflow_cli/`](./iflow_cli/) | iFlow CLI (`@iflow-ai/iflow-cli`) | +| [`openclaw/`](./openclaw/) | OpenClaw — admin/proxy split-mode demo, has its own README | +| [`qwen_code/`](./qwen_code/) | qwen-code (`@qwen-code/qwen-code`) | +| [`swe_agent/`](./swe_agent/) | SWE-agent (`pip install -e` from GitHub) | + +Each subdir contains a `*_demo.py` entry point and a `rock_agent_config.yaml` driving the install/run. + +## Run + +```bash +# pick any subdir +cd iflow_cli +python iflow_cli_demo.py +``` + +See the [Install Agent in Sandbox (Experimental)](../../docs/versioned_docs/version-1.7.x/References/Python%20SDK%20References/rock-agent.md) reference for the full RockAgentConfig schema. diff --git a/examples/agents/claude_code/claude_code_demo.py b/examples/install-agents/claude_code/claude_code_demo.py similarity index 100% rename from examples/agents/claude_code/claude_code_demo.py rename to examples/install-agents/claude_code/claude_code_demo.py diff --git a/examples/agents/claude_code/rock_agent_config.yaml b/examples/install-agents/claude_code/rock_agent_config.yaml similarity index 100% rename from examples/agents/claude_code/rock_agent_config.yaml rename to examples/install-agents/claude_code/rock_agent_config.yaml diff --git a/examples/agents/cursor_cli/cursor_cli_demo.py b/examples/install-agents/cursor_cli/cursor_cli_demo.py similarity index 100% rename from examples/agents/cursor_cli/cursor_cli_demo.py rename to examples/install-agents/cursor_cli/cursor_cli_demo.py diff --git a/examples/agents/cursor_cli/rock_agent_config.yaml b/examples/install-agents/cursor_cli/rock_agent_config.yaml similarity index 100% rename from examples/agents/cursor_cli/rock_agent_config.yaml rename to examples/install-agents/cursor_cli/rock_agent_config.yaml diff --git a/examples/agents/iflow_cli/iflow_cli_demo.py b/examples/install-agents/iflow_cli/iflow_cli_demo.py similarity index 100% rename from examples/agents/iflow_cli/iflow_cli_demo.py rename to examples/install-agents/iflow_cli/iflow_cli_demo.py diff --git a/examples/agents/iflow_cli/integration_with_model_service/local/local_demo.py b/examples/install-agents/iflow_cli/integration_with_model_service/local/local_demo.py similarity index 100% rename from examples/agents/iflow_cli/integration_with_model_service/local/local_demo.py rename to examples/install-agents/iflow_cli/integration_with_model_service/local/local_demo.py diff --git a/examples/agents/iflow_cli/integration_with_model_service/local/rock_agent_config.yaml b/examples/install-agents/iflow_cli/integration_with_model_service/local/rock_agent_config.yaml similarity index 100% rename from examples/agents/iflow_cli/integration_with_model_service/local/rock_agent_config.yaml rename to examples/install-agents/iflow_cli/integration_with_model_service/local/rock_agent_config.yaml diff --git a/examples/agents/iflow_cli/integration_with_model_service/proxy/proxy_demo.py b/examples/install-agents/iflow_cli/integration_with_model_service/proxy/proxy_demo.py similarity index 100% rename from examples/agents/iflow_cli/integration_with_model_service/proxy/proxy_demo.py rename to examples/install-agents/iflow_cli/integration_with_model_service/proxy/proxy_demo.py diff --git a/examples/agents/iflow_cli/integration_with_model_service/proxy/rock_agent_config.yaml b/examples/install-agents/iflow_cli/integration_with_model_service/proxy/rock_agent_config.yaml similarity index 100% rename from examples/agents/iflow_cli/integration_with_model_service/proxy/rock_agent_config.yaml rename to examples/install-agents/iflow_cli/integration_with_model_service/proxy/rock_agent_config.yaml diff --git a/examples/agents/iflow_cli/rock_agent_config.yaml b/examples/install-agents/iflow_cli/rock_agent_config.yaml similarity index 100% rename from examples/agents/iflow_cli/rock_agent_config.yaml rename to examples/install-agents/iflow_cli/rock_agent_config.yaml diff --git a/examples/agents/openclaw/REAMDE.md b/examples/install-agents/openclaw/README.md similarity index 98% rename from examples/agents/openclaw/REAMDE.md rename to examples/install-agents/openclaw/README.md index 707c87f636..89f75685cc 100644 --- a/examples/agents/openclaw/REAMDE.md +++ b/examples/install-agents/openclaw/README.md @@ -52,7 +52,7 @@ rock admin start --env local-proxy --role proxy --port 9001 ## 4. Run the Demo ```bash -cd examples/agents/openclaw +cd examples/install-agents/openclaw python openclaw_demo.py ``` diff --git a/examples/agents/openclaw/openclaw.json b/examples/install-agents/openclaw/openclaw.json similarity index 100% rename from examples/agents/openclaw/openclaw.json rename to examples/install-agents/openclaw/openclaw.json diff --git a/examples/agents/openclaw/openclaw_demo.py b/examples/install-agents/openclaw/openclaw_demo.py similarity index 100% rename from examples/agents/openclaw/openclaw_demo.py rename to examples/install-agents/openclaw/openclaw_demo.py diff --git a/examples/agents/openclaw/rock_agent_config.yaml b/examples/install-agents/openclaw/rock_agent_config.yaml similarity index 100% rename from examples/agents/openclaw/rock_agent_config.yaml rename to examples/install-agents/openclaw/rock_agent_config.yaml diff --git a/examples/agents/qwen_code/qwen_code_demo.py b/examples/install-agents/qwen_code/qwen_code_demo.py similarity index 100% rename from examples/agents/qwen_code/qwen_code_demo.py rename to examples/install-agents/qwen_code/qwen_code_demo.py diff --git a/examples/agents/qwen_code/rock_agent_config.yaml b/examples/install-agents/qwen_code/rock_agent_config.yaml similarity index 100% rename from examples/agents/qwen_code/rock_agent_config.yaml rename to examples/install-agents/qwen_code/rock_agent_config.yaml diff --git a/examples/agents/swe_agent/rock_agent_config.yaml b/examples/install-agents/swe_agent/rock_agent_config.yaml similarity index 100% rename from examples/agents/swe_agent/rock_agent_config.yaml rename to examples/install-agents/swe_agent/rock_agent_config.yaml diff --git a/examples/agents/swe_agent/swe_agent_demo.py b/examples/install-agents/swe_agent/swe_agent_demo.py similarity index 100% rename from examples/agents/swe_agent/swe_agent_demo.py rename to examples/install-agents/swe_agent/swe_agent_demo.py diff --git a/examples/job/README.md b/examples/job/README.md new file mode 100644 index 0000000000..ed591838fa --- /dev/null +++ b/examples/job/README.md @@ -0,0 +1,16 @@ +# job + +Examples for the **Job** way of using ROCK: run an agent evaluation/task in a sandbox via `rock.sdk.job.Job` + `JobConfig`. + +For installing and running an agent inside a single sandbox, see [`../install-agents/`](../install-agents/) instead. + +## Layout + +| Subdir | Backend | Use it for | +|--------|---------|-----------| +| [`bash/`](./bash/) | `BashJobConfig` | Run an arbitrary shell script inside a sandbox (data processing, external evaluation tools) | +| [`harbor/`](./harbor/) | `HarborJobConfig` | Run an AI agent benchmark task (SWE-bench, Terminal Bench, …) via the Harbor framework | + +Both backends share a single `Job(config).run()` entrypoint — pick the config type based on your scenario. + +See the [Use Job to Run Agent](../../docs/versioned_docs/version-1.7.x/References/Python%20SDK%20References/job.md) reference for the full schema. diff --git a/examples/job/bash/README.md b/examples/job/bash/README.md new file mode 100644 index 0000000000..aff46add5e --- /dev/null +++ b/examples/job/bash/README.md @@ -0,0 +1,24 @@ +# job/bash + +`BashJob` examples: run an arbitrary shell script inside a sandbox. + +## Layout + +| File / dir | Form | Description | +|------------|------|-------------| +| [`simple_bash_job_demo.sh`](./simple_bash_job_demo.sh) | CLI | Minimal `rock job run --script-content ...` demo | +| [`claw_eval/`](./claw_eval/) | Python SDK | `claw-eval` benchmark wrapped as a BashJob — uses `JobConfig.from_yaml()` + `Job(config).run()` | + +Both forms use the same underlying `BashJobConfig` schema; the CLI just wraps it. + +## Quick run + +```bash +# CLI form +bash simple_bash_job_demo.sh + +# SDK form +cd claw_eval +cp claw_eval_bashjob.yaml.template claw_eval_bashjob.yaml # fill in real values +python run_claw_eval.py +``` diff --git a/examples/evaluation/claw_eval/claw_eval_bashjob.yaml.template b/examples/job/bash/claw_eval/claw_eval_bashjob.yaml.template similarity index 100% rename from examples/evaluation/claw_eval/claw_eval_bashjob.yaml.template rename to examples/job/bash/claw_eval/claw_eval_bashjob.yaml.template diff --git a/examples/evaluation/claw_eval/claw_eval_config.yaml.template b/examples/job/bash/claw_eval/claw_eval_config.yaml.template similarity index 100% rename from examples/evaluation/claw_eval/claw_eval_config.yaml.template rename to examples/job/bash/claw_eval/claw_eval_config.yaml.template diff --git a/examples/evaluation/claw_eval/run_claw_eval.py b/examples/job/bash/claw_eval/run_claw_eval.py similarity index 95% rename from examples/evaluation/claw_eval/run_claw_eval.py rename to examples/job/bash/claw_eval/run_claw_eval.py index f6beaefec7..21d8609c86 100644 --- a/examples/evaluation/claw_eval/run_claw_eval.py +++ b/examples/job/bash/claw_eval/run_claw_eval.py @@ -1,7 +1,7 @@ """Run claw-eval via BashJob SDK. Usage: - cd examples/agents/claw_eval + cd examples/job/bash/claw_eval cp claw_eval_bashjob.yaml.template claw_eval_bashjob.yaml # fill in real values python run_claw_eval.py diff --git a/examples/evaluation/claw_eval/run_claw_eval.sh b/examples/job/bash/claw_eval/run_claw_eval.sh similarity index 100% rename from examples/evaluation/claw_eval/run_claw_eval.sh rename to examples/job/bash/claw_eval/run_claw_eval.sh diff --git a/examples/bash/simple_bash_job_demo.sh b/examples/job/bash/simple_bash_job_demo.sh similarity index 100% rename from examples/bash/simple_bash_job_demo.sh rename to examples/job/bash/simple_bash_job_demo.sh diff --git a/examples/job/harbor/README.md b/examples/job/harbor/README.md new file mode 100644 index 0000000000..061b7bdb54 --- /dev/null +++ b/examples/job/harbor/README.md @@ -0,0 +1,27 @@ +# job/harbor + +`HarborJob` examples: run an AI agent benchmark task via the Harbor framework. + +## Files + +| File | Purpose | +|------|---------| +| [`harbor_demo.py`](./harbor_demo.py) | Entry point — loads `JobConfig.from_yaml()`, runs `Job(config).run()`, iterates trial results | +| [`swe_job_config.yaml.template`](./swe_job_config.yaml.template) | SWE-bench task config template | +| [`swe_job_config-verifier.yaml.template`](./swe_job_config-verifier.yaml.template) | SWE-bench variant with `verifier.mode: native` | +| [`tb_job_config.yaml.template`](./tb_job_config.yaml.template) | Terminal Bench task config template | + +## Quick run + +```bash +# 1. copy a template and fill in real values +cp swe_job_config.yaml.template swe_job_config.yaml + +# 2. set required env vars (OSS credentials, etc.) — see harbor_demo.py docstring +source .env + +# 3. run +python harbor_demo.py -c swe_job_config.yaml +``` + +The `agents:` block uses Harbor's own minimal schema (typical fields: `name`, `model_name`) — see the templates above for the canonical shape. diff --git a/examples/harbor/harbor_demo.py b/examples/job/harbor/harbor_demo.py similarity index 88% rename from examples/harbor/harbor_demo.py rename to examples/job/harbor/harbor_demo.py index 09f48b3198..e6201acdb7 100644 --- a/examples/harbor/harbor_demo.py +++ b/examples/job/harbor/harbor_demo.py @@ -1,13 +1,11 @@ -"""Harbor benchmark demo using ROCK Job SDK (new path). +"""Harbor benchmark demo using ROCK Job SDK. Uses ``rock.sdk.job.Job`` with ``HarborJobConfig`` — the recommended path with full feature parity (G1-G7 fixed) and scatter / multiple trial types. -For the legacy path (``rock.sdk.bench.Job``), see ``harbor_demo_legacy.py``. - Usage: - python examples/harbor/harbor_demo.py -c examples/harbor/swe.intern.yaml - python examples/harbor/harbor_demo.py -c examples/harbor/tb_job_config.yaml -t mailman + python examples/job/harbor/harbor_demo.py -c examples/job/harbor/swe.intern.yaml + python examples/job/harbor/harbor_demo.py -c examples/job/harbor/tb_job_config.yaml -t mailman Required environment variables (OSS_* are auto-forwarded into the sandbox): OSS_ACCESS_KEY_ID Alibaba Cloud OSS access key ID @@ -20,7 +18,7 @@ Recommended setup: 1. Copy .env.example to .env and fill in your credentials 2. source .env - 3. python examples/harbor/harbor_demo.py -c ... + 3. python examples/job/harbor/harbor_demo.py -c ... """ import argparse diff --git a/examples/harbor/swe_job_config-verifier.yaml.template b/examples/job/harbor/swe_job_config-verifier.yaml.template similarity index 100% rename from examples/harbor/swe_job_config-verifier.yaml.template rename to examples/job/harbor/swe_job_config-verifier.yaml.template diff --git a/examples/harbor/swe_job_config.yaml.template b/examples/job/harbor/swe_job_config.yaml.template similarity index 100% rename from examples/harbor/swe_job_config.yaml.template rename to examples/job/harbor/swe_job_config.yaml.template diff --git a/examples/harbor/tb_job_config.yaml.template b/examples/job/harbor/tb_job_config.yaml.template similarity index 100% rename from examples/harbor/tb_job_config.yaml.template rename to examples/job/harbor/tb_job_config.yaml.template