diff --git a/docs/docs.json b/docs/docs.json index 2f657a674b..25261a4de2 100644 --- a/docs/docs.json +++ b/docs/docs.json @@ -122,7 +122,8 @@ "group": "Tools", "icon": "wrench", "pages": [ - "edge/en/guides/tools/publish-custom-tools" + "edge/en/guides/tools/publish-custom-tools", + "edge/en/guides/tools/production-code-execution" ] }, { diff --git a/docs/edge/en/concepts/production-architecture.mdx b/docs/edge/en/concepts/production-architecture.mdx index 82f8738605..81209d02bb 100644 --- a/docs/edge/en/concepts/production-architecture.mdx +++ b/docs/edge/en/concepts/production-architecture.mdx @@ -121,6 +121,11 @@ def log_request(context): print(f"Agent {context.agent.role} is calling the LLM...") ``` +### 4. Sandboxed Code Execution +If a crew needs to run generated code, keep that execution outside the host process. Use a dedicated sandbox, pass in only the files and environment variables the task needs, set timeouts, and validate every artifact before it changes production state. + +See the [Production Code Execution](/en/guides/tools/production-code-execution) guide for the recommended sandbox strategy, E2B example crew, and production safety checklist. + ## Deployment Patterns When deploying your Flow, consider the following: @@ -128,7 +133,7 @@ When deploying your Flow, consider the following: ### CrewAI Enterprise The easiest way to deploy your Flow is using CrewAI Enterprise. It handles the infrastructure, authentication, and monitoring for you. -Check out the [Deployment Guide](/en/enterprise/guides/deploy-crew) to get started. +Check out the [Deployment Guide](/en/enterprise/guides/deploy-to-amp) to get started. ```bash crewai deploy create diff --git a/docs/edge/en/guides/tools/production-code-execution.mdx b/docs/edge/en/guides/tools/production-code-execution.mdx new file mode 100644 index 0000000000..ab4708245d --- /dev/null +++ b/docs/edge/en/guides/tools/production-code-execution.mdx @@ -0,0 +1,163 @@ +--- +title: Production Code Execution +description: How to let CrewAI agents run generated code safely in production using isolated sandbox tools, bounded execution, and explicit review gates. +icon: terminal +mode: "wide" +--- + +## Overview + +Production crews sometimes need to run generated code: data analysis scripts, repository checks, file transforms, test commands, or short automation steps. Treat that capability as a high-risk tool, not as a normal Python helper. + +The production default is: + +1. Keep the CrewAI process on your trusted host. +2. Send generated code to an isolated sandbox. +3. Pass only the files, environment variables, and time budget the task needs. +4. Validate the result before it changes production state. + + +Do not run model-generated code directly on the host that contains your application, repository, credentials, or user data. The deprecated `CodeInterpreterTool`, `allow_code_execution`, and `code_execution_mode` paths are not the recommended production pattern. Use a dedicated sandbox service such as [E2B Sandbox Tools](/en/tools/ai-ml/e2bsandboxtools), or another sandbox that gives you equivalent isolation, quotas, logging, and cleanup. + + +## Choose an execution strategy + +| Strategy | Use it when | Production tradeoffs | +| --- | --- | --- | +| No code execution | The task can be solved with normal tools, structured outputs, or deterministic Python you wrote yourself. | Safest and simplest, but limits agent autonomy. | +| Ephemeral remote sandbox | The agent needs to run one-off Python or shell commands. | Best default. Each call starts with a clean environment and minimizes state leakage. | +| Persistent remote sandbox | The task needs a session with imports, files, or variables reused across steps. | Faster for multi-step work, but state accumulates. Keep secrets out and close the sandbox when the run ends. | +| Existing managed sandbox | Another system creates and owns the sandbox lifecycle. | Useful for enterprise policy controls. CrewAI should attach with the narrowest permissions and should not assume it can clean up the sandbox. | +| Host execution | Only for fully trusted code written by your application team. | Do not expose this path to autonomous agents or untrusted prompts. | + +## Recommended pattern with E2B + +Use the E2B tools when agents need shell, Python, or filesystem access without exposing the host environment: + +- `E2BPythonTool` for Python snippets, data analysis, and rich outputs. +- `E2BExecTool` for shell commands such as package checks or test commands. +- `E2BFileTool` for sandbox file reads and writes. + +Install the E2B extra and configure the API key outside your prompt templates: + +```shell +uv add "crewai-tools[e2b]" +export E2B_API_KEY="" +``` + + +Ephemeral mode is the safest default. Leave `persistent=False` unless the task really needs state across calls. + + +```python Code +from crewai import Agent, Crew, Process, Task +from crewai_tools import E2BPythonTool + +python_sandbox = E2BPythonTool( + # Fresh sandbox per call. Good default for untrusted generated code. + persistent=False, + # Idle timeout for the sandbox lifecycle. + sandbox_timeout=120, +) + +analyst = Agent( + role="Sandboxed data analyst", + goal="Run small Python checks in an isolated sandbox and report the result", + backstory="You execute only the code needed for the task and summarize stdout, stderr, and errors.", + tools=[python_sandbox], + verbose=True, +) + +analysis_task = Task( + description=( + "Use Python to compute the mean of [1, 2, 3, 4, 5]. " + "Return the code you ran, stdout, and the final answer." + ), + expected_output="The computed mean plus the sandbox execution result.", + agent=analyst, +) + +crew = Crew( + agents=[analyst], + tasks=[analysis_task], + process=Process.sequential, +) + +result = crew.kickoff() +``` + +## Local repository and environment access + +A sandbox should not automatically see your whole repository or process environment. Give it a small, explicit working set: + +1. Select only the files the task needs. +2. Copy those files into the sandbox. +3. Run the command with per-call environment variables. +4. Copy back only the artifacts you expect. +5. Treat all outputs as untrusted until validated. + +If one workflow must share files between shell and file tools, create or manage a sandbox outside the crew and pass the same `sandbox_id` to each tool. Do not assume two separate persistent tool instances share state unless they attach to the same sandbox. + +```python Code +from crewai_tools import E2BExecTool, E2BFileTool + +sandbox_id = "sbx_..." # Created by your control plane or sandbox manager. + +exec_tool = E2BExecTool(sandbox_id=sandbox_id, sandbox_timeout=300) +file_tool = E2BFileTool(sandbox_id=sandbox_id, sandbox_timeout=300) +``` + +## Production safety checklist + +Before enabling generated-code execution in a crew, define these controls: + +- **Isolation:** Run code outside the host process. Prefer a fresh sandbox per tool call. +- **File scope:** Copy in only the required files. Never mount the full repository by default. +- **Secrets:** Do not place long-lived credentials in prompts, task descriptions, or persistent sandboxes. Use short-lived credentials with the smallest possible scope. +- **Timeouts:** Set per-call tool timeouts and sandbox idle timeouts. Surface timeout failures to the crew instead of retrying forever. +- **Resource limits:** Configure CPU, memory, network, and filesystem limits in the sandbox provider or surrounding infrastructure. +- **Network policy:** Disable outbound network access unless the task needs it. If it does, allowlist destinations. +- **Package installs:** Pin dependencies or use prebuilt templates. Avoid letting agents install arbitrary packages in production. +- **Human review:** Require approval before sandbox output writes to production systems, deploys code, sends email, or performs irreversible actions. +- **Observability:** Log the task id, agent role, sandbox id, command/code hash, exit code, stdout/stderr summary, duration, and artifact paths. Use [CrewAI Tracing](/en/observability/tracing) to correlate tool calls with crew runs. +- **Cleanup:** Kill or expire sandboxes after use. Persistent sandboxes need an owner, max lifetime, and explicit close path. + +## Failure handling + +Generated-code tasks should fail closed. Design the surrounding Flow or application code so a sandbox failure returns a clear error instead of silently falling back to host execution. + +Recommended error states: + +| Failure | Response | +| --- | --- | +| Sandbox cannot start | Stop the task and report the provider error. | +| Timeout | Stop the command, preserve logs, and ask for narrower code or more resources. | +| Package install fails | Return the install error and require an approved dependency/template update. | +| Output validation fails | Reject the result and retry with the validation error as context. | +| Sandbox cleanup fails | Mark the run degraded, alert operations, and expire the sandbox from the provider console. | + +## Flow-first production shape + +For production systems, wrap code execution inside a Flow so you can keep policy, validation, and side effects outside the agent loop: + +```mermaid +graph TD + Start((Start)) --> Flow[Flow Orchestrator] + Flow --> Prepare[Select inputs and sandbox policy] + Prepare --> Crew[Crew with sandbox tool] + Crew --> Validate[Validate stdout, files, and structured result] + Validate -- accepted --> Apply[Apply approved side effect] + Validate -- rejected --> Review[Human review or retry] + Apply --> End((End)) + Review --> End +``` + +This keeps the agent responsible for proposing and running bounded code, while your application remains responsible for policy decisions, approvals, and production writes. + +## Related docs + +- [E2B Sandbox Tools](/en/tools/ai-ml/e2bsandboxtools) +- [Production Architecture](/en/concepts/production-architecture) +- [Task Guardrails](/en/concepts/tasks#task-guardrails) +- [Execution Hooks](/en/learn/execution-hooks) +- [CrewAI Tracing](/en/observability/tracing) diff --git a/docs/edge/en/tools/ai-ml/codeinterpretertool.mdx b/docs/edge/en/tools/ai-ml/codeinterpretertool.mdx index 660c98a608..274b13bc5b 100644 --- a/docs/edge/en/tools/ai-ml/codeinterpretertool.mdx +++ b/docs/edge/en/tools/ai-ml/codeinterpretertool.mdx @@ -8,7 +8,7 @@ mode: "wide" # `CodeInterpreterTool` - **Deprecated:** `CodeInterpreterTool` has been removed from `crewai-tools`. The `allow_code_execution` and `code_execution_mode` parameters on `Agent` are also deprecated. Use a dedicated sandbox service — [E2B](https://e2b.dev) or [Modal](https://modal.com) — for secure, isolated code execution. + **Deprecated:** `CodeInterpreterTool` has been removed from `crewai-tools`. The `allow_code_execution` and `code_execution_mode` parameters on `Agent` are also deprecated. Use a dedicated sandbox service — [E2B](https://e2b.dev) or [Modal](https://modal.com) — for secure, isolated code execution. For current production guidance, see [Production Code Execution](/en/guides/tools/production-code-execution). ## Description diff --git a/docs/edge/en/tools/overview.mdx b/docs/edge/en/tools/overview.mdx index fb9926f0b5..f7af8a69a6 100644 --- a/docs/edge/en/tools/overview.mdx +++ b/docs/edge/en/tools/overview.mdx @@ -103,6 +103,9 @@ Need a specific tool? Here are some popular choices: Execute Python code + + Run generated code safely in isolated sandboxes + Access AWS S3 files