From 45ff5187b458fa106191ec3ba422437c60dd0c76 Mon Sep 17 00:00:00 2001
From: maxpetrusenkoagent <[REDACTED EMAIL]>
Date: Sun, 28 Jun 2026 00:06:10 -0400
Subject: [PATCH] docs: add production code execution guide
---
docs/docs.json | 3 +-
.../en/concepts/production-architecture.mdx | 7 +-
.../tools/production-code-execution.mdx | 163 ++++++++++++++++++
.../en/tools/ai-ml/codeinterpretertool.mdx | 2 +-
docs/edge/en/tools/overview.mdx | 3 +
5 files changed, 175 insertions(+), 3 deletions(-)
create mode 100644 docs/edge/en/guides/tools/production-code-execution.mdx
diff --git a/docs/docs.json b/docs/docs.json
index 2f657a674b..25261a4de2 100644
--- a/docs/docs.json
+++ b/docs/docs.json
@@ -122,7 +122,8 @@
"group": "Tools",
"icon": "wrench",
"pages": [
- "edge/en/guides/tools/publish-custom-tools"
+ "edge/en/guides/tools/publish-custom-tools",
+ "edge/en/guides/tools/production-code-execution"
]
},
{
diff --git a/docs/edge/en/concepts/production-architecture.mdx b/docs/edge/en/concepts/production-architecture.mdx
index 82f8738605..81209d02bb 100644
--- a/docs/edge/en/concepts/production-architecture.mdx
+++ b/docs/edge/en/concepts/production-architecture.mdx
@@ -121,6 +121,11 @@ def log_request(context):
print(f"Agent {context.agent.role} is calling the LLM...")
```
+### 4. Sandboxed Code Execution
+If a crew needs to run generated code, keep that execution outside the host process. Use a dedicated sandbox, pass in only the files and environment variables the task needs, set timeouts, and validate every artifact before it changes production state.
+
+See the [Production Code Execution](/en/guides/tools/production-code-execution) guide for the recommended sandbox strategy, E2B example crew, and production safety checklist.
+
## Deployment Patterns
When deploying your Flow, consider the following:
@@ -128,7 +133,7 @@ When deploying your Flow, consider the following:
### CrewAI Enterprise
The easiest way to deploy your Flow is using CrewAI Enterprise. It handles the infrastructure, authentication, and monitoring for you.
-Check out the [Deployment Guide](/en/enterprise/guides/deploy-crew) to get started.
+Check out the [Deployment Guide](/en/enterprise/guides/deploy-to-amp) to get started.
```bash
crewai deploy create
diff --git a/docs/edge/en/guides/tools/production-code-execution.mdx b/docs/edge/en/guides/tools/production-code-execution.mdx
new file mode 100644
index 0000000000..ab4708245d
--- /dev/null
+++ b/docs/edge/en/guides/tools/production-code-execution.mdx
@@ -0,0 +1,163 @@
+---
+title: Production Code Execution
+description: How to let CrewAI agents run generated code safely in production using isolated sandbox tools, bounded execution, and explicit review gates.
+icon: terminal
+mode: "wide"
+---
+
+## Overview
+
+Production crews sometimes need to run generated code: data analysis scripts, repository checks, file transforms, test commands, or short automation steps. Treat that capability as a high-risk tool, not as a normal Python helper.
+
+The production default is:
+
+1. Keep the CrewAI process on your trusted host.
+2. Send generated code to an isolated sandbox.
+3. Pass only the files, environment variables, and time budget the task needs.
+4. Validate the result before it changes production state.
+
+
+Do not run model-generated code directly on the host that contains your application, repository, credentials, or user data. The deprecated `CodeInterpreterTool`, `allow_code_execution`, and `code_execution_mode` paths are not the recommended production pattern. Use a dedicated sandbox service such as [E2B Sandbox Tools](/en/tools/ai-ml/e2bsandboxtools), or another sandbox that gives you equivalent isolation, quotas, logging, and cleanup.
+
+
+## Choose an execution strategy
+
+| Strategy | Use it when | Production tradeoffs |
+| --- | --- | --- |
+| No code execution | The task can be solved with normal tools, structured outputs, or deterministic Python you wrote yourself. | Safest and simplest, but limits agent autonomy. |
+| Ephemeral remote sandbox | The agent needs to run one-off Python or shell commands. | Best default. Each call starts with a clean environment and minimizes state leakage. |
+| Persistent remote sandbox | The task needs a session with imports, files, or variables reused across steps. | Faster for multi-step work, but state accumulates. Keep secrets out and close the sandbox when the run ends. |
+| Existing managed sandbox | Another system creates and owns the sandbox lifecycle. | Useful for enterprise policy controls. CrewAI should attach with the narrowest permissions and should not assume it can clean up the sandbox. |
+| Host execution | Only for fully trusted code written by your application team. | Do not expose this path to autonomous agents or untrusted prompts. |
+
+## Recommended pattern with E2B
+
+Use the E2B tools when agents need shell, Python, or filesystem access without exposing the host environment:
+
+- `E2BPythonTool` for Python snippets, data analysis, and rich outputs.
+- `E2BExecTool` for shell commands such as package checks or test commands.
+- `E2BFileTool` for sandbox file reads and writes.
+
+Install the E2B extra and configure the API key outside your prompt templates:
+
+```shell
+uv add "crewai-tools[e2b]"
+export E2B_API_KEY=""
+```
+
+
+Ephemeral mode is the safest default. Leave `persistent=False` unless the task really needs state across calls.
+
+
+```python Code
+from crewai import Agent, Crew, Process, Task
+from crewai_tools import E2BPythonTool
+
+python_sandbox = E2BPythonTool(
+ # Fresh sandbox per call. Good default for untrusted generated code.
+ persistent=False,
+ # Idle timeout for the sandbox lifecycle.
+ sandbox_timeout=120,
+)
+
+analyst = Agent(
+ role="Sandboxed data analyst",
+ goal="Run small Python checks in an isolated sandbox and report the result",
+ backstory="You execute only the code needed for the task and summarize stdout, stderr, and errors.",
+ tools=[python_sandbox],
+ verbose=True,
+)
+
+analysis_task = Task(
+ description=(
+ "Use Python to compute the mean of [1, 2, 3, 4, 5]. "
+ "Return the code you ran, stdout, and the final answer."
+ ),
+ expected_output="The computed mean plus the sandbox execution result.",
+ agent=analyst,
+)
+
+crew = Crew(
+ agents=[analyst],
+ tasks=[analysis_task],
+ process=Process.sequential,
+)
+
+result = crew.kickoff()
+```
+
+## Local repository and environment access
+
+A sandbox should not automatically see your whole repository or process environment. Give it a small, explicit working set:
+
+1. Select only the files the task needs.
+2. Copy those files into the sandbox.
+3. Run the command with per-call environment variables.
+4. Copy back only the artifacts you expect.
+5. Treat all outputs as untrusted until validated.
+
+If one workflow must share files between shell and file tools, create or manage a sandbox outside the crew and pass the same `sandbox_id` to each tool. Do not assume two separate persistent tool instances share state unless they attach to the same sandbox.
+
+```python Code
+from crewai_tools import E2BExecTool, E2BFileTool
+
+sandbox_id = "sbx_..." # Created by your control plane or sandbox manager.
+
+exec_tool = E2BExecTool(sandbox_id=sandbox_id, sandbox_timeout=300)
+file_tool = E2BFileTool(sandbox_id=sandbox_id, sandbox_timeout=300)
+```
+
+## Production safety checklist
+
+Before enabling generated-code execution in a crew, define these controls:
+
+- **Isolation:** Run code outside the host process. Prefer a fresh sandbox per tool call.
+- **File scope:** Copy in only the required files. Never mount the full repository by default.
+- **Secrets:** Do not place long-lived credentials in prompts, task descriptions, or persistent sandboxes. Use short-lived credentials with the smallest possible scope.
+- **Timeouts:** Set per-call tool timeouts and sandbox idle timeouts. Surface timeout failures to the crew instead of retrying forever.
+- **Resource limits:** Configure CPU, memory, network, and filesystem limits in the sandbox provider or surrounding infrastructure.
+- **Network policy:** Disable outbound network access unless the task needs it. If it does, allowlist destinations.
+- **Package installs:** Pin dependencies or use prebuilt templates. Avoid letting agents install arbitrary packages in production.
+- **Human review:** Require approval before sandbox output writes to production systems, deploys code, sends email, or performs irreversible actions.
+- **Observability:** Log the task id, agent role, sandbox id, command/code hash, exit code, stdout/stderr summary, duration, and artifact paths. Use [CrewAI Tracing](/en/observability/tracing) to correlate tool calls with crew runs.
+- **Cleanup:** Kill or expire sandboxes after use. Persistent sandboxes need an owner, max lifetime, and explicit close path.
+
+## Failure handling
+
+Generated-code tasks should fail closed. Design the surrounding Flow or application code so a sandbox failure returns a clear error instead of silently falling back to host execution.
+
+Recommended error states:
+
+| Failure | Response |
+| --- | --- |
+| Sandbox cannot start | Stop the task and report the provider error. |
+| Timeout | Stop the command, preserve logs, and ask for narrower code or more resources. |
+| Package install fails | Return the install error and require an approved dependency/template update. |
+| Output validation fails | Reject the result and retry with the validation error as context. |
+| Sandbox cleanup fails | Mark the run degraded, alert operations, and expire the sandbox from the provider console. |
+
+## Flow-first production shape
+
+For production systems, wrap code execution inside a Flow so you can keep policy, validation, and side effects outside the agent loop:
+
+```mermaid
+graph TD
+ Start((Start)) --> Flow[Flow Orchestrator]
+ Flow --> Prepare[Select inputs and sandbox policy]
+ Prepare --> Crew[Crew with sandbox tool]
+ Crew --> Validate[Validate stdout, files, and structured result]
+ Validate -- accepted --> Apply[Apply approved side effect]
+ Validate -- rejected --> Review[Human review or retry]
+ Apply --> End((End))
+ Review --> End
+```
+
+This keeps the agent responsible for proposing and running bounded code, while your application remains responsible for policy decisions, approvals, and production writes.
+
+## Related docs
+
+- [E2B Sandbox Tools](/en/tools/ai-ml/e2bsandboxtools)
+- [Production Architecture](/en/concepts/production-architecture)
+- [Task Guardrails](/en/concepts/tasks#task-guardrails)
+- [Execution Hooks](/en/learn/execution-hooks)
+- [CrewAI Tracing](/en/observability/tracing)
diff --git a/docs/edge/en/tools/ai-ml/codeinterpretertool.mdx b/docs/edge/en/tools/ai-ml/codeinterpretertool.mdx
index 660c98a608..274b13bc5b 100644
--- a/docs/edge/en/tools/ai-ml/codeinterpretertool.mdx
+++ b/docs/edge/en/tools/ai-ml/codeinterpretertool.mdx
@@ -8,7 +8,7 @@ mode: "wide"
# `CodeInterpreterTool`
- **Deprecated:** `CodeInterpreterTool` has been removed from `crewai-tools`. The `allow_code_execution` and `code_execution_mode` parameters on `Agent` are also deprecated. Use a dedicated sandbox service — [E2B](https://e2b.dev) or [Modal](https://modal.com) — for secure, isolated code execution.
+ **Deprecated:** `CodeInterpreterTool` has been removed from `crewai-tools`. The `allow_code_execution` and `code_execution_mode` parameters on `Agent` are also deprecated. Use a dedicated sandbox service — [E2B](https://e2b.dev) or [Modal](https://modal.com) — for secure, isolated code execution. For current production guidance, see [Production Code Execution](/en/guides/tools/production-code-execution).
## Description
diff --git a/docs/edge/en/tools/overview.mdx b/docs/edge/en/tools/overview.mdx
index fb9926f0b5..f7af8a69a6 100644
--- a/docs/edge/en/tools/overview.mdx
+++ b/docs/edge/en/tools/overview.mdx
@@ -103,6 +103,9 @@ Need a specific tool? Here are some popular choices:
Execute Python code
+
+ Run generated code safely in isolated sandboxes
+
Access AWS S3 files