[RFC] Refactor the Sandbox Execution Layer

## Summary

Refactor our sandbox/runtime layer away from swe-rex toward a thinner, backend-native model that supports **stateful** execution across many container environments **without installing or running an in-container service**, and without modifying task images.

## Background

Today agents talk to task environments through swe-rex. For every container this means installing a runtime server at startup and running it as a resident process exposed over a network port/tunnel. It got us going, but it no longer scales well.

## Problems

1. **Image coupling.** Every base image must be runtime-compatible; maintaining this across hundreds/thousands of heterogeneous images doesn't scale.
2. **Startup cost & reliability.** Installing/launching a server on each cold start is network-dependent, adds latency, and fails often enough to hurt large-scale rollout/eval.
3. **Networking overhead.** Reaching an in-container server needs an exposed port or tunnel, which is awkward and costly on serverless/remote backends and adds another thing that can break.
4. **Limited flexibility.** The current model is hard to extend with richer stateful interactions and ties us to a single runtime implementation.

## Proposed Direction

Introduce a small sandbox abstraction with two layers:
1. **A minimal, backend-native exec primitive** — each backend only implements "run a command in the environment" using what it already provides. No installed service, no exposed port.
2. **A shared stateful-session layer** on top — state lives inside the container via a persistent execution context, and this layer preserves it and returns structured results. Being backend-independent, every backend gets the same stateful behavior.
This keeps task images untouched, removes the resident-server and port requirements, and lets us add backends by implementing only the small primitive. The exact mechanism for keeping state and capturing output is left to prototyping. We'll provide a migration path so existing flows keep working during the transition.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC] Refactor the Sandbox Execution Layer #67

Summary

Background

Problems

Proposed Direction

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[RFC] Refactor the Sandbox Execution Layer #67

Description

Summary

Background

Problems

Proposed Direction

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions