Summary
Refactor our sandbox/runtime layer away from swe-rex toward a thinner, backend-native model that supports stateful execution across many container environments without installing or running an in-container service, and without modifying task images.
Background
Today agents talk to task environments through swe-rex. For every container this means installing a runtime server at startup and running it as a resident process exposed over a network port/tunnel. It got us going, but it no longer scales well.
Problems
- Image coupling. Every base image must be runtime-compatible; maintaining this across hundreds/thousands of heterogeneous images doesn't scale.
- Startup cost & reliability. Installing/launching a server on each cold start is network-dependent, adds latency, and fails often enough to hurt large-scale rollout/eval.
- Networking overhead. Reaching an in-container server needs an exposed port or tunnel, which is awkward and costly on serverless/remote backends and adds another thing that can break.
- Limited flexibility. The current model is hard to extend with richer stateful interactions and ties us to a single runtime implementation.
Proposed Direction
Introduce a small sandbox abstraction with two layers:
- A minimal, backend-native exec primitive — each backend only implements "run a command in the environment" using what it already provides. No installed service, no exposed port.
- A shared stateful-session layer on top — state lives inside the container via a persistent execution context, and this layer preserves it and returns structured results. Being backend-independent, every backend gets the same stateful behavior.
This keeps task images untouched, removes the resident-server and port requirements, and lets us add backends by implementing only the small primitive. The exact mechanism for keeping state and capturing output is left to prototyping. We'll provide a migration path so existing flows keep working during the transition.
Summary
Refactor our sandbox/runtime layer away from swe-rex toward a thinner, backend-native model that supports stateful execution across many container environments without installing or running an in-container service, and without modifying task images.
Background
Today agents talk to task environments through swe-rex. For every container this means installing a runtime server at startup and running it as a resident process exposed over a network port/tunnel. It got us going, but it no longer scales well.
Problems
Proposed Direction
Introduce a small sandbox abstraction with two layers:
This keeps task images untouched, removes the resident-server and port requirements, and lets us add backends by implementing only the small primitive. The exact mechanism for keeping state and capturing output is left to prototyping. We'll provide a migration path so existing flows keep working during the transition.