Skip to content

[RFC] Refactor the Sandbox Execution Layer #67

Description

@yyDing1

Summary

Refactor our sandbox/runtime layer away from swe-rex toward a thinner, backend-native model that supports stateful execution across many container environments without installing or running an in-container service, and without modifying task images.

Background

Today agents talk to task environments through swe-rex. For every container this means installing a runtime server at startup and running it as a resident process exposed over a network port/tunnel. It got us going, but it no longer scales well.

Problems

  1. Image coupling. Every base image must be runtime-compatible; maintaining this across hundreds/thousands of heterogeneous images doesn't scale.
  2. Startup cost & reliability. Installing/launching a server on each cold start is network-dependent, adds latency, and fails often enough to hurt large-scale rollout/eval.
  3. Networking overhead. Reaching an in-container server needs an exposed port or tunnel, which is awkward and costly on serverless/remote backends and adds another thing that can break.
  4. Limited flexibility. The current model is hard to extend with richer stateful interactions and ties us to a single runtime implementation.

Proposed Direction

Introduce a small sandbox abstraction with two layers:

  1. A minimal, backend-native exec primitive — each backend only implements "run a command in the environment" using what it already provides. No installed service, no exposed port.
  2. A shared stateful-session layer on top — state lives inside the container via a persistent execution context, and this layer preserves it and returns structured results. Being backend-independent, every backend gets the same stateful behavior.
    This keeps task images untouched, removes the resident-server and port requirements, and lets us add backends by implementing only the small primitive. The exact mechanism for keeping state and capturing output is left to prototyping. We'll provide a migration path so existing flows keep working during the transition.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions