Skip to content

Atespace: per-tenant scoping for the actor lifecycle#280

Open
Haven Xia (HavenXia) wants to merge 9 commits into
agent-substrate:mainfrom
HavenXia:atespace-inc1
Open

Atespace: per-tenant scoping for the actor lifecycle#280
Haven Xia (HavenXia) wants to merge 9 commits into
agent-substrate:mainfrom
HavenXia:atespace-inc1

Conversation

@HavenXia

@HavenXia Haven Xia (HavenXia) commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

This is part of solution for #21.

First incremental slice of the atespace design for actors — part of #21. An atespace is a mandatory tenant boundary that every actor belongs to. It's folded into the actor's identity and storage key (actor:<atespace>:<id>), so list/get/delete within a tenant is a cheap key-prefix operation, and actors in different atespaces can reuse the same id without colliding.

This PR adds atespace through the actor lifecycle end-to-end (proto → store → control API → kubectl-ate).

It doesn't touch DNS, snapshots, scheduling, or auth. Landing it on its own so the future changes are additive, and everything that isn't actor-CRUD is explicitly out of scope below.

This PR changes:

  • Proto — every actor RPC request carries an atespace; Actor carries it as part of its identity.
  • Store — actors are keyed actor:<atespace>:<id> - GetActor/DeleteActor/ListActors take an atespace. Listing is a scoped SCAN actor:<atespace>:*, or SCAN actor:* for all atespaces.
  • Control API — create / get / delete / suspend / pause / resume / update are all atespace-scoped. atespace is required and validated as a DNS-1123 label. The syncer's dead-worker recovery is atespace-aware by adding Worker.actor_atespace.
  • Atespace object + CRUD — atespace with CreateAtespace / GetAtespace / ListAtespaces / DeleteAtespace. CreateActor now requires the atespace to exist first (FailedPrecondition otherwise); DeleteAtespace only removes an empty atespace.
  • changes on kubectl-ate:
    • --atespace / -a on every actor subcommand (create/get/delete/resume/suspend/pause/logs).
    • -A / --all-atespaces to list across all tenants.
    • Add an ATESPACE column in the actor table, the existing namespace column is renamed TEMPLATE NS to disambiguate it from the atespace.
    • Create / get / delete atespace subcommands.

Examples

Scope a listing to one tenant (-a is shorthand for --atespace):

Atespace Creation

$ kubectl ate create atespace team-a
NAME
team-a
$ kubectl ate create atespace team-b
NAME
team-b

Actor Creation

$ kubectl ate create actor test -t ate-demo-counter/counter --atespace team-a
ATESPACE   TEMPLATE NS        TEMPLATE   ID     STATUS             ATEOM POD   ATEOM IP   VERSION
team-a     ate-demo-counter   counter    test   STATUS_SUSPENDED   <none>                 1

$ kubectl ate create actor test2 -t ate-demo-counter/counter --atespace team-a
ATESPACE   TEMPLATE NS        TEMPLATE   ID      STATUS             ATEOM POD   ATEOM IP   VERSION
team-a     ate-demo-counter   counter    test2   STATUS_SUSPENDED   <none>                 1

$ kubectl ate create actor test -t ate-demo-counter/counter --atespace team-b
ATESPACE   TEMPLATE NS        TEMPLATE   ID     STATUS             ATEOM POD   ATEOM IP   VERSION
team-b     ate-demo-counter   counter    test   STATUS_SUSPENDED   <none>                 1

# Fail when atespace not exist
$ kubectl ate create actor test -t ate-demo-counter/counter --atespace team-c
Error: failed to create actor: rpc error: code = FailedPrecondition desc = Atespace team-c not found

Get by atespaces

$ kubectl ate get actors -A
ATESPACE   TEMPLATE NS        TEMPLATE   ID      STATUS             ATEOM POD   ATEOM IP   VERSION
team-a     ate-demo-counter   counter    test    STATUS_SUSPENDED   <none>                 1
team-a     ate-demo-counter   counter    test2   STATUS_SUSPENDED   <none>                 1
team-b     ate-demo-counter   counter    test    STATUS_SUSPENDED   <none>                 1

$ kubectl ate get actors -a team-a
ATESPACE   TEMPLATE NS        TEMPLATE   ID      STATUS             ATEOM POD   ATEOM IP   VERSION
team-a     ate-demo-counter   counter    test    STATUS_SUSPENDED   <none>                 1
team-a     ate-demo-counter   counter    test2   STATUS_SUSPENDED   <none>                 1

$ kubectl ate get actors -a team-b
ATESPACE   TEMPLATE NS        TEMPLATE   ID     STATUS             ATEOM POD   ATEOM IP   VERSION
team-b     ate-demo-counter   counter    test   STATUS_SUSPENDED   <none>                 1

Resume & Suspend

$ kubectl ate resume actor test -a team-a
ATESPACE   TEMPLATE NS        TEMPLATE   ID     STATUS           ATEOM POD                                              ATEOM IP     VERSION
team-a     ate-demo-counter   counter    test   STATUS_RUNNING   ate-demo-counter/counter-deployment-5d9b77d6fd-r846n   10.52.3.86   3

$ kubectl ate get actors -a team-a
ATESPACE   TEMPLATE NS        TEMPLATE   ID      STATUS             ATEOM POD                                              ATEOM IP     VERSION
team-a     ate-demo-counter   counter    test    STATUS_RUNNING     ate-demo-counter/counter-deployment-5d9b77d6fd-r846n   10.52.3.86   3
team-a     ate-demo-counter   counter    test2   STATUS_SUSPENDED   <none>                                                              1

$ kubectl ate suspend actor test -a team-a
ATESPACE   TEMPLATE NS        TEMPLATE   ID     STATUS             ATEOM POD   ATEOM IP   VERSION
team-a     ate-demo-counter   counter    test   STATUS_SUSPENDED   <none>                 5

Deletion

$kubectl ate delete actor test2 -a team-a
actor "test2" deleted

# Fail if deleting a non-empty atespace
$ kubectl ate delete atespace team-a
Error: failed to delete atespace: rpc error: code = FailedPrecondition desc = Atespace team-a is not empty

$ kubectl ate delete actor test -a team-a
actor "test" deleted

$ kubectl ate delete atespace team-a
atespace "team-a" deleted

Scope / non-goals

Deferred to later atespace increments (intentionally not in this PR): cascade delete of a non-empty atespace (soft-delete + finalizers; for now DeleteAtespace only removes an empty atespace), DNS names, snapshot paths, template grants, the worker's own (system) atespace, and quota.

  • Tests pass
  • Appropriate documentation

@HavenXia Haven Xia (HavenXia) changed the title Atespace inc1 The initial shape of atespace Jun 19, 2026
@HavenXia Haven Xia (HavenXia) changed the title The initial shape of atespace Atespace: per-tenant scoping for the actor lifecycle Jun 19, 2026
@HavenXia Haven Xia (HavenXia) marked this pull request as ready for review June 19, 2026 19:41

@thockin Tim Hockin (thockin) left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently an atespace is created merely by using it in an actor. I think we want to move to Atespaces being explicitly created and managed, right? I'm fine with that as a followup, or as more commits in here.

Comment thread pkg/proto/ateapipb/ateapi.proto
Comment thread pkg/proto/ateapipb/ateapi.proto
// Lists all known actors. Returns a page of actors and a next page token.
ListActors(ctx context.Context, pageSize int32, pageToken string) ([]*ateapipb.Actor, string, error)
// Lists actors in the given atespace (scoped scan). Returns a page of actors and a next page token.
ListActors(ctx context.Context, atespace string, pageSize int32, pageToken string) ([]*ateapipb.Actor, string, error)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does "" mean "all atespaces" ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes — for ListActors only, an empty atespace means "all atespaces" (it backs kubectl ate get actors -A). Every other operation requires a non-empty atespace. I documented it at https://github.com/agent-substrate/substrate/pull/280/changes#diff-101ee4ffedf6cf67cec6ea55a251c663205e7333f629d1e3edbb397b2c6e3017R49 but not the interface here, added comments to make it clear to readers.

Comment thread pkg/proto/ateapipb/ateapi.proto Outdated
Selector worker_selector = 4;

// The atespace to create the actor into.
string atespace = 5;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate the caution of no renumbering proto tags, but we should strive to make this as readable as possible, so I would put it next to ID

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, we are still in a phase, we can break API. Just put a comment in PR that this is a breaking change and all actors needs to be deleted.

In any case, all existing actors will not work, since they will miss namespace.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved by ActorReference — actor_id and atespace are now a single ActorReference ref = 1 at the top of each request, so there's no separate atespace field left to reposition.

func init() {
createActorCmd.Flags().StringVarP(&templateFlag, "template", "t", "", "Template to derive the actor from in <namespace>/<name> format (required)")
_ = createActorCmd.MarkFlagRequired("template")
createActorCmd.Flags().StringVar(&atespaceFlag, "atespace", "", "Atespace (tenant) to create the actor in (required)")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's avoid the word "tenant"

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed


createReq := &ateapipb.CreateActorRequest{
ActorId: actorID,
Atespace: at.ObjectMeta.Namespace,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not obvious to me -- we should never assume k8s namespaces == atespaces -- they are not the same.

We are taking a snapshot, so we should probably use an atespace that is specifically put aside for this. If we move to CRUD of atespaces as a first-class resource, we should "reserve" any atespace whose name being with "ate-", so this can be something like "ate-golden"?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, will update to use "ate-golden".

Comment thread cmd/atenet/internal/dns/corefile.go Outdated
dnsDomainParts := strings.Split("."+resources.ActorDNSSuffix+".", ".")
dnsDomainRef := strings.Join(dnsDomainParts, `\.`)
directives = append(directives, fmt.Sprintf(` match "^%s%s$"`, resources.ActorIDRegexPattern, dnsDomainRef))
directives = append(directives, fmt.Sprintf(` match "^%s\.%s%s$"`, resources.ActorIDRegexPattern, resources.ActorIDRegexPattern, dnsDomainRef))

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fishy -- why do we need an explicit \. in one place but not the other?

Debugger says: the input is ".actors.resources.substrate.ate.dev." (leading and trailing dots). That makes dnsDomainParts have empty first and last entries. That means dnsDomainRef has leading and trailing \.. That makes this correct.

It's WEIRD but correct. I'll send a followup to document it

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrote it to be explicit — escape the suffix's dots and spell out every . in the pattern to improve readability.

// "<actor_id>.<atespace>.actors.resources.substrate.ate.dev" (a trailing dot is
// tolerated) into its atespace and actor id, validating both. It does not accept
// a host:port; callers must strip the port first.
func ParseActorDNSName(name string) (atespace, actorID string, err error) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a perfect target for a small unit test?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, added.

Comment thread pkg/proto/ateapipb/ateapi.proto Outdated
Selector worker_selector = 4;

// The atespace to create the actor into.
string atespace = 5;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, we are still in a phase, we can break API. Just put a comment in PR that this is a breaking change and all actors needs to be deleted.

In any case, all existing actors will not work, since they will miss namespace.

if req.GetAtespace() == "" {
return status.Error(codes.InvalidArgument, "atespace is required")
}
if err := resources.ValidateAtespace(req.GetAtespace()); err != nil {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious, why "resources.ValidateAtespace" does not validate for emptry string.
I see similar behavior is defined for resources.ValidateActorID too.
Might be we need to fix all the "resource validator" to check for empty string.

Julian Gutierrez Oschmann (@juli4n) - WDYT?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to bring some consistency to validation early. K8s left it all open-coded for far too long and it got very painful.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marked that as a TODO, will create an issue to track the unification of actor/atespace validation across the control API RPCs.

if req.GetActorId() == "" {
return status.Error(codes.InvalidArgument, "id is required")
}
if req.GetAtespace() == "" {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have you considered to have full valudation? resources.ValidateNamespace.

It might worth to add similar full validation for actorID itself.

P.S - please make similar fix for all APIs, if you decide to accept the comment.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I 200% agree that we need rigorous validation done at the right moment in the stack, I don't think this PR should block on it. We can TODO it and come back

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marked that as a TODO, will create an issue to track the unification of actor/atespace validation across the control API RPCs.

@HavenXia Haven Xia (HavenXia) force-pushed the atespace-inc1 branch 2 times, most recently from f1f120d to 6ad7922 Compare June 27, 2026 00:12
An actor name is only unique within its atespace, so the two always travel together. Add ActorReference{atespace, name} and use it across the Get/Create/Update/Suspend/Pause/Resume/Delete actor requests in place of separate actor_id + atespace fields. The stored Actor object and Redis key layout are unchanged.
Golden actors are system actors, not user actors; don't put them in an atespace named after the template's k8s namespace (a k8s namespace is not an atespace). The "ate-" prefix is reserved for system atespaces.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants