Atespace: per-tenant scoping for the actor lifecycle#280
Atespace: per-tenant scoping for the actor lifecycle#280Haven Xia (HavenXia) wants to merge 9 commits into
Conversation
e9eddfb to
1728f96
Compare
838b199 to
26d0074
Compare
Tim Hockin (thockin)
left a comment
There was a problem hiding this comment.
Currently an atespace is created merely by using it in an actor. I think we want to move to Atespaces being explicitly created and managed, right? I'm fine with that as a followup, or as more commits in here.
| // Lists all known actors. Returns a page of actors and a next page token. | ||
| ListActors(ctx context.Context, pageSize int32, pageToken string) ([]*ateapipb.Actor, string, error) | ||
| // Lists actors in the given atespace (scoped scan). Returns a page of actors and a next page token. | ||
| ListActors(ctx context.Context, atespace string, pageSize int32, pageToken string) ([]*ateapipb.Actor, string, error) |
There was a problem hiding this comment.
Does "" mean "all atespaces" ?
There was a problem hiding this comment.
Yes — for ListActors only, an empty atespace means "all atespaces" (it backs kubectl ate get actors -A). Every other operation requires a non-empty atespace. I documented it at https://github.com/agent-substrate/substrate/pull/280/changes#diff-101ee4ffedf6cf67cec6ea55a251c663205e7333f629d1e3edbb397b2c6e3017R49 but not the interface here, added comments to make it clear to readers.
| Selector worker_selector = 4; | ||
|
|
||
| // The atespace to create the actor into. | ||
| string atespace = 5; |
There was a problem hiding this comment.
I appreciate the caution of no renumbering proto tags, but we should strive to make this as readable as possible, so I would put it next to ID
There was a problem hiding this comment.
Agree, we are still in a phase, we can break API. Just put a comment in PR that this is a breaking change and all actors needs to be deleted.
In any case, all existing actors will not work, since they will miss namespace.
There was a problem hiding this comment.
Resolved by ActorReference — actor_id and atespace are now a single ActorReference ref = 1 at the top of each request, so there's no separate atespace field left to reposition.
| func init() { | ||
| createActorCmd.Flags().StringVarP(&templateFlag, "template", "t", "", "Template to derive the actor from in <namespace>/<name> format (required)") | ||
| _ = createActorCmd.MarkFlagRequired("template") | ||
| createActorCmd.Flags().StringVar(&atespaceFlag, "atespace", "", "Atespace (tenant) to create the actor in (required)") |
There was a problem hiding this comment.
Let's avoid the word "tenant"
There was a problem hiding this comment.
removed
|
|
||
| createReq := &ateapipb.CreateActorRequest{ | ||
| ActorId: actorID, | ||
| Atespace: at.ObjectMeta.Namespace, |
There was a problem hiding this comment.
This is not obvious to me -- we should never assume k8s namespaces == atespaces -- they are not the same.
We are taking a snapshot, so we should probably use an atespace that is specifically put aside for this. If we move to CRUD of atespaces as a first-class resource, we should "reserve" any atespace whose name being with "ate-", so this can be something like "ate-golden"?
There was a problem hiding this comment.
Sounds good, will update to use "ate-golden".
| dnsDomainParts := strings.Split("."+resources.ActorDNSSuffix+".", ".") | ||
| dnsDomainRef := strings.Join(dnsDomainParts, `\.`) | ||
| directives = append(directives, fmt.Sprintf(` match "^%s%s$"`, resources.ActorIDRegexPattern, dnsDomainRef)) | ||
| directives = append(directives, fmt.Sprintf(` match "^%s\.%s%s$"`, resources.ActorIDRegexPattern, resources.ActorIDRegexPattern, dnsDomainRef)) |
There was a problem hiding this comment.
This is fishy -- why do we need an explicit \. in one place but not the other?
Debugger says: the input is ".actors.resources.substrate.ate.dev." (leading and trailing dots). That makes dnsDomainParts have empty first and last entries. That means dnsDomainRef has leading and trailing \.. That makes this correct.
It's WEIRD but correct. I'll send a followup to document it
There was a problem hiding this comment.
I rewrote it to be explicit — escape the suffix's dots and spell out every . in the pattern to improve readability.
| // "<actor_id>.<atespace>.actors.resources.substrate.ate.dev" (a trailing dot is | ||
| // tolerated) into its atespace and actor id, validating both. It does not accept | ||
| // a host:port; callers must strip the port first. | ||
| func ParseActorDNSName(name string) (atespace, actorID string, err error) { |
There was a problem hiding this comment.
This seems like a perfect target for a small unit test?
There was a problem hiding this comment.
Thanks, added.
| Selector worker_selector = 4; | ||
|
|
||
| // The atespace to create the actor into. | ||
| string atespace = 5; |
There was a problem hiding this comment.
Agree, we are still in a phase, we can break API. Just put a comment in PR that this is a breaking change and all actors needs to be deleted.
In any case, all existing actors will not work, since they will miss namespace.
| if req.GetAtespace() == "" { | ||
| return status.Error(codes.InvalidArgument, "atespace is required") | ||
| } | ||
| if err := resources.ValidateAtespace(req.GetAtespace()); err != nil { |
There was a problem hiding this comment.
I am curious, why "resources.ValidateAtespace" does not validate for emptry string.
I see similar behavior is defined for resources.ValidateActorID too.
Might be we need to fix all the "resource validator" to check for empty string.
There was a problem hiding this comment.
We need to bring some consistency to validation early. K8s left it all open-coded for far too long and it got very painful.
There was a problem hiding this comment.
Marked that as a TODO, will create an issue to track the unification of actor/atespace validation across the control API RPCs.
| if req.GetActorId() == "" { | ||
| return status.Error(codes.InvalidArgument, "id is required") | ||
| } | ||
| if req.GetAtespace() == "" { |
There was a problem hiding this comment.
have you considered to have full valudation? resources.ValidateNamespace.
It might worth to add similar full validation for actorID itself.
P.S - please make similar fix for all APIs, if you decide to accept the comment.
There was a problem hiding this comment.
While I 200% agree that we need rigorous validation done at the right moment in the stack, I don't think this PR should block on it. We can TODO it and come back
There was a problem hiding this comment.
Marked that as a TODO, will create an issue to track the unification of actor/atespace validation across the control API RPCs.
f1f120d to
6ad7922
Compare
An actor name is only unique within its atespace, so the two always travel together. Add ActorReference{atespace, name} and use it across the Get/Create/Update/Suspend/Pause/Resume/Delete actor requests in place of separate actor_id + atespace fields. The stored Actor object and Redis key layout are unchanged.
Golden actors are system actors, not user actors; don't put them in an atespace named after the template's k8s namespace (a k8s namespace is not an atespace). The "ate-" prefix is reserved for system atespaces.
6ad7922 to
960d07f
Compare
This is part of solution for #21.
First incremental slice of the atespace design for actors — part of #21. An atespace is a mandatory tenant boundary that every actor belongs to. It's folded into the actor's identity and storage key (
actor:<atespace>:<id>), so list/get/delete within a tenant is a cheap key-prefix operation, and actors in different atespaces can reuse the same id without colliding.This PR adds atespace through the actor lifecycle end-to-end (proto → store → control API →
kubectl-ate).It doesn't touch DNS, snapshots, scheduling, or auth. Landing it on its own so the future changes are additive, and everything that isn't actor-CRUD is explicitly out of scope below.
This PR changes:
keyed actor:<atespace>:<id>-GetActor/DeleteActor/ListActorstake an atespace. Listing is a scoped SCANactor:<atespace>:*, or SCANactor:*for all atespaces.DNS-1123label. The syncer's dead-worker recovery is atespace-aware by addingWorker.actor_atespace.CreateAtespace / GetAtespace / ListAtespaces / DeleteAtespace.CreateActornow requires the atespace to exist first (FailedPreconditionotherwise);DeleteAtespaceonly removes an empty atespace.kubectl-ate:--atespace/-aon every actor subcommand (create/get/delete/resume/suspend/pause/logs).-A/--all-atespacesto list across all tenants.ATESPACEcolumn in the actor table, the existing namespace column is renamedTEMPLATE NSto disambiguate it from the atespace.Examples
Scope a listing to one tenant (-a is shorthand for --atespace):
Atespace Creation
Actor Creation
Get by atespaces
Resume & Suspend
Deletion
Scope / non-goals
Deferred to later atespace increments (intentionally not in this PR): cascade delete of a non-empty atespace (soft-delete + finalizers; for now DeleteAtespace only removes an empty atespace), DNS names, snapshot paths, template grants, the worker's own (system) atespace, and quota.