Skip to content

Colocate new public-plane tenants: onboarding region picker + plane->bucket storage mappings #3015

Description

@SeanWhelan

New tenants currently get gs://estuary-trial (GCS) as their default storage mapping while landing on aws-us-east-1-c1, so every new public-plane signup is cross-cloud from day one: fragments are written AWS->GCS and read back GCS->AWS, with the egress and request costs that implies, growing with every signup (cost detail tracked internally in estuary/sre#29). This issue is the activation half of stopping that growth: once sre#29 has created a colocated trial bucket per public data plane, new tenants get pointed at the bucket matching their plane.

This only touches new tenants. Existing tenants on gs://estuary-trial are a separate and much harder migration workstream, deliberately out of scope here.

Where things stand today

  • Provisioning happens in provision_tenant (crates/control-plane-api/src/directives/beta_onboard.rs), invoked by the betaOnboard directive handler in crates/agent/src/directives/beta_onboard.rs. It inserts two storage_mappings rows: tenant/ (stores: estuary-trial with prefix: collection-data/, plus a data_planes array) and recovery/tenant/ (stores only, no data_planes array).
  • The "default" data plane is just the first element of the mapping's data_planes array (crates/validation/src/lib.rs, walk_prefix). The public_planes CTE in provision_tenant forces ops/dp/public/aws-us-east-1-c1 first and excludes the two deprecated cronut planes.
  • There is no signup-time plane selection today. betaOnboard claims are requestedTenant + survey with deny_unknown_fields; the UI submits them from src/directives/shared.ts in estuary/ui.
  • data_planes.config (json) is parsed as stack::DataPlane (crates/data-plane-controller/src/shared/stack.rs) and already carries per-plane data_buckets: Vec<url::Url>. Today that field only drives IAM provisioning during data-plane provisioning; nothing reads it to choose a trial bucket.

plane->bucket reference

We landed on a configurable field on the data-plane config rather than a junction table. Given the above, the proposal is an explicit optional field like trial_bucket: Option<url::Url> on stack::DataPlane, rather than overloading data_buckets with an ordering convention. The model round-trips through the Pulumi stack config, so adding the field needs coordinating with the infrastructure repo.

Work

  • Config: add trial_bucket (or whatever we name it) to stack::DataPlane and the corresponding infrastructure model; populate it for public planes as sre#29 creates buckets
  • Claims/directive: extend betaOnboard claims with a requested-plane field (unknown fields are currently rejected, so this is a clean change) and validate it against ops/dp/public/ planes
  • Onboarding UI: region/plane picker in estuary/ui (src/directives/shared.ts, src/directives/Onboard/*). The UI lists public planes only; raw bucket names never reach the frontend. The choice sets the tenant's default plane, i.e. the first element of the data_planes array
  • provision_tenant: resolve the colocated bucket for the chosen plane (it already queries data_planes, so reading the config field there is a small change) and write both the tenant/ and recovery/tenant/ stores against it. No GCS fallback store is needed, since a brand-new tenant has never written to GCS and has nothing to read back
  • Backward compatibility: if no plane is supplied, keep aws-us-east-1-c1 first as today, but resolve it to its colocated bucket rather than gs://estuary-trial

Dependencies

  • estuary/sre#29 (internal): colocated buckets and policies need to exist for the public planes, including bucket policy sign-off

Exit criteria

A new public-plane signup is colocated (compute and storage in the same cloud/region) from day one, with no GCS store in its mappings.

Caveat to note

The picker sets only the tenant's default plane. Tenants are not locked to one plane and can create tasks on other planes later. That's fine here because a new tenant starts clean, but it's the core reason migrating existing tenants is hard, and why that's split out.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions