Artanis fleet-overseer automation: autonomous control loop orchestrating stress + healing + scaling + external-yield (on artanis-administrator-tick, approval-gated)

> **Owner-directed strict-bug-override:** GitHub issues in this repo are normally strict, reproducible bugs only (`apps/openagents.com/AGENTS.md`). This issue is filed under explicit owner direction. Discussion/commentary belongs in the Product Promises Forum.

Child of #6316.

## Why

The continuous stress harness, external-wins admission, reliability hardening, and throughput tuning all need a continuous operator. The owner wants **Artanis** to be that autonomous overseer -- watching fleet health + throughput + external demand, orchestrating the stress load (start/scale/back-off), triggering heal/scale/quarantine within approval-gated authority, and reporting -- as an **extension of the existing `artanis-administrator-tick` + `artanis-approval-gates`**, not a new system. Today Artanis runs a bounded live autonomous loop fenced by typed schemas and approval gates, but it does **not** read GPU/replica/throughput/demand; the GPU sensor (`glm-pool-heartbeat.ts`) runs as a separate tick. This wires Artanis onto that sensor as the orchestrator.

## Scope

- **New `artanis-fleet-overseer-tick.ts`** exporting `runArtanisFleetOverseerTick(db, deps)` + a `...Scheduled` Effect wrapper, mirroring `runArtanisAdminTick`/`runArtanisAdminTickScheduled` exactly: env-gated by `ARTANIS_FLEET_OVERSEER_ENABLED`, self-bounded cadence, every outcome a D1 row in `artanis_fleet_overseer_decisions`, schema-invalid -> `blocked`. Registered as one new `observedEffect('ArtanisFleet.tick', ...)` in the Worker `scheduled` `Promise.all` beside `ArtanisAdmin.tick`.
- **Watch (assembleContext):** read the existing `glmPoolHeartbeatRoutingStateOracle` (per-replica health/warm/draining + the new live-headroom read surface), aggregate throughput/goodput from telemetry, and external demand from the token-usage ledger. Do NOT build a new fleet collector -- the GLM heartbeat stays the sensor.
- **Decide:** reuse `artanisMindComplete` with a bounded `S.Union` action vocabulary validated by a typed schema.
- **Act -- authority split:**
  - **Autonomous (no spend, non-destructive):** start/scale/back-off the internal stress load (yield to external demand using the same external-wins signal); re-admit recovered / warm idle owned replicas; emit public-safe health/throughput reports.
  - **Owner-gated (pending `ArtanisApprovalGateRecord`, effective only with operator approval + receipt):** request paid scale-out (reuse `provider_call`/`deployment`); quarantine a replica (add a new `fleet_mutation` risky kind to `ArtanisRiskyActionKind` + `ARTANIS_RISKY_ACTION_KINDS` + `rollbackRequiredKinds`, with test + `INVARIANTS.md` update in the same change); any wallet spend/settlement (already gated, never widened).
- **Report:** fold a public-safe fleet health/throughput/goodput signal into the existing `artanis-health.ts` snapshot (new `ArtanisHealthSignalKind`) so stale/blocked fleet health structurally blocks overclaiming.

## Acceptance (measurable)

- The overseer tick runs on the scheduled handler, env-gated, self-bounded, with every outcome a D1 row; a schema-invalid mind proposal yields a `blocked` row and dispatches nothing.
- Autonomously starts/scales/backs-off the stress harness keyed on live external demand, with 0 external-request failures during a demand spike.
- A replica-quarantine or paid-scale-out proposal emits a pending approval gate and does NOT execute until `artanisApprovalGateEffective` (operator approval + receipt); wallet spend stays gated.
- A fleet health/throughput signal appears in the Artanis health snapshot and a stale/blocked signal blocks overclaiming.
- Public-safe: no raw origin URLs, IPs, bearer material, prompts, or wallet material in any projection.

## Refs

- `docs/inference/2026-06-25-glm-fleet-max-throughput-stress-and-artanis-overseer.md` (§5)
- `apps/openagents.com/workers/api/src/artanis-administrator-tick.ts`, `artanis-approval-gates.ts`, `artanis-scheduled-runner.ts`, `artanis-health.ts`, `artanis-spend.ts`, `artanis-mind.ts`, `inference/glm-pool-heartbeat.ts`
- The continuous stress harness, external-wins admission, and reliability hardening issues; #6316

Strict-bug-override: filed as direction work under owner mandate, not a strict bug.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Artanis fleet-overseer automation: autonomous control loop orchestrating stress + healing + scaling + external-yield (on artanis-administrator-tick, approval-gated) #6321

Why

Scope

Acceptance (measurable)

Refs

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Artanis fleet-overseer automation: autonomous control loop orchestrating stress + healing + scaling + external-yield (on artanis-administrator-tick, approval-gated) #6321

Description

Why

Scope

Acceptance (measurable)

Refs

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions