You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Owner-directed strict-bug-override: GitHub issues in this repo are normally strict, reproducible bugs only (apps/openagents.com/AGENTS.md). This issue is filed under explicit owner direction. Discussion/commentary belongs in the Product Promises Forum.
The continuous stress harness, external-wins admission, reliability hardening, and throughput tuning all need a continuous operator. The owner wants Artanis to be that autonomous overseer -- watching fleet health + throughput + external demand, orchestrating the stress load (start/scale/back-off), triggering heal/scale/quarantine within approval-gated authority, and reporting -- as an extension of the existing artanis-administrator-tick + artanis-approval-gates, not a new system. Today Artanis runs a bounded live autonomous loop fenced by typed schemas and approval gates, but it does not read GPU/replica/throughput/demand; the GPU sensor (glm-pool-heartbeat.ts) runs as a separate tick. This wires Artanis onto that sensor as the orchestrator.
Scope
New artanis-fleet-overseer-tick.ts exporting runArtanisFleetOverseerTick(db, deps) + a ...Scheduled Effect wrapper, mirroring runArtanisAdminTick/runArtanisAdminTickScheduled exactly: env-gated by ARTANIS_FLEET_OVERSEER_ENABLED, self-bounded cadence, every outcome a D1 row in artanis_fleet_overseer_decisions, schema-invalid -> blocked. Registered as one new observedEffect('ArtanisFleet.tick', ...) in the Worker scheduledPromise.all beside ArtanisAdmin.tick.
Watch (assembleContext): read the existing glmPoolHeartbeatRoutingStateOracle (per-replica health/warm/draining + the new live-headroom read surface), aggregate throughput/goodput from telemetry, and external demand from the token-usage ledger. Do NOT build a new fleet collector -- the GLM heartbeat stays the sensor.
Decide: reuse artanisMindComplete with a bounded S.Union action vocabulary validated by a typed schema.
Act -- authority split:
Autonomous (no spend, non-destructive): start/scale/back-off the internal stress load (yield to external demand using the same external-wins signal); re-admit recovered / warm idle owned replicas; emit public-safe health/throughput reports.
Owner-gated (pending ArtanisApprovalGateRecord, effective only with operator approval + receipt): request paid scale-out (reuse provider_call/deployment); quarantine a replica (add a new fleet_mutation risky kind to ArtanisRiskyActionKind + ARTANIS_RISKY_ACTION_KINDS + rollbackRequiredKinds, with test + INVARIANTS.md update in the same change); any wallet spend/settlement (already gated, never widened).
Report: fold a public-safe fleet health/throughput/goodput signal into the existing artanis-health.ts snapshot (new ArtanisHealthSignalKind) so stale/blocked fleet health structurally blocks overclaiming.
Acceptance (measurable)
The overseer tick runs on the scheduled handler, env-gated, self-bounded, with every outcome a D1 row; a schema-invalid mind proposal yields a blocked row and dispatches nothing.
Autonomously starts/scales/backs-off the stress harness keyed on live external demand, with 0 external-request failures during a demand spike.
A replica-quarantine or paid-scale-out proposal emits a pending approval gate and does NOT execute until artanisApprovalGateEffective (operator approval + receipt); wallet spend stays gated.
A fleet health/throughput signal appears in the Artanis health snapshot and a stale/blocked signal blocks overclaiming.
Public-safe: no raw origin URLs, IPs, bearer material, prompts, or wallet material in any projection.
Child of #6316.
Why
The continuous stress harness, external-wins admission, reliability hardening, and throughput tuning all need a continuous operator. The owner wants Artanis to be that autonomous overseer -- watching fleet health + throughput + external demand, orchestrating the stress load (start/scale/back-off), triggering heal/scale/quarantine within approval-gated authority, and reporting -- as an extension of the existing
artanis-administrator-tick+artanis-approval-gates, not a new system. Today Artanis runs a bounded live autonomous loop fenced by typed schemas and approval gates, but it does not read GPU/replica/throughput/demand; the GPU sensor (glm-pool-heartbeat.ts) runs as a separate tick. This wires Artanis onto that sensor as the orchestrator.Scope
artanis-fleet-overseer-tick.tsexportingrunArtanisFleetOverseerTick(db, deps)+ a...ScheduledEffect wrapper, mirroringrunArtanisAdminTick/runArtanisAdminTickScheduledexactly: env-gated byARTANIS_FLEET_OVERSEER_ENABLED, self-bounded cadence, every outcome a D1 row inartanis_fleet_overseer_decisions, schema-invalid ->blocked. Registered as one newobservedEffect('ArtanisFleet.tick', ...)in the WorkerscheduledPromise.allbesideArtanisAdmin.tick.glmPoolHeartbeatRoutingStateOracle(per-replica health/warm/draining + the new live-headroom read surface), aggregate throughput/goodput from telemetry, and external demand from the token-usage ledger. Do NOT build a new fleet collector -- the GLM heartbeat stays the sensor.artanisMindCompletewith a boundedS.Unionaction vocabulary validated by a typed schema.ArtanisApprovalGateRecord, effective only with operator approval + receipt): request paid scale-out (reuseprovider_call/deployment); quarantine a replica (add a newfleet_mutationrisky kind toArtanisRiskyActionKind+ARTANIS_RISKY_ACTION_KINDS+rollbackRequiredKinds, with test +INVARIANTS.mdupdate in the same change); any wallet spend/settlement (already gated, never widened).artanis-health.tssnapshot (newArtanisHealthSignalKind) so stale/blocked fleet health structurally blocks overclaiming.Acceptance (measurable)
blockedrow and dispatches nothing.artanisApprovalGateEffective(operator approval + receipt); wallet spend stays gated.Refs
docs/inference/2026-06-25-glm-fleet-max-throughput-stress-and-artanis-overseer.md(§5)apps/openagents.com/workers/api/src/artanis-administrator-tick.ts,artanis-approval-gates.ts,artanis-scheduled-runner.ts,artanis-health.ts,artanis-spend.ts,artanis-mind.ts,inference/glm-pool-heartbeat.tsStrict-bug-override: filed as direction work under owner mandate, not a strict bug.