Releases: wmcmahan/cycgraph
@cycgraph/orchestrator@0.2.0
Minor Changes
-
131e3d3: Architecture & API hygiene (Phase 6): tighten the public surface and close a status-resurrection hole.
Status-transition guard (correctness). A shared guard now governs every status write (both the public
set_statusreducer and the internal lifecycle reducer). A run that has reached a terminal state (completed,failed,cancelled,timeout) can no longer be moved back to an active status — previously a strayset_status, or a replayed_initon a recovered run, could flipfailed→runningand resurrect a dead run. Terminal→terminal transitions remain allowed for saga rollback (failed/timeout→cancelled). New exports:canTransitionStatus,isTerminalStatus,TERMINAL_STATUSES.Node-type executor registry. The 12-case dispatch
switchinGraphRunneris replaced by aRecord<NodeType, NodeExecutor>registry (runner/node-executors/registry.ts). Adding a node type is now a single registration that the compiler enforces is exhaustive, instead of shotgun edits across the runner. New exports:NODE_EXECUTORS,SUPPORTED_NODE_TYPES,getNodeExecutor, and theNodeExecutortype.Public API hygiene (BREAKING). Engine internals that were leaking through the root entry point are moved behind a new
@cycgraph/orchestrator/internalsubpath:internalReducer,StreamChannel, the filtrex condition internals (FILTREX_EXTRA_FUNCTIONS,FILTREX_COMPILE_OPTIONS,normalizeConditionExpression), and the low-levelcalculateBackoff/sleephelpers. They are no longer part of the semver contract — import them from@cycgraph/orchestrator/internalif you genuinely need them (first-party tooling only). The public condition evaluatorevaluateConditionstays on the root. Wildcardexport *of the reducers/helpers/conditions barrels is replaced with explicit named exports so the public surface is auditable.Dropped the phantom
@cycgraph/context-enginepeerDependency. The orchestrator integrates the context engine purely via an injected function type (ContextCompressor) and never imports the package, so the (optional) peer dependency was noise. Removed. -
131e3d3: Budget integrity (Phase 3): make every LLM call count toward budgets and stop runaway spend mid-loop.
Supervisor spend is now tracked. Supervisor routing calls previously recorded NO
token_usageon their handoff/completion actions, so every iteration's tokens were invisible to the token budget, cost budget, per-node budget, and usage records — on a 10-iteration loop that hid 100K–1M+ tokens. Handoff and completion actions now carrytoken_usage+model, so supervisor spend flows through the normal_track_tokens/_track_costpath.Supervisor prompt memory is byte-capped. The supervisor prompt embedded the full memory blob with no size limit, so a loop that re-reads memory every iteration grew ~quadratically. It now uses the same
MAX_MEMORY_PROMPT_BYTES(50KB) cap as agent prompts.Composite nodes stop spending mid-loop. Per-node and workflow budgets were only checked AFTER a composite node's aggregated action returned — an evolution node ran its entire population × generations before the cap was even consulted. A new between-iteration budget guard (
checkCompositeBudget) lets evolution and annealing stop early once accumulated token/cost spend crosses the node'sbudgetor the remaining workflow budget. Evolution surfaces a{nodeId}_budget_stoppedflag.Failed-attempt LLM spend is counted. A node that retries N times previously counted only the successful attempt's tokens. The agent executor now attaches best-effort
partialUsagetoAgentExecutionError/AgentTimeoutError, and the runner dispatches_track_tokens/_track_costfor each failed attempt — so amax_retries: 3node can no longer hide up to ~4× its visible spend.Parallel task timeouts actually abort the LLM call. Evolution/voting/map passed
executeParallela per-task timeout signal that the callers ignored, wiring only the workflow signal — so atask_timeout_msleft the underlyingstreamTextrunning in the background, burning uncounted tokens. The callers now combine both signals (combineAbortSignals), so a task timeout cancels the LLM call. -
131e3d3: Durability hardening (Phase 1): make crash recovery, idempotency, and multi-worker execution actually safe.
Deterministic replay. Reducers now derive every timestamp (
started_at,updated_at, approval deadlines, history entries) fromaction.metadata.timestampinstead ofnew Date(), so event-log replay reconstructs byte-identical state.applyHumanResponselogs itsresume_from_humanaction durably (resumed runs previously lost the human decision).workflow_startedcarries aREPLAY_VERSIONstamp recovery checks for reducer-semantics drift.State hydration. New
hydrateWorkflowState()(barrel-exported) runs at every load boundary — coerces jsonb date strings back toDate, appliesstate_schema_versionmigrations, and refuses snapshots from a newer engine. Fixes the bug where a recovered HITL workflow comparednew Date() >= waiting_timeout_atagainst a string (always false), so approval timeouts never fired after recovery.Authoritative event log. Appends are awaited behind a flush barrier before each state snapshot commits (events can no longer silently lag the snapshot they anchor). Duplicate
(run_id, sequence_id)appends are rejected with the newEventSequenceConflictErrorinstead of being silently dropped (Postgres) or duplicated (in-memory) — the two implementations now match. Recovery validates the log is gap-free (EventLogCorruptionErroron a lost append) and the worker reconciles event-log replay against the latest snapshot, resuming from whichever reflects more progress.Unified idempotency. One key space (
node_id:iteration) checked before execution; a node whose action was applied before a crash (post-reduce/pre-advance window, detected via the snapshot's new_last_event_sequence_idhigh-water mark) is skipped on resume instead of re-executed.MemoryWriternow receives anidempotency_key(run_id:node_id:iteration) so reflection facts stop duplicating in long-term memory on retry/recovery.Durable queue + run fencing. New
DrizzleWorkflowQueue(migration0014,workflow_jobstable) withFOR UPDATE SKIP LOCKEDatomic claims. Every claim bumps aclaim_epochon the run;createFencedRunnerOptions(job)builds fenced persistence/event-log writers that reject stale-epoch writes with the newStaleClaimError— a reclaimed worker can no longer clobber the new claimant (split-brain). The worker emitsjob:claim_lostand leaves the job untouched.worker.stop()now hard-cancels runners past the grace period before releasing jobs, and shutdown-interrupted jobs stayactivefor visibility-timeout reclaim.InMemoryWorkflowQueuemirrors the epoch semantics for parity.New barrel exports:
hydrateWorkflowState,CURRENT_STATE_SCHEMA_VERSION,REPLAY_VERSION,EventSequenceConflictError,StaleClaimError. New Postgres exports:DrizzleWorkflowQueue,createFencedRunnerOptions,DrizzlePersistenceProviderOptions,RunClaim,DrizzleEventLogWriterOptions. -
8f211cc: Eval-gated learning ("verified lessons"): lessons are now retained only if runs that used them verifiably score better.
@cycgraph/orchestrator — lesson provenance. Retrieved memory facts can carry an
id(MemoryRetrievalResult.facts[].id, optional and non-breaking). When present, the runner records which facts were injected into each node's prompt in an append-onlymemory._lesson_provenanceregistry (same replay-safe pattern as the taint registry; invisible to node StateViews). Voting and evolution forward provenance from every sub-agent — losing candidates count as trials too. New exports:getInjectedFactIds(state),getLessonProvenance(state),getLessonProvenanceRegistry(memory), plus theLessonProvenanceEntry/LessonProvenanceRegistrytypes. Known v1 limitation: supervisor-node retrieval is not provenance-tracked.@cycgraph/memory — outcome ledger, retention gate, gated retrieval. New
OutcomeLedgerinterface +InMemoryOutcomeLedger(recordOutcome({ run_id, score, fact_ids }), per-fact trial stats, leave-one-out baselines). NewevaluateRetention(store, ledger, policy)promotescandidate-tagged lessons that lift outcomes pastpromote_margin(tag rewritten toverified), soft-evicts harmful ones (invalidated_by: 'eval-gate:harmful'), and retires no-lift candidates atmax_trials— including ones deadlocked on an empty leave-one-out baseline. NewretrieveGatedLessons(store, options)fills the prompt budget verified-first with candidate exploration slots, selected in-progress-first via the ledger, with arest_after_trialsbench phase so fully-trialled candidates create the absence runs their baseline needs.Runnable adversarial demo at
packages/evals/examples/eval-gated-learning/: three deliberately poisoned lessons crater a run and the gate evicts all three on outcome evidence alone, two runs after injection. -
027be81: Fix:
evolution_config.elite_countis now actually implemented (it was a no-op).The schema and validator advertised
elite_countand rejectedelite_count >= population_size, but the executor never used it — every generation was bred entirely from scratch, so the per-generation best fitness could dip when a noisy generation produced worse candidates than the last.Elitism now works as documented: the top
elite_countcandidates of each generation are carried forward unchanged into the next generation's pool — not re-generated and not re-scored. Two consequences:- Monotonic fitness. The best-so-far re-enters every subsequent pool, so the next generation's best is always ≥ the current one.
${node}_fitness_historynever dips. (Setelite_count: 0to opt o...
- Monotonic fitness. The best-so-far re-enters every subsequent pool, so the next generation's best is always ≥ the current one.
@cycgraph/orchestrator-postgres@0.2.0
Minor Changes
-
131e3d3: Durability hardening (Phase 1): make crash recovery, idempotency, and multi-worker execution actually safe.
Deterministic replay. Reducers now derive every timestamp (
started_at,updated_at, approval deadlines, history entries) fromaction.metadata.timestampinstead ofnew Date(), so event-log replay reconstructs byte-identical state.applyHumanResponselogs itsresume_from_humanaction durably (resumed runs previously lost the human decision).workflow_startedcarries aREPLAY_VERSIONstamp recovery checks for reducer-semantics drift.State hydration. New
hydrateWorkflowState()(barrel-exported) runs at every load boundary — coerces jsonb date strings back toDate, appliesstate_schema_versionmigrations, and refuses snapshots from a newer engine. Fixes the bug where a recovered HITL workflow comparednew Date() >= waiting_timeout_atagainst a string (always false), so approval timeouts never fired after recovery.Authoritative event log. Appends are awaited behind a flush barrier before each state snapshot commits (events can no longer silently lag the snapshot they anchor). Duplicate
(run_id, sequence_id)appends are rejected with the newEventSequenceConflictErrorinstead of being silently dropped (Postgres) or duplicated (in-memory) — the two implementations now match. Recovery validates the log is gap-free (EventLogCorruptionErroron a lost append) and the worker reconciles event-log replay against the latest snapshot, resuming from whichever reflects more progress.Unified idempotency. One key space (
node_id:iteration) checked before execution; a node whose action was applied before a crash (post-reduce/pre-advance window, detected via the snapshot's new_last_event_sequence_idhigh-water mark) is skipped on resume instead of re-executed.MemoryWriternow receives anidempotency_key(run_id:node_id:iteration) so reflection facts stop duplicating in long-term memory on retry/recovery.Durable queue + run fencing. New
DrizzleWorkflowQueue(migration0014,workflow_jobstable) withFOR UPDATE SKIP LOCKEDatomic claims. Every claim bumps aclaim_epochon the run;createFencedRunnerOptions(job)builds fenced persistence/event-log writers that reject stale-epoch writes with the newStaleClaimError— a reclaimed worker can no longer clobber the new claimant (split-brain). The worker emitsjob:claim_lostand leaves the job untouched.worker.stop()now hard-cancels runners past the grace period before releasing jobs, and shutdown-interrupted jobs stayactivefor visibility-timeout reclaim.InMemoryWorkflowQueuemirrors the epoch semantics for parity.New barrel exports:
hydrateWorkflowState,CURRENT_STATE_SCHEMA_VERSION,REPLAY_VERSION,EventSequenceConflictError,StaleClaimError. New Postgres exports:DrizzleWorkflowQueue,createFencedRunnerOptions,DrizzlePersistenceProviderOptions,RunClaim,DrizzleEventLogWriterOptions. -
131e3d3: Performance & scale (Phase 5): cut the cost of the hot paths and add the knobs to keep a long/large run bounded.
Tag-filtered fact retrieval is now an index lookup, not a table scan.
FactFiltergained atagsfield; the hierarchical retriever pushes the reflection-loop's tag filter into the store instead of paging the whole table and filtering client-side. The Postgres store resolves it viatags ?| array[...]backed by a new GIN index onmemory_facts.tags(migration0015) and now applies a deterministicORDER BY valid_from DESC, idsoLIMIT/OFFSETpagination is stable. The in-memory store honors the sametagsfilter (insertion-ordered, already stable). Run0015_add_memory_facts_tags_ginbefore relying on tag retrieval at scale — on a large live table preferCREATE INDEX CONCURRENTLYout-of-band.Evolution scores candidates in parallel (bounded by the existing
max_concurrency) instead of one evaluator call at a time — a generation now takes ~one evaluation's wall-clock, not N. It also stores per-candidate fitness summaries in${node}_population(index/fitness/reasoning) rather than every candidate's full output (the winner's full output already lives in${node}_winner), shrinking state and every checkpoint.Memory retrieval is bounded and batched.
extractSubgraphgained amax_entitiescap (defaultDEFAULT_MAX_SUBGRAPH_ENTITIES = 500) so a dense graph can't expand the BFS frontier near-exponentially, and it batch-fetches visited entities (getEntities) instead of one round-trip each.Sanitize-after-truncate in prompt building. Injection-sanitization is now the last transformation before memory/retrieved-memory is embedded — applied to exactly the bytes that reach the prompt (and to compressor output, which is now also byte-capped). Closes the window where truncating after sanitizing could leave a partial boundary artifact, and stops wasting sanitization on bytes that get dropped.
Delta tracker no longer loses patches on a failed persist.
computeDeltaadvances its baseline optimistically but stashes the prior baseline; the persistence coordinator calls the newrollback()if the write throws, so the next delta diffs against the last durably persisted state (no lost changes, no skipped version numbers).Auto-compaction is on by default.
GraphRunnerOptions.compaction_intervalnow defaults toDEFAULT_COMPACTION_INTERVAL = 1000(was0/disabled) when aneventLogis wired, so a long run can't grow the event log without bound. Compaction is recovery-safe (checkpoint +loadEventsAfter). Setcompaction_interval: 0to retain full history and compact manually. The snapshot-resume idempotency rebuild is now checkpoint-aware — it loads only the tail after the latest checkpoint instead of the entire event history.New
RateLimiterport. InjectGraphRunnerOptions.rateLimiterto pace LLM calls inside a provider's budget — awaited before every agent/supervisor/evaluator call at a single chokepoint (the implementation may delay to throttle or throw to reject; abortable; propagated into subgraphs). New exports:RateLimiter,RateLimitRequest,RateLimitCallKind.Per-server MCP concurrency limit.
MCPConnectionManageracceptsdefault_max_concurrent_calls, andMCPServerEntrygainedmax_concurrent_calls, bounding in-flight tool calls per server (via a FIFO semaphore) so a wide fan-out can't overwhelm one MCP server. Defaults to unlimited for compatibility. -
131e3d3: Security hardening (Phase 2): close the gaps between the documented security model and what the code enforced.
Architect publish is validated and gateable.
architect_publish_workflownow runsGraphSchema.parse+validateGraphbefore persisting — a prompt-injected or buggy agent can no longer publish an unvalidated executable graph (wildcard reads, unbounded fan-out, arbitrary tool wiring). New optionalArchitectToolDeps.canPublishgate lets the host require human approval / a privileged credential before any publish.MCP registry is re-validated at the trust boundary + SSRF guard. Both
InMemoryMCPServerRegistryandDrizzleMCPServerRegistrynowMCPServerEntrySchema.parseon save AND load — the stdio command allowlist and URL checks are enforced for real, not just at compile time, closing a host-RCE path. Transport URLs (http/sse) are blocked from pointing at private / loopback / link-local / cloud-metadata addresses (SSRF). Escape hatch for local dev:CYCGRAPH_ALLOW_PRIVATE_MCP_URLS=true.Taint tracking holes fixed. (1) Standalone
toolnodes now taint their MCP output — previously external data was written to memory untainted, defeating taint-aware routing. (2) Concurrent executions (voting/evolution/map) no longer cross-attribute taint: eachresolveTools()gets its own collector, drained viadrainTaintEntries(tools). (3)_taint_registryis now append-only through reducers — a craftedupdate_memory: { _taint_registry: {} }can no longer clear taint to launder untrusted data as trusted.read_keysdefaults to least privilege (BREAKING). Noderead_keysnow defaults to[]instead of['*']. A node sees onlygoal/constraintsplus the memory keys it explicitly lists — state slicing is on by default. Nodes that read upstream outputs must declare them (e.g.read_keys: ['research_notes']).validateGraphwarns on any node using['*']. The architect prompt/schema emit explicit, scoped keys.Resource bounds (DoS guards). Added upper bounds to every fan-out/iteration knob:
population_size≤ 100,max_generations≤ 100,max_concurrency≤ 50,voter_agent_ids≤ 50, supervisor/annealingmax_iterations≤ 1000. Subgraph nesting is capped at depth 32 (a chain of distinct subgraphs previously recursed to OOM), and subgraphs now inherit the parent's guardrails (toolResolver, factSanitizer, memoryWriter, modelResolver, etc.) instead of running with reduced guarantees.Reflection facts are sanitized + fail-closed. Fact content is injection-sanitized before persistence, closing a cross-run stored-injection channel (tainted content → distilled fact → retrieved into a future run's prompt).
factSanitizernow FAILS CLOSED by default: a thrown sanitizer (downed PII service, buggy regex) drops the fact instead of persisting it unredacted. NewGraphRunnerOptions.factSanitizerFailMode: 'drop' | 'pass'(default'drop'); set'pass'to restore the old fail-open behavior.New exports:
ArchitectToolDeps.canPublish,GraphRunnerOptions.factSanitizerFailMode. -
First stable release — the "verified lessons" release. Workflows learn from every run (reflection → memory → retrieval), and lessons survive only if runs that used them verifiably scored better: lesson provenance in the runner, an outcome ledger, and a statistically-controlled retention gate (Welch inference, FDR control, sequential alp...
@cycgraph/memory@0.2.0
Minor Changes
-
8f211cc: Eval-gated learning ("verified lessons"): lessons are now retained only if runs that used them verifiably score better.
@cycgraph/orchestrator — lesson provenance. Retrieved memory facts can carry an
id(MemoryRetrievalResult.facts[].id, optional and non-breaking). When present, the runner records which facts were injected into each node's prompt in an append-onlymemory._lesson_provenanceregistry (same replay-safe pattern as the taint registry; invisible to node StateViews). Voting and evolution forward provenance from every sub-agent — losing candidates count as trials too. New exports:getInjectedFactIds(state),getLessonProvenance(state),getLessonProvenanceRegistry(memory), plus theLessonProvenanceEntry/LessonProvenanceRegistrytypes. Known v1 limitation: supervisor-node retrieval is not provenance-tracked.@cycgraph/memory — outcome ledger, retention gate, gated retrieval. New
OutcomeLedgerinterface +InMemoryOutcomeLedger(recordOutcome({ run_id, score, fact_ids }), per-fact trial stats, leave-one-out baselines). NewevaluateRetention(store, ledger, policy)promotescandidate-tagged lessons that lift outcomes pastpromote_margin(tag rewritten toverified), soft-evicts harmful ones (invalidated_by: 'eval-gate:harmful'), and retires no-lift candidates atmax_trials— including ones deadlocked on an empty leave-one-out baseline. NewretrieveGatedLessons(store, options)fills the prompt budget verified-first with candidate exploration slots, selected in-progress-first via the ledger, with arest_after_trialsbench phase so fully-trialled candidates create the absence runs their baseline needs.Runnable adversarial demo at
packages/evals/examples/eval-gated-learning/: three deliberately poisoned lessons crater a run and the gate evicts all three on outcome evidence alone, two runs after injection. -
131e3d3: Performance & scale (Phase 5): cut the cost of the hot paths and add the knobs to keep a long/large run bounded.
Tag-filtered fact retrieval is now an index lookup, not a table scan.
FactFiltergained atagsfield; the hierarchical retriever pushes the reflection-loop's tag filter into the store instead of paging the whole table and filtering client-side. The Postgres store resolves it viatags ?| array[...]backed by a new GIN index onmemory_facts.tags(migration0015) and now applies a deterministicORDER BY valid_from DESC, idsoLIMIT/OFFSETpagination is stable. The in-memory store honors the sametagsfilter (insertion-ordered, already stable). Run0015_add_memory_facts_tags_ginbefore relying on tag retrieval at scale — on a large live table preferCREATE INDEX CONCURRENTLYout-of-band.Evolution scores candidates in parallel (bounded by the existing
max_concurrency) instead of one evaluator call at a time — a generation now takes ~one evaluation's wall-clock, not N. It also stores per-candidate fitness summaries in${node}_population(index/fitness/reasoning) rather than every candidate's full output (the winner's full output already lives in${node}_winner), shrinking state and every checkpoint.Memory retrieval is bounded and batched.
extractSubgraphgained amax_entitiescap (defaultDEFAULT_MAX_SUBGRAPH_ENTITIES = 500) so a dense graph can't expand the BFS frontier near-exponentially, and it batch-fetches visited entities (getEntities) instead of one round-trip each.Sanitize-after-truncate in prompt building. Injection-sanitization is now the last transformation before memory/retrieved-memory is embedded — applied to exactly the bytes that reach the prompt (and to compressor output, which is now also byte-capped). Closes the window where truncating after sanitizing could leave a partial boundary artifact, and stops wasting sanitization on bytes that get dropped.
Delta tracker no longer loses patches on a failed persist.
computeDeltaadvances its baseline optimistically but stashes the prior baseline; the persistence coordinator calls the newrollback()if the write throws, so the next delta diffs against the last durably persisted state (no lost changes, no skipped version numbers).Auto-compaction is on by default.
GraphRunnerOptions.compaction_intervalnow defaults toDEFAULT_COMPACTION_INTERVAL = 1000(was0/disabled) when aneventLogis wired, so a long run can't grow the event log without bound. Compaction is recovery-safe (checkpoint +loadEventsAfter). Setcompaction_interval: 0to retain full history and compact manually. The snapshot-resume idempotency rebuild is now checkpoint-aware — it loads only the tail after the latest checkpoint instead of the entire event history.New
RateLimiterport. InjectGraphRunnerOptions.rateLimiterto pace LLM calls inside a provider's budget — awaited before every agent/supervisor/evaluator call at a single chokepoint (the implementation may delay to throttle or throw to reject; abortable; propagated into subgraphs). New exports:RateLimiter,RateLimitRequest,RateLimitCallKind.Per-server MCP concurrency limit.
MCPConnectionManageracceptsdefault_max_concurrent_calls, andMCPServerEntrygainedmax_concurrent_calls, bounding in-flight tool calls per server (via a FIFO semaphore) so a wide fan-out can't overwhelm one MCP server. Defaults to unlimited for compatibility. -
d3641f2: Compound learning:
reflectionnode type +MemoryWriter+ tag-based retrieval.@cycgraph/orchestrator
- New
reflectionnode type that distillssource_keysfrom workflow memory into atomic facts and persists them via an injectedMemoryWriter. Two extractor variants:rule_based— deterministic sentence-level extraction, no LLM callllm— uses the newextractFactsExecutorprimitive via a structured-output agent
- New
MemoryWriteradapter type onGraphRunnerOptions(mirrorsMemoryRetriever). - New
extractFactsExecutorprimitive (sibling toevaluateQualityExecutor) for LLM-based fact distillation. - New
memory_querydirective onGraphNode— declares per-node retrieval (text / entity_ids / tags / max_facts). When set, the runner callsmemoryRetrieverbefore agent / supervisor prompt construction and renders results into a## Relevant Memorysection ahead of the workflow-state<data>block. Voting and evolution nodes propagatememory_queryto synthetic sub-nodes automatically. MemoryRetrieverquery type gainedtags?: string[].- New errors:
MemoryWriterMissingError(barrel-exported). - New types barrel-exported:
MemoryWriter,MemoryWriterFact,MemoryWriterResult,FactExtractionResult,ReflectionConfig,MemoryQuery.
@cycgraph/memory
SemanticFact.tagsandMemoryQuery.tagsfields (both default[]).- New tag-only retrieval path in
retrieveMemory()— list facts by tag, intersect tags, apply temporal validity, expand to themes and episodes. No embedding provider required. - Existing embedding and entity-based paths now also intersect with the
tagsfilter.
@cycgraph/orchestrator-postgres
- New
memory_facts.tagsjsonbcolumn (migration0013_add_fact_tags). DrizzleMemoryStoreandDrizzleMemoryIndexrow mappers updated to read/writetags.
- New
-
First stable release — the "verified lessons" release. Workflows learn from every run (reflection → memory → retrieval), and lessons survive only if runs that used them verifiably scored better: lesson provenance in the runner, an outcome ledger, and a statistically-controlled retention gate (Welch inference, FDR control, sequential alpha-spending) with a shipping simulator to measure any policy's real detection and false-positive rates before trusting it. Guarded throughout by per-node budgets, taint tracking, least-privilege state slicing, and human-in-the-loop gates.
-
40787be: Statistically honest retention gate + validation simulator. The eval-gating gate's new default
decision_rule: 'inference'replaces the point-estimate margin comparison with a Welch-style test on the lift vs the leave-one-out baseline (Student-t, Welch–Satterthwaite df), Benjamini–Hochberg FDR control across candidates per pass, and alpha-spending over doubling baseline brackets so repeated gating cannot inflate false positives (the peeking problem — measured at 25% false decisions before this control, 0–2% after). The legacy behavior remains one flag away (decision_rule: 'margin').New
RetentionPolicyfields:promote_confidence/evict_confidence(default 0.9),noise_floor_sd,multiple_comparison('bh'|'none'),sequential_control('doubling'|'none'), andmax_baseline_runs— the baseline-side stopping rule that retires candidates the bracket penalty has made undecidable (required alongsiderest_after_trials, which freezes trials somax_trialsalone can never fire).RetentionReportentries now carry anevidenceobject (lift,se,df,p_promote,p_evict,trials,baseline_runs,alpha_bracket);promotedentries changed fromstring[]to{ fact_id, evidence? }[].FactStats/OutcomeBaselinegainvariance(breaking for third-partyOutcomeLedgerimplementers).retrieveGatedLessonsgainsrest_after_trials— candidates bench after enough trials, freeing slots and creating the absence runs their baseline needs.New validation module:
simulateGate()andgateOperatingCharacteristics()drive the real store/ledger/retriever/gate pipeline with synthetic lessons of known effect — deterministic, sub-second — so any policy's detection and false-positive rates can be measured before trusting it (runnable example:packages/evals/examples/gate-operating-characteristics/). New dependency-free statistics utilities exported:studentTCdf,welchLift,benjaminiHochberg,normalQuantile,requiredTrials,mulberry32,gaussian.
Patch Changes...
@cycgraph/context-engine@0.1.0
@cycgraph/context-engine@0.1.0
@cycgraph/orchestrator-postgres@1.0.0-beta.9
Patch Changes
- Updated dependencies [40787be]
- @cycgraph/memory@0.1.0-beta.6
@cycgraph/memory@0.1.0-beta.6
Minor Changes
-
40787be: Statistically honest retention gate + validation simulator. The eval-gating gate's new default
decision_rule: 'inference'replaces the point-estimate margin comparison with a Welch-style test on the lift vs the leave-one-out baseline (Student-t, Welch–Satterthwaite df), Benjamini–Hochberg FDR control across candidates per pass, and alpha-spending over doubling baseline brackets so repeated gating cannot inflate false positives (the peeking problem — measured at 25% false decisions before this control, 0–2% after). The legacy behavior remains one flag away (decision_rule: 'margin').New
RetentionPolicyfields:promote_confidence/evict_confidence(default 0.9),noise_floor_sd,multiple_comparison('bh'|'none'),sequential_control('doubling'|'none'), andmax_baseline_runs— the baseline-side stopping rule that retires candidates the bracket penalty has made undecidable (required alongsiderest_after_trials, which freezes trials somax_trialsalone can never fire).RetentionReportentries now carry anevidenceobject (lift,se,df,p_promote,p_evict,trials,baseline_runs,alpha_bracket);promotedentries changed fromstring[]to{ fact_id, evidence? }[].FactStats/OutcomeBaselinegainvariance(breaking for third-partyOutcomeLedgerimplementers).retrieveGatedLessonsgainsrest_after_trials— candidates bench after enough trials, freeing slots and creating the absence runs their baseline needs.New validation module:
simulateGate()andgateOperatingCharacteristics()drive the real store/ledger/retriever/gate pipeline with synthetic lessons of known effect — deterministic, sub-second — so any policy's detection and false-positive rates can be measured before trusting it (runnable example:packages/evals/examples/gate-operating-characteristics/). New dependency-free statistics utilities exported:studentTCdf,welchLift,benjaminiHochberg,normalQuantile,requiredTrials,mulberry32,gaussian.
@cycgraph/orchestrator@0.1.0-beta.8
Minor Changes
-
8f211cc: Eval-gated learning ("verified lessons"): lessons are now retained only if runs that used them verifiably score better.
@cycgraph/orchestrator — lesson provenance. Retrieved memory facts can carry an
id(MemoryRetrievalResult.facts[].id, optional and non-breaking). When present, the runner records which facts were injected into each node's prompt in an append-onlymemory._lesson_provenanceregistry (same replay-safe pattern as the taint registry; invisible to node StateViews). Voting and evolution forward provenance from every sub-agent — losing candidates count as trials too. New exports:getInjectedFactIds(state),getLessonProvenance(state),getLessonProvenanceRegistry(memory), plus theLessonProvenanceEntry/LessonProvenanceRegistrytypes. Known v1 limitation: supervisor-node retrieval is not provenance-tracked.@cycgraph/memory — outcome ledger, retention gate, gated retrieval. New
OutcomeLedgerinterface +InMemoryOutcomeLedger(recordOutcome({ run_id, score, fact_ids }), per-fact trial stats, leave-one-out baselines). NewevaluateRetention(store, ledger, policy)promotescandidate-tagged lessons that lift outcomes pastpromote_margin(tag rewritten toverified), soft-evicts harmful ones (invalidated_by: 'eval-gate:harmful'), and retires no-lift candidates atmax_trials— including ones deadlocked on an empty leave-one-out baseline. NewretrieveGatedLessons(store, options)fills the prompt budget verified-first with candidate exploration slots, selected in-progress-first via the ledger, with arest_after_trialsbench phase so fully-trialled candidates create the absence runs their baseline needs.Runnable adversarial demo at
packages/evals/examples/eval-gated-learning/: three deliberately poisoned lessons crater a run and the gate evicts all three on outcome evidence alone, two runs after injection.
Patch Changes
- 8f211cc: Migrate off Anthropic model IDs retiring 2026-06-15.
DEFAULT_AGENT_MODELis nowclaude-sonnet-4-6(wasclaude-sonnet-4-20250514);ANTHROPIC_MODELSgainsclaude-opus-4-8andclaude-sonnet-4-6while keeping the deprecated IDs so existing persisted agent configs still validate; the pricing table gainsclaude-opus-4-8($5/$25 per MTok) and keeps historical entries so cost replay of old runs stays correct. All examples, docs, and test fixtures updated to the new IDs.
@cycgraph/orchestrator@0.1.0-beta.7
Patch Changes
-
5617568: Upgrade OpenTelemetry to the current line and drop the
protobufjsoverride (resolves a moderate advisory).The OTLP-HTTP and Prometheus exporters were on
0.217.0and pulledprotobufjs@8.0.xtransitively (via@opentelemetry/otlp-transformer). A repo-wideprotobufjs: ">=8.0.1"override pinned it to8.0.3— which is inside the vulnerable range of GHSA-jggg-4jg4-v7c6 (DoS via recursive JSON descriptor expansion,>=8.0.0 <8.2.0).@opentelemetry/otlp-transformer@0.219.0no longer depends onprotobufjsat all, so bumping the exporters removes that dependency edge. The only remainingprotobufjsis7.6.3via@grpc/proto-loader(the gRPC log exporter bundled insdk-node), which is outside the advisory range. With the override removed,npm audit --omit=devreports 0 vulnerabilities.Bumped:
@opentelemetry/exporter-prometheus,@opentelemetry/exporter-trace-otlp-http,@opentelemetry/sdk-node^0.217.0→^0.219.0;@opentelemetry/resources,@opentelemetry/sdk-metrics^2.7.1→^2.8.0. Removed theprotobufjsentry from the rootoverrides.
@cycgraph/orchestrator@0.1.0-beta.6
Minor Changes
-
027be81: Fix:
evolution_config.elite_countis now actually implemented (it was a no-op).The schema and validator advertised
elite_countand rejectedelite_count >= population_size, but the executor never used it — every generation was bred entirely from scratch, so the per-generation best fitness could dip when a noisy generation produced worse candidates than the last.Elitism now works as documented: the top
elite_countcandidates of each generation are carried forward unchanged into the next generation's pool — not re-generated and not re-scored. Two consequences:- Monotonic fitness. The best-so-far re-enters every subsequent pool, so the next generation's best is always ≥ the current one.
${node}_fitness_historynever dips. (Setelite_count: 0to opt out and restore the old all-fresh behavior.) - Fewer LLM calls. A carried elite occupies a population slot without a generation or evaluation call, so each generation after the first issues
population_size - elite_countcandidate calls instead ofpopulation_size.
elite_countdefaults to1, so this changes default evolution behavior. The carried candidate is taggedis_elite: truein the${node}_populationsummary.elite_countis internally clamped topopulation_size - 1so at least one fresh candidate is always generated. - Monotonic fitness. The best-so-far re-enters every subsequent pool, so the next generation's best is always ≥ the current one.