Skip to content

Releases: wmcmahan/cycgraph

@cycgraph/orchestrator@0.2.0

12 Jun 15:31

Choose a tag to compare

Minor Changes

  • 131e3d3: Architecture & API hygiene (Phase 6): tighten the public surface and close a status-resurrection hole.

    Status-transition guard (correctness). A shared guard now governs every status write (both the public set_status reducer and the internal lifecycle reducer). A run that has reached a terminal state (completed, failed, cancelled, timeout) can no longer be moved back to an active status — previously a stray set_status, or a replayed _init on a recovered run, could flip failedrunning and resurrect a dead run. Terminal→terminal transitions remain allowed for saga rollback (failed/timeoutcancelled). New exports: canTransitionStatus, isTerminalStatus, TERMINAL_STATUSES.

    Node-type executor registry. The 12-case dispatch switch in GraphRunner is replaced by a Record<NodeType, NodeExecutor> registry (runner/node-executors/registry.ts). Adding a node type is now a single registration that the compiler enforces is exhaustive, instead of shotgun edits across the runner. New exports: NODE_EXECUTORS, SUPPORTED_NODE_TYPES, getNodeExecutor, and the NodeExecutor type.

    Public API hygiene (BREAKING). Engine internals that were leaking through the root entry point are moved behind a new @cycgraph/orchestrator/internal subpath: internalReducer, StreamChannel, the filtrex condition internals (FILTREX_EXTRA_FUNCTIONS, FILTREX_COMPILE_OPTIONS, normalizeConditionExpression), and the low-level calculateBackoff / sleep helpers. They are no longer part of the semver contract — import them from @cycgraph/orchestrator/internal if you genuinely need them (first-party tooling only). The public condition evaluator evaluateCondition stays on the root. Wildcard export * of the reducers/helpers/conditions barrels is replaced with explicit named exports so the public surface is auditable.

    Dropped the phantom @cycgraph/context-engine peerDependency. The orchestrator integrates the context engine purely via an injected function type (ContextCompressor) and never imports the package, so the (optional) peer dependency was noise. Removed.

  • 131e3d3: Budget integrity (Phase 3): make every LLM call count toward budgets and stop runaway spend mid-loop.

    Supervisor spend is now tracked. Supervisor routing calls previously recorded NO token_usage on their handoff/completion actions, so every iteration's tokens were invisible to the token budget, cost budget, per-node budget, and usage records — on a 10-iteration loop that hid 100K–1M+ tokens. Handoff and completion actions now carry token_usage + model, so supervisor spend flows through the normal _track_tokens/_track_cost path.

    Supervisor prompt memory is byte-capped. The supervisor prompt embedded the full memory blob with no size limit, so a loop that re-reads memory every iteration grew ~quadratically. It now uses the same MAX_MEMORY_PROMPT_BYTES (50KB) cap as agent prompts.

    Composite nodes stop spending mid-loop. Per-node and workflow budgets were only checked AFTER a composite node's aggregated action returned — an evolution node ran its entire population × generations before the cap was even consulted. A new between-iteration budget guard (checkCompositeBudget) lets evolution and annealing stop early once accumulated token/cost spend crosses the node's budget or the remaining workflow budget. Evolution surfaces a {nodeId}_budget_stopped flag.

    Failed-attempt LLM spend is counted. A node that retries N times previously counted only the successful attempt's tokens. The agent executor now attaches best-effort partialUsage to AgentExecutionError/AgentTimeoutError, and the runner dispatches _track_tokens/_track_cost for each failed attempt — so a max_retries: 3 node can no longer hide up to ~4× its visible spend.

    Parallel task timeouts actually abort the LLM call. Evolution/voting/map passed executeParallel a per-task timeout signal that the callers ignored, wiring only the workflow signal — so a task_timeout_ms left the underlying streamText running in the background, burning uncounted tokens. The callers now combine both signals (combineAbortSignals), so a task timeout cancels the LLM call.

  • 131e3d3: Durability hardening (Phase 1): make crash recovery, idempotency, and multi-worker execution actually safe.

    Deterministic replay. Reducers now derive every timestamp (started_at, updated_at, approval deadlines, history entries) from action.metadata.timestamp instead of new Date(), so event-log replay reconstructs byte-identical state. applyHumanResponse logs its resume_from_human action durably (resumed runs previously lost the human decision). workflow_started carries a REPLAY_VERSION stamp recovery checks for reducer-semantics drift.

    State hydration. New hydrateWorkflowState() (barrel-exported) runs at every load boundary — coerces jsonb date strings back to Date, applies state_schema_version migrations, and refuses snapshots from a newer engine. Fixes the bug where a recovered HITL workflow compared new Date() >= waiting_timeout_at against a string (always false), so approval timeouts never fired after recovery.

    Authoritative event log. Appends are awaited behind a flush barrier before each state snapshot commits (events can no longer silently lag the snapshot they anchor). Duplicate (run_id, sequence_id) appends are rejected with the new EventSequenceConflictError instead of being silently dropped (Postgres) or duplicated (in-memory) — the two implementations now match. Recovery validates the log is gap-free (EventLogCorruptionError on a lost append) and the worker reconciles event-log replay against the latest snapshot, resuming from whichever reflects more progress.

    Unified idempotency. One key space (node_id:iteration) checked before execution; a node whose action was applied before a crash (post-reduce/pre-advance window, detected via the snapshot's new _last_event_sequence_id high-water mark) is skipped on resume instead of re-executed. MemoryWriter now receives an idempotency_key (run_id:node_id:iteration) so reflection facts stop duplicating in long-term memory on retry/recovery.

    Durable queue + run fencing. New DrizzleWorkflowQueue (migration 0014, workflow_jobs table) with FOR UPDATE SKIP LOCKED atomic claims. Every claim bumps a claim_epoch on the run; createFencedRunnerOptions(job) builds fenced persistence/event-log writers that reject stale-epoch writes with the new StaleClaimError — a reclaimed worker can no longer clobber the new claimant (split-brain). The worker emits job:claim_lost and leaves the job untouched. worker.stop() now hard-cancels runners past the grace period before releasing jobs, and shutdown-interrupted jobs stay active for visibility-timeout reclaim. InMemoryWorkflowQueue mirrors the epoch semantics for parity.

    New barrel exports: hydrateWorkflowState, CURRENT_STATE_SCHEMA_VERSION, REPLAY_VERSION, EventSequenceConflictError, StaleClaimError. New Postgres exports: DrizzleWorkflowQueue, createFencedRunnerOptions, DrizzlePersistenceProviderOptions, RunClaim, DrizzleEventLogWriterOptions.

  • 8f211cc: Eval-gated learning ("verified lessons"): lessons are now retained only if runs that used them verifiably score better.

    @cycgraph/orchestrator — lesson provenance. Retrieved memory facts can carry an id (MemoryRetrievalResult.facts[].id, optional and non-breaking). When present, the runner records which facts were injected into each node's prompt in an append-only memory._lesson_provenance registry (same replay-safe pattern as the taint registry; invisible to node StateViews). Voting and evolution forward provenance from every sub-agent — losing candidates count as trials too. New exports: getInjectedFactIds(state), getLessonProvenance(state), getLessonProvenanceRegistry(memory), plus the LessonProvenanceEntry / LessonProvenanceRegistry types. Known v1 limitation: supervisor-node retrieval is not provenance-tracked.

    @cycgraph/memory — outcome ledger, retention gate, gated retrieval. New OutcomeLedger interface + InMemoryOutcomeLedger (recordOutcome({ run_id, score, fact_ids }), per-fact trial stats, leave-one-out baselines). New evaluateRetention(store, ledger, policy) promotes candidate-tagged lessons that lift outcomes past promote_margin (tag rewritten to verified), soft-evicts harmful ones (invalidated_by: 'eval-gate:harmful'), and retires no-lift candidates at max_trials — including ones deadlocked on an empty leave-one-out baseline. New retrieveGatedLessons(store, options) fills the prompt budget verified-first with candidate exploration slots, selected in-progress-first via the ledger, with a rest_after_trials bench phase so fully-trialled candidates create the absence runs their baseline needs.

    Runnable adversarial demo at packages/evals/examples/eval-gated-learning/: three deliberately poisoned lessons crater a run and the gate evicts all three on outcome evidence alone, two runs after injection.

  • 027be81: Fix: evolution_config.elite_count is now actually implemented (it was a no-op).

    The schema and validator advertised elite_count and rejected elite_count >= population_size, but the executor never used it — every generation was bred entirely from scratch, so the per-generation best fitness could dip when a noisy generation produced worse candidates than the last.

    Elitism now works as documented: the top elite_count candidates of each generation are carried forward unchanged into the next generation's pool — not re-generated and not re-scored. Two consequences:

    • Monotonic fitness. The best-so-far re-enters every subsequent pool, so the next generation's best is always ≥ the current one. ${node}_fitness_history never dips. (Set elite_count: 0 to opt o...
Read more

@cycgraph/orchestrator-postgres@0.2.0

12 Jun 15:31

Choose a tag to compare

Minor Changes

  • 131e3d3: Durability hardening (Phase 1): make crash recovery, idempotency, and multi-worker execution actually safe.

    Deterministic replay. Reducers now derive every timestamp (started_at, updated_at, approval deadlines, history entries) from action.metadata.timestamp instead of new Date(), so event-log replay reconstructs byte-identical state. applyHumanResponse logs its resume_from_human action durably (resumed runs previously lost the human decision). workflow_started carries a REPLAY_VERSION stamp recovery checks for reducer-semantics drift.

    State hydration. New hydrateWorkflowState() (barrel-exported) runs at every load boundary — coerces jsonb date strings back to Date, applies state_schema_version migrations, and refuses snapshots from a newer engine. Fixes the bug where a recovered HITL workflow compared new Date() >= waiting_timeout_at against a string (always false), so approval timeouts never fired after recovery.

    Authoritative event log. Appends are awaited behind a flush barrier before each state snapshot commits (events can no longer silently lag the snapshot they anchor). Duplicate (run_id, sequence_id) appends are rejected with the new EventSequenceConflictError instead of being silently dropped (Postgres) or duplicated (in-memory) — the two implementations now match. Recovery validates the log is gap-free (EventLogCorruptionError on a lost append) and the worker reconciles event-log replay against the latest snapshot, resuming from whichever reflects more progress.

    Unified idempotency. One key space (node_id:iteration) checked before execution; a node whose action was applied before a crash (post-reduce/pre-advance window, detected via the snapshot's new _last_event_sequence_id high-water mark) is skipped on resume instead of re-executed. MemoryWriter now receives an idempotency_key (run_id:node_id:iteration) so reflection facts stop duplicating in long-term memory on retry/recovery.

    Durable queue + run fencing. New DrizzleWorkflowQueue (migration 0014, workflow_jobs table) with FOR UPDATE SKIP LOCKED atomic claims. Every claim bumps a claim_epoch on the run; createFencedRunnerOptions(job) builds fenced persistence/event-log writers that reject stale-epoch writes with the new StaleClaimError — a reclaimed worker can no longer clobber the new claimant (split-brain). The worker emits job:claim_lost and leaves the job untouched. worker.stop() now hard-cancels runners past the grace period before releasing jobs, and shutdown-interrupted jobs stay active for visibility-timeout reclaim. InMemoryWorkflowQueue mirrors the epoch semantics for parity.

    New barrel exports: hydrateWorkflowState, CURRENT_STATE_SCHEMA_VERSION, REPLAY_VERSION, EventSequenceConflictError, StaleClaimError. New Postgres exports: DrizzleWorkflowQueue, createFencedRunnerOptions, DrizzlePersistenceProviderOptions, RunClaim, DrizzleEventLogWriterOptions.

  • 131e3d3: Performance & scale (Phase 5): cut the cost of the hot paths and add the knobs to keep a long/large run bounded.

    Tag-filtered fact retrieval is now an index lookup, not a table scan. FactFilter gained a tags field; the hierarchical retriever pushes the reflection-loop's tag filter into the store instead of paging the whole table and filtering client-side. The Postgres store resolves it via tags ?| array[...] backed by a new GIN index on memory_facts.tags (migration 0015) and now applies a deterministic ORDER BY valid_from DESC, id so LIMIT/OFFSET pagination is stable. The in-memory store honors the same tags filter (insertion-ordered, already stable). Run 0015_add_memory_facts_tags_gin before relying on tag retrieval at scale — on a large live table prefer CREATE INDEX CONCURRENTLY out-of-band.

    Evolution scores candidates in parallel (bounded by the existing max_concurrency) instead of one evaluator call at a time — a generation now takes ~one evaluation's wall-clock, not N. It also stores per-candidate fitness summaries in ${node}_population (index/fitness/reasoning) rather than every candidate's full output (the winner's full output already lives in ${node}_winner), shrinking state and every checkpoint.

    Memory retrieval is bounded and batched. extractSubgraph gained a max_entities cap (default DEFAULT_MAX_SUBGRAPH_ENTITIES = 500) so a dense graph can't expand the BFS frontier near-exponentially, and it batch-fetches visited entities (getEntities) instead of one round-trip each.

    Sanitize-after-truncate in prompt building. Injection-sanitization is now the last transformation before memory/retrieved-memory is embedded — applied to exactly the bytes that reach the prompt (and to compressor output, which is now also byte-capped). Closes the window where truncating after sanitizing could leave a partial boundary artifact, and stops wasting sanitization on bytes that get dropped.

    Delta tracker no longer loses patches on a failed persist. computeDelta advances its baseline optimistically but stashes the prior baseline; the persistence coordinator calls the new rollback() if the write throws, so the next delta diffs against the last durably persisted state (no lost changes, no skipped version numbers).

    Auto-compaction is on by default. GraphRunnerOptions.compaction_interval now defaults to DEFAULT_COMPACTION_INTERVAL = 1000 (was 0/disabled) when an eventLog is wired, so a long run can't grow the event log without bound. Compaction is recovery-safe (checkpoint + loadEventsAfter). Set compaction_interval: 0 to retain full history and compact manually. The snapshot-resume idempotency rebuild is now checkpoint-aware — it loads only the tail after the latest checkpoint instead of the entire event history.

    New RateLimiter port. Inject GraphRunnerOptions.rateLimiter to pace LLM calls inside a provider's budget — awaited before every agent/supervisor/evaluator call at a single chokepoint (the implementation may delay to throttle or throw to reject; abortable; propagated into subgraphs). New exports: RateLimiter, RateLimitRequest, RateLimitCallKind.

    Per-server MCP concurrency limit. MCPConnectionManager accepts default_max_concurrent_calls, and MCPServerEntry gained max_concurrent_calls, bounding in-flight tool calls per server (via a FIFO semaphore) so a wide fan-out can't overwhelm one MCP server. Defaults to unlimited for compatibility.

  • 131e3d3: Security hardening (Phase 2): close the gaps between the documented security model and what the code enforced.

    Architect publish is validated and gateable. architect_publish_workflow now runs GraphSchema.parse + validateGraph before persisting — a prompt-injected or buggy agent can no longer publish an unvalidated executable graph (wildcard reads, unbounded fan-out, arbitrary tool wiring). New optional ArchitectToolDeps.canPublish gate lets the host require human approval / a privileged credential before any publish.

    MCP registry is re-validated at the trust boundary + SSRF guard. Both InMemoryMCPServerRegistry and DrizzleMCPServerRegistry now MCPServerEntrySchema.parse on save AND load — the stdio command allowlist and URL checks are enforced for real, not just at compile time, closing a host-RCE path. Transport URLs (http/sse) are blocked from pointing at private / loopback / link-local / cloud-metadata addresses (SSRF). Escape hatch for local dev: CYCGRAPH_ALLOW_PRIVATE_MCP_URLS=true.

    Taint tracking holes fixed. (1) Standalone tool nodes now taint their MCP output — previously external data was written to memory untainted, defeating taint-aware routing. (2) Concurrent executions (voting/evolution/map) no longer cross-attribute taint: each resolveTools() gets its own collector, drained via drainTaintEntries(tools). (3) _taint_registry is now append-only through reducers — a crafted update_memory: { _taint_registry: {} } can no longer clear taint to launder untrusted data as trusted.

    read_keys defaults to least privilege (BREAKING). Node read_keys now defaults to [] instead of ['*']. A node sees only goal/constraints plus the memory keys it explicitly lists — state slicing is on by default. Nodes that read upstream outputs must declare them (e.g. read_keys: ['research_notes']). validateGraph warns on any node using ['*']. The architect prompt/schema emit explicit, scoped keys.

    Resource bounds (DoS guards). Added upper bounds to every fan-out/iteration knob: population_size ≤ 100, max_generations ≤ 100, max_concurrency ≤ 50, voter_agent_ids ≤ 50, supervisor/annealing max_iterations ≤ 1000. Subgraph nesting is capped at depth 32 (a chain of distinct subgraphs previously recursed to OOM), and subgraphs now inherit the parent's guardrails (toolResolver, factSanitizer, memoryWriter, modelResolver, etc.) instead of running with reduced guarantees.

    Reflection facts are sanitized + fail-closed. Fact content is injection-sanitized before persistence, closing a cross-run stored-injection channel (tainted content → distilled fact → retrieved into a future run's prompt). factSanitizer now FAILS CLOSED by default: a thrown sanitizer (downed PII service, buggy regex) drops the fact instead of persisting it unredacted. New GraphRunnerOptions.factSanitizerFailMode: 'drop' | 'pass' (default 'drop'); set 'pass' to restore the old fail-open behavior.

    New exports: ArchitectToolDeps.canPublish, GraphRunnerOptions.factSanitizerFailMode.

  • First stable release — the "verified lessons" release. Workflows learn from every run (reflection → memory → retrieval), and lessons survive only if runs that used them verifiably scored better: lesson provenance in the runner, an outcome ledger, and a statistically-controlled retention gate (Welch inference, FDR control, sequential alp...

Read more

@cycgraph/memory@0.2.0

12 Jun 15:31

Choose a tag to compare

Minor Changes

  • 8f211cc: Eval-gated learning ("verified lessons"): lessons are now retained only if runs that used them verifiably score better.

    @cycgraph/orchestrator — lesson provenance. Retrieved memory facts can carry an id (MemoryRetrievalResult.facts[].id, optional and non-breaking). When present, the runner records which facts were injected into each node's prompt in an append-only memory._lesson_provenance registry (same replay-safe pattern as the taint registry; invisible to node StateViews). Voting and evolution forward provenance from every sub-agent — losing candidates count as trials too. New exports: getInjectedFactIds(state), getLessonProvenance(state), getLessonProvenanceRegistry(memory), plus the LessonProvenanceEntry / LessonProvenanceRegistry types. Known v1 limitation: supervisor-node retrieval is not provenance-tracked.

    @cycgraph/memory — outcome ledger, retention gate, gated retrieval. New OutcomeLedger interface + InMemoryOutcomeLedger (recordOutcome({ run_id, score, fact_ids }), per-fact trial stats, leave-one-out baselines). New evaluateRetention(store, ledger, policy) promotes candidate-tagged lessons that lift outcomes past promote_margin (tag rewritten to verified), soft-evicts harmful ones (invalidated_by: 'eval-gate:harmful'), and retires no-lift candidates at max_trials — including ones deadlocked on an empty leave-one-out baseline. New retrieveGatedLessons(store, options) fills the prompt budget verified-first with candidate exploration slots, selected in-progress-first via the ledger, with a rest_after_trials bench phase so fully-trialled candidates create the absence runs their baseline needs.

    Runnable adversarial demo at packages/evals/examples/eval-gated-learning/: three deliberately poisoned lessons crater a run and the gate evicts all three on outcome evidence alone, two runs after injection.

  • 131e3d3: Performance & scale (Phase 5): cut the cost of the hot paths and add the knobs to keep a long/large run bounded.

    Tag-filtered fact retrieval is now an index lookup, not a table scan. FactFilter gained a tags field; the hierarchical retriever pushes the reflection-loop's tag filter into the store instead of paging the whole table and filtering client-side. The Postgres store resolves it via tags ?| array[...] backed by a new GIN index on memory_facts.tags (migration 0015) and now applies a deterministic ORDER BY valid_from DESC, id so LIMIT/OFFSET pagination is stable. The in-memory store honors the same tags filter (insertion-ordered, already stable). Run 0015_add_memory_facts_tags_gin before relying on tag retrieval at scale — on a large live table prefer CREATE INDEX CONCURRENTLY out-of-band.

    Evolution scores candidates in parallel (bounded by the existing max_concurrency) instead of one evaluator call at a time — a generation now takes ~one evaluation's wall-clock, not N. It also stores per-candidate fitness summaries in ${node}_population (index/fitness/reasoning) rather than every candidate's full output (the winner's full output already lives in ${node}_winner), shrinking state and every checkpoint.

    Memory retrieval is bounded and batched. extractSubgraph gained a max_entities cap (default DEFAULT_MAX_SUBGRAPH_ENTITIES = 500) so a dense graph can't expand the BFS frontier near-exponentially, and it batch-fetches visited entities (getEntities) instead of one round-trip each.

    Sanitize-after-truncate in prompt building. Injection-sanitization is now the last transformation before memory/retrieved-memory is embedded — applied to exactly the bytes that reach the prompt (and to compressor output, which is now also byte-capped). Closes the window where truncating after sanitizing could leave a partial boundary artifact, and stops wasting sanitization on bytes that get dropped.

    Delta tracker no longer loses patches on a failed persist. computeDelta advances its baseline optimistically but stashes the prior baseline; the persistence coordinator calls the new rollback() if the write throws, so the next delta diffs against the last durably persisted state (no lost changes, no skipped version numbers).

    Auto-compaction is on by default. GraphRunnerOptions.compaction_interval now defaults to DEFAULT_COMPACTION_INTERVAL = 1000 (was 0/disabled) when an eventLog is wired, so a long run can't grow the event log without bound. Compaction is recovery-safe (checkpoint + loadEventsAfter). Set compaction_interval: 0 to retain full history and compact manually. The snapshot-resume idempotency rebuild is now checkpoint-aware — it loads only the tail after the latest checkpoint instead of the entire event history.

    New RateLimiter port. Inject GraphRunnerOptions.rateLimiter to pace LLM calls inside a provider's budget — awaited before every agent/supervisor/evaluator call at a single chokepoint (the implementation may delay to throttle or throw to reject; abortable; propagated into subgraphs). New exports: RateLimiter, RateLimitRequest, RateLimitCallKind.

    Per-server MCP concurrency limit. MCPConnectionManager accepts default_max_concurrent_calls, and MCPServerEntry gained max_concurrent_calls, bounding in-flight tool calls per server (via a FIFO semaphore) so a wide fan-out can't overwhelm one MCP server. Defaults to unlimited for compatibility.

  • d3641f2: Compound learning: reflection node type + MemoryWriter + tag-based retrieval.

    @cycgraph/orchestrator

    • New reflection node type that distills source_keys from workflow memory into atomic facts and persists them via an injected MemoryWriter. Two extractor variants:
      • rule_based — deterministic sentence-level extraction, no LLM call
      • llm — uses the new extractFactsExecutor primitive via a structured-output agent
    • New MemoryWriter adapter type on GraphRunnerOptions (mirrors MemoryRetriever).
    • New extractFactsExecutor primitive (sibling to evaluateQualityExecutor) for LLM-based fact distillation.
    • New memory_query directive on GraphNode — declares per-node retrieval (text / entity_ids / tags / max_facts). When set, the runner calls memoryRetriever before agent / supervisor prompt construction and renders results into a ## Relevant Memory section ahead of the workflow-state <data> block. Voting and evolution nodes propagate memory_query to synthetic sub-nodes automatically.
    • MemoryRetriever query type gained tags?: string[].
    • New errors: MemoryWriterMissingError (barrel-exported).
    • New types barrel-exported: MemoryWriter, MemoryWriterFact, MemoryWriterResult, FactExtractionResult, ReflectionConfig, MemoryQuery.

    @cycgraph/memory

    • SemanticFact.tags and MemoryQuery.tags fields (both default []).
    • New tag-only retrieval path in retrieveMemory() — list facts by tag, intersect tags, apply temporal validity, expand to themes and episodes. No embedding provider required.
    • Existing embedding and entity-based paths now also intersect with the tags filter.

    @cycgraph/orchestrator-postgres

    • New memory_facts.tags jsonb column (migration 0013_add_fact_tags).
    • DrizzleMemoryStore and DrizzleMemoryIndex row mappers updated to read/write tags.
  • First stable release — the "verified lessons" release. Workflows learn from every run (reflection → memory → retrieval), and lessons survive only if runs that used them verifiably scored better: lesson provenance in the runner, an outcome ledger, and a statistically-controlled retention gate (Welch inference, FDR control, sequential alpha-spending) with a shipping simulator to measure any policy's real detection and false-positive rates before trusting it. Guarded throughout by per-node budgets, taint tracking, least-privilege state slicing, and human-in-the-loop gates.

  • 40787be: Statistically honest retention gate + validation simulator. The eval-gating gate's new default decision_rule: 'inference' replaces the point-estimate margin comparison with a Welch-style test on the lift vs the leave-one-out baseline (Student-t, Welch–Satterthwaite df), Benjamini–Hochberg FDR control across candidates per pass, and alpha-spending over doubling baseline brackets so repeated gating cannot inflate false positives (the peeking problem — measured at 25% false decisions before this control, 0–2% after). The legacy behavior remains one flag away (decision_rule: 'margin').

    New RetentionPolicy fields: promote_confidence / evict_confidence (default 0.9), noise_floor_sd, multiple_comparison ('bh'|'none'), sequential_control ('doubling'|'none'), and max_baseline_runs — the baseline-side stopping rule that retires candidates the bracket penalty has made undecidable (required alongside rest_after_trials, which freezes trials so max_trials alone can never fire). RetentionReport entries now carry an evidence object (lift, se, df, p_promote, p_evict, trials, baseline_runs, alpha_bracket); promoted entries changed from string[] to { fact_id, evidence? }[]. FactStats/OutcomeBaseline gain variance (breaking for third-party OutcomeLedger implementers). retrieveGatedLessons gains rest_after_trials — candidates bench after enough trials, freeing slots and creating the absence runs their baseline needs.

    New validation module: simulateGate() and gateOperatingCharacteristics() drive the real store/ledger/retriever/gate pipeline with synthetic lessons of known effect — deterministic, sub-second — so any policy's detection and false-positive rates can be measured before trusting it (runnable example: packages/evals/examples/gate-operating-characteristics/). New dependency-free statistics utilities exported: studentTCdf, welchLift, benjaminiHochberg, normalQuantile, requiredTrials, mulberry32, gaussian.

Patch Changes...

Read more

@cycgraph/context-engine@0.1.0

12 Jun 15:31

Choose a tag to compare

@cycgraph/context-engine@0.1.0

@cycgraph/orchestrator-postgres@1.0.0-beta.9

12 Jun 15:08
18f1933

Choose a tag to compare

Patch Changes

  • Updated dependencies [40787be]
    • @cycgraph/memory@0.1.0-beta.6

@cycgraph/memory@0.1.0-beta.6

12 Jun 15:08
18f1933

Choose a tag to compare

Pre-release

Minor Changes

  • 40787be: Statistically honest retention gate + validation simulator. The eval-gating gate's new default decision_rule: 'inference' replaces the point-estimate margin comparison with a Welch-style test on the lift vs the leave-one-out baseline (Student-t, Welch–Satterthwaite df), Benjamini–Hochberg FDR control across candidates per pass, and alpha-spending over doubling baseline brackets so repeated gating cannot inflate false positives (the peeking problem — measured at 25% false decisions before this control, 0–2% after). The legacy behavior remains one flag away (decision_rule: 'margin').

    New RetentionPolicy fields: promote_confidence / evict_confidence (default 0.9), noise_floor_sd, multiple_comparison ('bh'|'none'), sequential_control ('doubling'|'none'), and max_baseline_runs — the baseline-side stopping rule that retires candidates the bracket penalty has made undecidable (required alongside rest_after_trials, which freezes trials so max_trials alone can never fire). RetentionReport entries now carry an evidence object (lift, se, df, p_promote, p_evict, trials, baseline_runs, alpha_bracket); promoted entries changed from string[] to { fact_id, evidence? }[]. FactStats/OutcomeBaseline gain variance (breaking for third-party OutcomeLedger implementers). retrieveGatedLessons gains rest_after_trials — candidates bench after enough trials, freeing slots and creating the absence runs their baseline needs.

    New validation module: simulateGate() and gateOperatingCharacteristics() drive the real store/ledger/retriever/gate pipeline with synthetic lessons of known effect — deterministic, sub-second — so any policy's detection and false-positive rates can be measured before trusting it (runnable example: packages/evals/examples/gate-operating-characteristics/). New dependency-free statistics utilities exported: studentTCdf, welchLift, benjaminiHochberg, normalQuantile, requiredTrials, mulberry32, gaussian.

@cycgraph/orchestrator@0.1.0-beta.8

11 Jun 21:25
50c25ec

Choose a tag to compare

Pre-release

Minor Changes

  • 8f211cc: Eval-gated learning ("verified lessons"): lessons are now retained only if runs that used them verifiably score better.

    @cycgraph/orchestrator — lesson provenance. Retrieved memory facts can carry an id (MemoryRetrievalResult.facts[].id, optional and non-breaking). When present, the runner records which facts were injected into each node's prompt in an append-only memory._lesson_provenance registry (same replay-safe pattern as the taint registry; invisible to node StateViews). Voting and evolution forward provenance from every sub-agent — losing candidates count as trials too. New exports: getInjectedFactIds(state), getLessonProvenance(state), getLessonProvenanceRegistry(memory), plus the LessonProvenanceEntry / LessonProvenanceRegistry types. Known v1 limitation: supervisor-node retrieval is not provenance-tracked.

    @cycgraph/memory — outcome ledger, retention gate, gated retrieval. New OutcomeLedger interface + InMemoryOutcomeLedger (recordOutcome({ run_id, score, fact_ids }), per-fact trial stats, leave-one-out baselines). New evaluateRetention(store, ledger, policy) promotes candidate-tagged lessons that lift outcomes past promote_margin (tag rewritten to verified), soft-evicts harmful ones (invalidated_by: 'eval-gate:harmful'), and retires no-lift candidates at max_trials — including ones deadlocked on an empty leave-one-out baseline. New retrieveGatedLessons(store, options) fills the prompt budget verified-first with candidate exploration slots, selected in-progress-first via the ledger, with a rest_after_trials bench phase so fully-trialled candidates create the absence runs their baseline needs.

    Runnable adversarial demo at packages/evals/examples/eval-gated-learning/: three deliberately poisoned lessons crater a run and the gate evicts all three on outcome evidence alone, two runs after injection.

Patch Changes

  • 8f211cc: Migrate off Anthropic model IDs retiring 2026-06-15. DEFAULT_AGENT_MODEL is now claude-sonnet-4-6 (was claude-sonnet-4-20250514); ANTHROPIC_MODELS gains claude-opus-4-8 and claude-sonnet-4-6 while keeping the deprecated IDs so existing persisted agent configs still validate; the pricing table gains claude-opus-4-8 ($5/$25 per MTok) and keeps historical entries so cost replay of old runs stays correct. All examples, docs, and test fixtures updated to the new IDs.

@cycgraph/orchestrator@0.1.0-beta.7

11 Jun 13:29
ffe231e

Choose a tag to compare

Pre-release

Patch Changes

  • 5617568: Upgrade OpenTelemetry to the current line and drop the protobufjs override (resolves a moderate advisory).

    The OTLP-HTTP and Prometheus exporters were on 0.217.0 and pulled protobufjs@8.0.x transitively (via @opentelemetry/otlp-transformer). A repo-wide protobufjs: ">=8.0.1" override pinned it to 8.0.3 — which is inside the vulnerable range of GHSA-jggg-4jg4-v7c6 (DoS via recursive JSON descriptor expansion, >=8.0.0 <8.2.0).

    @opentelemetry/otlp-transformer@0.219.0 no longer depends on protobufjs at all, so bumping the exporters removes that dependency edge. The only remaining protobufjs is 7.6.3 via @grpc/proto-loader (the gRPC log exporter bundled in sdk-node), which is outside the advisory range. With the override removed, npm audit --omit=dev reports 0 vulnerabilities.

    Bumped: @opentelemetry/exporter-prometheus, @opentelemetry/exporter-trace-otlp-http, @opentelemetry/sdk-node ^0.217.0^0.219.0; @opentelemetry/resources, @opentelemetry/sdk-metrics ^2.7.1^2.8.0. Removed the protobufjs entry from the root overrides.

@cycgraph/orchestrator@0.1.0-beta.6

11 Jun 01:06
576d680

Choose a tag to compare

Pre-release

Minor Changes

  • 027be81: Fix: evolution_config.elite_count is now actually implemented (it was a no-op).

    The schema and validator advertised elite_count and rejected elite_count >= population_size, but the executor never used it — every generation was bred entirely from scratch, so the per-generation best fitness could dip when a noisy generation produced worse candidates than the last.

    Elitism now works as documented: the top elite_count candidates of each generation are carried forward unchanged into the next generation's pool — not re-generated and not re-scored. Two consequences:

    • Monotonic fitness. The best-so-far re-enters every subsequent pool, so the next generation's best is always ≥ the current one. ${node}_fitness_history never dips. (Set elite_count: 0 to opt out and restore the old all-fresh behavior.)
    • Fewer LLM calls. A carried elite occupies a population slot without a generation or evaluation call, so each generation after the first issues population_size - elite_count candidate calls instead of population_size.

    elite_count defaults to 1, so this changes default evolution behavior. The carried candidate is tagged is_elite: true in the ${node}_population summary. elite_count is internally clamped to population_size - 1 so at least one fresh candidate is always generated.

@cycgraph/orchestrator-postgres@1.0.0-beta.8

11 Jun 21:25
50c25ec

Choose a tag to compare

Patch Changes

  • Updated dependencies [8f211cc]
  • Updated dependencies [8f211cc]
    • @cycgraph/orchestrator@0.1.0-beta.8
    • @cycgraph/memory@0.1.0-beta.5