| status | implemented |
|---|
improve is the board-level reasoning and stabilization lane.
It is responsible for:
- interpreting blocked work
- detecting recurring failure patterns
- creating the right bounded follow-up tasks
- routing unclear cases to human attention
- scanning for merge conflicts, stale PRs, and review requests
- detecting post-merge CI regressions
Its default rule is:
tasks first, code second
Priority order:
- blocked tasks
- recent worker comments/results
- retained summaries for recent failures
- explicit
task-kind: improvetasks - open PR state (merge conflicts, stale PRs, review feedback)
Blocked tasks are classified into a small fixed set:
infra_toolingvalidation_failureflaky_testscope_policyparse_configverification_failurecontext_limitdependency_missingawaiting_inputunknown
Additionally, pre_exec_rejected is used by the goal watcher when a task fails pre-execution validation before any Kodo run begins.
awaiting_input is set when Kodo embeds a <!-- cp:question: ... --> HTML comment in its output, indicating it stopped to ask a clarifying question rather than proceeding with an assumption. The improve watcher scans for human replies every 8 cycles and re-queues the task automatically once a reply is detected.
This vocabulary is shared across board comments, watcher logs, and retained summaries.
Improve uses a small internal triage result with:
classificationcertaintyreason_summaryrecommended_actionhuman_attention_required- optional bounded follow-up task spec
This keeps improve behavior explainable and testable.
- add a triage comment to a blocked task
- create one bounded follow-up
goaltask - create one bounded follow-up
testtask - create one explicit
improvetask when more analysis is warranted - leave a task marked for human attention
- create
[Rebase]tasks for PRs with merge conflicts - create
[Revise]tasks for PRs withCHANGES_REQUESTEDreviews - close stale PRs and requeue originating tasks to Backlog
- create regression tasks for post-merge CI failures
All scans run on the primary slot only when parallel_slots > 1.
| Scan | Frequency | What it does |
|---|---|---|
| Review revision | every 3 cycles | Detects CHANGES_REQUESTED reviews on open PRs; creates [Revise] tasks |
| Merge conflict | every 5 cycles | Checks mergeable==false; attempts rebase; creates [Rebase] task on failure |
| Post-merge regression | every 10 cycles | Checks merged PR CI status; creates regression tasks for failures |
| Feedback loop scan | every 15 cycles | Checks Done tasks' PR state on GitHub; auto-records merged/closed outcomes to state/proposal_feedback/; captures human rejections from Cancelled autonomy tasks (see below) |
| Stale PR TTL | every 20 cycles | Closes PRs older than stale_pr_days (default 7); requeues to Backlog |
| Workspace health | every 25 cycles | Verifies venv python per repo; attempts bootstrap repair on failure; creates [Workspace] task if repair fails |
| Stale autonomy scan | every 30 cycles | Cancels autonomy-proposed Backlog tasks older than 21 days whose signal is stale |
| Awaiting-input scan | every 8 cycles | Finds blocked awaiting_input tasks, checks for human reply, injects answer and re-queues |
| Priority rescore scan | every 45 cycles | Demotes backlog autonomy tasks with calibration acceptance <40% (adds signal_stale label); promotes those >75% to priority: high |
Every 30 improve cycles, handle_stale_autonomy_task_scan scans Backlog tasks with source: autonomy that are older than 21 days. It cancels them with a <!-- cp:stale-autonomy-scan --> marker comment. If the underlying signal reappears (the proposer still finds the same issue in a newer snapshot), the task will be recreated fresh.
The marker prevents the feedback loop scan from treating these system-cancelled tasks as human rejections.
The feedback loop scan (Part B, every 15 cycles) also scans Cancelled tasks with source: autonomy. If a cancelled task does NOT carry the stale-autonomy-scan marker, it is treated as a human rejection:
- An
abandonedfeedback record is written tostate/proposal_feedback/<id>.json. - The task's
candidate_dedup_keyis registered inProposalRejectionStore(state/proposal_rejections.json) for permanent suppression.
This "no" signal is indefinite — the proposer will never recreate the same candidate unless the rejection record is manually removed.
When a task fails with context_limit, the follow-up task includes a prior_progress: block extracted from the previous execution's summary. This lets Kodo continue from where it stopped rather than restarting from scratch.
The improve watcher tracks per-command validation outcomes. When a command has failed ≥30% of its last 10 runs, it is classified as flaky_test rather than validation_failure. The follow-up task targets stabilizing the test rather than retrying the original goal.
Every 15 improve cycles, handle_feedback_loop_scan checks Done tasks that have a
pull_request_url in their artifact but no state/proposal_feedback/<id>.json file.
It fetches the GitHub PR state and writes the feedback record automatically. This closes
the learning loop for PRs merged or closed by humans outside the review watcher.
Every 25 improve cycles, handle_workspace_health_check runs a quick sanity check
(python -c "import sys; sys.exit(0)") inside the venv of each repo with a local_path
configured. On failure it attempts RepoEnvironmentBootstrapper.prepare(). If bootstrap
also fails, a high-priority [Workspace] Repair environment for <repo> goal task is
created. This prevents a broken venv from causing every subsequent task to fail silently
with dependency_missing or infra_tooling.
The stale-running reconciler uses per-kind timeouts rather than a single global TTL. When a task has been in Running state longer than its TTL, it is moved to Blocked for triage.
| Task kind | TTL |
|---|---|
goal |
120 minutes |
test |
45 minutes |
test_campaign |
45 minutes |
improve |
30 minutes |
improve_campaign |
30 minutes |
fix_pr |
45 minutes |
| (other / unknown) | 90 minutes |
TTL evaluation uses issue.updated_at. A task that updated recently (e.g. still receiving comments from an active run) is left alone even if its wall-clock age exceeds the TTL. This prevents false-positive stale detection on long-running but active executions.
When the escalation webhook fires (should_escalate is true — ≥5 blocked events for the same classification in 24 hours), the improve watcher now also creates a dedicated [Systemic] Investigate recurring <classification> failures task on the board with:
task-kind: improvesource: improveurgency: high
This ensures systemic issues produce actionable board work, not just a webhook notification. The task is deduplicated: if a [Systemic] task with the same title already exists on the board, no duplicate is created.
When the same classification appears in ≥5 blocked-triage events within 24 hours, an HTTP POST is sent to escalation.webhook_url (if configured). A cooldown prevents repeated POSTs for the same classification. This is a signal to the operator that a systemic issue exists.
- recursive unblock storms
- vague task spray
- many similar children for the same failure pattern
- broad unsolicited repo rewrites
- duplicate avoidance for the same source task + handoff reason
- cap on follow-up tasks per improve cycle
- repeated-pattern heuristic that prefers one system-fix task over many scattered children
- no recursive unblock task generation for improve-generated failures
- stale-PR scan only closes PRs whose comment history confirms no prior close attempt
Improve should leave tasks visibly human-facing when the next action cannot be automated confidently, especially for:
- auth or secret issues
- local environment/tooling setup failures
- unclear Plane permission problems
- repeated
unknownfailures without a stable classification
In these cases, improve comments make clear:
- the classification
- the reason summary
- whether a follow-up task was created
- whether human attention is required
proposeseeds new bounded board work when the system is otherwise quietgoalimplementstestverifiesimproveinterprets, stabilizes, and generates next work
This keeps the board readable and avoids mirroring every internal Kodo sub-role on the board.