Skip to content

Unify skill catalog database source#517

Merged
kaizhou-lab merged 147 commits into
mainfrom
fix/skill-management-db
Jun 25, 2026
Merged

Unify skill catalog database source#517
kaizhou-lab merged 147 commits into
mainfrom
fix/skill-management-db

Conversation

@kaizhou-lab

Copy link
Copy Markdown
Contributor

Summary

  • Store user, builtin, extension, and cron skills in the unified skill catalog table.
  • Make /api/skills read from the database-backed catalog and remove the separate builtin-auto list path.
  • Update conversation and agent skill loading to consume the unified catalog source.

Test Plan

  • just push -u origin fix/skill-management-db

zk added 30 commits June 15, 2026 17:57
The availability scheduler runs `try_connect_custom_agent` every 5
minutes for every agent, spawning a CLI subprocess and tearing it
down once the ACP handshake completes (or fails). For wrapper CLIs
that fork a long-lived grandchild — `npm exec openclaw --acp` is the
production case — cleanup was leaking the grandchild because:

* `kill_on_drop(true)` on the tokio Command only signals the direct
  child (the npm exec wrapper), not its grandchild.
* The probe relied on `drop(protocol)` for the success path and on
  no explicit cleanup for the handshake-fail path, so `proc.kill`
  was never called.
* `CliAgentProcess::kill` itself short-circuited and returned Ok the
  moment the leader exited within the grace period — so even when
  callers did invoke it, no group-wide SIGKILL was sent.

Result: dozens of zombie `openclaw-acp` processes accumulated per
day under the 5-minute scheduler.

Fix:

1. `CliAgentProcess::kill` now always issues a group-wide SIGKILL
   after the grace period, even when the leader has already exited.
   `force_kill` already maps ESRCH to success, so the sweep is
   idempotent for already-reaped trees.
2. `try_connect_custom_agent` calls `proc.kill` on every outcome
   (success, ACP failure, handshake timeout) by hoisting the spawn
   out of the inner future and running cleanup unconditionally
   after the timeout race resolves.
3. New regression test `probe_kills_grandchild_left_behind_by_wrapper`
   exercises the exact wrapper-grandchild shape from production and
   asserts the grandchild is reaped before the probe returns.
zk added 25 commits June 22, 2026 15:24
…iders

Deepen the agent health probe from `initialize` to `session/new` so it
reflects real usability, not just protocol reachability — `initialize`
returns authMethods even for authorized agents and cannot tell apart
"reachable but not signed in".

- custom_agent_probe: after `initialize`, open a throwaway `session/new`
  (no prompt); classify the outcome as Ok / Auth (ACP auth_required,
  JSON-RPC -32000) / Fail. Applies to both the custom and builtin-managed
  probe paths.
- api-types: add `TryConnectCustomAgentResponse::FailAuth` (tag `fail_auth`).
- availability: map FailAuth → offline + `auth_required` code; gate aionrs
  (built-in agent, no external CLI) availability on having at least one
  enabled model provider, mirroring AssistantService::resolve_default_agent_type
  — offline + `no_provider` otherwise.
- custom: accept test-on-save when the agent is reachable but auth_required
  (a valid agent the user just hasn't logged into yet).
- registry: add guidance for auth_required and no_provider error codes.
- The background scheduler shares run_probe, so periodic checks reflect the
  same session/new-based status.
…ficeAI/AionCore into feat/agent-connection-testing-phase2

* 'feat/agent-connection-testing-phase2' of github.com:iOfficeAI/AionCore:
…ackend

An assistant's agent_status was matched to its agent row by `backend`
only. aionrs (the built-in Rust agent) has a NULL `backend` and is keyed
by `agent_type` ("aionrs"), so every aionrs-backed assistant failed to
resolve a row and was mislabelled Missing/unavailable.

Match the agent row on `backend == effective_backend` OR
`agent_type.serde_name() == effective_backend`, so aionrs assistants
resolve to the real aionrs row and reflect its actual status.

Add a regression test covering an aionrs assistant (row with NULL backend,
agent_type Aionrs, Online) resolving to Online instead of Missing.
…-testing-phase2

# Conflicts:
#	crates/aionui-conversation/src/service.rs
Base automatically changed from feat/agent-connection-testing-phase2 to main June 25, 2026 09:27
# Conflicts:
#	crates/aionui-assistant/src/service.rs
#	crates/aionui-conversation/src/service.rs
#	crates/aionui-conversation/src/service_test.rs
#	crates/aionui-conversation/src/turn_orchestrator.rs
#	crates/aionui-cron/tests/service_integration.rs
#	crates/aionui-db/migrations/013_agent_connection_snapshot.sql
#	crates/aionui-db/src/lib.rs
#	crates/aionui-db/tests/cron_assistant_first_migration.rs
#	crates/aionui-team/src/test_utils.rs
#	crates/aionui-team/tests/session_service_integration.rs
@kaizhou-lab kaizhou-lab enabled auto-merge (squash) June 25, 2026 09:49
@kaizhou-lab kaizhou-lab merged commit 5f453b0 into main Jun 25, 2026
6 checks passed
@kaizhou-lab kaizhou-lab deleted the fix/skill-management-db branch June 25, 2026 09:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant