Skip to content

[Planning][P3] Swarm P2P mesh — production hardening and multi-agent workflows #106

Description

@EXboys

Summary

SkillLite includes a skilllite-swarm crate and skilllite swarm --listen <ADDR> subcommand for P2P mesh: mDNS discovery, peer routing, and distributed task dispatch. It is wired into the main binary and documented as an entry point in docs/en/ENTRYPOINTS-AND-DOMAINS.md §5.

Current capabilities (from crate docs):

  • Discovery: mDNS service registration and browsing
  • Routing: Match required_capabilities with local/neighbor capabilities
  • HTTP /task: Receive NodeTask, execute locally or forward to peer
  • Auth: SKILLLITE_SWARM_TOKEN Bearer gate

This issue tracks production hardening and multi-agent workflow features beyond the current MVP daemon.

Goals

Area Current state Target
Security Bearer token on HTTP TLS/mTLS option, token rotation, capability-scoped tokens
Reliability Basic routing Retry, timeout, dead-peer detection, graceful shutdown
Observability Minimal Structured logs, metrics hooks, task trace IDs across hops
Skill sync NewSkill Gossip (basic) Conflict resolution, version pinning, signed skill propagation
Agent integration skilllite run --soul Documented multi-node agent delegation patterns
Operations Manual --listen Systemd/launchd service templates, LAN vs loopback guidance

Proposed milestones

M1 — Security & ops baseline

  • Document threat model for LAN vs 0.0.0.0 binding
  • Optional TLS termination or reverse-proxy guide
  • Health/readiness endpoint for orchestrators

M2 — Routing reliability

  • Task timeout and retry policy
  • Peer health checks + automatic de-registration
  • Idempotent task delivery (task_id dedup)

M3 — Multi-agent workflows

  • Example: split planning vs execution across nodes
  • Integration with agent loop swarm events (swarm_started, etc.)
  • CLI skilllite swarm status --json for Desktop/monitoring

Acceptance criteria

  • Production deployment guide (EN + zh) with security checklist
  • At least one end-to-end multi-node tutorial (2+ machines or containers)
  • --json status command for external monitoring
  • No breaking change to default loopback-only behavior without explicit opt-in

Non-goals

  • Cloud-hosted centralized orchestrator (Swarm stays P2P-first)
  • Replacing MCP as the primary IDE integration path

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    agentAgent related changesbackendBackendenhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions