Autonomy Fleet Triage

Failure-mode repository, event triage engine, and shift handoff workflow for autonomous delivery robot operations.

Autonomy Fleet Triage converts raw robot events into operator-ready mitigation guidance and engineering-ready case records. It keeps failure modes as data, scores incoming events against known symptoms and subsystems, promotes safety-sensitive incidents, assigns SLA targets, and renders a static shift dashboard for operations review.

Features

SQLite-backed failure-mode repository with owners, runbook links, symptoms, likely causes, and mitigation steps
JSONL robot event ingestion for shift replay or operational logs
Transparent triage engine using symptom overlap, subsystem match, free-text evidence, and severity promotion rules
SLA assignment for low, medium, high, and critical cases
CLI for initializing the repository, ingesting events, triaging a full shift, triaging a single live event, and rendering a dashboard
Static HTML operations dashboard for shift review and cross-functional handoff
Idempotent case upserts so rerunning a shift replay refreshes existing cases instead of creating duplicates
Tests for repository round trips, triage matching, escalation routing, and safety-sensitive battery handling

Operational Workflow

Initialize the SQLite repository with reviewed failure modes and runbook links.
Ingest JSONL robot events from a shift replay or live operations export.
Score each event against known failure modes using subsystem, symptom, and text evidence.
Promote severity for blocked missions, unsafe stops, weak network signals, and low battery conditions.
Store idempotent case records with escalation team, SLA, evidence, and recommended mitigation.
Render a dashboard that gives Operations and Engineering the same case context.

Quick Start

python3 -m venv .venv
source .venv/bin/activate
python -m pip install -e ".[dev]"

autonomy-triage init
autonomy-triage ingest
autonomy-triage triage
autonomy-triage dashboard
python -m pytest -q

Open reports/dashboard.html after running the dashboard command.

Live Event Example

autonomy-triage triage-one \
  --event-id EVT-LIVE-001 \
  --robot-id DL-104 \
  --timestamp 2026-05-08T22:41:00+00:00 \
  --location "San Francisco / Mission Bay" \
  --subsystem connectivity \
  --symptoms telemetry_gap,low_rssi,command_timeout \
  --notes "Video froze during handoff and operator saw delayed command acknowledgment" \
  --network-rssi -87 \
  --battery-pct 44 \
  --mission-state blocked

Project Structure

src/autonomy_fleet_triage/
  cli.py          command-line workflow
  dashboard.py    static HTML dashboard renderer
  models.py       dataclasses and severity model
  repository.py   SQLite schema, seed, ingest, case storage
  triage.py       matching, severity promotion, SLA logic
data/
  seed_failure_modes.json
  sample_events.jsonl
docs/
  architecture.md
  runbooks.md
tests/

Architecture Decisions

Explainable triage first: live support needs traceable evidence more than opaque automation.
Failure modes as data: Operations and Engineering can review the repository without editing Python code.
Safety-aware escalation: blocked missions, unsafe stops, low battery, and weak network signals promote severity.
Local-first architecture: SQLite plus static HTML keeps the workflow reproducible and easy to run during interviews.
Automation-ready output: every triage case includes severity, SLA, evidence, escalation team, and recommended actions.

Example Output

case=1 event=EVT-1001 robot=DL-042 severity=high score=0.99 mode=Localization drift near dense curbside pickup
case=2 event=EVT-1002 robot=DL-017 severity=high score=0.89 mode=Network handoff degradation during live mission
case=3 event=EVT-1003 robot=DL-088 severity=high score=0.99 mode=Planner regression after autonomy software release

Support Surface

Initial debugging and troubleshooting through triage-one.
Failure-mode repository ownership through seed_failure_modes.json.
Cross-functional escalation to Robot Operations, Hardware Operations, Platform Engineering, or Autonomy Software.
Self-service documentation through runbook links and case evidence.
Workflow improvement through stored case history and recurring pattern review.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
docs		docs
src/autonomy_fleet_triage		src/autonomy_fleet_triage
tests		tests
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Autonomy Fleet Triage

Features

Operational Workflow

Quick Start

Live Event Example

Project Structure

Architecture Decisions

Example Output

Support Surface

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Autonomy Fleet Triage

Features

Operational Workflow

Quick Start

Live Event Example

Project Structure

Architecture Decisions

Example Output

Support Surface

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages