advisor: surface learning evidence gaps#53
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e0d6be8103
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| "outcomes": sum(1 for row in rows if row.get("outcome")), | ||
| "reward_labels": sum(1 for row in rows if row.get("reward_label")), | ||
| "lineages": sum(1 for row in rows if store.get_lineage(row.get("run_id")) is not None), | ||
| "trajectories": len(store.list_trajectories()), |
There was a problem hiding this comment.
Track trajectory coverage by run_id
trajectories is currently computed as the total number of trajectory records, but _evidence_blocking_reasons compares it against total run count. If one run has multiple trajectories while another has none, these counts can still match and missing_trajectories will not be raised, so the evidence surface can report readiness even though some runs are missing trajectory provenance. This should count runs with at least one trajectory (or otherwise evaluate coverage per run) instead of raw trajectory rows.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d3e4a3fb7d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| "runs": len(rows), | ||
| "outcomes": sum(1 for row in rows if row.get("outcome")), | ||
| "reward_labels": sum(1 for row in rows if row.get("reward_label")), | ||
| "lineages": sum(1 for row in rows if store.get_lineage(row.get("run_id")) is not None), |
There was a problem hiding this comment.
Batch lineage evidence lookup for activity snapshots
The lineage count in build_advisor_evidence_snapshot performs store.get_lineage(...) once per run, and each get_lineage call opens its own SQLite query/connection path. Because both /v1/operator/advisor-activity and the auto-refreshing /dashboard/advisor-activity route build this snapshot on every request, larger run histories will cause many DB round-trips per page load and can materially degrade dashboard latency (and DB contention). This should be replaced with a single bulk lineage lookup (for example, one query that returns the run_ids present in run_lineages).
Useful? React with 👍 / 👎.
Summary
docs/plans/2026-04-24-final-multiturn-real-learning-completion.md.Testing
python -m pytest tests/agent/advisor/test_dashboard.py tests/agent/advisor/test_api.py tests/agent/advisor/test_cli.py -qruff check .python -m pytest tests/agent/advisor -qgit diff --checkAdvisor service remained stopped:
port_8765=not_listening