Skip to content

advisor: surface learning evidence gaps#53

Merged
sarvesh1327 merged 3 commits into
mainfrom
phase9-evidence-surfaces
Apr 24, 2026
Merged

advisor: surface learning evidence gaps#53
sarvesh1327 merged 3 commits into
mainfrom
phase9-evidence-surfaces

Conversation

@sarvesh1327

Copy link
Copy Markdown
Owner

Summary

  • Start Phase 9 from docs/plans/2026-04-24-final-multiturn-real-learning-completion.md.
  • Add dashboard evidence badges per run for trajectory/reward/lineage presence.
  • Add machine-readable DB and artifact evidence counts to dashboard snapshots and validation gate output.
  • Add profile artifact evidence for active checkpoint id, adapter file presence, training manifest, and backend manifest.
  • Surface the same evidence through HTTP dashboard/activity routes and CLI validation/activity commands.

Testing

  • python -m pytest tests/agent/advisor/test_dashboard.py tests/agent/advisor/test_api.py tests/agent/advisor/test_cli.py -q
  • ruff check .
  • python -m pytest tests/agent/advisor -q
  • git diff --check

Advisor service remained stopped: port_8765=not_listening

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e0d6be8103

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread agent/advisor/product/dashboard.py Outdated
"outcomes": sum(1 for row in rows if row.get("outcome")),
"reward_labels": sum(1 for row in rows if row.get("reward_label")),
"lineages": sum(1 for row in rows if store.get_lineage(row.get("run_id")) is not None),
"trajectories": len(store.list_trajectories()),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Track trajectory coverage by run_id

trajectories is currently computed as the total number of trajectory records, but _evidence_blocking_reasons compares it against total run count. If one run has multiple trajectories while another has none, these counts can still match and missing_trajectories will not be raised, so the evidence surface can report readiness even though some runs are missing trajectory provenance. This should count runs with at least one trajectory (or otherwise evaluate coverage per run) instead of raw trajectory rows.

Useful? React with 👍 / 👎.

@sarvesh1327 sarvesh1327 merged commit 839c484 into main Apr 24, 2026
1 check passed
@sarvesh1327 sarvesh1327 deleted the phase9-evidence-surfaces branch April 24, 2026 16:37

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d3e4a3fb7d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

"runs": len(rows),
"outcomes": sum(1 for row in rows if row.get("outcome")),
"reward_labels": sum(1 for row in rows if row.get("reward_label")),
"lineages": sum(1 for row in rows if store.get_lineage(row.get("run_id")) is not None),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Batch lineage evidence lookup for activity snapshots

The lineage count in build_advisor_evidence_snapshot performs store.get_lineage(...) once per run, and each get_lineage call opens its own SQLite query/connection path. Because both /v1/operator/advisor-activity and the auto-refreshing /dashboard/advisor-activity route build this snapshot on every request, larger run histories will cause many DB round-trips per page load and can materially degrade dashboard latency (and DB contention). This should be replaced with a single bulk lineage lookup (for example, one query that returns the run_ids present in run_lineages).

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant