🏥 Repository Health Dashboard

# 🏥 Daily Health Check — 2026-06-26

**Status:** 🔴 3 critical · 🟡 2 warnings · 🔵 1 info
**Since yesterday:** 🆕 3 new · ✅ 0 resolved · 📌 3 unchanged

> 📌 **Maintainer action needed:** please pin this issue as the canonical health dashboard and unpin/close any stale duplicate.

---

## 🆕 New Findings (3)

> These appeared since the last health check (2026-06-25).

### 🔴 [P1] Evaluation `build-validator` job failed: GitHub.Copilot.SDK build error

- **Fingerprint:** `pipeline:evaluation:build-validator:build-skill-validator:failure`
- **First seen:** 2026-06-26
- **Details:** The `build-validator` job in the `evaluation` workflow failed on `main` at 2026-06-25T14:02Z for PR #818 evaluation. The `Build skill-validator` step failed with multiple `CS0234` errors: `The type or namespace name 'SDK' does not exist in the namespace 'GitHub.Copilot'`. Affected files: `AgentRunner.cs`, `Judge.cs`, `LlmSession.cs`, `LocalSessionFsHandler.cs`, `PairwiseJudge.cs`. The failure occurred against CLI 1.0.65. **Note:** Today's scheduled run (at 00:55 UTC) succeeded on the latest main commit, suggesting the incompatibility was fixed by a subsequent commit (PR #832 merged 2026-06-25T22:35).
- **Links:** [Failed run (build-validator, 5 min)](https://github.com/dotnet/skills/actions/runs/28175748796) · [Today's successful scheduled run](https://github.com/dotnet/skills/actions/runs/28210145148) · [evaluation.yml](https://github.com/dotnet/skills/blob/main/.github/workflows/evaluation.yml)
- **Suggested action:** Verify that `GitHub.Copilot.SDK` namespace usage in `AgentRunner.cs`, `Judge.cs`, `LlmSession.cs` is compatible with the pinned CLI version. Check if PR #803 (restore 15K cap) introduced a reference to SDK APIs not yet available in CLI 1.0.65.

---

### 🔴 [P5] Evaluation failure rate critical: ~33% across all branches in last 24h

- **Fingerprint:** `pipeline:evaluation:failure-rate:critical`
- **First seen:** 2026-06-26
- **Details:** Of 4 evaluation runs in the last 24h, 1 failed, 2 succeeded, 1 was cancelled. Failure rate (excl. cancelled) = **33.3%** — above the 30% Critical threshold. The single failure was run #4823 (`Evaluate PR #818 @ fda2336`, `build-validator` job, build error — same as P1 above). The failure rate is driven entirely by this one build-error run; the scheduled run and PR #833 evaluation both succeeded.
- **Breakdown:** 2 successes (scheduled + PR #833), 1 failure (PR #818), 1 cancelled (issue_comment triggered)
- **Links:** [Failed run #4823](https://github.com/dotnet/skills/actions/runs/28175748796) · [Successful PR #833 run](https://github.com/dotnet/skills/actions/runs/28205424418) · [Successful scheduled run](https://github.com/dotnet/skills/actions/runs/28210145148)
- **Suggested action:** Correlated with P1 above — fixing the SDK build compatibility issue should resolve this metric.

---

### 🟡 [P2] Evaluation cancelled after 102 min on main (issue_comment trigger)

- **Fingerprint:** `pipeline:evaluation:evaluate:timeout`
- **First seen:** 2026-06-26
- **Details:** An `issue_comment`-triggered evaluation run ("Add eval coverage for dotnet-test/filter-syntax") was cancelled on `main` at 2026-06-25T08:34Z after running for ~102 minutes. No job failed — the run was cancelled externally (likely by a concurrency group eviction when a newer run started). This is consistent with the chronic eval duration issue (runs taking 99–130 min).
- **Links:** [Cancelled run #4804](https://github.com/dotnet/skills/actions/runs/28157545142)
- **Suggested action:** The 102-min runtime is consistent with the chronic duration issue (P3/resource:eval-duration:critical). Reducing eval runtime (parallelizing `dotnet-test` plugin evaluation) would also reduce concurrency-group cancellations.

---

## 🔍 Investigation Results

> Deep investigations are dispatched for new critical/warning findings.
> The [grooming workflow](../workflows/devops-health-groom.md) links results ~3 hours after this run.

| Finding | Severity | Investigation | First Seen | Result |
|---------|----------|---------------|------------|--------|
| Evaluation avg duration critical (110+ min) | 🔴 Critical | ✅ Done | 2026-06-03 | [The `evaluate (dotnet-test)` job is the sole critical bottleneck at **100 minutes**, driven by 175 sequential eval scenarios; the regression traces directly to PR #707 (merged June 1) which added polyglot scenarios to three skills.](https://github.com/dotnet/skills/issues/695#issuecomment-4609202357) |
| Evaluation build-validator failed (GitHub.Copilot.SDK CS0234) | 🔴 Critical | 🔄 Dispatched | 2026-06-26 | [⏳ Investigation dispatched — results arriving shortly...](https://github.com/dotnet/skills/actions/runs/28217288553) |
| Evaluation failure rate critical (33% in 24h) | 🔴 Critical | 🔄 Dispatched | 2026-06-26 | [⏳ Investigation dispatched — results arriving shortly...](https://github.com/dotnet/skills/actions/runs/28217288553) |

---

## ✅ Resolved Since Yesterday (0)

> No findings resolved since the last health check (2026-06-25).

---

## 📌 Existing Findings (3)

> These have been present since before today. Sorted by severity then age.

<details>
<summary>🔴 Critical — Evaluation average duration critical (~99–130 min avg, threshold: 55 min) · first seen 2026-06-03 · 12 occurrences</summary>

- **Fingerprint:** `resource:eval-duration:critical`
- **First seen:** 2026-06-03 · Occurrences: 12 (chronic — 3+ weeks)
- **Details:** The 14-day average for substantial evaluation schedule runs remains well above the 55-min critical threshold. Today's scheduled run took **99 minutes** (down from the 121–130 min recent average), but still 1.8× the threshold. The previous run (yesterday's) was 85 min, suggesting some variance.
- **7-day summary (schedule, main):** Today: 99 min ✅ success; yesterday: 85 min ✅; Jun 23–24: 2 failures. Avg ~118 min (est.)
- **Root cause (from investigation):** The `evaluate (dotnet-test)` job is the bottleneck, driven by 175 sequential eval scenarios added by PR #707 (merged 2026-06-01).
- **Links:** [Today's run (99 min, success)](https://github.com/dotnet/skills/actions/runs/28210145148) · [Investigation result](https://github.com/dotnet/skills/issues/695#issuecomment-4609202357) · [evaluation.yml](https://github.com/dotnet/skills/blob/main/.github/workflows/evaluation.yml)
- **Suggested action:** Parallelize eval scenarios in `evaluate (dotnet-test)` — split 175 sequential scenarios across parallel jobs, or introduce a fast-path for PR-triggered evaluations vs. scheduled full runs.

</details>

<details>
<summary>🟡 Warning — Orphan plugin: dotnet-experimental not listed in marketplace.json · first seen 2026-05-14 · 31 occurrences</summary>

- **Fingerprint:** `infra:orphan-plugin:dotnet-experimental`
- **First seen:** 2026-05-14 · Occurrences: 31 (chronic — 6+ weeks)
- **Details:** `plugins/dotnet-experimental/` exists on disk with a valid `plugin.json` and skills (`exp-mock-usage-analysis`, `exp-test-maintainability`, `exp-simd-vectorization`), but no entry in `.github/plugin/marketplace.json` (which lists 14 plugins). The plugin is not discoverable by consumers.
- **Links:** [marketplace.json](https://github.com/dotnet/skills/blob/main/.github/plugin/marketplace.json) · [plugins/dotnet-experimental/](https://github.com/dotnet/skills/tree/main/plugins/dotnet-experimental)
- **Suggested action:** Either add `{ "name": "dotnet-experimental", "source": "./plugins/dotnet-experimental", "description": "..." }` to `marketplace.json` when ready to publish, or remove the directory if not intended for publication.

</details>

<details>
<summary>🔵 Info — evaluation.yml uses --verdict-warn-only mode · first seen 2026-05-16 · 29 occurrences</summary>

- **Fingerprint:** `infra:verdict-warn-only`
- **First seen:** 2026-05-16 · Occurrences: 29 (intentional configuration)
- **Details:** `evaluation.yml` passes `--verdict-warn-only` to the skill-validator, treating skill validation failures as warnings rather than hard failures. This is intentional.
- **Link:** [evaluation.yml](https://github.com/dotnet/skills/blob/main/.github/workflows/evaluation.yml)

</details>

---

## 📊 Trends (7-day)

| Metric | Today | 7d Avg | Δ | Trend |
|--------|-------|--------|---|-------|
| Eval duration — schedule/main (min) | 99 | ~118 | -19 | ✅ |
| Eval success rate — main schedule (7d) | 100% (1/1) | ~71% | +29% | ✅ |
| Eval success rate — all branches (24h) | 67% (2/3) | 100% | -33% | ⚠️ |
| Eval scheduled cancellation rate (24h) | 0% (0/1) | 0% | 0% | ➡️ |
| Workflow failure rate — main (24h) | ~3% (1 failure) | ~0% | +3% | ↗️ |
| Compute hours/day | ~3.5h | ~2.0h | +1.5h | ↗️ |

> ⚠️ **Eval pipeline degraded today:** 1 evaluation build failure + 1 cancellation on `main` in 24h.
> ✅ **Scheduled eval succeeded** (99 min, latest main commit). The build failure was on an older PR commit and appears resolved.
> ⚠️ **P1 + P5 are correlated** — the SDK build error in PR #818's skill-validator code drove both findings. The fix (PR #832, merged 22:35 UTC) appears to have resolved the incompatibility.
> ⚠️ **Eval duration** remains a chronic concern (118 min est. 7d avg). Concurrency cancellations will continue until runtime is reduced.
> ⚠️ Skipped I5 check (Pages deployment): GitHub Pages API not accessible via available tools.
> i️ I3 (validate-skills): No `validate-skills.yml` workflow found — check not applicable.
> i️ I6 (unpinned actions): All workflow action references use first-party (`actions/*`) or SHA-pinned third-party actions — no unpinned third-party actions detected.

---

🤖 Generated by DevOps Health Check agentic workflow · [Run #28217288553](https://github.com/dotnet/skills/actions/runs/28217288553) · 2026-06-26T04:34 UTC

> Generated by [DevOps Daily Health Check](https://github.com/dotnet/skills/actions/runs/28217288553) · 1.2K AIC · ⊞ 36.5K · [◷](https://github.com/search?q=repo%3Adotnet%2Fskills+is%3Aissue+%22gh-aw-workflow-call-id%3A+dotnet%2Fskills%2Fdevops-health-check%22&type=issues)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🏥 Repository Health Dashboard #695

🏥 Daily Health Check — 2026-06-26

🆕 New Findings (3)

🔴 [P1] Evaluation `build-validator` job failed: GitHub.Copilot.SDK build error

🔴 [P5] Evaluation failure rate critical: ~33% across all branches in last 24h

🟡 [P2] Evaluation cancelled after 102 min on main (issue_comment trigger)

🔍 Investigation Results

✅ Resolved Since Yesterday (0)

📌 Existing Findings (3)

📊 Trends (7-day)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Finding	Severity	Investigation	First Seen	Result
Evaluation avg duration critical (110+ min)	🔴 Critical	✅ Done	2026-06-03	The `evaluate (dotnet-test)` job is the sole critical bottleneck at 100 minutes, driven by 175 sequential eval scenarios; the regression traces directly to PR #707 (merged June 1) which added polyglot scenarios to three skills.
Evaluation build-validator failed (GitHub.Copilot.SDK CS0234)	🔴 Critical	🔄 Dispatched	2026-06-26	⏳ Investigation dispatched — results arriving shortly...
Evaluation failure rate critical (33% in 24h)	🔴 Critical	🔄 Dispatched	2026-06-26	⏳ Investigation dispatched — results arriving shortly...

Metric	Today	7d Avg	Δ	Trend
Eval duration — schedule/main (min)	99	~118	-19	✅
Eval success rate — main schedule (7d)	100% (1/1)	~71%	+29%	✅
Eval success rate — all branches (24h)	67% (2/3)	100%	-33%	⚠️
Eval scheduled cancellation rate (24h)	0% (0/1)	0%	0%	➡️
Workflow failure rate — main (24h)	~3% (1 failure)	~0%	+3%	↗️
Compute hours/day	~3.5h	~2.0h	+1.5h	↗️

Uh oh!

🏥 Repository Health Dashboard #695

Description

🏥 Daily Health Check — 2026-06-26

🆕 New Findings (3)

🔴 [P1] Evaluation build-validator job failed: GitHub.Copilot.SDK build error

🔴 [P5] Evaluation failure rate critical: ~33% across all branches in last 24h

🟡 [P2] Evaluation cancelled after 102 min on main (issue_comment trigger)

🔍 Investigation Results

✅ Resolved Since Yesterday (0)

📌 Existing Findings (3)

📊 Trends (7-day)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

🔴 [P1] Evaluation `build-validator` job failed: GitHub.Copilot.SDK build error