From 29da5bbd3f9502903118c6075011b9f595e07a11 Mon Sep 17 00:00:00 2001 From: 1008covingtonlane <42551186+1008covingtonlane@users.noreply.github.com> Date: Sat, 27 Jun 2026 20:22:48 -0400 Subject: [PATCH 1/2] SystemDrive Free Space TSG: GMACache connectivity pointer + at-scale Tier 1 fan-out Follow-up to the merged TSG, addressing two reviewer asks: - Network engineer: name the connectivity to restore for a backed-up GMACache (outbound HTTPS/443 to the Azure Arc and Azure Local endpoints, the firewall requirements doc, and azcmagent show), so it is actionable without a second TSG. - Partner/SI: add an at-scale note that fans the Tier 1 reclamation across all nodes in one Invoke-Command (WinSxS example), with the Tier 1b update-in-progress caveat. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- ...bleshooting-Test-SystemDrive-Free-Space.md | 22 ++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/TSG/EnvironmentValidator/Troubleshooting-Test-SystemDrive-Free-Space.md b/TSG/EnvironmentValidator/Troubleshooting-Test-SystemDrive-Free-Space.md index 82ac1c96..1a9e93dc 100644 --- a/TSG/EnvironmentValidator/Troubleshooting-Test-SystemDrive-Free-Space.md +++ b/TSG/EnvironmentValidator/Troubleshooting-Test-SystemDrive-Free-Space.md @@ -273,6 +273,21 @@ Remove-Item C:\Windows\Temp\* -Recurse -Force -ErrorAction SilentlyContinue Remove-Item $env:TEMP\* -Recurse -Force -ErrorAction SilentlyContinue ``` +**Run Tier 1 across all nodes at once.** For at-scale deployment prep, fan the Tier 1 +reclamation out to every machine in the cluster with a single `Invoke-Command` instead of +repeating it node by node. For example, the WinSxS component cleanup (Tier 1a, the largest +safe win): + +```powershell +Invoke-Command -ComputerName (Get-ClusterNode).Name -ScriptBlock { + Dism.exe /Online /Cleanup-Image /StartComponentCleanup +} | Out-Null +``` + +Wrap any of the Tier 1 a-d commands the same way. Do not fan out the Windows Update cache +clear (Tier 1b) while a solution update or upgrade is in progress, because it briefly stops +the `wuauserv` and BITS services on every node. + #### Tier 2: diagnostic logs (reclaim with care) Large event logs such as `Microsoft-Windows-FailoverClustering%4Diagnostic` and @@ -312,7 +327,12 @@ or updates, and it does not fix the underlying cause. - **`C:\GMACache` (monitoring agent cache).** A large `GMACache`, especially `GMACache\TelemetryCache`, usually means the machine cannot upload telemetry to Azure, so the data backs up on disk. The fix is to restore outbound connectivity - and the Arc connection so the cache drains on its own. Do not delete the cache to + and the Arc connection so the cache drains on its own. Concretely, that means + restoring the node's outbound HTTPS (TCP 443) to the Azure Arc and Azure Local + service endpoints (see the [Azure Local firewall and outbound connectivity + requirements](https://learn.microsoft.com/azure/azure-local/concepts/firewall-requirements)) + and confirming the Arc agent is connected (`azcmagent show` reports + `Agent Status: Connected`). Do not delete the cache to free space; that loses buffered data, and the folder simply refills while connectivity is broken. - **`C:\Observability`, `C:\NugetStore`, `C:\ImageComposition`, `C:\CloudContent`, From 6d02763ab2a2a46a7f400f5fb6ce93f587d2f50b Mon Sep 17 00:00:00 2001 From: 1008covingtonlane <42551186+1008covingtonlane@users.noreply.github.com> Date: Sat, 27 Jun 2026 21:20:27 -0400 Subject: [PATCH 2/2] SystemDrive TSG: address review on the fan-out and the Arc status check - Tier 1 fan-out: add -ThrottleLimit 2 so the IO/CPU-intensive cleanup does not spike every node at once, and return a per-node {Node, ExitCode} object (0 = succeeded) instead of discarding output, so operators can verify completion. - GMACache Arc check: use the JSON output per repo convention, (azcmagent show -j | ConvertFrom-Json).status, instead of parsing the human-readable "Agent Status: Connected" text. The azcmagent JSON key is `status` (CLI reference; display name "Agent Status"), not connectionStatus. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../Troubleshooting-Test-SystemDrive-Free-Space.md | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/TSG/EnvironmentValidator/Troubleshooting-Test-SystemDrive-Free-Space.md b/TSG/EnvironmentValidator/Troubleshooting-Test-SystemDrive-Free-Space.md index 1a9e93dc..3755146e 100644 --- a/TSG/EnvironmentValidator/Troubleshooting-Test-SystemDrive-Free-Space.md +++ b/TSG/EnvironmentValidator/Troubleshooting-Test-SystemDrive-Free-Space.md @@ -279,9 +279,13 @@ repeating it node by node. For example, the WinSxS component cleanup (Tier 1a, t safe win): ```powershell -Invoke-Command -ComputerName (Get-ClusterNode).Name -ScriptBlock { +# -ThrottleLimit caps how many nodes run this IO/CPU-intensive cleanup at once, so the +# cluster does not spike all at once; the returned per-node exit code confirms success +# (0 = succeeded). Raise the throttle only if the cluster has headroom. +Invoke-Command -ComputerName (Get-ClusterNode).Name -ThrottleLimit 2 -ScriptBlock { Dism.exe /Online /Cleanup-Image /StartComponentCleanup -} | Out-Null + [pscustomobject]@{ Node = $env:COMPUTERNAME; ExitCode = $LASTEXITCODE } +} | Sort-Object Node | Format-Table -AutoSize ``` Wrap any of the Tier 1 a-d commands the same way. Do not fan out the Windows Update cache @@ -331,8 +335,8 @@ or updates, and it does not fix the underlying cause. restoring the node's outbound HTTPS (TCP 443) to the Azure Arc and Azure Local service endpoints (see the [Azure Local firewall and outbound connectivity requirements](https://learn.microsoft.com/azure/azure-local/concepts/firewall-requirements)) - and confirming the Arc agent is connected (`azcmagent show` reports - `Agent Status: Connected`). Do not delete the cache to + and confirming the Arc agent is connected (`(azcmagent show -j | ConvertFrom-Json).status` + returns `Connected`). Do not delete the cache to free space; that loses buffered data, and the folder simply refills while connectivity is broken. - **`C:\Observability`, `C:\NugetStore`, `C:\ImageComposition`, `C:\CloudContent`,