Add context compaction and CI quota exhaustion case studies#28
Add context compaction and CI quota exhaustion case studies#28travisbreaks wants to merge 4 commits into
Conversation
33e69f2 to
a379cc9
Compare
|
Thanks for this additional contribution — the failure modes you're documenting here (context compaction state loss, CI resource exhaustion) are genuinely valuable and under-documented.
|
- Add <br> tags between metadata fields in case study overviews for proper markdown rendering (matching existing case study format) - Update dates from generic "2025" to specific months: October 2025 for CI quota exhaustion, September 2025 through March 2026 for context compaction (recurring) - Add explicit acknowledgment of OpenClaw email deletion case study overlap in context compaction doc, with a new section explaining how the two are complementary (acute single-event vs. chronic recurring state loss) - Update failure-mode entry headings with specific dates - Source fields note first-person operational accounts as primary public record (no blog posts exist for these; they are documented from private repo usage) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add <br> tags between metadata fields in case study overviews for proper markdown rendering (matching existing case study format) - Update dates from generic "2025" to specific months: October 2025 for CI quota exhaustion, September 2025 through March 2026 for context compaction (recurring) - Add explicit acknowledgment of OpenClaw email deletion case study overlap in context compaction doc, with a new section explaining how the two are complementary (acute single-event vs. chronic recurring state loss) - Update failure-mode entry headings with specific dates - Source fields note first-person operational accounts as primary public record (no blog posts exist for these; they are documented from private repo usage) Co-Authored-By: Claude Opus 4.6 (1M context) <tadao@travisfixes.com>
0421e48 to
b484713
Compare
There was a problem hiding this comment.
Pull request overview
Adds two new operator-reported Claude Code production case studies and links them into the repo’s “Autonomous Agent Failures” index and relevant failure-mode pages.
Changes:
- Added two new case study documents covering (1) context compaction state loss and (2) CI minutes exhaustion via fix-push-fail loops.
- Linked the new case studies from the README and added short “Real-World Examples” entries under relevant failure-mode docs.
- Cross-referenced the new case studies from plan-generation, tool-use, and verification/termination failure-mode pages.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| docs/failure-modes/verification-termination.md | Adds a new “Real-World Example” entry referencing the context compaction case study. |
| docs/failure-modes/tool-use.md | Adds a new “Real-World Example” entry referencing the CI quota exhaustion case study (tool selection framing). |
| docs/failure-modes/plan-generation.md | Adds a new “Real-World Example” entry referencing the CI quota exhaustion case study (planning/loop framing). |
| docs/case-studies/claude-code-context-compaction.md | New case study documenting context compaction dropping operational state and safety constraints over long sessions. |
| docs/case-studies/claude-code-ci-quota-exhaustion.md | New case study documenting CI minutes exhaustion via iterative push/CI verification loops. |
| README.md | Adds two new bullets under “Autonomous Agent Failures” linking to the new case studies. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| **Agent**: Claude Code (CLI tool, Claude Opus model)<br> | ||
| **Operator**: Solo developer, private monorepo with 12+ projects<br> |
There was a problem hiding this comment.
Inconsistent description of the product: the Incident Overview says Claude Code is a “CLI tool”, but the narrative attributes “context compaction” to the VS Code extension. Please clarify which surface is involved (VS Code extension vs CLI), or describe it as “Claude Code (VS Code extension + CLI)” so the case study isn’t internally contradictory.
| - [Context Compaction Drops Operational State](docs/case-studies/claude-code-context-compaction.md) - Claude Code's context window compression silently discards safety constraints and task state mid-session, causing the agent to resume with confidence on incomplete information. | ||
| - [CI Quota Exhaustion via Fix-Push-Fail Loop](docs/case-studies/claude-code-ci-quota-exhaustion.md) - Agent used remote CI as its linting tool, pushing 8+ incremental commits that burned 2,000/2,000 GitHub Actions minutes and locked out the entire organization. |
There was a problem hiding this comment.
PR description says these are case studies from 2024–2025, but the added case study date range includes March 2026. Please reconcile the PR description with the actual dates in the linked case studies (either update the description year range or adjust the case study framing).
Address Copilot inline review feedback (PR vectara#28, 2026-04-14): the Agent metadata said "CLI tool" but the narrative explicitly references the VS Code extension performing context compaction. Updated metadata to "VS Code extension" to match the narrative. The CI quota exhaustion case is surface-agnostic and stays as-is. Co-Authored-By: Tadao <tadao@travisfixes.com>
Two new first-person operator case studies from production Claude Code use: 1. Context Compaction Drops Operational State: documents how lossy context window summarization silently discards safety constraints and task state, with a near-miss on data exposure when "do not deploy" context was lost. 2. CI Quota Exhaustion via Fix-Push-Fail Loop: documents an agent using remote CI as its linting tool (8+ incremental pushes) instead of local equivalents, burning 2,000/2,000 GitHub Actions minutes and locking out the entire organization. Both cases include README entries under Autonomous Agent Failures and example entries in the relevant failure mode files (verification-termination, plan-generation, tool-use). Co-Authored-By: Claude Opus 4.6 (1M context) <tadao@travisfixes.com>
- Add <br> tags between metadata fields in case study overviews for proper markdown rendering (matching existing case study format) - Update dates from generic "2025" to specific months: October 2025 for CI quota exhaustion, September 2025 through March 2026 for context compaction (recurring) - Add explicit acknowledgment of OpenClaw email deletion case study overlap in context compaction doc, with a new section explaining how the two are complementary (acute single-event vs. chronic recurring state loss) - Update failure-mode entry headings with specific dates - Source fields note first-person operational accounts as primary public record (no blog posts exist for these; they are documented from private repo usage) Co-Authored-By: Claude Opus 4.6 (1M context) <tadao@travisfixes.com>
Adds the "When Agents Fail" blog post URL to the Source metadata and a References section in both case studies, as requested by @ofermend in PR review. Co-Authored-By: Tadao <tadao@travisfixes.com>
Address Copilot inline review feedback (PR vectara#28, 2026-04-14): the Agent metadata said "CLI tool" but the narrative explicitly references the VS Code extension performing context compaction. Updated metadata to "VS Code extension" to match the narrative. The CI quota exhaustion case is surface-agnostic and stays as-is. Co-Authored-By: Tadao <tadao@travisfixes.com>
8b824b6 to
1e3d134
Compare
|
Addressed Copilot's two inline notes (1e3d134): metadata in the context-compaction case now says VS Code extension; PR description year range updated to 2024-2026. @ofermend, the rest of your review was addressed in 47b451a and e19851e (br tags, dates, OpenClaw section, blog link, References). Sibling #27 merged yesterday. Rebased on current main. |
Summary
Test plan
🤖 Generated with Claude Code