Add context compaction and CI quota exhaustion case studies by travisbreaks · Pull Request #28 · vectara/awesome-agent-failures

travisbreaks · 2026-03-17T05:44:29Z

Summary

Adds two new first-person operator case studies from production Claude Code use (2024-2026)
Context Compaction Drops Operational State: lossy context window summarization silently discards safety constraints mid-session, with a near-miss on data exposure
CI Quota Exhaustion via Fix-Push-Fail Loop: agent uses remote CI as its linting tool instead of local equivalents, burning all 2,000 GitHub Actions free-tier minutes in one session
Adds README entries under "Autonomous Agent Failures" for both cases
Adds example entries to three failure mode files: verification-termination.md, plan-generation.md, tool-use.md

Test plan

Verify both case study files follow the existing template format (Incident Overview, What Happened, Root Cause Analysis, Impact, Mitigation, Lessons Learned)
Verify README entries link correctly to the new case study files
Verify failure mode file entries link correctly to the new case studies
Verify no broken internal links
Verify content is consistent with the repo's style and tone

🤖 Generated with Claude Code

ofermend · 2026-04-02T14:48:35Z

Thanks for this additional contribution — the failure modes you're documenting here (context compaction state loss, CI resource exhaustion) are genuinely valuable and under-documented.

As with the other PR, can you please document this in your personal blog and link to it? Since these are first-person accounts from a private repo, we're accepting your blog post as the external source in this case as well. Of course if there are other external sources documenting similar failure modes - that would be better.
As with the other PR, please use
tags between metadata fields so that the case study renders well as markdown (see other case studies), and align with the field structure in other case studies.
For the "context compaction" - this overlaps with the existing "OpenClaw email deletion case study" that's already there, although your case study adds the angle of this happening gradually mid-session (vs. OpenClaw's single-task failure), and the "near-miss" framing is distinct. Do you think we can merge them to avoid duplication (maybe add to the OpenClaw case study with these additional nuanced risks)? If not, please explicitly acknowledge the OpenClaw precedent and clarify what new insight this case adds.
If there is a more specific date you can reference that would be great (as opposed to just "2025")

- Add <br> tags between metadata fields in case study overviews for proper markdown rendering (matching existing case study format) - Update dates from generic "2025" to specific months: October 2025 for CI quota exhaustion, September 2025 through March 2026 for context compaction (recurring) - Add explicit acknowledgment of OpenClaw email deletion case study overlap in context compaction doc, with a new section explaining how the two are complementary (acute single-event vs. chronic recurring state loss) - Update failure-mode entry headings with specific dates - Source fields note first-person operational accounts as primary public record (no blog posts exist for these; they are documented from private repo usage) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add <br> tags between metadata fields in case study overviews for proper markdown rendering (matching existing case study format) - Update dates from generic "2025" to specific months: October 2025 for CI quota exhaustion, September 2025 through March 2026 for context compaction (recurring) - Add explicit acknowledgment of OpenClaw email deletion case study overlap in context compaction doc, with a new section explaining how the two are complementary (acute single-event vs. chronic recurring state loss) - Update failure-mode entry headings with specific dates - Source fields note first-person operational accounts as primary public record (no blog posts exist for these; they are documented from private repo usage) Co-Authored-By: Claude Opus 4.6 (1M context) <tadao@travisfixes.com>

Copilot

Pull request overview

Adds two new operator-reported Claude Code production case studies and links them into the repo’s “Autonomous Agent Failures” index and relevant failure-mode pages.

Changes:

Added two new case study documents covering (1) context compaction state loss and (2) CI minutes exhaustion via fix-push-fail loops.
Linked the new case studies from the README and added short “Real-World Examples” entries under relevant failure-mode docs.
Cross-referenced the new case studies from plan-generation, tool-use, and verification/termination failure-mode pages.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
docs/failure-modes/verification-termination.md	Adds a new “Real-World Example” entry referencing the context compaction case study.
docs/failure-modes/tool-use.md	Adds a new “Real-World Example” entry referencing the CI quota exhaustion case study (tool selection framing).
docs/failure-modes/plan-generation.md	Adds a new “Real-World Example” entry referencing the CI quota exhaustion case study (planning/loop framing).
docs/case-studies/claude-code-context-compaction.md	New case study documenting context compaction dropping operational state and safety constraints over long sessions.
docs/case-studies/claude-code-ci-quota-exhaustion.md	New case study documenting CI minutes exhaustion via iterative push/CI verification loops.
README.md	Adds two new bullets under “Autonomous Agent Failures” linking to the new case studies.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-14T17:14:13Z

+**Agent**: Claude Code (CLI tool, Claude Opus model)<br>
+**Operator**: Solo developer, private monorepo with 12+ projects<br>


Inconsistent description of the product: the Incident Overview says Claude Code is a “CLI tool”, but the narrative attributes “context compaction” to the VS Code extension. Please clarify which surface is involved (VS Code extension vs CLI), or describe it as “Claude Code (VS Code extension + CLI)” so the case study isn’t internally contradictory.

Copilot · 2026-04-14T17:14:14Z

+- [Context Compaction Drops Operational State](docs/case-studies/claude-code-context-compaction.md) - Claude Code's context window compression silently discards safety constraints and task state mid-session, causing the agent to resume with confidence on incomplete information.
+- [CI Quota Exhaustion via Fix-Push-Fail Loop](docs/case-studies/claude-code-ci-quota-exhaustion.md) - Agent used remote CI as its linting tool, pushing 8+ incremental commits that burned 2,000/2,000 GitHub Actions minutes and locked out the entire organization.


PR description says these are case studies from 2024–2025, but the added case study date range includes March 2026. Please reconcile the PR description with the actual dates in the linked case studies (either update the description year range or adjust the case study framing).

Address Copilot inline review feedback (PR vectara#28, 2026-04-14): the Agent metadata said "CLI tool" but the narrative explicitly references the VS Code extension performing context compaction. Updated metadata to "VS Code extension" to match the narrative. The CI quota exhaustion case is surface-agnostic and stays as-is. Co-Authored-By: Tadao <tadao@travisfixes.com>

Two new first-person operator case studies from production Claude Code use: 1. Context Compaction Drops Operational State: documents how lossy context window summarization silently discards safety constraints and task state, with a near-miss on data exposure when "do not deploy" context was lost. 2. CI Quota Exhaustion via Fix-Push-Fail Loop: documents an agent using remote CI as its linting tool (8+ incremental pushes) instead of local equivalents, burning 2,000/2,000 GitHub Actions minutes and locking out the entire organization. Both cases include README entries under Autonomous Agent Failures and example entries in the relevant failure mode files (verification-termination, plan-generation, tool-use). Co-Authored-By: Claude Opus 4.6 (1M context) <tadao@travisfixes.com>

- Add <br> tags between metadata fields in case study overviews for proper markdown rendering (matching existing case study format) - Update dates from generic "2025" to specific months: October 2025 for CI quota exhaustion, September 2025 through March 2026 for context compaction (recurring) - Add explicit acknowledgment of OpenClaw email deletion case study overlap in context compaction doc, with a new section explaining how the two are complementary (acute single-event vs. chronic recurring state loss) - Update failure-mode entry headings with specific dates - Source fields note first-person operational accounts as primary public record (no blog posts exist for these; they are documented from private repo usage) Co-Authored-By: Claude Opus 4.6 (1M context) <tadao@travisfixes.com>

@ofermend

Adds the "When Agents Fail" blog post URL to the Source metadata and a References section in both case studies, as requested by @ofermend in PR review. Co-Authored-By: Tadao <tadao@travisfixes.com>

Address Copilot inline review feedback (PR vectara#28, 2026-04-14): the Agent metadata said "CLI tool" but the narrative explicitly references the VS Code extension performing context compaction. Updated metadata to "VS Code extension" to match the narrative. The CI quota exhaustion case is surface-agnostic and stays as-is. Co-Authored-By: Tadao <tadao@travisfixes.com>

travisbreaks · 2026-05-01T21:29:27Z

Addressed Copilot's two inline notes (1e3d134): metadata in the context-compaction case now says VS Code extension; PR description year range updated to 2024-2026. @ofermend, the rest of your review was addressed in 47b451a and e19851e (br tags, dates, OpenClaw section, blog link, References). Sibling #27 merged yesterday. Rebased on current main.

travisbreaks force-pushed the feature/case-3-and-5-operator-failures branch from 33e69f2 to a379cc9 Compare March 19, 2026 20:25

travisbreaks force-pushed the feature/case-3-and-5-operator-failures branch from 0421e48 to b484713 Compare April 5, 2026 01:03

ofermend requested a review from Copilot April 14, 2026 17:11

Copilot started reviewing on behalf of ofermend April 14, 2026 17:11 View session

Copilot AI reviewed Apr 14, 2026

View reviewed changes

travisbreaks and others added 4 commits May 1, 2026 16:00

fix: add blog link and References section per maintainer request

e19851e

Adds the "When Agents Fail" blog post URL to the Source metadata and a References section in both case studies, as requested by @ofermend in PR review. Co-Authored-By: Tadao <tadao@travisfixes.com>

travisbreaks force-pushed the feature/case-3-and-5-operator-failures branch from 8b824b6 to 1e3d134 Compare May 1, 2026 21:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add context compaction and CI quota exhaustion case studies#28

Add context compaction and CI quota exhaustion case studies#28
travisbreaks wants to merge 4 commits into
vectara:mainfrom
travisbreaks:feature/case-3-and-5-operator-failures

travisbreaks commented Mar 17, 2026 •

edited

Loading

Uh oh!

ofermend commented Apr 2, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 14, 2026

Uh oh!

Copilot AI Apr 14, 2026

Uh oh!

travisbreaks commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		Agent: Claude Code (CLI tool, Claude Opus model)<br>
		Operator: Solo developer, private monorepo with 12+ projects<br>

		- [Context Compaction Drops Operational State](docs/case-studies/claude-code-context-compaction.md) - Claude Code's context window compression silently discards safety constraints and task state mid-session, causing the agent to resume with confidence on incomplete information.
		- [CI Quota Exhaustion via Fix-Push-Fail Loop](docs/case-studies/claude-code-ci-quota-exhaustion.md) - Agent used remote CI as its linting tool, pushing 8+ incremental commits that burned 2,000/2,000 GitHub Actions minutes and locked out the entire organization.

Conversation

travisbreaks commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

ofermend commented Apr 2, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

travisbreaks commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

travisbreaks commented Mar 17, 2026 •

edited

Loading