Skip to content

Add context compaction and CI quota exhaustion case studies#28

Open
travisbreaks wants to merge 4 commits into
vectara:mainfrom
travisbreaks:feature/case-3-and-5-operator-failures
Open

Add context compaction and CI quota exhaustion case studies#28
travisbreaks wants to merge 4 commits into
vectara:mainfrom
travisbreaks:feature/case-3-and-5-operator-failures

Conversation

@travisbreaks

@travisbreaks travisbreaks commented Mar 17, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Adds two new first-person operator case studies from production Claude Code use (2024-2026)
  • Context Compaction Drops Operational State: lossy context window summarization silently discards safety constraints mid-session, with a near-miss on data exposure
  • CI Quota Exhaustion via Fix-Push-Fail Loop: agent uses remote CI as its linting tool instead of local equivalents, burning all 2,000 GitHub Actions free-tier minutes in one session
  • Adds README entries under "Autonomous Agent Failures" for both cases
  • Adds example entries to three failure mode files: verification-termination.md, plan-generation.md, tool-use.md

Test plan

  • Verify both case study files follow the existing template format (Incident Overview, What Happened, Root Cause Analysis, Impact, Mitigation, Lessons Learned)
  • Verify README entries link correctly to the new case study files
  • Verify failure mode file entries link correctly to the new case studies
  • Verify no broken internal links
  • Verify content is consistent with the repo's style and tone

🤖 Generated with Claude Code

@travisbreaks travisbreaks force-pushed the feature/case-3-and-5-operator-failures branch from 33e69f2 to a379cc9 Compare March 19, 2026 20:25
@ofermend

ofermend commented Apr 2, 2026

Copy link
Copy Markdown
Contributor

Thanks for this additional contribution — the failure modes you're documenting here (context compaction state loss, CI resource exhaustion) are genuinely valuable and under-documented.

  1. As with the other PR, can you please document this in your personal blog and link to it? Since these are first-person accounts from a private repo, we're accepting your blog post as the external source in this case as well. Of course if there are other external sources documenting similar failure modes - that would be better.

  2. As with the other PR, please use
    tags between metadata fields so that the case study renders well as markdown (see other case studies), and align with the field structure in other case studies.

  3. For the "context compaction" - this overlaps with the existing "OpenClaw email deletion case study" that's already there, although your case study adds the angle of this happening gradually mid-session (vs. OpenClaw's single-task failure), and the "near-miss" framing is distinct. Do you think we can merge them to avoid duplication (maybe add to the OpenClaw case study with these additional nuanced risks)? If not, please explicitly acknowledge the OpenClaw precedent and clarify what new insight this case adds.

  4. If there is a more specific date you can reference that would be great (as opposed to just "2025")

travisbreaks added a commit to travisbreaks/awesome-agent-failures that referenced this pull request Apr 5, 2026
- Add <br> tags between metadata fields in case study overviews for
  proper markdown rendering (matching existing case study format)
- Update dates from generic "2025" to specific months: October 2025
  for CI quota exhaustion, September 2025 through March 2026 for
  context compaction (recurring)
- Add explicit acknowledgment of OpenClaw email deletion case study
  overlap in context compaction doc, with a new section explaining
  how the two are complementary (acute single-event vs. chronic
  recurring state loss)
- Update failure-mode entry headings with specific dates
- Source fields note first-person operational accounts as primary
  public record (no blog posts exist for these; they are documented
  from private repo usage)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
travisbreaks added a commit to travisbreaks/awesome-agent-failures that referenced this pull request Apr 5, 2026
- Add <br> tags between metadata fields in case study overviews for
  proper markdown rendering (matching existing case study format)
- Update dates from generic "2025" to specific months: October 2025
  for CI quota exhaustion, September 2025 through March 2026 for
  context compaction (recurring)
- Add explicit acknowledgment of OpenClaw email deletion case study
  overlap in context compaction doc, with a new section explaining
  how the two are complementary (acute single-event vs. chronic
  recurring state loss)
- Update failure-mode entry headings with specific dates
- Source fields note first-person operational accounts as primary
  public record (no blog posts exist for these; they are documented
  from private repo usage)

Co-Authored-By: Claude Opus 4.6 (1M context) <tadao@travisfixes.com>
@travisbreaks travisbreaks force-pushed the feature/case-3-and-5-operator-failures branch from 0421e48 to b484713 Compare April 5, 2026 01:03
@ofermend ofermend requested a review from Copilot April 14, 2026 17:11

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds two new operator-reported Claude Code production case studies and links them into the repo’s “Autonomous Agent Failures” index and relevant failure-mode pages.

Changes:

  • Added two new case study documents covering (1) context compaction state loss and (2) CI minutes exhaustion via fix-push-fail loops.
  • Linked the new case studies from the README and added short “Real-World Examples” entries under relevant failure-mode docs.
  • Cross-referenced the new case studies from plan-generation, tool-use, and verification/termination failure-mode pages.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
docs/failure-modes/verification-termination.md Adds a new “Real-World Example” entry referencing the context compaction case study.
docs/failure-modes/tool-use.md Adds a new “Real-World Example” entry referencing the CI quota exhaustion case study (tool selection framing).
docs/failure-modes/plan-generation.md Adds a new “Real-World Example” entry referencing the CI quota exhaustion case study (planning/loop framing).
docs/case-studies/claude-code-context-compaction.md New case study documenting context compaction dropping operational state and safety constraints over long sessions.
docs/case-studies/claude-code-ci-quota-exhaustion.md New case study documenting CI minutes exhaustion via iterative push/CI verification loops.
README.md Adds two new bullets under “Autonomous Agent Failures” linking to the new case studies.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +5 to +6
**Agent**: Claude Code (CLI tool, Claude Opus model)<br>
**Operator**: Solo developer, private monorepo with 12+ projects<br>

Copilot AI Apr 14, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent description of the product: the Incident Overview says Claude Code is a “CLI tool”, but the narrative attributes “context compaction” to the VS Code extension. Please clarify which surface is involved (VS Code extension vs CLI), or describe it as “Claude Code (VS Code extension + CLI)” so the case study isn’t internally contradictory.

Copilot uses AI. Check for mistakes.
Comment thread README.md
Comment on lines +71 to +72
- [Context Compaction Drops Operational State](docs/case-studies/claude-code-context-compaction.md) - Claude Code's context window compression silently discards safety constraints and task state mid-session, causing the agent to resume with confidence on incomplete information.
- [CI Quota Exhaustion via Fix-Push-Fail Loop](docs/case-studies/claude-code-ci-quota-exhaustion.md) - Agent used remote CI as its linting tool, pushing 8+ incremental commits that burned 2,000/2,000 GitHub Actions minutes and locked out the entire organization.

Copilot AI Apr 14, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description says these are case studies from 2024–2025, but the added case study date range includes March 2026. Please reconcile the PR description with the actual dates in the linked case studies (either update the description year range or adjust the case study framing).

Copilot uses AI. Check for mistakes.
travisbreaks added a commit to travisbreaks/awesome-agent-failures that referenced this pull request May 1, 2026
Address Copilot inline review feedback (PR vectara#28, 2026-04-14):
the Agent metadata said "CLI tool" but the narrative explicitly
references the VS Code extension performing context compaction.
Updated metadata to "VS Code extension" to match the narrative.

The CI quota exhaustion case is surface-agnostic and stays as-is.

Co-Authored-By: Tadao <tadao@travisfixes.com>
travisbreaks and others added 4 commits May 1, 2026 16:00
Two new first-person operator case studies from production Claude Code use:

1. Context Compaction Drops Operational State: documents how lossy context
   window summarization silently discards safety constraints and task state,
   with a near-miss on data exposure when "do not deploy" context was lost.

2. CI Quota Exhaustion via Fix-Push-Fail Loop: documents an agent using
   remote CI as its linting tool (8+ incremental pushes) instead of local
   equivalents, burning 2,000/2,000 GitHub Actions minutes and locking out
   the entire organization.

Both cases include README entries under Autonomous Agent Failures and
example entries in the relevant failure mode files (verification-termination,
plan-generation, tool-use).

Co-Authored-By: Claude Opus 4.6 (1M context) <tadao@travisfixes.com>
- Add <br> tags between metadata fields in case study overviews for
  proper markdown rendering (matching existing case study format)
- Update dates from generic "2025" to specific months: October 2025
  for CI quota exhaustion, September 2025 through March 2026 for
  context compaction (recurring)
- Add explicit acknowledgment of OpenClaw email deletion case study
  overlap in context compaction doc, with a new section explaining
  how the two are complementary (acute single-event vs. chronic
  recurring state loss)
- Update failure-mode entry headings with specific dates
- Source fields note first-person operational accounts as primary
  public record (no blog posts exist for these; they are documented
  from private repo usage)

Co-Authored-By: Claude Opus 4.6 (1M context) <tadao@travisfixes.com>
Adds the "When Agents Fail" blog post URL to the Source metadata and
a References section in both case studies, as requested by @ofermend
in PR review.

Co-Authored-By: Tadao <tadao@travisfixes.com>
Address Copilot inline review feedback (PR vectara#28, 2026-04-14):
the Agent metadata said "CLI tool" but the narrative explicitly
references the VS Code extension performing context compaction.
Updated metadata to "VS Code extension" to match the narrative.

The CI quota exhaustion case is surface-agnostic and stays as-is.

Co-Authored-By: Tadao <tadao@travisfixes.com>
@travisbreaks travisbreaks force-pushed the feature/case-3-and-5-operator-failures branch from 8b824b6 to 1e3d134 Compare May 1, 2026 21:23
@travisbreaks

Copy link
Copy Markdown
Contributor Author

Addressed Copilot's two inline notes (1e3d134): metadata in the context-compaction case now says VS Code extension; PR description year range updated to 2024-2026. @ofermend, the rest of your review was addressed in 47b451a and e19851e (br tags, dates, OpenClaw section, blog link, References). Sibling #27 merged yesterday. Rebased on current main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants