feat: self-healing -- oompa monitors its own logs for recurring errors and creates fix issues

## Summary

Add a self-healing capability where oompa periodically inspects its own structured logs for recurring errors, classifies them, checks for existing duplicate issues, and creates new issues when it detects novel problems. Issues are created **without** the \`good-for-ai\` label -- they require human triage before oompa attempts to fix them.

## Motivation

Several bugs have gone undetected for days because oompa silently retries failing operations without escalating:

- **Corrupted git worktree** on hypershift caused a review retry loop for 5 days, burning ~\$2,000-4,000 in agent invocations (issue #164)
- **Stale review cursor** caused the same reviews to be re-processed every 2 minutes indefinitely
- **Git amend failures** repeated every cycle without any escalation

Oompa already logs these errors via slog, but nobody reads the logs proactively. Oompa should watch its own logs and raise issues when it detects persistent problems.

## Proposed Design

### New role: \`self-check\`

A new role that runs on a schedule (e.g., every 30 minutes or hourly) and:

1. **Scans recent logs** for error/warn-level entries using \`journalctl --user -u oompa.service --since "1h ago"\` or the in-memory event ring buffer
2. **Groups errors by pattern** -- same error message on the same project/PR repeating N times is one problem, not N problems
3. **Classifies severity**:
   - **Critical**: same error repeating every poll cycle (retry loop) -- e.g., git corruption, cursor not advancing
   - **Warning**: intermittent errors (happen sometimes but not every cycle)
   - **Info**: one-off errors that self-resolved
4. **Checks for duplicates** -- before creating an issue, search existing open issues on \`qinqon/oompa\` for the same error pattern. Use the error message signature (first line, stripped of variable parts like SHAs and timestamps) as the search key
5. **Creates an issue** (without \`good-for-ai\` label) with:
   - Error pattern and frequency
   - Affected project/PR
   - Sample log lines
   - Suggested severity
   - Duration of the problem (first occurrence → latest occurrence)

### What it should detect

| Pattern | Example | Severity |
|---------|---------|----------|
| Same error every cycle | \`failed to amend commit\` on every poll | Critical |
| Agent invoked but no cursor advance | Review loop on stale reviews | Critical |
| Worktree operation failures | \`git worktree add\` failing repeatedly | Critical |
| Cost anomaly | Single PR consuming >$10 in a session | Critical |
| Intermittent API errors | GitHub API rate limit or 5xx | Warning |
| Agent timeouts | Agent killed after timeout | Warning |
| One-off failures | Single failed push that succeeds on retry | Info (ignore) |

### Config

\`\`\`yaml
# In the oompa project config or as a global setting
self-check:
  schedule: "*/30 * * * *"  # every 30 minutes, or "hourly"
  lookback: 1h              # how far back to scan logs
  min-repeat: 3             # minimum repetitions to consider it a pattern
  cost-threshold: 10.0      # alert if single PR costs more than this
\`\`\`

### Issue format

\`\`\`markdown
## Self-check: recurring error detected

**Pattern:** \`failed to amend commit\` on openshift/hypershift PR #8365
**Severity:** Critical (repeating every poll cycle)
**First seen:** 2026-05-06T14:00:42Z
**Latest:** 2026-05-11T06:24:04Z (5 days)
**Occurrences:** 3,600+ times
**Estimated cost:** ~\$2,000 (agent invoked each cycle at ~\$0.60)

### Sample log entries

\`\`\`
May 11 05:51:30 ERROR failed to amend commit pr=8365 error="git commit --amend: exit status 1 (stderr: error: invalid object...)"
May 11 05:53:30 ERROR failed to amend commit pr=8365 error="git commit --amend: exit status 1 (stderr: error: invalid object...)"
\`\`\`

### Suggested action

This error indicates a corrupted git worktree. The worktree at \`/tmp/oompa-work/openshift/hypershift/worktrees/...\` may need to be deleted and recreated.
\`\`\`

### Why no \`good-for-ai\` label?

Self-detected issues should be triaged by a human before oompa attempts to fix them. Reasons:

- The fix might require infrastructure changes (deleting worktrees, restarting the service)
- Some errors might be expected/transient and don't need a code fix
- Oompa shouldn't create an infinite loop of self-fixing (detect error → create issue → pick up issue → fail → detect error...)
- Human can add \`good-for-ai\` after reviewing if the fix is appropriate for autonomous implementation

### Duplicate detection

Before creating an issue, search for duplicates:

\`\`\`bash
gh search issues --repo qinqon/oompa "self-check: failed to amend commit" --state open
\`\`\`

If a matching open issue exists:
- Add a comment with the latest occurrence count and time range
- Don't create a duplicate

### Safety constraints

- **Never create issues with \`good-for-ai\` label** -- humans decide what to auto-fix
- **Rate limit issue creation** -- max 1 issue per error pattern per day
- **Don't alert on known-transient errors** -- configurable ignore patterns (e.g., "context canceled" during shutdown)
- **Don't self-reference** -- if the self-check itself errors, log it but don't create an issue about it (prevents infinite loops)

## Implementation sketch

### Log source options

1. **journalctl** -- parse slog output from systemd journal. Works but requires text parsing
2. **Event ring buffer** -- use the existing event system. Only captures events that are emitted, might miss raw error logs
3. **Dedicated error log** -- add a separate error collector that captures all error/warn slog entries in a structured buffer. Most reliable

Option 3 is recommended -- add a \`slog.Handler\` wrapper that captures error-level entries into a ring buffer, alongside the existing text handler.

## Non-goals

- This is NOT about fixing bugs automatically -- it's about detecting and reporting them
- This is NOT a replacement for monitoring/alerting -- it's a lightweight self-awareness feature
- This does NOT modify oompa's behavior when errors are detected -- it only creates issues

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: self-healing -- oompa monitors its own logs for recurring errors and creates fix issues #166

Summary

Motivation

Proposed Design

New role: `self-check`

What it should detect

Config

In the oompa project config or as a global setting

Issue format

Self-check: recurring error detected

Sample log entries

Suggested action

Why no `good-for-ai` label?

Duplicate detection

Safety constraints

Implementation sketch

Log source options

Non-goals

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Pattern	Example	Severity
Same error every cycle	`failed to amend commit` on every poll	Critical
Agent invoked but no cursor advance	Review loop on stale reviews	Critical
Worktree operation failures	`git worktree add` failing repeatedly	Critical
Cost anomaly	Single PR consuming >$10 in a session	Critical
Intermittent API errors	GitHub API rate limit or 5xx	Warning
Agent timeouts	Agent killed after timeout	Warning
One-off failures	Single failed push that succeeds on retry	Info (ignore)

feat: self-healing -- oompa monitors its own logs for recurring errors and creates fix issues #166

Description

Summary

Motivation

Proposed Design

New role: `self-check`

What it should detect

Config

In the oompa project config or as a global setting

Issue format

Self-check: recurring error detected

Sample log entries

Suggested action

Why no `good-for-ai` label?

Duplicate detection

Safety constraints

Implementation sketch

Log source options

Non-goals

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions