Skip to content

Safeguard CI artifact downloads against disk space exhaustion #183

Description

@qinqon

Problem

Issue #177 (merged as PR #178) improves CI analysis by instructing the agent to download build logs and artifacts. However, for Prow-based projects (hypershift, kubernetes-nmstate, openshift/release), CI artifacts stored in GCS can be massive:

Prow artifact type Typical size
Build logs 1-10 MB
JUnit XML 100 KB - 1 MB
Must-gather (management cluster) 100-500 MB
Must-gather (hosted cluster) 100-300 MB
Full Prow job artifacts 200 MB - 1 GB+

Oompa runs on a VM where `/tmp` is a 16 GB tmpfs (RAM-backed). Oompa worktrees already consume 3.6 GB. If the agent downloads Prow artifacts for multiple failing checks across multiple projects, it could consume 4+ GB in a single poll cycle, potentially crashing the system.

Risk scenario

3 Prow projects x 3 failing checks x 500 MB must-gather = 4.5 GB per cycle. Without cleanup, artifacts accumulate.

Proposed Safeguards

1. Only download logs, not full artifacts by default

The prompt should instruct the agent to:

  • Use `gh run view --log-failed` for GitHub Actions (tiny, ~8 KB)
  • Use `gh api .../jobs/ID/logs` for specific job logs (< 2 MB)
  • NOT download full Prow artifacts (must-gather, dump archives, screenshots)
  • Only download JUnit XML and build-log.txt for Prow jobs (< 10 MB total)

2. Download budget per investigation

Cap total downloads per check investigation at 100 MB. If an artifact exceeds the budget, skip it with a log warning.

3. Cleanup after each investigation

Delete all downloaded artifacts immediately after the agent completes its analysis. Don't accumulate across cycles.

4. Disk space check before downloading

Before any artifact download, check available disk space. If below 2 GB available, skip downloads entirely and fall back to analyzing the `CheckRun.Output` text only (current behavior).

5. Configurable artifact download toggle

Add a config option to control artifact downloading:
```yaml
prs:

  • watch: [6252]
    download-ci-artifacts: false # default: false for safety
    ```

Only enable for projects where you trust the artifact sizes.

6. GitHub Actions vs Prow distinction

CI system Download approach Risk
GitHub Actions `gh run view --log-failed` Very low (KB)
GitHub Actions artifacts `gh run download` Low (usually 0, some repos a few MB)
Prow build-log.txt `gcloud storage cp .../build-log.txt` Low (1-10 MB)
Prow JUnit XML `gcloud storage cp .../junit*.xml` Low (< 1 MB)
Prow must-gather `gcloud storage cp .../must-gather.tar` HIGH (100-500 MB) -- never download
Prow full artifacts `gcloud storage cp -r .../artifacts/` VERY HIGH (1 GB+) -- never download

Current State

The merged PR #178 updated the CI analysis prompt to mention `gh run download` and artifact inspection. The prompt doesn't currently have safeguards against downloading massive artifacts. The agent uses its own judgment about what to download, which may lead to downloading must-gather archives on Prow-based projects.

Implementation

Prompt changes (`pkg/agent/prompt.go`)

Add explicit size constraints and prohibitions to the CI analysis prompt:

```
IMPORTANT: Disk space is limited. When downloading CI artifacts:

  • NEVER download must-gather archives, dump tarballs, or full artifact directories
  • Only download: build-log.txt, junit*.xml, and specific small log files
  • If a single file is > 50 MB, skip it
  • Clean up all downloaded files when your analysis is complete
  • Prefer gh run view --log-failed over gh run download (much smaller)
    ```

Runtime safeguards (`pkg/agent/ci.go`)

  • Check available disk space (`df`) before any CI investigation that may download artifacts
  • Set a download timeout
  • Clean up `/tmp/ci-artifacts` after each investigation
  • Log a warning if disk space drops below 2 GB threshold

Config changes

  • Add `download-ci-artifacts` boolean to PRsRoleConfig (default: false)
  • When false, the prompt tells the agent not to download artifacts at all
  • When true, the prompt allows downloads with the size constraints above

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions