Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 14 additions & 1 deletion hyperfleet/docs/release/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
Status: Active
Owner: HyperFleet Team
Last Updated: 2026-05-11
Last Updated: 2026-06-22
---

# Release Documentation
Expand All @@ -17,6 +17,19 @@ Last Updated: 2026-05-11
| [ADR 0014](../../adrs/0014-konflux-build-and-release.md) | Decision record for adopting Konflux |
| [ADR 0016](../../adrs/0016-helm-oci-distribution.md) | Decision record for Helm OCI distribution |

## Operations

Engineer-facing operational docs — what to read *during* a release or *when something fails*. The `operations/` subdirectory.

| Document | Purpose |
|----------|---------|
| [Release Runbook](./operations/release-runbook.md) | Copy-paste command sequence for RC → GA, fix cycle, and hotfix |
| [Pipeline Anatomy](./operations/pipeline-anatomy.md) | Reading a Konflux PipelineRun, the build-vs-release distinction, where to look in the UI |
| [Debugging](./operations/debugging.md) | Failure-mode runbook organized by symptom |
| [Configuration Map](./operations/configuration-map.md) | Every release-related config file across the six repos — what it does, who reviews it |
| [Notifications](./operations/notifications.md) | Slack `#hyperfleet-e2e-status`, Pyxis, GitHub PR checks, and Prow status signals |
| [Support](./operations/support.md) | Slack channels, JIRA queues, Konflux UI, escalation contacts |

## Prow Test and Release

The `test-release/` subdirectory contains Prow-specific docs for CI job setup and E2E testing infrastructure.
Expand Down
133 changes: 133 additions & 0 deletions hyperfleet/docs/release/operations/configuration-map.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
---
Status: Active
Owner: HyperFleet Team
Last Updated: 2026-06-22
---

# Configuration Map

> **Audience:** HyperFleet engineers who need to find, read, or change a release-related config file. Tells you which repo holds what, what it does, and who reviews changes.

The HyperFleet Konflux setup is split across six repos. This page is the index. For the WHY behind the design, see [Konflux Release Pipeline Design](../konflux-release-pipeline-design.md) and [ADR 0014](../../../adrs/0014-konflux-build-and-release.md).

---

## At a glance

```mermaid
flowchart LR
subgraph github["GitHub: openshift-hyperfleet"]
api["hyperfleet-api"]
sen["hyperfleet-sentinel"]
adp["hyperfleet-adapter"]
rel["hyperfleet-release"]
end
subgraph gitlab["GitLab: releng"]
krd["konflux-release-data"]
end
subgraph os["GitHub: openshift"]
ocr["openshift/release"]
end
api -- ".tekton/*" --> kf["Konflux<br/>kflux-prd-rh02"]
sen -- ".tekton/*" --> kf
adp -- ".tekton/*" --> kf
krd -- RPA, EC policy, tenant --> kf
kf --> quay["Quay.io"]
quay --> ocr
rel -- RC E2E trigger --> ocr
```

---

## `konflux-release-data` (GitLab)

URL: <https://gitlab.cee.redhat.com/releng/konflux-release-data>. This is the GitOps source of truth for everything the Konflux platform applies to our tenant. Changes go via MR; ArgoCD syncs them onto `kflux-prd-rh02`. CI runs `tox` — see the repo's `CLAUDE.md` for the lint/test matrix.

| File | Purpose |
|------|---------|
| `config/kflux-prd-rh02.0fk9.p1/service/ReleasePlanAdmission/hyperfleet/hyperfleet.yaml` | RPA for the three component images. Maps Snapshots to Quay paths, applies tags, references the Slack webhook secret. |
| `config/kflux-prd-rh02.0fk9.p1/service/ReleasePlanAdmission/hyperfleet/hyperfleet-charts.yaml` | RPA for Helm chart OCI releases. Uses `push-to-external-registry`; targets `…/hyperfleet-api-chart`. |
| `constraints/service/hyperfleet.yaml` | JSON-schema constraint that validates our RPAs (origin, policy, registry URL prefix, pipeline source, service account). |
| `config/kflux-prd-rh02.0fk9.p1/service/EnterpriseContractPolicy/registry-hyperfleet-chart-prod.yaml` | EC policy for chart releases. Derived from `app-interface-standard`; excludes container-only checks. |
| `tenants-config/cluster/kflux-prd-rh02/tenants/hyperfleet-tenant/` | Tenant namespace, RBAC, Application (`hyperfleet`), three Components, ReleasePlan. Source files only — never edit `auto-generated/`. |
| `CODEOWNERS` | Approval routing. HyperFleet paths require team approval. |

The container RPA uses policy `app-interface-standard`; the chart RPA uses `registry-hyperfleet-chart-prod`. Both auto-release (`block-releases: false`). Service account for releases is `release-app-interface-prod`.

---

## Component repos: `hyperfleet-api`, `hyperfleet-sentinel`, `hyperfleet-adapter`

Each repo has the same shape. Replace `<svc>` with `api`, `sentinel`, or `adapter`.

| File | Triggers on | Notes |
|------|-------------|-------|
| `.tekton/hyperfleet-<svc>-push.yaml` | Merge to `main` | Builds with `APP_VERSION=0.0.0-dev` default. Powers nightly. |
| `.tekton/hyperfleet-<svc>-tag.yaml` | Push of a semver tag (`vX.Y.Z` or `vX.Y.Z-rcN`) | CEL match in PaC annotation — see [Pipeline Anatomy](./pipeline-anatomy.md#what-triggers-what) for the exact pattern. `extract-version` task strips `refs/tags/v` → injects `APP_VERSION`. |
| `.tekton/hyperfleet-<svc>-chart-push.yaml` | Merge to `main` (chart path) | Builds and releases the component's Helm chart alongside the image, via the chart RPA. |
| `Dockerfile` | — | Contract: `ARG APP_VERSION="0.0.0-dev"` and `LABEL version="${APP_VERSION}"`. The label is what the RPA's `{{ labels.version }}` template reads. |
Comment thread
coderabbitai[bot] marked this conversation as resolved.

If you change the CEL regex or the Dockerfile `APP_VERSION` contract, you change the release flow. See [Pipeline Anatomy](./pipeline-anatomy.md) for the version chain.

---

## Helm chart artifacts

Each component repo ships its own Helm chart through its `.tekton/hyperfleet-<svc>-chart-push.yaml` pipeline, and the chart artifact is released alongside the image via the `hyperfleet-charts.yaml` RPA.

For the chart distribution design see [Helm OCI Distribution Design](../helm-oci-distribution-design.md) and [ADR 0016](../../../adrs/0016-helm-oci-distribution.md).

---

## `hyperfleet-release`

The release coordination repo. Holds the manifest that pins which component versions form a release candidate or GA.

| File | Purpose |
|------|---------|
| `RELEASE_MANIFEST.yaml` | Pinned component versions for the current release. Schema: `release`, `e2e_ref`, `components.{hyperfleet-api,hyperfleet-sentinel,hyperfleet-adapter}`. |
| `scripts/trigger-rc-e2e.sh` | Reads the manifest, verifies each image exists in Quay, calls Gangway to start the Prow RC E2E job. |
| `scripts/README.md` | Prerequisites and dry-run usage for the trigger script. |
| `RELEASE` | Top-level release status / notes (per-release). |

The manifest is the source of truth for *which combination of images is under test* — Konflux Snapshots are the build source of truth.

---

## `openshift/release` (Prow)

| Path | Purpose |
|------|---------|
| `ci-operator/config/openshift-hyperfleet/` | ci-operator configs per repo (PR presubmits, nightly E2E, RC E2E). |
| `ci-operator/jobs/openshift-hyperfleet/` | Generated Prow job YAML (regenerated from configs). |
| `ci-operator/step-registry/hyperfleet/` | Reusable Hyperfleet steps (commitlint, risk-scorer). |
| `ci-operator/step-registry/openshift-hyperfleet/chart-deployment/` | Chart deployment step for E2E. |

For how to add a new E2E job or modify the Gangway trigger flow, see [Add Hyperfleet E2E CI Job in Prow](../test-release/add-hyperfleet-e2e-ci-job-in-prow.md) and [Trigger HyperFleet E2E Jobs via Gangway API](../test-release/trigger-e2e-jobs-via-gangway.md).

---

## Konflux cluster

Things that exist on the cluster, not in a repo. The repos above declare them via GitOps.

| Resource | Location |
|----------|----------|
| Konflux UI | <https://konflux-ui.apps.kflux-prd-rh02.0fk9.p1.openshiftapps.com/> |
| Tenant namespace | `hyperfleet-tenant` on `kflux-prd-rh02` |
| Application | `hyperfleet` (single app, three Components) |
| Slack webhook secret | `hyperfleet-slack-webhook-notification-secret` (key `webhook-url`) in `rhtap-releng-tenant`. Created and rotated by RelEng. |
| Pyxis secret | Shared RelEng secret in `rhtap-releng-tenant`. |

---

## Who reviews what

| Area | Reviewer source |
|------|-----------------|
| `konflux-release-data` HyperFleet paths | `CODEOWNERS` in that repo — HyperFleet team |
| Component repo `.tekton/` and `Dockerfile` | Repo `CODEOWNERS` / `OWNERS` |
| `openshift/release` HyperFleet configs | `OWNERS` under `ci-operator/config/openshift-hyperfleet/` |
| Cluster-side resources (secrets in `rhtap-releng-tenant`) | RelEng — coordinate via `#forum-konflux-release` |

For escalation contacts and Slack channels see [Support](./support.md).
158 changes: 158 additions & 0 deletions hyperfleet/docs/release/operations/debugging.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
---
Status: Active
Owner: HyperFleet Team
Last Updated: 2026-06-22
---

# Debugging: When the Release Pipeline Breaks

> **Audience:** HyperFleet engineers who pushed a tag, merged a PR, or triggered an RC E2E and something didn't happen. Organized by symptom.

Each entry: what you see → where to look → what to check → who to ping. For the happy-path mental model, read [Pipeline Anatomy](./pipeline-anatomy.md) first. For where each config lives, see [Configuration Map](./configuration-map.md).

---

## I pushed a tag and no PipelineRun started

**Where to look**

- Konflux UI → Applications → `hyperfleet` → Pipeline runs (filter by component).
- The component repo's GitHub: **Actions** tab is NOT where PaC reports; check the commit status (green check / yellow dot near the commit).

**What to check**

1. **The tag actually pushed.** `git push origin v0.3.0-rc1` is one tag; `git push --tags` pushes everything. Verify with `git ls-remote --tags origin | grep v0.3.0-rc1`.
2. **The CEL regex matches your tag.** Open `.tekton/hyperfleet-<svc>-tag.yaml` and look for the `on-cel-expression` annotation. It must match `^refs/tags/v[0-9]+\.[0-9]+\.[0-9]+(-rc[0-9]+)?$`. A tag like `v0.3` or `release-0.3-rc1` won't match.
3. **The PaC GitHub App is installed on the org.** Visit <https://github.com/organizations/openshift-hyperfleet/settings/installations> and confirm "Red Hat Konflux" has access to the component repo.
4. **The Component is wired in `konflux-release-data`.** `tenants-config/cluster/kflux-prd-rh02/tenants/hyperfleet-tenant/applications/hyperfleet/components/<svc>.yaml` should have `build.appstudio.openshift.io/request: configure-pac`.

**Who to ping:** PaC install issues — `#forum-konflux-release` on Red Hat Slack.

---

## A build PipelineRun failed

**Where to look**

- The failing task in the Konflux UI → **Logs** tab.

**What to check**

1. **Is it advisory or blocking?** See the task table in [Pipeline Anatomy](./pipeline-anatomy.md#the-build-dag). Under `app-interface-standard`, scan tasks (Snyk, Clair, ClamAV, shell/unicode SAST) are advisory — they surface findings but don't block the release. A red badge on `sast-snyk-check` does **not** stop the build pipeline from being marked failed in the UI even when the EC verdict treats it as advisory — read the actual error.
2. **`build-container` failed → Dockerfile issue.** Local reproduce: `buildah build --build-arg APP_VERSION=0.3.0-rc1 .` from a clean checkout of the tag.
3. **`extract-version` failed.** The task expects `target_branch` to start with `refs/tags/v`. A non-tag trigger or a malformed tag breaks it.
4. **`prefetch-dependencies` failed.** Cachi2 — non-hermetic today, so most failures are transient. Re-run the PipelineRun from the UI.
5. **Snyk SARIF inspection.** Pull the SARIF from the task results; the UI columns mislead. See the spike note `Verifying Snyk SAST Outputs` for the skopeo-based extraction.

**Who to ping:** Konflux build infra — `#konflux-users`. EC policy questions — `#forum-conforma`.

---

## Build green but image not in Quay

The classic. The build PipelineRun and the release PipelineRun are separate. Build green ≠ image pushed.

**Where to look**

- Konflux UI → **Releases** tab (NOT pipeline runs). There should be a Release CR created from the Snapshot.
- If the Release exists but is incomplete, click into it and inspect the managed PipelineRun (`rh-push-to-external-registry`).

**What to check**

1. **Snapshot created.** Build run → Snapshots → the Snapshot for this commit.
2. **EC verdict.** Snapshot → Enterprise Contract result. Failed EC → release will not auto-fire. Read the violation messages; if they're advisory-tagged, they shouldn't block (`app-interface-standard` excludes most).
3. **Release CR exists.** If no Release was created, the Snapshot's `auto-release` is off — check the ReleasePlan label `release.appstudio.openshift.io/auto-release: 'true'`.
4. **`rh-push-to-external-registry` ran and failed.** Open the managed PipelineRun; most failures are missing secrets (see next section).
5. **`skopeo list-tags` empty 60s after release run finishes.** Almost always a Quay propagation delay — wait two minutes and retry.

**Who to ping:** RPA / managed pipeline behavior — `#forum-konflux-release`.

---

## Release pipeline failed

Once the build Snapshot passes EC, `rh-push-to-external-registry` runs. The common failures:

| Symptom in the managed PipelineRun | Likely cause | Fix |
|------------------------------------|--------------|-----|
| `verify-access` task failed | RPA / ReleasePlan mismatch, or the service account lost Quay push access | Check `serviceAccountName: release-app-interface-prod` in the RPA matches what's provisioned. Ping RelEng. |
| `collect-data` task failed | RPA `data.mapping` references a component not in the Snapshot | Confirm the Application's Components and the RPA mapping list are aligned |
| `push-snapshot` task failed | Quay push auth failed (`konflux-release-service-access-management-token` rotated) | RelEng — `#forum-konflux-release` |
| `create-pyxis-image` task failed | Pyxis secret missing or expired in `rhtap-releng-tenant` | RelEng |
| `slack-notification` task failed | `hyperfleet-slack-webhook-notification-secret` missing, or webhook URL revoked | See [Notifications](./notifications.md#rotating-the-slack-webhook). The release itself usually still completes — the notification is in the `finally` block. |
| EC violation in the release run (different from build EC) | Policy `app-interface-standard` constraint we don't meet | Inspect; if the violation is real-engineering, fix. If it's an exemption candidate, file an exception in `konflux-release-data/exceptions/`. |

---

## Wrong version baked into the image

Symptom: `skopeo inspect …:0.3.0-rc1 | jq -r '.Labels.version'` returns `0.0.0-dev` or empty.

**Where to look**

- The build run's `extract-version` task → **Logs**.
- The build run's `build-container` task → **Logs** (look for `--build-arg APP_VERSION=…`).

**What to check**

1. The `extract-version` output is the version string with the `v` stripped. If it printed `0.0.0-dev`, the trigger was a `push` event on main, not a tag — the run name will be `…-on-push-…`. The build did what it was told.
2. If the trigger was the tag and `extract-version` printed correctly but the LABEL is `0.0.0-dev`: the Dockerfile lost its `ARG APP_VERSION` or its `LABEL version="${APP_VERSION}"`. Re-check the Dockerfile contract — see [Configuration Map: component repos](./configuration-map.md#component-repos-hyperfleet-api-hyperfleet-sentinel-hyperfleet-adapter).
3. Multi-stage Dockerfiles: `ARG APP_VERSION` must be **redeclared in every stage** that references it. Missing redeclaration is the #1 cause.

---

## RC E2E didn't trigger after I tagged hyperfleet-release

**Where to look**

- `hyperfleet-release/scripts/trigger-rc-e2e.sh` — run with `--dry-run` first.
- The Gangway HTTP response from the script's log output.
- The Prow dashboard for our job.

**What to check**

1. **Manifest images exist in Quay.** The script verifies each component image at the version pinned in `RELEASE_MANIFEST.yaml`. If any image is missing, the script bails before calling Gangway. `skopeo list-tags` to confirm.
2. **`oc` token from app.ci is fresh.** The Gangway call needs a token; if it 401s, refresh from <https://oauth-openshift.apps.ci.l2s4.p1.openshiftapps.com/oauth/token/request>.
3. **Job config drift in `openshift/release`.** If the periodic / triggered job name doesn't match what the script sends, Gangway accepts the call but no job appears. Compare against the latest ci-operator config under `ci-operator/config/openshift-hyperfleet/`.
4. **GitHub Action not yet wired.** The interim workflow is manual via the script — there's no auto-trigger on tag push yet (see [HYPERFLEET-1038](https://redhat.atlassian.net/browse/HYPERFLEET-1038)).

For full mechanics see [Trigger HyperFleet E2E Jobs via Gangway API](../test-release/trigger-e2e-jobs-via-gangway.md).

---

## RC E2E ran but pulled the wrong image tags

**Where to look**

- The Prow job → environment variables → `*_IMAGE_TAG`, `*_IMAGE_REPO`.

**What to check**

1. **`RELEASE_MANIFEST.yaml` was current when the script ran.** The script reads the file at invocation time; if you committed the manifest after pushing the tag, the previous version was used.
2. **The script strips the `v` prefix.** Quay tags have no `v`; the manifest entries do. The script handles the conversion — if you bypass the script and call Gangway directly, you have to do this yourself.
3. **Manifest typo.** Component keys must be `hyperfleet-api`, `hyperfleet-sentinel`, `hyperfleet-adapter` exactly.

---

## Enterprise Contract violation

EC runs in two places: in the build PipelineRun's `verify-enterprise-contract` task, and again in the release pipeline. The build-side verdict gates the Snapshot; the release-side verdict gates the push.

**What to check**

1. **Policy applied.** Container RPA uses `app-interface-standard`; chart RPA uses `registry-hyperfleet-chart-prod`. Mismatch → wrong rules applied → spurious failures.
2. **RPA `origin` matches the tenant.** `origin: hyperfleet-tenant`. Required for the constraint to bind.
3. **Constraint regex.** `constraints/service/hyperfleet.yaml` enforces the Quay URL prefix. A typo in the RPA's `mapping.components[].repositories[].url` will trip the constraint at MR-validation time (`tox -e test`), not at runtime.
4. **Genuine policy gap.** If the violation is real and unfixable in the short term, file an exception under `konflux-release-data/exceptions/` with rationale.

---

## Escalation

- General Konflux platform — `#konflux-users` (Red Hat Slack)
- Release pipeline (RPA, managed pipelines) — `#forum-konflux-release`
- Enterprise Contract — `#forum-conforma`
- Hyperfleet release coordination — `#hyperfleet-e2e-status`, then page the Release Owner
- File a Konflux support ticket: project `KFLUXSPRT` on JIRA

See [Support](./support.md) for the full list with links.
Loading