Skip to content

fix: avoid SSA caBundle conflict installing opentelemetry-operator over pre-existing CRDs#23

Merged
prathamesh-sonpatki merged 4 commits into
mainfrom
fix/otel-operator-crd-cainjector-conflict
Jun 1, 2026
Merged

fix: avoid SSA caBundle conflict installing opentelemetry-operator over pre-existing CRDs#23
prathamesh-sonpatki merged 4 commits into
mainfrom
fix/otel-operator-crd-cainjector-conflict

Conversation

@karthikeyangs9

Copy link
Copy Markdown
Contributor

Problem

Customer install of opentelemetry-operator fails on a cluster that already has the OTel CRDs (left by a prior cert-manager-based install). cert-manager-cainjector owns .spec.conversion.webhook.clientConfig.caBundle via server-side apply and re-injects it continuously. Helm v4 applies CRDs with SSA but without --force-conflicts, so the install aborts:

Error: conflict occurred while applying object /opentelemetrycollectors.opentelemetry.io ...:
 conflict with "cert-manager-cainjector" ... .spec.conversion.webhook.clientConfig.caBundle

adopt_otel_crds can't fix this — it can't win a race against an active controller that re-asserts the field.

Why the kube-prometheus pattern doesn't work here

kube-prometheus-stack ships CRDs in the chart's crds/ dir, so helm show crds + --skip-crds works. The opentelemetry-operator chart ships CRDs as templates gated by crds.createhelm show crds is empty and --skip-crds is a no-op. (Confirmed: helm template --set crds.create=false renders 0 CRDs.)

Fix

In install_operator, when OTel CRDs already exist:

  1. Render the CRDs with helm template --include-crds (filtered to CRD docs).
  2. Force-apply them out-of-band (kubectl apply --server-side --force-conflicts) — upgrades schema to the chart version and takes field ownership.
  3. Install Helm with --set crds.create=false so Helm never re-applies the CRDs.

Verification

Live in kind (Helm v4.1.3 + cert-manager, cainjector owning caBundle):

Result
unpatched (HEAD) reproduces the exact cert-manager-cainjector caBundle conflict
patched operator installs cleanly — deployment 1/1 Ready, all 4 CRDs intact
  • Unit: tests/test_install_operator.py 4/4; adopt_otel/adopt_prom 8/8 each; bats unit.bats 27/27
  • Added operator-crd-conflict integration mode; CRD-conflict CI legs bumped to Helm v4 so the SSA codepath is exercised

karthikeyangs9 and others added 4 commits June 1, 2026 12:57
…er pre-existing CRDs

When OTel CRDs already exist and cert-manager-cainjector owns
.spec.conversion.webhook.clientConfig.caBundle (a prior cert-manager-based
install), Helm v4's server-side apply fails:

  conflict with "cert-manager-cainjector" ... .spec.conversion.webhook.clientConfig.caBundle

The opentelemetry-operator chart ships CRDs as templates gated by crds.create
(NOT in the chart's crds/ dir), so the kube-prometheus-stack approach
(helm show crds + --skip-crds) is a no-op here. Instead, render CRDs with
helm template --include-crds, force-apply them out-of-band (upgrades schema and
takes field ownership from cainjector), and install with --set crds.create=false
so Helm never re-applies — and never re-conflicts on — the CRDs.

Verified live in kind (Helm v4 + cert-manager): the unpatched script reproduces
the exact cainjector caBundle conflict; the patched script installs cleanly
(operator 1/1 Ready, all CRDs intact).

- install_operator: crds.create=false + out-of-band CRD render/force-apply
- tests/test_install_operator.py: unit coverage for the codepath
- tests/integration/kind-e2e.sh: operator-crd-conflict mode
- CI: add operator-crd-conflict to matrix; run CRD-conflict legs on Helm v4
Address code review on the cainjector caBundle conflict fix:

- The out-of-band CRD apply ran as a `helm template | awk | kubectl apply
  | grep || true` pipeline. The script uses `set -e` but not `pipefail`,
  so only the trailing `grep ... || true` exit status was checked — a
  failed render or failed apply was masked and the install still proceeded
  to crds.create=false, leaving stale/missing CRDs with no error. Capture
  each stage to a variable and check it explicitly, aborting via log_error
  on render failure, empty CRD set, or apply failure.

- Broaden CRD detection from the single opentelemetrycollectors CRD to any
  *.opentelemetry.io CRD, so a partial pre-existing set still triggers the
  mitigation.

- Correct the stale test module docstring (the fix uses crds.create=false,
  not --skip-crds).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The mock helm previously emitted clean `---`-separated docs with no
`# Source:` comments, so the awk CRD-isolation filter was never tested
against the shape `helm template --include-crds` actually produces.

- Make the mock emit Helm-realistic output: each doc prefixed with a
  `# Source:` comment, no leading `---` on the first doc, two CRDs plus a
  Deployment.
- Capture the manifest stream the out-of-band force-apply receives on
  stdin (mock kubectl) and return it from run_install.
- Add test_force_apply_receives_crds_only: asserts both CRDs are applied
  and the Deployment is filtered out, validating the awk filter end-to-end.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The operator-crd-conflict integration test seeded CRDs from the operator
repo's `main` branch, whose schema can drift from the chart version the
script actually installs. Pin to v0.129.1 — the appVersion of chart
OPERATOR_VERSION=0.92.1 — so the seeded CRDs are the exact schema the
mitigation upgrades from. Comment notes both must be bumped together.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@prathamesh-sonpatki prathamesh-sonpatki merged commit 8e18339 into main Jun 1, 2026
9 checks passed
@prathamesh-sonpatki prathamesh-sonpatki deleted the fix/otel-operator-crd-cainjector-conflict branch June 1, 2026 08:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants