Skip to content

OCPCLOUD-2899: Support converting taints between MAPI and CAPI#567

Open
nrb wants to merge 4 commits into
openshift:mainfrom
nrb:OCPCLOUD-2899
Open

OCPCLOUD-2899: Support converting taints between MAPI and CAPI#567
nrb wants to merge 4 commits into
openshift:mainfrom
nrb:OCPCLOUD-2899

Conversation

@nrb

@nrb nrb commented May 20, 2026

Copy link
Copy Markdown
Contributor

This work relies on having CAPI v1.12, though it doesn't look like bumped yet, possibly because of the upstream CAPA bump (kubernetes-sigs/cluster-api-provider-aws#5857) being in progress.

Summary by CodeRabbit

  • New Features

    • Taints are now properly converted and synchronized bidirectionally between machine configurations, with correct effect mapping and propagation handling.
  • Tests

    • Added comprehensive test coverage for taint conversion, including single and multiple taints with proper effect mapping scenarios.
  • Chores

    • Removed unused cluster-api dependency directive from module configuration.

nrb added 4 commits May 20, 2026 14:56
Signed-off-by: Nolan Brubaker <nolan@nbrubaker.com>
Signed-off-by: Nolan Brubaker <nolan@nbrubaker.com>
Signed-off-by: Nolan Brubaker <nolan@nbrubaker.com>
Signed-off-by: Nolan Brubaker <nolan@nbrubaker.com>
@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci-robot

openshift-ci-robot commented May 20, 2026

Copy link
Copy Markdown

@nrb: This pull request references OCPCLOUD-2899 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "5.0.0" version, but no target version was set.

Details

In response to this:

This work relies on having CAPI v1.12, though it doesn't look like bumped yet, possibly because of the upstream CAPA bump (kubernetes-sigs/cluster-api-provider-aws#5857) being in progress.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 20, 2026
@coderabbitai

coderabbitai Bot commented May 20, 2026

Copy link
Copy Markdown

Walkthrough

This PR implements symmetric taint conversion between CAPI and MAPI Machine representations. It removes a module replace directive, adds bidirectional conversion helpers, removes spec.taints from unsupported validation, implements CAPI↔MAPI machine conversion with tests, extends MachineSet conversion, and normalizes the fuzzer to align with conversion expectations.

Changes

Bidirectional Taint Conversion

Layer / File(s) Summary
Module dependency cleanup
go.mod
Removes the sigs.k8s.io/cluster-api v1.11.5 replace directive.
Taint conversion helpers and validation framework
pkg/conversion/capi2mapi/common.go, pkg/conversion/mapi2capi/util.go
Adds convertCAPITaintsToMAPITaints helper to convert CAPI MachineTaint to MAPI corev1.Taint, adds convertMAPITaintsToCAPITaints helper to convert MAPI corev1.Taint to CAPI MachineTaint with Propagation set to Always, and removes spec.taints from unsupported machine fields validation.
CAPI to MAPI machine taint conversion
pkg/conversion/capi2mapi/machine.go, pkg/conversion/capi2mapi/machine_test.go
Implements fromCAPIMachineToMAPIMachine to populate MAPI MachineSpec.Taints from CAPI taints, replacing a prior TODO, and adds tests for zero taints and multi-taint conversion.
Test callback infrastructure for assertions
pkg/conversion/mapi2capi/machine_test.go, pkg/conversion/mapi2capi/machineset_test.go
Updates machine and machineset test harnesses to support optional per-case assertion callbacks that validate converted CAPI objects, enabling taint field and propagation verification.
MAPI to CAPI machine and machineset taint conversion
pkg/conversion/mapi2capi/machine.go, pkg/conversion/mapi2capi/machine_test.go, pkg/conversion/mapi2capi/machineset_test.go
Implements fromMAPIMachineToCAPIMachine to populate CAPI MachineSpec.Taints and adds tests for all three taint effects (NoSchedule, NoExecute, PreferNoSchedule) with Propagation Always, and verifies MachineSet template taint conversion.
Fuzzer taint normalization
pkg/conversion/test/fuzz/fuzz.go
Normalizes fuzzed CAPI Machine taints: empty slices become nil, and each taint's Propagation is set to MachineTaintPropagationAlways to align with conversion expectations.

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 10 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 54.55% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Microshift Test Compatibility ⚠️ Warning New e2e tests reference MachineAPIMigration feature gate without MicroShift protection: machine_migration_test.go and machineset_migration_test.go use unsupported FeatureGate resources. Add [Skipped:MicroShift] or [apigroup:machine.openshift.io] labels to feature-gated tests, as FeatureGate resources are unsupported on MicroShift.
✅ Passed checks (10 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main change: adding support for converting taints between MAPI and CAPI, which is the primary objective of the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All new and modified Ginkgo test names use stable, descriptive static strings. No dynamic content, generated values, timestamps, or identifiers that could change between runs were detected.
Test Structure And Quality ✅ Passed Tests follow Ginkgo best practices: single-responsibility It blocks, meaningful test names, good assertions, no cluster operations needing timeouts, and consistent patterns with existing code.
Single Node Openshift (Sno) Test Compatibility ✅ Passed New tests are unit tests in pkg/conversion/ testing taint conversion logic, and the e2e test validates VAP enforcement only, not taint scheduling behavior requiring multiple nodes.
Topology-Aware Scheduling Compatibility ✅ Passed PR modifies only internal taint conversion utilities (pkg/conversion/*) for machine/node properties, not pod scheduling constraints. No deployments, pod specs, or scheduling rules are introduced.
Ote Binary Stdout Contract ✅ Passed No OTE Binary Stdout Contract violations detected. All modified files contain only conversion logic and tests with no process-level stdout writes, klog misconfiguration, or unsafe I/O.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed New Ginkgo tests use only string literals and Kubernetes taint effects; no IPv4 addresses, IP parsing, external connectivity, or disconnected environment assumptions detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci

openshift-ci Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign racheljpg for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/conversion/capi2mapi/machine.go`:
- Around line 66-67: Detect and reject any CAPI taint with Propagation ==
OnInitialization instead of silently converting it to an Always taint: add
validation before or inside convertCAPITaintsToMAPITaints (or create
validateCAPITaintPropagation called from the caller that currently passes
capiMachine.Spec.Taints) to iterate capiMachine.Spec.Taints, if any
taint.Propagation == OnInitialization return an explicit error naming the
offending taint (key/effect/propagation) so the caller can fail conversion; do
not map OnInitialization to an Always taint or drop propagation silently. Ensure
the conversion call site propagates this error up instead of continuing.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 96e16cba-e3d1-43a4-99c5-0355bcc1384f

📥 Commits

Reviewing files that changed from the base of the PR and between c585235 and 00d7943.

⛔ Files ignored due to path filters (24)
  • go.sum is excluded by !**/*.sum
  • go.work is excluded by !**/*.work
  • go.work.sum is excluded by !**/*.sum
  • vendor/modules.txt is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/cluster-api/api/core/v1beta1/cluster_types.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/cluster-api/api/core/v1beta1/conversion.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/cluster-api/api/core/v1beta1/machine_types.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/cluster-api/api/core/v1beta1/zz_generated.conversion.go is excluded by !**/vendor/**, !vendor/**, !**/zz_generated*
  • vendor/sigs.k8s.io/cluster-api/api/core/v1beta1/zz_generated.openapi.go is excluded by !**/vendor/**, !vendor/**, !**/zz_generated*
  • vendor/sigs.k8s.io/cluster-api/api/core/v1beta2/cluster_types.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/cluster-api/api/core/v1beta2/clusterclass_types.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/cluster-api/api/core/v1beta2/common_types.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/cluster-api/api/core/v1beta2/machine_phase_types.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/cluster-api/api/core/v1beta2/machine_types.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/cluster-api/api/core/v1beta2/machinehealthcheck_types.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/cluster-api/api/core/v1beta2/machineset_types.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/cluster-api/api/core/v1beta2/v1beta1_condition_consts.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/cluster-api/api/core/v1beta2/zz_generated.deepcopy.go is excluded by !**/vendor/**, !vendor/**, !**/zz_generated*
  • vendor/sigs.k8s.io/cluster-api/api/core/v1beta2/zz_generated.openapi.go is excluded by !**/vendor/**, !vendor/**, !**/zz_generated*
  • vendor/sigs.k8s.io/cluster-api/feature/feature.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/cluster-api/internal/contract/controlplane.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/cluster-api/util/conditions/sort.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/cluster-api/util/util.go is excluded by !**/vendor/**, !vendor/**
  • vendor/sigs.k8s.io/cluster-api/util/version/version.go is excluded by !**/vendor/**, !vendor/**
📒 Files selected for processing (9)
  • go.mod
  • pkg/conversion/capi2mapi/common.go
  • pkg/conversion/capi2mapi/machine.go
  • pkg/conversion/capi2mapi/machine_test.go
  • pkg/conversion/mapi2capi/machine.go
  • pkg/conversion/mapi2capi/machine_test.go
  • pkg/conversion/mapi2capi/machineset_test.go
  • pkg/conversion/mapi2capi/util.go
  • pkg/conversion/test/fuzz/fuzz.go
💤 Files with no reviewable changes (1)
  • go.mod

Comment on lines +66 to 67
Taints: convertCAPITaintsToMAPITaints(capiMachine.Spec.Taints),
// ProviderSpec: // ProviderSpec MUST NOT be populated here. It is added later by higher level fuctions.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

In Cluster API v1.12.3, is MachineTaintPropagationOnInitialization semantically distinct from MachineTaintPropagationAlways for Machine.spec.taints reconciliation?

💡 Result:

Yes, in Cluster API v1.12.3, MachineTaintPropagationOnInitialization and MachineTaintPropagationAlways are semantically distinct regarding the reconciliation of taints on a node [1]. The distinction lies in how the Cluster API controller manages the lifecycle of the taint after its initial application [1]: - MachineTaintPropagationAlways: The controller continuously reconciles the taint [1]. This ensures that the taint remains on the node if it is currently defined on the Machine [1]. If the taint is manually removed from the node, the controller will re-add it during the next reconciliation loop [1]. Similarly, if the taint is removed from the Machine spec, the controller will remove it from the node [1]. - MachineTaintPropagationOnInitialization: The controller ensures the taint is applied to the node only once during the initialization phase [1]. After this initial application, the controller ceases to manage the taint [1]. Consequently, if the taint is removed from the node, it will not be re-added by the controller, and if the taint is removed from the Machine spec, no action is taken on the node [1]. This behavior allows for different operational patterns, such as using OnInitialization for taints that are intended to be manually managed or removed after a one-time setup task (e.g., preparing a node with specific drivers) [1].

Citations:


🏁 Script executed:

# First, let's examine the file and the specific lines mentioned
fd 'machine.go' | grep 'conversion/capi2mapi'

Repository: openshift/cluster-capi-operator

Length of output: 111


🏁 Script executed:

# Once we find the file, let's view the lines in context
rg -A 15 -B 10 'convertCAPITaintsToMAPITaints' pkg/conversion/capi2mapi/machine.go

Repository: openshift/cluster-capi-operator

Length of output: 1542


🏁 Script executed:

# Let's find and examine the convertCAPITaintsToMAPITaints function definition
rg -A 20 'func convertCAPITaintsToMAPITaints' --type go

Repository: openshift/cluster-capi-operator

Length of output: 1371


🏁 Script executed:

# Check the MAPI Machine and Taint type definitions
fd 'types\.go|machine\.go' | xargs grep -l 'type Machine struct' | head -3

Repository: openshift/cluster-capi-operator

Length of output: 270


🏁 Script executed:

# Check if MAPI taint type has a Propagation field
rg 'type.*Taint.*struct' --type go -A 10 | head -50

Repository: openshift/cluster-capi-operator

Length of output: 5339


🏁 Script executed:

# View the full MachineTaint struct definition
rg -A 30 'type MachineTaint struct' vendor/sigs.k8s.io/cluster-api/api/core/v1beta2/common_types.go

Repository: openshift/cluster-capi-operator

Length of output: 1974


🏁 Script executed:

# Also check what values MachineTaintPropagation can have
rg 'MachineTaintPropagation' vendor/sigs.k8s.io/cluster-api/api/core/v1beta2/ -A 2 -B 2

Repository: openshift/cluster-capi-operator

Length of output: 3154


🏁 Script executed:

# Verify the location of the conversion call and its error handling context
cat -n pkg/conversion/capi2mapi/machine.go | sed -n '50,85p'

Repository: openshift/cluster-capi-operator

Length of output: 2370


Reject CAPI taints with OnInitialization propagation that MAPI cannot represent.

The conversion at line 66 silently drops taint propagation information, transforming OnInitialization taints into continuously reconciled Always taints. MAPI/Kubernetes taints lack propagation semantics; CAPI's OnInitialization (apply once, then stop managing) and Always (continuously reconcile) are fundamentally different behaviors.

Validate taint propagation early and surface an error for unsupported values:

Proposed fix
 func fromCAPIMachineToMAPIMachine(capiMachine *clusterv1.Machine, additionalMachineAPIMetadataLabels, additionalMachineAPIMetadataAnnotations map[string]string) (*mapiv1beta1.Machine, field.ErrorList) {
 	errs := field.ErrorList{}
+	for i, t := range capiMachine.Spec.Taints {
+		if t.Propagation != "" && t.Propagation != clusterv1.MachineTaintPropagationAlways {
+			errs = append(errs, field.Invalid(
+				field.NewPath("spec", "taints").Index(i).Child("propagation"),
+				t.Propagation,
+				"only Always taint propagation is supported for MAPI conversion",
+			))
+		}
+	}
 
 	lifecycleHooks, capiMachineNonHookAnnotations := convertCAPILifecycleHookAnnotationsToMAPILifecycleHooksAndAnnotations(capiMachine.Annotations)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/conversion/capi2mapi/machine.go` around lines 66 - 67, Detect and reject
any CAPI taint with Propagation == OnInitialization instead of silently
converting it to an Always taint: add validation before or inside
convertCAPITaintsToMAPITaints (or create validateCAPITaintPropagation called
from the caller that currently passes capiMachine.Spec.Taints) to iterate
capiMachine.Spec.Taints, if any taint.Propagation == OnInitialization return an
explicit error naming the offending taint (key/effect/propagation) so the caller
can fail conversion; do not map OnInitialization to an Always taint or drop
propagation silently. Ensure the conversion call site propagates this error up
instead of continuing.

@openshift-ci

openshift-ci Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

@nrb: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@tthvo

tthvo commented May 26, 2026

Copy link
Copy Markdown
Member

This work relies on having CAPI v1.12, though it doesn't look like bumped yet, possibly because of the upstream CAPA bump (kubernetes-sigs/cluster-api-provider-aws#5857) being in progress.

👋 May I ask some questions:

  • Do you think we need to also enable feature gate MachineTaintPropagation for CAPI controller? I see it is false by default 🤔
  • Will this taints field be also available in v1beta1 API? I see upstream kubernetes-sigs/cluster-api@4c81f12 brought it to v1beta1 so probably it will in the next sync.

I'm asking because the installer still uses v1beta1 to generate CAPI machines and we need to set node taints for edge machine pool. Or I guess since v1beta2 is now the storage version (right?), the installer should start using it instead 👀

@nrb

nrb commented Jun 12, 2026

Copy link
Copy Markdown
Contributor Author

@tthvo

Do you think we need to also enable feature gate MachineTaintPropagation for CAPI controller? I see it is false by default 🤔

Yeah, we'll need to turn that on.

Will this taints field be also available in v1beta1 API? I see upstream kubernetes-sigs/cluster-api@4c81f12 brought it to v1beta1 so probably it will in the next sync.

It looks like that backport was made against the 1.13 series, not 1.12? I don't see any mention of a git backport in kubernetes-sigs/cluster-api#13455.

IMO we should try to move to v1beta2 if it's not too painful...but given the delay on CAPA, it might be.

@tthvo

tthvo commented Jun 12, 2026

Copy link
Copy Markdown
Member

It looks like that backport was made against the 1.13 series, not 1.12? I don't see any mention of a git backport in kubernetes-sigs/cluster-api#13455.

IMO we should try to move to v1beta2 if it's not too painful...but given the delay on CAPA, it might be.

Arghh, sorry, I didn't look at the PR date... 🤦 Anywayy, +1 to use v1beta2! On the installer side, we did move to generate CAPI manifests with v1beta2 for workers after all. There are few more places for control plane, which will eventually move too. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants