Skip to content

NE-2386: IngressController API for AWS NLB security group selection#2037

Open
aswinsuryan wants to merge 2 commits into
openshift:masterfrom
aswinsuryan:NE-2386-aws-nlb-security-group-selection
Open

NE-2386: IngressController API for AWS NLB security group selection#2037
aswinsuryan wants to merge 2 commits into
openshift:masterfrom
aswinsuryan:NE-2386-aws-nlb-security-group-selection

Conversation

@aswinsuryan

Copy link
Copy Markdown

This enhancement enables administrators to specify custom (Bring Your Own) security groups for AWS Network Load Balancers on IngressControllers. A new securityGroups field is added to AWSNetworkLoadBalancerParameters, which the Cluster
Ingress Operator translates into the service.beta.kubernetes.io/aws-load-balancer-security-groups Service annotation for the Cloud Controller Manager. This targets ROSA as the primary deployment type and builds on the CCM-level BYO security
group support added in upstream cloud-provider-aws#1379.

Add enhancement proposal for specifying custom (BYO) security groups
on AWS NLB IngressControllers via a new securityGroups field on
AWSNetworkLoadBalancerParameters.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Aswin Suryanarayanan <asuryana@redhat.com>
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 11, 2026
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 11, 2026
@openshift-ci

openshift-ci Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci-robot

openshift-ci-robot commented Jun 11, 2026

Copy link
Copy Markdown

@aswinsuryan: This pull request references NE-2386 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

This enhancement enables administrators to specify custom (Bring Your Own) security groups for AWS Network Load Balancers on IngressControllers. A new securityGroups field is added to AWSNetworkLoadBalancerParameters, which the Cluster
Ingress Operator translates into the service.beta.kubernetes.io/aws-load-balancer-security-groups Service annotation for the Cloud Controller Manager. This targets ROSA as the primary deployment type and builds on the CCM-level BYO security
group support added in upstream cloud-provider-aws#1379.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci

openshift-ci Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign miciah for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@aswinsuryan aswinsuryan marked this pull request as ready for review June 11, 2026 17:18
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 11, 2026
@openshift-ci openshift-ci Bot requested review from Miciah and candita June 11, 2026 17:18
@mtulio

mtulio commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

cc @mfbonfigli

@mtulio mtulio left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good! I have some questions and comments about CCM behavior with managed SG.

existing `AWSServiceLBNetworkSecurityGroup` feature gate is for the
managed security group feature. The CCM-side BYO SG support (PR
#1379) may land ungated via a CCM rebase. Should the IngressController
API field use `AWSServiceLBNetworkSecurityGroup`, a new dedicated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we should not use the AWSServiceLBNetworkSecurityGroup gate as it is already promoted GA on 4.22 on self-managed, and working to be promoted GA on Hypershift 5.0, so it may not behave as expected to gate IngressController.

cc-ing colleagues to add more thoughts @mfbonfigli @JoelSpeed @elmiko

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I have removed the feature gate requirement entirely for the IngressController API field. Instead of using AWSServiceLBNetworkSecurityGroup or creating a new IngressController-specific gate, the Ingress Operator will use runtime validation by checking the CCM cloud-config for NLBSecurityGroupMode = Managed.

setting is required for the BYO SG annotation to be honored — without
it, the annotation is silently ignored. How this value gets set
depends on the feature gate decision (see
[Open Questions](#open-questions)).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point. AFAICT we don't have an interface to check the cloud-config other than checking the ConfigMap, do you think that would be good for an validation webhook, or even in the IngressController startup to check if that config NLBSecurityGroupMode is Managed in the cloud-config, and reject the object if BYOSG is set and not enabled in the cluster.
FYI this configuration is enforced by default for all self-managed starting on 4.22, so that we don't expect this to be changed by user.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to validate during IngressController reconciliation by checking the cloud-provider-config ConfigMap for NLBSecurityGroupMode = Managed. If not enabled, the operator sets Degraded=True and doesn't apply the annotation.

[Open Questions](#open-questions)).
4. The master node IAM role must have the
`elasticloadbalancing:SetSecurityGroups` permission (added in
[openshift/installer#10512](https://github.com/openshift/installer/pull/10512)).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As well is about to be added to the ROSA managed policy at Jira card : TBD

**Note:** If the NLB was initially created without any security group (before
this feature was available), the NLB must be recreated to support security
groups. This is an AWS platform limitation. The administrator must delete and
recreate the IngressController, which causes downtime.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recreate the IngressController

Considering the Service bound to the Load Balancer, and IngressController manages the Service, if the Service is deleted do we expect the IngressController to recreate it and update the DNS with new LB automatically without needing to recreate the IngressController?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the IngressController automatically recreates the Service with all annotations (including securityGroups) and DNS is updated in the same reconciliation loop.

security groups are not deleted (the user retains ownership). The
exact CCM behavior for transitioning back to a managed security group
after BYO removal is not documented upstream and should be verified
during implementation.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I followed correctly when BYOSG annotation is removed in existing Service/NLB which was previously created with BYOSG, a managed SG will be created attaching it to the new SG, leaving SG resource unattached to the NLB:
https://github.com/mfbonfigli/cloud-provider-aws/blob/31a27a5f9ac61ad68f9b4d0a8da765ff060245d3/pkg/providers/v1/aws.go#L2276-L2304

There is also a e2e for it: https://github.com/kubernetes/cloud-provider-aws/blob/c34d66ed717aaffe3319f149c26e28163e206a3e/tests/e2e/loadbalancer.go#L548

cc @mfbonfigli

Comment thread enhancements/ingress/aws-nlb-security-group-selection.md Outdated
Comment thread enhancements/ingress/aws-nlb-security-group-selection.md Outdated
Comment thread enhancements/ingress/aws-nlb-security-group-selection.md Outdated

6. The administrator must delete the IngressController and recreate it
to provision a new NLB with security group support. This causes
downtime for routes served by that IngressController.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As asked, do we have a practice or guidance to delete the Service directly or IC? is it possible to decrease disruptions when deleting the Service if this is supported?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! I've added guidance recommending Service deletion over IngressController deletion

`NLBSecurityGroupMode = Managed`
(e.g., the CCCMO has not been updated to set it).

2. The administrator adds `securityGroups` to the IngressController spec.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

considering the as the CCM managed SG is a cluster-scoped configuration through cloud-config, do you think we can get early feedback to the user preventing the ingresscontroller object add/update the securityGroups when managed SG is not enabled in the cluster?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added validation to provide early feedback when securityGroups is specified but managed SG mode isn't enabled.

Additional changes based on review feedback:
- Clarify CCM behavior when removing BYO security groups
- Add guidance on Service vs IngressController deletion
- Update Prerequisites to reflect automatic CCCMO enablement

Signed-off-by: Aswin Suryanarayanan <asuryana@redhat.com>
@openshift-ci

openshift-ci Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

@aswinsuryan: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/markdownlint 8612d52 link true /test markdownlint

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants