NE-2386: IngressController API for AWS NLB security group selection#2037
NE-2386: IngressController API for AWS NLB security group selection#2037aswinsuryan wants to merge 2 commits into
Conversation
Add enhancement proposal for specifying custom (BYO) security groups on AWS NLB IngressControllers via a new securityGroups field on AWSNetworkLoadBalancerParameters. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Aswin Suryanarayanan <asuryana@redhat.com>
|
Skipping CI for Draft Pull Request. |
|
@aswinsuryan: This pull request references NE-2386 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
cc @mfbonfigli |
mtulio
left a comment
There was a problem hiding this comment.
Overall looks good! I have some questions and comments about CCM behavior with managed SG.
| existing `AWSServiceLBNetworkSecurityGroup` feature gate is for the | ||
| managed security group feature. The CCM-side BYO SG support (PR | ||
| #1379) may land ungated via a CCM rebase. Should the IngressController | ||
| API field use `AWSServiceLBNetworkSecurityGroup`, a new dedicated |
There was a problem hiding this comment.
I believe we should not use the AWSServiceLBNetworkSecurityGroup gate as it is already promoted GA on 4.22 on self-managed, and working to be promoted GA on Hypershift 5.0, so it may not behave as expected to gate IngressController.
cc-ing colleagues to add more thoughts @mfbonfigli @JoelSpeed @elmiko
There was a problem hiding this comment.
Agreed. I have removed the feature gate requirement entirely for the IngressController API field. Instead of using AWSServiceLBNetworkSecurityGroup or creating a new IngressController-specific gate, the Ingress Operator will use runtime validation by checking the CCM cloud-config for NLBSecurityGroupMode = Managed.
| setting is required for the BYO SG annotation to be honored — without | ||
| it, the annotation is silently ignored. How this value gets set | ||
| depends on the feature gate decision (see | ||
| [Open Questions](#open-questions)). |
There was a problem hiding this comment.
This is a good point. AFAICT we don't have an interface to check the cloud-config other than checking the ConfigMap, do you think that would be good for an validation webhook, or even in the IngressController startup to check if that config NLBSecurityGroupMode is Managed in the cloud-config, and reject the object if BYOSG is set and not enabled in the cluster.
FYI this configuration is enforced by default for all self-managed starting on 4.22, so that we don't expect this to be changed by user.
There was a problem hiding this comment.
Changed to validate during IngressController reconciliation by checking the cloud-provider-config ConfigMap for NLBSecurityGroupMode = Managed. If not enabled, the operator sets Degraded=True and doesn't apply the annotation.
| [Open Questions](#open-questions)). | ||
| 4. The master node IAM role must have the | ||
| `elasticloadbalancing:SetSecurityGroups` permission (added in | ||
| [openshift/installer#10512](https://github.com/openshift/installer/pull/10512)). |
There was a problem hiding this comment.
As well is about to be added to the ROSA managed policy at Jira card : TBD
| **Note:** If the NLB was initially created without any security group (before | ||
| this feature was available), the NLB must be recreated to support security | ||
| groups. This is an AWS platform limitation. The administrator must delete and | ||
| recreate the IngressController, which causes downtime. |
There was a problem hiding this comment.
recreate the IngressController
Considering the Service bound to the Load Balancer, and IngressController manages the Service, if the Service is deleted do we expect the IngressController to recreate it and update the DNS with new LB automatically without needing to recreate the IngressController?
There was a problem hiding this comment.
yes, the IngressController automatically recreates the Service with all annotations (including securityGroups) and DNS is updated in the same reconciliation loop.
| security groups are not deleted (the user retains ownership). The | ||
| exact CCM behavior for transitioning back to a managed security group | ||
| after BYO removal is not documented upstream and should be verified | ||
| during implementation. |
There was a problem hiding this comment.
If I followed correctly when BYOSG annotation is removed in existing Service/NLB which was previously created with BYOSG, a managed SG will be created attaching it to the new SG, leaving SG resource unattached to the NLB:
https://github.com/mfbonfigli/cloud-provider-aws/blob/31a27a5f9ac61ad68f9b4d0a8da765ff060245d3/pkg/providers/v1/aws.go#L2276-L2304
There is also a e2e for it: https://github.com/kubernetes/cloud-provider-aws/blob/c34d66ed717aaffe3319f149c26e28163e206a3e/tests/e2e/loadbalancer.go#L548
cc @mfbonfigli
|
|
||
| 6. The administrator must delete the IngressController and recreate it | ||
| to provision a new NLB with security group support. This causes | ||
| downtime for routes served by that IngressController. |
There was a problem hiding this comment.
As asked, do we have a practice or guidance to delete the Service directly or IC? is it possible to decrease disruptions when deleting the Service if this is supported?
There was a problem hiding this comment.
Good question! I've added guidance recommending Service deletion over IngressController deletion
| `NLBSecurityGroupMode = Managed` | ||
| (e.g., the CCCMO has not been updated to set it). | ||
|
|
||
| 2. The administrator adds `securityGroups` to the IngressController spec. |
There was a problem hiding this comment.
considering the as the CCM managed SG is a cluster-scoped configuration through cloud-config, do you think we can get early feedback to the user preventing the ingresscontroller object add/update the securityGroups when managed SG is not enabled in the cluster?
There was a problem hiding this comment.
added validation to provide early feedback when securityGroups is specified but managed SG mode isn't enabled.
Additional changes based on review feedback: - Clarify CCM behavior when removing BYO security groups - Add guidance on Service vs IngressController deletion - Update Prerequisites to reflect automatic CCCMO enablement Signed-off-by: Aswin Suryanarayanan <asuryana@redhat.com>
|
@aswinsuryan: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
This enhancement enables administrators to specify custom (Bring Your Own) security groups for AWS Network Load Balancers on IngressControllers. A new securityGroups field is added to AWSNetworkLoadBalancerParameters, which the Cluster
Ingress Operator translates into the service.beta.kubernetes.io/aws-load-balancer-security-groups Service annotation for the Cloud Controller Manager. This targets ROSA as the primary deployment type and builds on the CCM-level BYO security
group support added in upstream cloud-provider-aws#1379.