Skip to content

apply annotation tags to NLB & listeners on update#1447

Open
alimx07 wants to merge 1 commit into
kubernetes:masterfrom
alimx07:fix/nlb-update-tags
Open

apply annotation tags to NLB & listeners on update#1447
alimx07 wants to merge 1 commit into
kubernetes:masterfrom
alimx07:fix/nlb-update-tags

Conversation

@alimx07

@alimx07 alimx07 commented May 15, 2026

Copy link
Copy Markdown

What type of PR is this?

/kind bug

What this PR does / why we need it:

service.beta.kubernetes.io/aws-load-balancer-additional-resource-tags was only applied to the NLB at creation time (via CreateLoadBalancer input tags). Annotation changes never propagated on later reconciles, and removed tags were never cleaned up.

This reconciles additional tags on every ensureLoadBalancerv2 update across the load balancer, all surviving listeners, and target groups ,adding/updating to match the annotation and removing keys the controller previously applied but no longer wants.

Two tag keys are involved:

  • service.beta.kubernetes.io/aws-load-balancer-additional-resource-tags : existing user annotation, now propagated on update.
  • kubernetes.io/cloud-controller/managed-tags : new marker tag recording which keys the controller owns, so removed keys are reconciled away while out-of-band and system tags are preserved.

Which issue(s) this PR fixes:

Fixes #1334

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Propagate `service.beta.kubernetes.io/aws-load-balancer-additional-resource-tags` to the NLB, its listeners, and target groups on update; previously tags were only applied to the load balancer at creation. Tag removals are now reconciled, and a `kubernetes.io/cloud-controller/managed-tags` marker preserves tags added out-of-band.

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 15, 2026
@k8s-ci-robot

Copy link
Copy Markdown
Contributor

This issue is currently awaiting triage.

If cloud-provider-aws contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign yue9944882 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested review from dims and elmiko May 15, 2026 16:53
@k8s-ci-robot

Copy link
Copy Markdown
Contributor

Hi @alimx07. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels May 15, 2026
@alimx07

alimx07 commented May 15, 2026

Copy link
Copy Markdown
Author

/cc @mtulio @kmala. Could you look into this if you have time. Thanks in advance!

@mtulio mtulio left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Initial review, will check deepen later. It would be nice to have an e2e case for this flow.

Comment thread pkg/providers/v1/aws_loadbalancer.go Outdated
Comment on lines +401 to +409
err := c.addLoadBalancerTagsv2(ctx, aws.ToString(listener.ListenerArn), tags)
if err != nil {
return nil, err
}
}
}
}

if err := c.addLoadBalancerTagsv2(ctx, aws.ToString(loadBalancer.LoadBalancerArn), tags); err != nil {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll need to implement some guardrails to safeguard user-provided tags through ServiceAnnotationLoadBalancerAdditionalTags replace controller-managed/private tags, such as Name, kubernetes.io/cluster/<cluster-id>, etc (which else?) in aws_validations?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, addressed. I think controller-managed tags are kubernetes.io/cluster/<cluster-id> , KubernetesCluster and kubernetes.io/service-name. All three are guarded in the validation now, but I couldn't find a Name tag being set anywhere in the NLB path, could you point me to where you're seeing it or other worth guarded tags so I can take a look?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for restricting those.
you are correct, Name is not added, only LB name is set, which is different. I don't have a strong opinion about restricting Name since lookup may also use LB names. cc @kmala

@alimx07 alimx07 force-pushed the fix/nlb-update-tags branch 2 times, most recently from e4e212b to a10c415 Compare May 16, 2026 13:23
@joshuakguo

Copy link
Copy Markdown
Contributor

preface: trying to understand more about cloud providers, a question just for my own understanding

is the case for tag updates that we only care to propagate new tags on reconcile but not remove removed tags on reconcile?

@alimx07

alimx07 commented May 18, 2026

Copy link
Copy Markdown
Author

preface: trying to understand more about cloud providers, a question just for my own understanding

is the case for tag updates that we only care to propagate new tags on reconcile but not remove removed tags on reconcile?

Hi @joshuakguo, Actually we should, this is missing right. I also noticed this was not found in the previous version on EnsureLoadBalancer. May need clarification from a maintainer, but what comes to my mind now as the simplest way is doing describeTags then diff and call remove on the extra ones. Thanks for mentioning this.

@kmala

kmala commented May 19, 2026

Copy link
Copy Markdown
Member

preface: trying to understand more about cloud providers, a question just for my own understanding
is the case for tag updates that we only care to propagate new tags on reconcile but not remove removed tags on reconcile?

Hi @joshuakguo, Actually we should, this is missing right. I also noticed this was not found in the previous version on EnsureLoadBalancer. May need clarification from a maintainer, but what comes to my mind now as the simplest way is doing describeTags then diff and call remove on the extra ones. Thanks for mentioning this.

I think the reason is because we don't have a way to know who owns the tags on the LB, since these are values of annotations.

@alimx07

alimx07 commented May 19, 2026

Copy link
Copy Markdown
Author

@kmala , Could we borrow from 3 way merge and introduce new annotation (e.g. something/last-applied-tags) as some sort of ownership we can diff on it?

@alimx07

alimx07 commented May 29, 2026

Copy link
Copy Markdown
Author

Hey @kmala @mtulio, If you have time. Any updates here ?
Thanks in advance !!

Comment thread pkg/providers/v1/aws.go Outdated
Comment thread tests/e2e/loadbalancer.go
return &elbv2.DeleteListenerOutput{}, nil
}

func TestEnsureLoadBalancerv2_TaggingLifecycle(t *testing.T) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for setting up those test. Since you are defining a function to test EnsureLoadBalancerv2, I'd suggest:

  • use Table-Driven test pattern like other units defined in this file for further maintenance :)
  • making this more generically to allow other scenarios that may want to validate the annotations more generically through EnsureLoadBalancerv2 it can be expanded by adding cases, reusing all that pre-setup (mocks ec2, etc)
  • Use t.Context() instead of context.TODO()

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay @mtulio , Makes sense on the table-driven pattern and t.Context(), will update both.

Just for clarification on the generic part, the goal is a table-driven TestEnsureLoadBalancerv2_annotations* where each case exercises a different annotation and asserts what it produces, So some tests like this, could be just cases included in our new generic one ?

@mtulio mtulio Jun 1, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on the generic part, the goal is a table-driven TestEnsureLoadBalancerv2_annotations* where each case exercises a different annotation and asserts what it produces,

yeah, I think that way will pave a good path for annotations' tests maitainance and increase coverage. thanks

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mtulio, I have done it. Would love if you any considerations about the test now?

Comment thread pkg/providers/v1/aws.go

// addLoadBalancerTagsv2 tags a single ELBv2 resource (LB, listener, target group, etc.).
// https://github.com/aws/aws-sdk-go-v2/blob/dc2d13fa6f1db25f1c6d804567e1ecfcdff4f040/service/elasticloadbalancingv2/api_op_AddTags.go#L14
func (c *Cloud) addLoadBalancerTagsv2(ctx context.Context, resourceARN string, requested map[string]string) error {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to skip api calls when no tags requested to added

Comment thread pkg/providers/v1/aws_loadbalancer.go Outdated
for port, protocols := range actual {
for protocol, listener := range protocols {
if _, ok := frontEndPorts[port][protocol]; ok {
err := c.addLoadBalancerTagsv2(ctx, aws.ToString(listener.ListenerArn), tags)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to save API calls here from N to 1 AddTag, or less? by:

  1. check if needs update tags and
  2. send ARN list (LB + listeners), instead each ARN?

@alimx07 alimx07 May 30, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems that even AddTags take a slice , this error comes from the SDK when trying to update more than one.

An error occurred (ValidationError) when calling the AddTags operation: Only one resource can be tagged at a time

So this is limitation from AWS API for now.

But you are right , we could save API calls by first call DescribeTags , then diff with tags and call AddTags per listener if needed. It will be either 1 or N+1 calls.

Comment thread tests/e2e/loadbalancer.go Outdated
return &elbv2.DeleteListenerOutput{}, nil
}

func TestEnsureLoadBalancerv2_TaggingLifecycle(t *testing.T) {

@mtulio mtulio Jun 1, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on the generic part, the goal is a table-driven TestEnsureLoadBalancerv2_annotations* where each case exercises a different annotation and asserts what it produces,

yeah, I think that way will pave a good path for annotations' tests maitainance and increase coverage. thanks

@alimx07 alimx07 force-pushed the fix/nlb-update-tags branch from a10c415 to d4f63bb Compare June 2, 2026 19:00
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 5, 2026
Comment thread pkg/providers/v1/aws.go Outdated
Comment thread pkg/providers/v1/aws_loadbalancer.go Outdated
}
}

// Reconcile tags on the LB and all surviving listeners. Newly created

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this block is not added in the better place it should be, or at least ensureLoadBalancerResourceTagsv2 is not.
Here is the end of the big block to "sync mappings" inside the existing LB synchronization.

Considering the problem below, I think this could be added the end of the ensureLoadBalancerV2 function using the "reconciliation pattern".

Question 1) tag removal support
Question/issue I am seeing here: how that annotation controls removed tags? given the following scenrio:
t1) user update a svc with tags: env=prod,team=eng
t2) user remove a tag team=eng: will team tag leaked ?
t3) user remove the tag annotation: will env tag leaked?

How we could prevent that "leak"?

Question 2) Tagging TGs
do we need tag the target groups as may be part of the "load balancer resources"?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Tag removal support. the current approach will not handle that and we could say tags will leak , which is the same approach for the old ensureLoadBalancer. The reason is we do not have a way to know who owns tags on LB (e.g. user , controller, terraform) , so naive describe + remove will not work.

we could borrow from 3 way merge and introduce new controller managed annotation (e.g. something/last-applied-tags) as some sort of ownership we can diff on it, if we are okay with that, I could go on with it.

For TGs, tags are obtained through the reconcile loop, so you're right. I missed that and was focused only on Listeners. With the current approach, this should be a minimal amount of work. Nice pushback :)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could borrow from 3 way merge and introduce new controller managed annotation (e.g. something/last-applied-tags) as some sort of ownership we can diff on it, if we are okay with that, I could go on with it.

yep, that's a good idea. Beforehand I would check what is the AWS LBC behavior of existing BYO tags, so we can prevent diverging from there and here in the UX. (I quickly read the documentation and did't get much insights)

Having said that, exploring that idea of managing the BYO tags, it indeed be interesting to have controller managed tag managing which tag is added (through values containing tagKeys?)

For example:

when annotation is added: service.beta.kubernetes.io/aws-load-balancer-additional-resource-tags: team=squadA,env=stage,group=tmp
The resource will have tags - managed by controller:

  • kubernetes.io/cloud-provider/managed-tags: team,env,group
  • team: squadA
  • env: stage
  • group: tmp

when annotation is updated to service.beta.kubernetes.io/aws-load-balancer-additional-resource-tags: team=squadA,env=prod (removing group, new value for env)

the tag reconciliation will detect drift from annotation vs managed tag, and remove stale tag group, as well update value of env, resulting in resource tags:

  • kubernetes.io/cloud-provider/managed-tags: team,env
  • team: squadA
  • env: prod

when annotation is removed from service, the tag reconciliation must detect managed tag exists in the resource && annotation not present, resulting in removing any tag added by controller, as well the managed tag

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mtulio, sorry for the late reply.

For AWS LBC, it assumes itself as the source of truth for tags. On each reconcile it removes any tag not in its desired set (annotation tags + --default-tags + its own tracking tags), except AWS system tags (aws:*) and anything listed in --external-managed-tags (code, docs).

Maybe I am wrong but I went with managed annotation for now to a tag propagation feature, it could be a weird UX, but matching LBC here making CCM authoritative seems like a breaking change I think.

Comment thread pkg/providers/v1/aws.go Outdated
Comment thread tests/e2e/loadbalancer.go Outdated
cfg.svc = updated

framework.Logf("verifying updated tag e2e-tag=two on LB and listeners")
framework.ExpectNoError(verifyResourceTags(cfg.ctx, elbClient, arns, map[string]string{"e2e-tag": "two"}))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you need to refresh listenerARNs as you are updating the svc, if the controller eventually recreate the listener you'll be checking old listener (if not delted) which will have old tag values

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay I will do it.

Just a question for curiosity, in the reconcile loop, listeners only deleted/created as ports mapping change. while this test just update some tags, so why recreation could happen here?

Comment thread tests/e2e/loadbalancer.go
Comment thread pkg/providers/v1/aws.go Outdated
@alimx07 alimx07 force-pushed the fix/nlb-update-tags branch from d4f63bb to 4055028 Compare June 12, 2026 23:17
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 12, 2026
@alimx07 alimx07 force-pushed the fix/nlb-update-tags branch from 4055028 to 0e01ab6 Compare June 21, 2026 07:36
@kubernetes-prow

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign cheftako for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubernetes-prow kubernetes-prow Bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 21, 2026
Previously, ServiceAnnotationLoadBalancerAdditionalTags was only
applied to the LB resource at creation time via CreateLoadBalancer
input tags. Listeners received no tags on subsequent reconcile calls.

Introduce addLoadBalancerTagsv2 which calls AddTags per-ARN.
On every ensureLoadBalancerv2 update pass, tag all surviving
listeners individually, then tag the LB itself, so annotation tag
changes propagate to both resources.

Add validateServiceAnnotationAdditionalTags func which
validates that additional tags do not override controller-managed tags

Add unit tests covering create→tag and multi-listener
modify+delete lifecycle

Add e2e-test covering propagation of additional tags on creation and update
@alimx07 alimx07 force-pushed the fix/nlb-update-tags branch from 0e01ab6 to 32f882c Compare June 21, 2026 07:43
@mtulio

mtulio commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

/ok-to-test

@kubernetes-prow kubernetes-prow Bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 22, 2026
@kubernetes-prow

Copy link
Copy Markdown

@alimx07: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cloud-provider-aws-e2e 32f882c link true /test pull-cloud-provider-aws-e2e

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Tags are not applied to NLB post creation

5 participants