Skip to content

DNS plugin fails on GKE Container-Optimized OS nodes with cannot allocate memory #2250

@chetanatole

Description

@chetanatole

Describe the bug
The Retina agent fails to start on GKE nodes running Container-Optimized OS (COS) when the DNS plugin is enabled. The failure occurs while attaching eBPF program to a socket, resulting in a cannot allocate memory error.

To Reproduce
Steps to reproduce the behavior:

  1. Install retina with hubble control plane:
helm upgrade --install retina oci://ghcr.io/microsoft/retina/charts/retina-hubble \
  --version $VERSION \
  --namespace kube-system \
  --set namespace=kube-system \
  --set operator.enabled=true \
  --set operator.repository=ghcr.io/microsoft/retina/retina-operator \
  --set operator.tag=$VERSION \
  --set agent.enabled=true \
  --set agent.repository=ghcr.io/microsoft/retina/retina-agent \
  --set agent.tag=$VERSION \
  --set agent.init.enabled=true \
  --set agent.init.repository=ghcr.io/microsoft/retina/retina-init \
  --set agent.init.tag=$VERSION \
  --set logLevel=info \
  --set hubble.tls.enabled=false \
  --set hubble.relay.tls.server.enabled=false \
  --set hubble.tls.auto.enabled=false \
  --set hubble.tls.auto.method=cronJob \
  --set hubble.tls.auto.certValidityDuration=1 \
  --set hubble.tls.auto.schedule="*/10 * * * *" \
  --set "hubble.metrics.enabled={dns:sourceContext=pod;destinationContext=pod;,drop:sourceContext=pod;destinationContext=pod;,tcp:sourceContext=pod;destinationContext=pod;,flow:sourceContext=pod;destinationContext=pod;labelsContext=source_ip\,destination_ip}" 
  1. Check the agent pods:
k get pod -l k8s-app=retina -n kube-system

NAME                 READY   STATUS             RESTARTS      AGE
retina-agent-2qgsz   0/1     CrashLoopBackOff   1 (10s ago)   17s
retina-agent-4tckz   0/1     Error              1 (12s ago)   17s
retina-agent-bhvbp   0/1     Error              1 (9s ago)    17s
  1. Check agent pod logs:
ts=2026-04-27T12:58:57.965Z level=info caller=packetparser/packetparser_linux.go:343 msg="Starting packet parser"
ts=2026-04-27T12:58:57.965Z level=info caller=packetparser/packetparser_linux.go:345 msg="setting up enricher since pod level is enabled"
ts=2026-04-27T12:58:57.965Z level=warn caller=packetparser/packetparser_linux.go:350 msg="retina enricher is not initialized"
ts=2026-04-27T12:58:57.965Z level=info caller=packetparser/packetparser_linux.go:383 msg="Skipping attaching bpf program to default interface of k8s Node in node namespace"
ts=2026-04-27T12:58:57.965Z level=info caller=packetparser/packetparser_linux.go:769 msg="Started packet parser"
ts=2026-04-27T12:58:57.979Z level=fatal caller=pluginmanager/cells_linux.go:64 msg="failed to start plugin manager" subsys=pluginmanager module=pluginmanager error="failed to reconcile plugin dns: failed to init plugin: failed to attach BPF to socket: cannot allocate memory" errorVerbose="cannot allocate memory\nfailed to attach BPF to socket\ngithub.com/microsoft/retina/pkg/plugin/dns.(*dns).Init\n\t/go/src/github.com/microsoft/retina/pkg/plugin/dns/dns_linux.go:87\ngithub.com/microsoft/retina/pkg/managers/pluginmanager.(*PluginManager).Reconcile\n\t/go/src/github.com/microsoft/retina/pkg/managers/pluginmanager/pluginmanager.go:112\ngithub.com/microsoft/retina/pkg/managers/pluginmanager.(*PluginManager).Start\n\t/go/src/github.com/microsoft/retina/pkg/managers/pluginmanager/pluginmanager.go:166\ngithub.com/microsoft/retina/pkg/managers/pluginmanager.newPluginManager.func1.1\n\t/go/src/github.com/microsoft/retina/pkg/managers/pluginmanager/cells_linux.go:62\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1771\nfailed to init plugin\ngithub.com/microsoft/retina/pkg/managers/pluginmanager.(*PluginManager).Reconcile\n\t/go/src/github.com/microsoft/retina/pkg/managers/pluginmanager/pluginmanager.go:113\ngithub.com/microsoft/retina/pkg/managers/pluginmanager.(*PluginManager).Start\n\t/go/src/github.com/microsoft/retina/pkg/managers/pluginmanager/pluginmanager.go:166\ngithub.com/microsoft/retina/pkg/managers/pluginmanager.newPluginManager.func1.1\n\t/go/src/github.com/microsoft/retina/pkg/managers/pluginmanager/cells_linux.go:62\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1771\nfailed to reconcile plugin dns\ngithub.com/microsoft/retina/pkg/managers/pluginmanager.(*PluginManager).Start\n\t/go/src/github.com/microsoft/retina/pkg/managers/pluginmanager/pluginmanager.go:170\ngithub.com/microsoft/retina/pkg/managers/pluginmanager.newPluginManager.func1.1\n\t/go/src/github.com/microsoft/retina/pkg/managers/pluginmanager/cells_linux.go:62\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1771"

Expected behavior
Retina agent should start successfully with DNS plugin enabled

Platform:

  • OS: Container-Optimized OS (cos_containerd)
  • Machine type: n2-standard-4
  • Kubernetes Version: v1.34.6-gke.1307000
  • Host: GKE
  • Retina Version: v1.2.0

Additional context

  • The issue does not occur on Ubuntu-based nodes in GKE env.
  • Disabling DNS plugin resolves the issue:
enabledPlugin_linux: '["linuxutil","packetforward","packetparser","dropreason"]'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No fields configured for Bug.

    Projects

    Status
    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions