Releases · kubescape/node-agent

02 Jun 07:09

github-actions

v0.3.132

2038a1b

Release v0.3.132 Latest

Latest

Summary

This PR addresses the critical eBPF agent deadlock and secondary OOM crashes under high load.

Key Improvements

Deadlock Resolution:
- Decoupled RBCache mutation from notifier channel sends by introducing an internal notificationQueue (capacity 10,000). This breaks the circular dependency (RBCache.mutex write lock blocking while sending, while readers are blocked on RLock), eliminating the deadlock entirely.
FIFO Ordered Notifications:
- Notifications are queued inside the write lock using a fast, non-blocking select write. This guarantees that notifications are queued in the exact chronological order of cache mutations, preserving strict FIFO delivery to downstream consumers (like ContainerWatcher.startRunningContainers) and preventing race conditions/state desynchronization.
- The background queue processor uses a highly optimized non-blocking fast-path to prevent channel allocation/timer overhead, falling back to a 100ms timeout to defensively isolate slow/stalled notifiers without blocking healthy ones.
Event Queue OOM Protection:
- Capped the unbounded OrderedEventQueue at maxBufferSize (100,000 events) and added event dropping.
- Dropped events are cleanly returned to the memory pool using event.Release() to prevent memory leaks and cgroup OOM kills under high system loads.

Assets 2

28 May 14:33

github-actions

v0.3.129

eb5d48e

Release v0.3.129

Summary

Phase 1 — Traces, logs, drop counters

New `pkg/otelsetup` package: `InitProviders` wires up TracerProvider, LoggerProvider, and MeterProvider over OTLP gRPC; injects ARMO `X-API-Key` / `X-Customer-GUID` auth headers when the endpoint matches `otel.armosec.io`; returns no-op providers when no endpoint is configured
Container profile lifecycle tracing: `ProfileLifecycleTracker` maintains one long-running span per container learning period (bounded at 10k entries with LRU eviction), recording `profile.entry.saved`, `learning.completed`, `learning.terminated`, and eviction events
Alert log records: `EmitAlertLogRecord` emits structured OTEL log records for every fired rule and malware detection; includes 60s/1000-entry dedup LRU to avoid flooding on hot rules
eBPF drop counters: `node_agent.ebpf.events_dropped.total` incremented in container watcher and event handler factory drop paths, labelled by `reason`
Slow-eval spans: rule evaluations exceeding `OTEL_SLOW_EVAL_THRESHOLD_MS` emit a `rule.evaluate` span
Ring-buffer log processor: 7500-entry ring buffer retains recent log records; flush endpoint activates automatically when KS_LOGGER_LEVEL=debug
sbommanager: attaches `otelgrpc.NewClientHandler()` for automatic trace propagation

Phase 2 — Replace Prometheus metrics with OTEL SDK

New `pkg/metricsmanager/otel/`: full `MetricsManager` interface backed by OTEL SDK; attribute-set caching on all hot paths (2× faster, 10× less memory vs Prometheus on the histogram path)
Collapsed eBPF counters: 17 individual per-event-type counters → single `node_agent.ebpf.events.total{event_type}`
Prometheus scrape mode: `OTEL_METRICS_EXPORTER=prometheus` installs an OTEL→Prometheus bridge and starts `:8080/metrics` listener
`rule.ID` standardisation: all metric call sites now use the stable rule ID (e.g. `R1001`) instead of the display name; malware alerts use constant `"malware"` to bound cardinality
`docs/metrics-migration.md`: full mapping of old Prometheus names → new OTEL names with dashboard update checklist
A/B benchmarks: hard gate passes — OTEL allocs/op ≤ Prometheus allocs/op, ns/op ≤ 1.1× Prometheus on `BenchmarkReportRuleEvaluationTime`

New env vars

Variable	Default	Purpose
`OTEL_EXPORTER_OTLP_ENDPOINT`	—	Base OTLP gRPC endpoint
`OTEL_METRICS_EXPORTER`	—	Set to `prometheus` to enable scrape endpoint on `:8080/metrics`
`OTEL_SLOW_EVAL_THRESHOLD_MS`	0 (disabled)	Threshold for slow-eval spans
`OTEL_DEBUG_PORT`	6060	Debug listener port

`OTEL_COLLECTOR_SVC` is now deprecated (superseded by `OTEL_EXPORTER_OTLP_ENDPOINT`).

Breaking change

Metric names changed. See `docs/metrics-migration.md` for the full mapping and dashboard update checklist.

Test plan

`go build ./...` — passes
`go test ./pkg/otelsetup/... ./pkg/metricsmanager/...` — all pass
A/B benchmark: OTEL `ReportRuleEvaluationTime` ~95 ns/op / 32 B / 2 allocs vs Prometheus ~200 ns/op / 336 B / 2 allocs — gate passes
`ProfileLifecycleTracker` and `RingBufferLogProcessor` unit tests pass

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Provider-based OpenTelemetry init, OTEL-backed metrics manager replacing prior Prometheus path; expanded metrics (events, rules, SBOM, alerts), gRPC instrumentation, profile lifecycle spans, alert deduplication and suppression reporting
Documentation
- Expanded OTEL configuration reference, runtime notes, and Prometheus→OTEL migration guide
Tests
- New unit tests and benchmarks for OTEL, lifecycle tracking, thresholds, and metrics

Assets 2

27 May 17:17

github-actions

v0.3.128

a9bde81

Release v0.3.128

Closes #825. Update inspektor-gadget dependency to use the version from your fork with the concurrent map writes fix.

Assets 2

27 May 13:20

github-actions

v0.3.127

0a63d5e

Release v0.3.127

Summary

This PR introduces a container meta-context so that rules can be written once and fire on any containerized workload — both Kubernetes pods and ECS/standalone containers — without duplicating tags.

What changed

pkg/contextdetection/types.go: Added Container EventSourceContext = "container" alongside the existing Kubernetes, Host, and Standalone constants.
pkg/rulemanager/rulepolicy.go: Updated RuleAppliesToContext to treat context:container as a meta-context that matches when the agent is running in either a kubernetes or a container context.

How the meta-context works

context:container  →  matches kubernetes OR container runtime
context:kubernetes →  matches kubernetes only  (unchanged)
context:host       →  matches host only         (unchanged)
context:standalone →  matches standalone only   (unchanged)

A rule tagged only context:container will fire on Kubernetes AND on ECS/standalone container nodes. A rule that needs Kubernetes-specific behaviour still uses context:kubernetes exclusively.

Backward compatibility

Rules that should fire on both Kubernetes and ECS nodes should carry both tags:

tags:
  - context:container    # fires on any containerised workload (new agents)
  - context:kubernetes   # still fires on old node-agents that don't know "container"

Old node-agents that don't recognise container will skip that tag and match context:kubernetes as before, so no existing rule behaviour is broken.

Logic walk-through

isContainerContext := currentContext == Kubernetes || currentContext == Container

for _, tag := range rule.Tags {
    if ctx == string(currentContext)              { return true } // exact match (all contexts)
    if ctx == "container" && isContainerContext   { return true } // meta-context match
}

No-context rules continue to default to Kubernetes-only (unchanged).

Summary by CodeRabbit

New Features
- Added support for ECS (Amazon Elastic Container Service) context detection.
- Runtime alerts now include ECS-specific metadata (cluster, task, launch type, availability zone).
- Improved rule matching to support Container context across Kubernetes, Standalone, Container, and ECS environments.
Chores
- Updated Go module dependencies.

Assets 2

26 May 17:18

github-actions

v0.3.124

cbfefb2

Release v0.3.124

Summary

Bumps github.com/cilium/cilium v1.17.14 → v1.17.15 (fixes GHSA-gj49-89wh-h4gj High)
Bumps golang.org/x/crypto v0.50.0 → v0.52.0 (fixes GO-2026-5005 through GO-2026-5033)
Bumps golang.org/x/net v0.53.0 → v0.55.0 (fixes GO-2026-5024 through GO-2026-5030)
Bumps golang.org/x/sys v0.43.0 → v0.45.0 (fixes GO-2026-5024)

Not yet fixable

GHSA-x744-4wpc-v9h2 (High) affects github.com/moby/moby and github.com/docker/docker — the fixed version (v29.3.1) is not yet published to the Go module proxy. Will address in a follow-up once it becomes available.

Test plan

CI passes (build + tests)
No new dependency conflicts introduced (go mod tidy clean)

🤖 Generated with Claude Code

Assets 2

26 May 08:19

github-actions

v0.3.122

ff24606

Release v0.3.122

Bumps github.com/containerd/containerd from 1.7.30 to 1.7.32.

Release notes

Sourced from github.com/containerd/containerd's releases.

containerd 1.7.32

Welcome to the v1.7.32 release of containerd!

The thirty-second patch release for containerd 1.7 contains various fixes and updates including a security patch.

containerd

CVE-2026-46680

Allow hosts.toml to contain only root-level fields without an explicit [host] section (#10028)

Fix handling of out-of-range USER values in OCI spec to avoid unexpected username/group lookups (#13450)

Apply hardening to block AF_ALG in default socket policy (#13406)

Support both "volatile" and "fsync=volatile" mount options for volatile snapshotter (#13299)

Set AppArmor abi conditionally to support versions < 3.0 (#13273)

Please try out the release binaries and report any issues at https://github.com/containerd/containerd/issues.

Maksym Pavlenko

Chris Henzie

Derek McGowan

Paweł Gronowski

Samuel Karp

Wei Fu

Brad Davidson

Brian Goff

LEI WANG

Phil Estes

bc87d865c Prepare release notes for v1.7.32

oci: return explicit error for out-of-range USER values (#13450)

503f47946 oci: return explicit error for out-of-range USER values

seccomp: Block AF_ALG in default socket policy (#13406)

e55b747d3 seccomp: Block AF_ALG in default socket policy

4627a65f8 seccomp: Document socket rule scope and socketcall limitation

Fix issue with empty host tree in hosts.toml (#10028)

24007441d Fix error parsing hosts.toml without any host tree

Support both styles of volatile mount option (#13299)

940733149 Support both styles of volatile mount option

apparmor: Set abi conditionally (#13273)

2b732c892 apparmor: Set abi conditionally

Add GitHub Action for k8s node e2e tests (#13258)

0db1e143a Add GitHub Action for k8s node e2e tests

Update release process after 1.7 (#13236)

3223a75c2 Update for latest updates to release tool

... (truncated)

Commits

180a7b7 Merge pull request #13452 from samuelkarp/prepare-1.7.32
bc87d86 Prepare release notes for v1.7.32
6a05ddd Merge pull request #13450 from samuelkarp/oci-withuser-errrange-1.7
9c3d01b Merge pull request #13406 from k8s-infra-cherrypick-robot/cherry-pick-13327-t...
e55b747 seccomp: Block AF_ALG in default socket policy
4627a65 seccomp: Document socket rule scope and socketcall limitation
33d9e24 Merge pull request #10028 from brandond/fix-hosts-toml
503f479 oci: return explicit error for out-of-range USER values
4393e22 Merge pull request #13299 from chrishenzie/release/1.7-volatile
9407331 Support both styles of volatile mount option
Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR
@dependabot recreate will recreate this PR, overwriting any edits that have been made to it
@dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the Security Alerts page.

Assets 2

19 May 15:42

github-actions

v0.3.119

bf71679

Release v0.3.119

Summary

Timeout: Increase default HTTP client timeout from 5s to 30s. Both node-agent and synchronizer had the same 5s limit, causing a race where the synchronizer's ReadTimeout could close the connection before responding, producing spurious context deadline exceeded (Client.Timeout exceeded while awaiting headers) errors — no high CPU required to trigger this.
Mutex stall: eventsStorageMutex was held for the entire HTTP round-trip (up to 5s on timeout), blocking all ReportEvent/handleNetworkEvent/handleDnsEvent goroutines. Fixed by snapshotting under the lock and releasing before the HTTP call. A new Entities map with entity structs copied by value is sufficient — the clearing loop never mutates Inbound/Outbound maps in place, so the old maps become exclusively owned by the snapshot once the lock is released.
Empty-stream skip: sendNetworkEvent logged "skipping" for empty streams but still sent the HTTP request. Added the missing return nil.

Deploy note: the timeout fix requires the matching change in kubescape/synchronizer (raise ReadTimeout to 30s) to be deployed together.

Test plan

Verify no context deadline exceeded errors in node-agent logs after deploying both PRs together
Confirm empty network stream intervals no longer generate HTTP requests to the synchronizer
Confirm network/DNS event recording is not stalled during a synchronizer outage

🤖 Generated with Claude Code

Summary by CodeRabbit

Bug Fixes
- HTTP request timeout increased from 5 to 30 seconds for improved stability.
Performance
- Network event processing now skips unnecessary operations when no events are present.
- Improved concurrency handling in network event storage and transmission.

Assets 2

18 May 16:12

github-actions

v0.3.115

5cf899c

Release v0.3.115

Summary by CodeRabbit

Bug Fixes
- Prevents partial or non-terminal profiles from replacing cached entries; entries remain pending until a terminal status (Completed/TooLarge) and are retried safely.
- Refresh now preserves existing cache when a fetched profile is not terminal.
New Features
- Cache now accepts and promotes entries immediately after a profile reaches Completed, reducing delay.
Tests
- Updated tests and fixtures to reflect the refined completion-status behavior.

Assets 2

12 May 12:22

github-actions

v0.3.113

cc59fa0

Release v0.3.113

Summary by CodeRabbit

Bug Fixes
- Improved validation messaging for rules with missing profile configurations. Consolidated multiple individual error logs into a single aggregated warning message for clearer feedback and reduced log noise.

Assets 2

06 May 15:44

github-actions

v0.3.112

2d768cb

Release v0.3.112

Summary by CodeRabbit

Chores
- Service discovery now supports the API_URL environment variable for dynamic endpoint configuration, defaulting to api.armosec.io when unset.

Assets 2

Releases: kubescape/node-agent

Release v0.3.132

Summary

Key Improvements

Uh oh!

Release v0.3.129

Summary

Phase 1 — Traces, logs, drop counters

Phase 2 — Replace Prometheus metrics with OTEL SDK

New env vars

Breaking change

Test plan

Summary by CodeRabbit

Uh oh!

Release v0.3.128

Uh oh!

Release v0.3.127

Summary

What changed

How the meta-context works

Backward compatibility

Logic walk-through

Summary by CodeRabbit

Uh oh!

Release v0.3.124

Summary

Not yet fixable

Test plan

Uh oh!

Release v0.3.122

containerd 1.7.32

Uh oh!

Release v0.3.119

Summary

Test plan

Summary by CodeRabbit

Uh oh!

Release v0.3.115

Summary by CodeRabbit

Uh oh!

Release v0.3.113

Summary by CodeRabbit

Uh oh!

Release v0.3.112

Summary by CodeRabbit

Uh oh!