Skip to content

fix(controlplane): move counters storage to agent's memory #789

@egnees

Description

@egnees

Problem

An agent's memory budget is enforced by the arenas in agent->block_allocator, set up at agent_attach time from memory_limit. Allocations through agent->memory_context are bounded by it.

A module's counter_registry correctly lives in agent memory, but the counter_storage spawned for that module — including the per-worker pages that dominate the size — does not. In module_ectx_create, both the counter_storage struct and the pages it allocates via cp_config->counter_storage_allocator come from cp_config->memory_context, the shared controlplane zone.

The number of counters and their sizes are declared by the module itself, so a single agent can spawn arbitrarily many pages in shared memory, bypassing its own limit.

Impact

agent->memory_limit does not actually bound an agent's footprint. A misbehaving or simply counter-heavy module can exhaust the controlplane zone and starve other agents, even when its own limit is set conservatively.

Function, pipeline, chain, and device counters are not in scope — those describe shared configuration owned by the controlplane and should remain in cp_config->memory_context.

Suggested direction

Allocate the module's counter_storage and its pages from the owning agent's memory context — e.g. a per-agent counter_storage_allocator bound to agent->memory_context, used in place of cp_config->counter_storage_allocator for the module ectx path.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions