Skip to content

Improve Site Explorer BMC scraping throughput #2934

Description

@williampnvidia

Site Explorer periodically scrapes BMC state for managed hosts. At large sites, the current scraping throughput can make the reported state significantly stale.

In one large deployment, Site Explorer currently scrapes about 90 BMCs concurrently, and a full batch takes roughly 4 minutes. For sites with a few thousand managed endpoints, this can push full refresh time beyond an hour. At larger scales, refresh time could grow to many hours.

We should improve Site Explorer performance so BMC state is refreshed much more quickly. Ideally, large deployments should get updated data within about 5 minutes when BMCs and network paths are healthy.

Potential areas to investigate:

  • Measure how far we can safely increase BMC scraping concurrency.
  • Improve per-BMC scrape efficiency, including use of more efficient Redfish APIs where available.
  • Review timeout behavior so slow, unreachable, or disconnected BMCs do not stall progress for healthy BMCs.
  • Add or improve metrics needed to understand scrape latency, timeout rates, and throughput bottlenecks.

Acceptance criteria:

  • Site Explorer has a documented performance profile for large deployments.
  • The implementation reduces full-site BMC refresh time substantially versus the current behavior.
  • Slow or unreachable BMCs do not disproportionately block scraping of healthy BMCs.
  • Follow-up issues are filed for any larger architectural work discovered during investigation.

Related to #1864

Metadata

Metadata

Labels

Type

No type
No fields configured for issues without a type.

Projects

Status
In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions