PerfTools/Perfetto: in-process Perfetto tracing service#51271
PerfTools/Perfetto: in-process Perfetto tracing service#51271felicepantaleo wants to merge 2 commits into
Conversation
|
cms-bot internal usage |
|
-code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-51271/49862 ERROR: Build errors found during clang-tidy run. |
|
Demonstration video |
|
test parameters:
|
|
@cmsbuild please test |
|
type ngt |
|
@cmsbuild code-checks |
|
-code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-51271/49863 ERROR: Build errors found during clang-tidy run. |
|
@cmsbuild please test with cms-sw/cmsdist#10668 |
|
@cmsbuild code-checks with cms-sw/cmsdist#10668 |
|
@cmsbuild code-checks with cms.week0.PR_596340/56.1 |
|
@cmsbuild code-checks with cms.week0_PR_596340/56.1 |
|
code-checks with cms.week0.PR_3f29859a/100.0-cced86a6d5071160d38b54fd5b3ba33d |
I could maintain it until it becomes stable... |
|
+1 Size: This PR adds an extra 28KB to repository Comparison SummarySummary:
|
…hook Add a process-wide, dependency-free CachingAllocatorMonitor interface that the CachingAllocator notifies on every allocate/free and on usage changes (live, cached and requested bytes, per device). It is a no-op unless a monitor is installed -- a single atomic-pointer load on the hot path -- so it costs nothing when unused. PerfTools/Perfetto installs one to attribute device-memory traffic to the responsible module.
1e2728d to
4a97f81
Compare
|
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-51271/49897
|
|
I ran PhaseIITiming with and without --procModifiers alpaka. You find them in the folder perfetto-PR-51271 I marked the always-compiled allocator hook a single atomic-pointer load a predicted-not-taken branch. Marking that branch |
|
@cmsbuild please test |
|
I can confirm that there is no impact on the Run 3 HLT performance.
|
|
Just FYI - adding the hooks to the caching allocator clashes with the (old, by now) plan to move it outside of CMSSW. But we can figure out how to handle things if and when we actually get to do it. |
|
+1 Size: This PR adds an extra 36KB to repository The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:
You can see more details here: Comparison SummarySummary:
|
|
+heterogeneous |
Add PerfettoTraceService, an EDM service that records a .pftrace (openable at https://ui.perfetto.dev) of a cmsRun job: - module / acquire / EventSetup / source / cleanup slices on per-(stream, thread) lanes under each edm::stream, so concurrent and ExternalWork modules nest correctly without overlap; - a global "Throughput (events/s)" counter and per-stream run/lumi/event counters; - optional Alpaka caching-allocator tracing: alloc/free attributed to the module, plus live/cached/requested device-memory counters; - optional CUDA kernel tracing via CUPTI: real device-side timing, registers, static/dynamic shared memory, per-thread and total local memory, estimated occupancy and the correlation id linking back to the host launch; - optional CPU (RAPL) and GPU (NVML) power counter tracks at a configurable rate; - tier-B per-function macros and a module filter for focused, low-overhead runs. A catch2 regression test (test/testPerfettoTrace.cpp) records a trace and asserts the track/lane/counter structure, so a future perfetto SDK or framework change that silently drops a feature fails the build. The Perfetto SDK comes from the `perfetto` CMSSW external (<use name="perfetto"/>, #include <perfetto.h>) rather than vendored into the release.
|
@cmsbuild please test |
|
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-51271/49927
|
|
@cmsbuild please test |
|
+1 Size: This PR adds an extra 20KB to repository Comparison SummarySummary:
|
Adds PerfTools/Perfetto, an EDM service (PerfettoTraceService) that records an in-process Perfetto (https://perfetto.dev) trace (.pftrace) of a cmsRun job, openable by drag-and-drop at https://perfetto.web.cern.ch , entirely client-side, together with a small dependency-free monitor hook in HeterogeneousCore/AlpakaInterface that the Alpaka caching allocator uses to report device-memory traffic.
What it records:
edm::stream, so independent modules running concurrently within a stream, and an ExternalWork module'sacquire()/produce()running on different threads — nest correctly without overlapping or mis-paired slices;CMS_PERFETTO_FUNC()/CMS_PERFETTO_SCOPE()macros for optional intra-module instrumentation, and a traceModules filter for focused, low-overhead runs.Everything beyond the per-stream slices and counters is opt-in and off by default: with the optional features disabled the per-allocation cost is a single relaxed atomic load, and disabled trace categories cost only a predicated load.
Usage:
cmsDriver.py … --customise PerfTools/Perfetto/customisePerfetto.customise, or add the service directly; seePerfTools/Perfetto/README.md@rovere @makortel @fwyzard