WIP: Execution benchmark#1634
Conversation
….com:llnl/conduit into task/siramok/06_22_26/real_node_backed_tests
…/siramok/06_24_26/execution_benchmark
…/siramok/06_24_26/execution_benchmark
| // TODO: Do we care to have bespoke timing in the case that conduit was built | ||
| // without caliper? |
There was a problem hiding this comment.
Probably not, but we want to use other performance metrics. Caliper doesn't tell us everything we would like to know. Maybe we could explore some of these tools:
- LC Hosted tool for GPU profile vis: https://lc.llnl.gov/perfetto/
- nsys: Conduit Device Support Ongoing Development #1614 (comment)
- rocprof for AMD
- Hatchet: https://github.com/LLNL/hatchet
- Thicket: https://github.com/llnl/thicket
I think ultimately some combination of tools will be able to give us more insightful metrics than just using Caliper. Caliper is still useful as a baseline since we have it set up for easy use in Conduit, but it may be useful to look at expanding our Caliper options since it can do much more than we use it for: https://github.com/Alpine-DAV/ascent/wiki/Caliper-How-To
There was a problem hiding this comment.
I removed this TODO.
Looks like caliper can be configured to output a file that hatchet can consume. Furthermore, it looks like thicket builds on hatchet, and therefore consumes the same data. I've changed the caliper config to output this file, but I think the file is currently getting overwritten for each benchmark combo, so I need to investigate a fix.
| CONDUIT_ANNOTATE_MARK_FUNCTION; | ||
| benchmark::exec(coordset_uniform_to_explicit, | ||
| BENCHMARK_NUM_WARMUP_ITERATIONS, | ||
| BENCHMARK_NUM_ITERATIONS, | ||
| {benchmark::FillMode::None}); | ||
| benchmark::exec(coordset_rectilinear_to_explicit, | ||
| BENCHMARK_NUM_WARMUP_ITERATIONS, | ||
| BENCHMARK_NUM_ITERATIONS, | ||
| {benchmark::FillMode::None}); |
There was a problem hiding this comment.
I don't understand; where are the execution options set? If we want to run the ported algorithms with different configurations, we need to set the execution options and then iteratively loop over the available execution options for execution location, output location, fallback location, and sync strategy. We will also need to experiment with having source data start on the host versus device. For these coordset transforms, we create destination data so we don't have to worry about where destination data starts.
There was a problem hiding this comment.
Not implemented yet, but on the todo list.
There was a problem hiding this comment.
This has been added for execution location, however it revealed a problem that will need discussion on how to proceed:
Essentially, it's not good enough for t_blueprint_mesh_transform_benchmark.cpp to be compiled with CUDA/HIP TUs. We also have to do that for conduit_blueprint_mesh.cpp to get device support in the coordset transforms.
The problem is that attempting to do so on the entire file is currently not possible. NVCC throws several compilation errors due to the polyhedral_* functions (which have not been ported).
There are at least two solutions that I can think of, but both are (potentially) problematic for different reasons.
Co-authored-by: Justin Privitera <35237779+JustinPrivitera@users.noreply.github.com>
The purpose of this PR is to create (some initial vision of) benchmarking infrastructure that is flexible enough to test arbitrary
execution::code. This should help us quantify performance improvements as we port existing portions of the codebase to useexecution::. We discussed comparing performance across different execution policies, scaling to different data sizes and data types, and measuring data transfer overhead (host <--> device vs. same <--> same).Currently has: