add latency benchmark mode#48
Open
RongLei-intel wants to merge 3 commits into
Open
Conversation
Contributor
Author
|
@lviner Could you help review this PR? We need this feature to measure the latency for collectives. Thanks! |
0742312 to
b2814b3
Compare
Collaborator
|
Hi @RongLei-intel, Thank you for your contribution. Please note that @lviner is no longer with Intel. We maintain two copies of this repository: an internal one (where CI runs) and this public repo. Updates are typically made internally first, then merged here. For internal PRs, we require each to be linked to a JIRA ticket. Could you please open a JIRA ticket and provide more details about the use case for this feature? If you have access to the internal repo, you can copy your commits there. If not, let me know and I can assist with copying them over. Thank you! |
6367dab to
f623d69
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces a new latency benchmark mode to the HCCL demo tool, allowing users to measure communication latency per iteration rather than overall throughput. The changes include new CLI options, updates to documentation, and logic to select the appropriate benchmarking method based on user input.
Latency Benchmark Feature:
--latency_benchmarktorun_hccl_demo.pyand documented its usage inREADME.md, enabling users to run tests that measure latency per iteration. [1] [2]run_hccl_demo.pyto set theHCCL_DEMO_LATENCY_BENCHMARKenvironment variable when latency benchmarking is enabled.Benchmark Logic and Reporting:
benchmark_latencyfunction inhccl_demo.cppto measure and report average latency per iteration, including warmup and correctness validation.hccl_demo.cppto select between throughput and latency benchmarking based on the new environment variable. [1] [2]hccl_demo.cppto distinguish between latency and throughput benchmarks in the summary output. [1] [2]Documentation:
README.md, clarifying its purpose and how to enable it.