Tests - Add LTP scripts to run module-level numerical tests#79
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR adds LTP (Long-Term Performance) scripts to run module-level numerical tests across different hardware platforms, specifically targeting NVIDIA H200 and AMD MI300X GPUs. The scripts automate the collection of numerical test statistics and enable comparison between platforms to ensure computational consistency.
- Scripts to execute and collect numerical test statistics on NVIDIA H200 and AMD MI300X platforms
- Automated comparison functionality to analyze numerical differences between platforms
- Support for running tests on multiple modules including attention, embedding, MLP, and others
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| run_numerical_tests_nvidia_h200_1n8g.sh | Script to run numerical tests on NVIDIA H200 platform with platform-specific environment setup |
| run_numerical_tests_amd_mi300x_1n8g.sh | Script to run numerical tests on AMD MI300X platform with ROCm and RCCL configurations |
| run_numerical_tests_platform_similarity.sh | Comparison script to analyze numerical similarity between NVIDIA H200 and AMD MI300X results |
Comments suppressed due to low confidence (1)
tests/test_utils/ltp_scripts/run_numerical_tests_nvidia_h200_1n8g.sh:4
- The version v1.1.4 for the grouped_gemm package may not exist. Please verify that this specific version tag exists in the repository before using it in the installation command.
pip install git+https://github.com/fanshiqing/grouped_gemm@v1.1.4
| mkdir -p ${result_dir}/${1}/module_mean_and_std | ||
| for name in ${file_names} | ||
| do | ||
| for x in {0..19} |
There was a problem hiding this comment.
Should we add configuration for running times, e.g.19
|
|
||
| run_numerical_tests() { | ||
| # Get raw module test results | ||
| for x in {0..19} |
There was a problem hiding this comment.
for all 19, should we to use parameter to replace.
| # Calculate module mean and std | ||
| file_names=$(find ${result_dir}/${1}/module_test -type f -printf "%f\n" | sort | uniq) | ||
| mkdir -p ${result_dir}/${1}/module_mean_and_std | ||
| for name in ${file_names} | ||
| do | ||
| for x in {0..19} | ||
| do | ||
| echo "${result_dir}/${1}/module_test/${x}/${name}" >> ${result_dir}/${1}/module_mean_and_std/input_list.txt | ||
| done | ||
| python \ | ||
| tests/numerical_tests/utils/module_mean_and_std.py \ | ||
| --input-list ${result_dir}/${1}/module_mean_and_std/input_list.txt \ | ||
| --output-mean-file ${result_dir}/${1}/module_mean_and_std/${name}.mean.pt \ | ||
| --output-std-file ${result_dir}/${1}/module_mean_and_std/${name}.std.pt | ||
| rm ${result_dir}/${1}/module_mean_and_std/input_list.txt | ||
| done |
There was a problem hiding this comment.
why not do the loop in a Python function directly? and it can avoid duplicate code in amd/nvidia sh
| # Calculate intra-module similarity | ||
| mkdir -p ${result_dir}/${1}/module_similarity | ||
| for name in ${file_names} | ||
| do | ||
| for x in {0..19} | ||
| do | ||
| for y in {0..19} |
| run_numerical_tests attention | ||
| run_numerical_tests bda | ||
| run_numerical_tests embedding | ||
| run_numerical_tests layer_norm | ||
| run_numerical_tests logits | ||
| run_numerical_tests mlp | ||
| run_numerical_tests rope |
There was a problem hiding this comment.
add the script first and move these lines to corresponding pr?
| sleep 10 | ||
| } | ||
|
|
||
| result_dir="./numerical_test_results/nvidia_h200" |
There was a problem hiding this comment.
will there be any issues if two runs use the same dir, maybe add a commit hash in the path?
| python \ | ||
| tests/numerical_tests/utils/module_similarity.py \ | ||
| --stats-a ${stats_dir_a}/${1}/module_mean_and_std/${name} \ | ||
| --stats-b ${stats_dir_b}/${1}/module_mean_and_std/${name} \ | ||
| --output-file ${result_dir}/${1}/module_similarity/${name}.json |
There was a problem hiding this comment.
what happens if there's mismatch? seems there's no assert in the code
|
Marking as stale. No activity in 60 days. |
Add LTP scripts to run module-level numerical tests. Including