Tests - Add basic utilities for module-level numerical tests#75
Tests - Add basic utilities for module-level numerical tests#75yzygitzh wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR adds basic utilities for module-level numerical tests to enable consistent testing and comparison of module behavior across different configurations. The utilities provide infrastructure for running numerical tests, capturing module statistics, and comparing results between different test runs.
Key changes include:
- Test infrastructure with base classes and configuration utilities
- Statistical analysis tools for computing means, standard deviations, and similarity metrics
- Environment variable management to preserve user NVTE settings during testing
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
tests/numerical_tests/modules/conftest.py |
Pytest configuration with result directory option and cleanup fixtures |
tests/numerical_tests/modules/test_module.py |
Base test class for module-level numerical tests with distributed setup and result saving |
tests/numerical_tests/modules/test_utilities.py |
Enhanced Utils class that preserves NVTE environment variables during testing |
tests/numerical_tests/utils/module_mean_and_std.py |
Statistical computation utility using Welford's algorithm for streaming mean/std calculation |
tests/numerical_tests/utils/module_similarity.py |
Cosine similarity calculation tool for comparing tensor statistics between test runs |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
| with open(args.output_file, 'w') as f: | ||
| json.dump(comparison_result, f, indent=2) | ||
|
|
||
| if __name__ == '__main__': |
| torch.save(mean_result, args.output_mean_file) | ||
| torch.save(std_result, args.output_std_file) | ||
|
|
||
| if __name__ == '__main__': |
|
|
||
|
|
||
| def pytest_addoption(parser): | ||
| parser.addoption( |
There was a problem hiding this comment.
Why do we need to add required option to run the tests
| bf16=config.bf16, | ||
| use_distributed_optimizer=True, | ||
| lr=1e-3, | ||
| clip_grad=0.0 |
There was a problem hiding this comment.
Why is there hard code for adam, lr, etc.
| seed = 42 | ||
| torch.manual_seed(seed) |
There was a problem hiding this comment.
do you need to set torch.cuda.manual_seed_all? and seed in python/numpy as well
| model_parallel_cuda_manual_seed(seed) | ||
|
|
||
| def teardown_method(self, method): | ||
| Utils.destroy_model_parallel() |
| torch.save(mean_result, args.output_mean_file) | ||
| torch.save(std_result, args.output_std_file) |
There was a problem hiding this comment.
why not return the value directly?
|
Marking as stale. No activity in 60 days. |
Adds basic utilities for module-level numerical tests, in tests/numerical_tests folder. Including: