Skip to content

Tests - Add basic utilities for module-level numerical tests#75

Open
yzygitzh wants to merge 2 commits into
devfrom
ziyue/pr-numerical-test-base
Open

Tests - Add basic utilities for module-level numerical tests#75
yzygitzh wants to merge 2 commits into
devfrom
ziyue/pr-numerical-test-base

Conversation

@yzygitzh

Copy link
Copy Markdown
Contributor

Adds basic utilities for module-level numerical tests, in tests/numerical_tests folder. Including:

  • modules/conftest.py: common arguments and fixtures for test classes.
  • modules/test_module.py: base class to run module-level numerical tests and dump results.
  • modules/test_utilities.py: revised Utils class from unit tests, to respect NVTE environmental variables from users.
  • utils/module_mean_and_std.py: calculate mean and std of several module test trials.
  • utils/module_similarity: calculate cosine similarity of two module test stats. The stats can be either raw value, mean or std.

@yzygitzh yzygitzh requested a review from a team as a code owner July 26, 2025 07:59
@yzygitzh yzygitzh added the CI/CD label Jul 26, 2025
@yzygitzh yzygitzh changed the title Test - Add basic utilities for module-level numerical tests Tests - Add basic utilities for module-level numerical tests Jul 26, 2025
@yzygitzh yzygitzh requested a review from Copilot July 28, 2025 03:11

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds basic utilities for module-level numerical tests to enable consistent testing and comparison of module behavior across different configurations. The utilities provide infrastructure for running numerical tests, capturing module statistics, and comparing results between different test runs.

Key changes include:

  • Test infrastructure with base classes and configuration utilities
  • Statistical analysis tools for computing means, standard deviations, and similarity metrics
  • Environment variable management to preserve user NVTE settings during testing

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/numerical_tests/modules/conftest.py Pytest configuration with result directory option and cleanup fixtures
tests/numerical_tests/modules/test_module.py Base test class for module-level numerical tests with distributed setup and result saving
tests/numerical_tests/modules/test_utilities.py Enhanced Utils class that preserves NVTE environment variables during testing
tests/numerical_tests/utils/module_mean_and_std.py Statistical computation utility using Welford's algorithm for streaming mean/std calculation
tests/numerical_tests/utils/module_similarity.py Cosine similarity calculation tool for comparing tensor statistics between test runs

Comment thread tests/numerical_tests/utils/module_similarity.py Outdated
Comment thread tests/numerical_tests/utils/module_similarity.py
Comment thread tests/numerical_tests/utils/module_similarity.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
with open(args.output_file, 'w') as f:
json.dump(comparison_result, f, indent=2)

if __name__ == '__main__':

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When do you use it?

torch.save(mean_result, args.output_mean_file)
torch.save(std_result, args.output_std_file)

if __name__ == '__main__':

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need main?



def pytest_addoption(parser):
parser.addoption(

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to add required option to run the tests

bf16=config.bf16,
use_distributed_optimizer=True,
lr=1e-3,
clip_grad=0.0

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there hard code for adam, lr, etc.

@cp5555 cp5555 requested a review from abuccts July 28, 2025 21:58
Comment on lines +28 to +29
seed = 42
torch.manual_seed(seed)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need to set torch.cuda.manual_seed_all? and seed in python/numpy as well

model_parallel_cuda_manual_seed(seed)

def teardown_method(self, method):
Utils.destroy_model_parallel()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

destroy distributed as well?

Comment on lines +65 to +66
torch.save(mean_result, args.output_mean_file)
torch.save(std_result, args.output_std_file)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not return the value directly?

@github-actions

Copy link
Copy Markdown

Marking as stale. No activity in 60 days.

@github-actions github-actions Bot added the stale label Sep 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants