Add performance profiling as a standard zstash test by forsyth2 · Pull Request #427 · E3SM-Project/zstash

forsyth2 · 2026-02-26T02:26:54Z

Summary

#414 introduced some early performance profiling for zstash, but these changes were not ultimately included in main. However, performance is becoming an ever more important feature of zstash, as that work showed. Other issues relating to performance: #402/#424, #249.

Objectives:

Test zstash runtimes on various combinations of variables:
- create, update, extract
- extract sequential vs parallel mode (especially relevant for Add Python 3.14 support #402/Fix Python 3.14 compatibility: AbsoluteLinkError, pickling errors, config propagation, HPSS hang, and print_all_contents deadlock in extract.py and parallel.py #424)
- --hpss as "none", an HPSS path, or a Globus path

Select one: This pull request is...

a bug fix: increment the patch version
a small improvement: increment the minor version
a new feature: increment the minor version
an incompatible (non-backwards compatible) API change: increment the major version

Small Change

To merge, I will use "Squash and merge". That is, this change should be a single commit.
Logic: I have visually inspected the entire pull request myself.
Pre-commit checks: All the pre-commits checks have passed.

forsyth2 · 2026-02-26T02:45:37Z

Action items:

Actually run these scripts and confirm visualizations and workflows are working properly
Run this code on main and on Add Python 3.14 support #402 and on Fix Python 3.14 compatibility: AbsoluteLinkError, pickling errors, config propagation, HPSS hang, and print_all_contents deadlock in extract.py and parallel.py #424 to compare the effects of parallelism changes needed for Python 3.14

forsyth2 · 2026-04-02T22:35:38Z

Results from running on 2026-04-02:

Notes:

Globus runtimes are definitely pretty variable. Notice the two wall clock times for zstash create with Globus differ substantially for instance.
Next time we run this, we can run against these results to get some comparison between runs.
If those look ok, we can probably merge this and try to do weekly performance testing on zstash's main branch.

Setup

cd ~/ez/zstash
git status
# On branch force-fork-parallel
# nothing to commit, working tree clean
git checkout add-performance-profiling
git log --oneline | head -n 10
# 9d6d15f Fixes made comparing pr402 and pr424
# a6cbc12 Update parameters
# 7467b68 Improve plots
# def78ac Fixes to generate plots
# ee61184 Add ability to configure hpss options
# c3ead41 Add regression testing
# 8662955 Send output to web server
# 8c3c555 Apply changes from Claude
# ee2dda7 Add performance profiling as a standard zstash test
# c14b8ee Add AGENTS.md (#423)

# Good, matches https://github.com/E3SM-Project/zstash/pull/427/commits
nersc_conda
rm -rf build
conda clean --all --y
conda env create -f conda/dev.yml -n zstash-pr427-performance-profile-20260402
conda activate zstash-pr427-performance-profile-20260402
pre-commit run --all-files
python -m pip install .
cd tests/performance
emacs generate_performance_data.bash # Edit parameters
git diff # Check diff
./generate_performance_data.bash
# ~2-3 hours to run, note there is the manual step to paste an auth code

# [SUCCESS] All tests completed. Results saved to: /pscratch/sd/f/forsyth/zstash_performance/performance_20260402/results.csv
# [INFO] Now edit IO paths and run: python visualize_performance.py

emacs visualize_performance.py # Edit parameters
git diff # Check diff
pre-commit run --all-files
git add -A
python visualize_performance.py
# Figure 1 (overview) saved to: /global/cfs/cdirs/e3sm/www/forsyth/zstash_performance/performance__20260402_pr427.png
#   Accessible at: https://portal.nersc.gov/cfs/e3sm/forsyth/zstash_performance/performance__20260402_pr427.png

forsyth2 · 2026-06-03T01:08:00Z

Rebased off the latest main.

Latest commit 1cb3eb8 is Claude's resolution of:

Looking at our 2026-04-14 profiling, the single-branch plots look ok, but the branch comparison plots have some issues:

the "baseline" bars appear to the left of the "current" bars. It would make more sense for them to be to the right. Currently, a taller bar on the right is good, but visually it looks like an increase in runtime!

For the "Extract: Sequential vs Parallel" plot, let's look at an example: "create: build/ update: run/" has 12 bars -- 3 groups of 4, a blue group for "No HPSS", an orange group for "Direct HPSS", and a green group for "Globus". Now let's dive into one of those groups, say the blue group. The 4 bars are "sequential current, sequential baseline, parallel current, parallel baseline". Based on the legend, we'd expect those to be differentiated by "solid blue, transparent hatched blue, solid double hatched blue, transparent blue that is somehow both double hatched for parallel and single hatched for baseline". What we actually see is "solid blue, transparent blue, solid double hatched blue, transparent double hatched blue"

For reference, previous plots:

2026-04-14 profiling -- this was to profile main during the testing period for E3SM Unified 1.13.0
2026-04-06 profiling -- this was to profile the refactoring of Refactor to resolve tars not deleting when --non-blocking is set #416.

Remaining action items:

Visual inspection of latest commit
Rerun plotting code on already existing profiling data
Try to automate keeping a history -- i.e., we should have a log of previous performance data, so we can review changes over time.

forsyth2 · 2026-06-03T19:12:23Z

Rerun plotting code on already existing profiling data

Setup

nersc_conda # Activate conda
rm -rf build
conda clean --all --y
conda env create -f conda/dev.yml -n zstash-pr427-20260603
conda activate zstash-pr427-20260603
pre-commit run --all-files

# Make changes to tests/performance/visualize_performance.py

python -m pip install .
cd tests/performance/
python visualize_performance.py
# Figure 1 (overview) saved to: /global/cfs/cdirs/e3sm/www/forsyth/zstash_performance/performance_pr427_20260603.png
#   Accessible at: https://portal.nersc.gov/cfs/e3sm/forsyth/zstash_performance/performance_pr427_20260603.png
# Figure 2 (baseline comparison) saved to: /global/cfs/cdirs/e3sm/www/forsyth/zstash_performance/performance_pr427_20260603_vs_baseline.png
#   Accessible at: https://portal.nersc.gov/cfs/e3sm/forsyth/zstash_performance/performance_pr427_20260603_vs_baseline.png

Fig1: https://portal.nersc.gov/cfs/e3sm/forsyth/zstash_performance/performance_pr427_20260603.png

Fig2: https://portal.nersc.gov/cfs/e3sm/forsyth/zstash_performance/performance_pr427_20260603_vs_baseline.png

✅ "baseline" bars are now to to the left of the "current" bars
✅ the 4 bars of each hpss type in the final extract plot of Fig2 are now distinguishable.

chengzhuzhang · 2026-06-03T19:26:17Z

@forsyth2 would it possible to do the performance profiling for zstash check ? There have been complaints about zstash check being slow. Would be useful to profile its performance and find potential solution to speed it up.

Copilot

Pull request overview

Adds a new, documented performance profiling workflow under tests/performance/ to
measure and visualize zstash runtime across operation modes and HPSS settings,
and updates the dev Conda environment to support the visualization tooling.

Changes:

Add a Perlmutter-oriented performance data generation script that records
zstash create/update/extract runtimes into a CSV.
Add a Python visualization script to plot performance results and optionally
compare against a baseline run.
Document the performance profiling workflow and link it from the main tests
README; add plotting/data dependencies to the dev Conda environment.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
`tests/README.md`	Adds a pointer to the new performance profiling docs.
`tests/performance/README.md`	Documents how to generate and visualize performance CSVs.
`tests/performance/generate_performance_data.bash`	Generates performance CSV by running a matrix of `zstash` operations/modes.
`tests/performance/visualize_performance.py`	Plots performance results and optional current-vs-baseline comparisons.
`conda/dev.yml`	Adds `matplotlib-base`, `pandas`, and `numpy` to support visualization.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

forsyth2 · 2026-06-03T20:22:15Z

@chengzhuzhang

There have been complaints about zstash check being slow.

I'm assuming you mean new/recent reports? I know we resolved the high-res team's issues by making more use of the --tars option.

would it possible to do the performance profiling for zstash check ?

I think that's good idea, but can we do it as a separate pull request? I think this PR is pretty close to merging, and I'd like to have something on main. Furthermore, debugging zstash check's slowness may give indicators of how best to profile it.

For what it's worth, Claude's assessment:

Looking at the code, adding zstash check would be a moderately sized addition — not trivial, but not a full rewrite either. Here's my assessment:

Arguments for a separate PR:

The current PR already has a coherent, complete scope (create/update/extract profiling with full visualization). Shipping it now lets you get feedback and merge a working system before extending it.
check has meaningfully different semantics — it validates archive integrity rather than measuring transfer/compression performance, so it may warrant its own visualization considerations (e.g., does it even make sense to compare check across HPSS modes the same way?).
If something goes wrong with the check implementation, it won't block merging the core profiling work.

Arguments for doing it now:

The plumbing is all there. Adding check to the bash script is straightforward: a run_check() function, a record_result call, and hooking it into the per-test loop after extract. The Python side just needs check added to OP_ORDER/OP_TITLES/OP_DIR_COL (or a new extract-style handler if it operates on the full archive).
You'd avoid a second round of review setup, CI runs, and README updates for what is mechanically similar work.

My recommendation: separate PR. The current PR is clean and self-contained. check deserves a moment to think through what its x-axis and grouping should be — it runs on an existing archive, so it's more like extract than create, but it doesn't use workers in the same way. Rushing that design into this PR risks muddying both the code and the review. Merge what you have, then open a focused check PR once you've decided how it fits into the visualization layout.

forsyth2 · 2026-06-03T20:28:34Z

There's also backwards compatibility concerns with the existing performance records, which is another point in favor of handling check in a new PR.

chengzhuzhang · 2026-06-03T20:29:29Z

@forsyth2 the recent performance improvement was made for zstash update. We can do something similar (a separate PR) as what has been done with this PR #420. To profile zstash check, and see if there is other potential bottleneck, or ways to speed up.

forsyth2 · 2026-06-03T20:36:02Z

the recent performance improvement was made for zstash update.

Yes, correct. zstash update required a performance update (#420) and zstash check just needed the use of --tars (#410 (comment))

We can do something similar (a separate PR) as what has been done with this PR #420.

In any case, yes, we should pursue this. I'll plan to get this PR in and then separately look into zstash check's performance (and begin developing its profiling mechanism in the process). Thanks!

forsyth2 · 2026-06-03T22:00:14Z

I created #446 to track the work on zstash check.

forsyth2 · 2026-06-03T23:57:52Z

Running latest changes

Process

Edit tests/performance/generate_performance_data.bash:

-unique_id=performance_20260402
-environment_commands="source /global/common/software/e3sm/anaconda_envs/test_e3sm_unified_1.13.0rc5_pm-cpu.sh"
+unique_id=performance_20260603
+environment_commands="source /global/common/software/e3sm/anaconda_envs/load_latest_e3sm_unified_pm-cpu.sh"

cd tests/performance/
./generate_performance_data.bash

This took between 1 and 2 hours. About 10 minutes in, it asks for the Globus auth code. At the end of its run, it prints:

[SUCCESS] All tests completed. Results saved to: /pscratch/sd/f/forsyth/zstash_performance/performance_20260603/results.csv
[SUCCESS] Results copied to: /global/homes/f/forsyth/zstash_performance_records/performance_20260603_results.csv
[INFO] Now edit IO paths and run: python visualize_performance.py

Edit tests/performance/visualize_performance.py:

 # The results to show in Fig. 1
 RESULTS_CSV: str = (
-    "/pscratch/sd/f/forsyth/zstash_performance/performance_20260414/results.csv"
+    "/pscratch/sd/f/forsyth/zstash_performance/performance_20260603/results.csv"
 )
 
 # The results to compare against in Fig. 2.
 # Set to None to skip Fig. 2.
 BASELINE_RESULTS_CSV: Optional[str] = (
-    "/pscratch/sd/f/forsyth/zstash_performance/performance_20260402/results.csv"
+    "/pscratch/sd/f/forsyth/zstash_performance/performance_20260414/results.csv"
 )
 
 # Output path for the saved figures.
 # Set to None to display interactively instead of saving.
 OUTPUT_PATH: Optional[str] = (
-    "/global/cfs/cdirs/e3sm/www/forsyth/zstash_performance/performance_pr427_20260603.png"
+    "/global/cfs/cdirs/e3sm/www/forsyth/zstash_performance/performance_pr427_20260603_run2.png"
 )

python visualize_performance.py

This runs very fast. At the end of its run, it prints:

Figure 1 (overview) saved to: /global/cfs/cdirs/e3sm/www/forsyth/zstash_performance/performance_pr427_20260603_run2.png
  Accessible at: https://portal.nersc.gov/cfs/e3sm/forsyth/zstash_performance/performance_pr427_20260603_run2.png
Figure 2 (baseline comparison) saved to: /global/cfs/cdirs/e3sm/www/forsyth/zstash_performance/performance_pr427_20260603_run2_vs_baseline.png
  Accessible at: https://portal.nersc.gov/cfs/e3sm/forsyth/zstash_performance/performance_pr427_20260603_run2_vs_baseline.png

Output format

The output format continues to look as expected:

✅ "baseline" bars are now to to the left of the "current" bars
✅ the 4 bars of each hpss type in the final extract plot of Fig2 are now distinguishable.

The results themselves

We're comparing to 2026-04-14, so we're seeing the performance changes of these commits (plus random variation):

Fig1: https://portal.nersc.gov/cfs/e3sm/forsyth/zstash_performance/performance_pr427_20260603_run2.png

Fig2: https://portal.nersc.gov/cfs/e3sm/forsyth/zstash_performance/performance_pr427_20260603_run2_vs_baseline.png

forsyth2

In addition to my latest test run, I've now done a high-level visual inspection of this largely-AI generated diff.

forsyth2 · 2026-06-04T15:33:46Z

+        # --- EXTRACT (sequential=1 worker, parallel=2 workers) ---
+        for num_workers in 1 2; do
+            extract_log="${log_dir}extract_${hpss_label}_${num_workers}workers.log"
+            extract_dir="${mode_dir}extract_${num_workers}workers/"


Just a note for implementing the performance profiling piece of #446

For this PR, we run extract twice on the same create-update sequence. This can be done simply by using a different extraction dir each time. So, for profiling check, we just need a third dir. Certainly no need to rerun the create-update sequence.

(There's still the question of how to fit check into the existing PDF, which is already fairly busy).

forsyth2

I think this is ready to merge. Further additions are planned (notably #446 for zstash check, and maybe also some sort of averages plot built on many records over time).

@chengzhuzhang @TonyB9000 I imagine I'll be the primary user of this for-developers feature, but do either of you want to review it before I merge it?

forsyth2 · 2026-06-04T16:11:53Z

Maybe also some sort of averages plot built on many records over time

Created #449

TonyB9000 · 2026-06-04T16:30:04Z

Hi @forsyth2 The work looks great. I admit to finding it hard to understand certain aspects of the plots (build, run, init). I get the "before" (hashed color) and "after" (latest branch). I think of zstash operations involving disk i/o, (reads, writes), hashing of contents for integrity, and network-transfers.

Come to think of it, strong (crypto-strength) hashing (MD5, SHA) is quite slow, as the intent is to thwart deliberate content manipulation (faking signatures in financial transactions, etc). For zstash, (and tarfiles in general), we only "hash" to detect simple corruption (bad bit-reads, etc). For that purpose, one could use a simple CRC (cyclic redundancy check) which is far faster as a "hashing" function.

forsyth2 · 2026-06-04T16:41:51Z

Thanks Tony!

certain aspects of the plots (build, run, init).

Those are really just meant to test directories of 3 different types: build/ has lots of small files, init/ has a few large files, and run/ is somewhere in between. The actual contents are irrelevant; I just wanted to be able to test on different combos of both file size and file count.

For that purpose, one could use a simple CRC (cyclic redundancy check) which is far faster as a "hashing" function.

That's an interesting suggestion. This is worth considering further!

forsyth2 · 2026-06-04T17:04:35Z

For that purpose, one could use a simple CRC (cyclic redundancy check) which is far faster as a "hashing" function.

@TonyB9000 When I ask Copilot, it says:

Based on this codebase, switching MD5 to CRC would likely give only a moderate end-to-end speedup, not a dramatic one.

zstash computes MD5 in hot paths (zstash/hpss_utils.py, zstash/extract.py) while already streaming file bytes.
So hashing adds CPU, but total time is still heavily I/O-bound (disk/HPSS/tar read-write).
Realistically, I’d expect:
- hashing step itself: CRC could be several times faster than MD5
- overall create/extract/check runtime: often more like single-digit to low double-digit % improvement (workload-dependent)

Implementation difficulty: moderate.

Why moderate (not trivial):

MD5 is embedded in schema and types (files.md5, tars.md5, settings.py row classes, SQL creation code).
Verification logic assumes MD5 (extract.py mismatch checks, tests that reference md5/md5sum).
Backward compatibility matters: existing archives/index DBs store MD5.
You’d likely need algorithm metadata + dual-read/verify support (old MD5, new CRC) or a migration strategy.

So: possible and worthwhile if CPU hashing is a bottleneck, but not a guaranteed large global speedup.
If you want low-risk gains, first benchmark create/check with and without tar-level hashing (--no_tars_md5) to measure how much of your runtime is hash-related today.

TonyB9000 · 2026-06-04T17:11:10Z

@forsyth2 I concur. Usually, disk i/o swamps CPU considerations. So if hashing currently accounts for 3% of overall processing, reducing it to 1% won;t give you much improvement. And if many related systems expect to see an MD5 hash, it could be a hassle replacing it with something else.

forsyth2 self-assigned this Feb 26, 2026

forsyth2 added the Testing Files in `tests` modified label Feb 26, 2026

This was referenced Feb 26, 2026

Add Python 3.14 support #402

Open

Add Python 3.14 support #428

Draft

forsyth2 mentioned this pull request Apr 3, 2026

Refactor to resolve tars not deleting when --non-blocking is set #416

Merged

14 tasks

forsyth2 mentioned this pull request Jun 2, 2026

Refactor for better maintainability #445

Draft

16 tasks

forsyth2 added 12 commits June 2, 2026 17:02

Add performance profiling as a standard zstash test

53e5dc3

Apply changes from Claude

35a69d2

Send output to web server

7302b54

Add regression testing

648f7e6

Add ability to configure hpss options

d9cf130

Fixes to generate plots

e6364cc

Improve plots

e415ec9

Update parameters

10539e0

Fixes made comparing pr402 and pr424

a8206ee

Updates for 2060402 profiling

85f093d

Enable changing environment

1768a8f

Fix issues in comparison plots

1cb3eb8

forsyth2 force-pushed the add-performance-profiling branch from 2a3d7dc to 1cb3eb8 Compare June 3, 2026 01:02

Update paths

5112331

Document process in README

4ff7f43

forsyth2 requested a review from Copilot June 3, 2026 20:14

Copilot started reviewing on behalf of forsyth2 June 3, 2026 20:14 View session

Copilot AI reviewed Jun 3, 2026

View reviewed changes

Address comments

258c90c

forsyth2 force-pushed the add-performance-profiling branch from c04a73e to 258c90c Compare June 3, 2026 21:47

forsyth2 mentioned this pull request Jun 3, 2026

[Feature]: Improve efficiency of zstash check #446

Open

Update IO paths

2fab8b2

forsyth2 marked this pull request as ready for review June 4, 2026 15:28

forsyth2 commented Jun 4, 2026

View reviewed changes

Address comments

01beff5

forsyth2 commented Jun 4, 2026

View reviewed changes

forsyth2 mentioned this pull request Jun 4, 2026

[Feature]: Add averages plot to performance profiling #449

Open

forsyth2 added the semver: new feature New feature (will increment minor version) label Jun 4, 2026

forsyth2 mentioned this pull request Jun 11, 2026

Profile zstash check #450

Draft

7 tasks

Conversation

forsyth2 commented Feb 26, 2026

Summary

Small Change

Uh oh!

forsyth2 commented Feb 26, 2026

Uh oh!

forsyth2 commented Apr 2, 2026

Uh oh!

forsyth2 commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

forsyth2 commented Jun 3, 2026

Uh oh!

chengzhuzhang commented Jun 3, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

forsyth2 commented Jun 3, 2026

Uh oh!

forsyth2 commented Jun 3, 2026

Uh oh!

chengzhuzhang commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

forsyth2 commented Jun 3, 2026

Uh oh!

forsyth2 commented Jun 3, 2026

Uh oh!

forsyth2 commented Jun 3, 2026

Running latest changes

Process

Output format

The results themselves

Uh oh!

forsyth2 left a comment

Choose a reason for hiding this comment

Uh oh!

forsyth2 Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

forsyth2 left a comment

Choose a reason for hiding this comment

Uh oh!

forsyth2 commented Jun 4, 2026

Uh oh!

TonyB9000 commented Jun 4, 2026

Uh oh!

forsyth2 commented Jun 4, 2026

Uh oh!

forsyth2 commented Jun 4, 2026

Uh oh!

TonyB9000 commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

forsyth2 commented Jun 3, 2026 •

edited

Loading

chengzhuzhang commented Jun 3, 2026 •

edited

Loading

TonyB9000 commented Jun 4, 2026 •

edited

Loading