Skip to content

Parallel notebook execution via multiprocessing spawn pool#299

Merged
rnikutta merged 2 commits into
masterfrom
nb_testing_parallel
Feb 25, 2026
Merged

Parallel notebook execution via multiprocessing spawn pool#299
rnikutta merged 2 commits into
masterfrom
nb_testing_parallel

Conversation

@rnikutta

Copy link
Copy Markdown
Member

Add -w/--workers CLI flag and workers notebook variable to run up to N notebooks simultaneously. Uses spawn context to avoid inheriting Jupyter's asyncio event loop into worker processes. Results stream in as each notebook finishes. Summary cell reports total CPU time, wall-clock time, and speedup — all derived from the returned testresults object. Default is workers=4.

Add -w/--workers CLI flag and workers notebook variable to run up to N
notebooks simultaneously. Uses spawn context to avoid inheriting
Jupyter's asyncio event loop into worker processes. Results stream in as
each notebook finishes. Summary cell reports total CPU time, wall-clock
time, and speedup — all derived from the returned testresults object.
Default is workers=4.
@review-notebook-app

Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@rnikutta

Copy link
Copy Markdown
Member Author

@jacquesalice could you please review this PR? (not urgent). Run tests/testnotebooks.ipynb on gp13 with a few notebook directories specified in include. All should be the same as before, but these can now run with a number of parallel processes (workers=[number]). The printing is delayed until after each test is done (otherwise we'd have intermingled incomplete outputs). Thanks!

@jacquesalice jacquesalice left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rnikutta this worked great! I tested it out by comparing the run speed of the following example with the old code vs this new code:

nbs = get_nbs(paths=paths,
              include=('/**/02_GettingStartedWithDataLab.ipynb',
                       '/**/ExploringSMASHDR2.ipynb',
                       '/**/DwarfGalaxiesInDelveDr2.ipynb',
                       '/**/GNIRS_DQS_DataAccessAtDataLab.ipynb',
                       '/**/LargeScaleStructureSdssCosmicSlime.ipynb',
                       '/**/ExtremelyMetalPoorStarsInTheLMC.ipynb'
                      ),
              exclude=exclude)

testresults = run(nbs, workers=workers)

The old code took 6min 19s to run, while the new code took only 3min 45s! PR Approved :-)

@rnikutta

Copy link
Copy Markdown
Member Author

Great, many thanks @jacquesalice , I'll merge.

@rnikutta rnikutta merged commit a39125d into master Feb 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants