Parallel notebook execution via multiprocessing spawn pool#299
Conversation
Add -w/--workers CLI flag and workers notebook variable to run up to N notebooks simultaneously. Uses spawn context to avoid inheriting Jupyter's asyncio event loop into worker processes. Results stream in as each notebook finishes. Summary cell reports total CPU time, wall-clock time, and speedup — all derived from the returned testresults object. Default is workers=4.
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
|
@jacquesalice could you please review this PR? (not urgent). Run tests/testnotebooks.ipynb on gp13 with a few notebook directories specified in i |
jacquesalice
left a comment
There was a problem hiding this comment.
@rnikutta this worked great! I tested it out by comparing the run speed of the following example with the old code vs this new code:
nbs = get_nbs(paths=paths,
include=('/**/02_GettingStartedWithDataLab.ipynb',
'/**/ExploringSMASHDR2.ipynb',
'/**/DwarfGalaxiesInDelveDr2.ipynb',
'/**/GNIRS_DQS_DataAccessAtDataLab.ipynb',
'/**/LargeScaleStructureSdssCosmicSlime.ipynb',
'/**/ExtremelyMetalPoorStarsInTheLMC.ipynb'
),
exclude=exclude)
testresults = run(nbs, workers=workers)
The old code took 6min 19s to run, while the new code took only 3min 45s! PR Approved :-)
|
Great, many thanks @jacquesalice , I'll merge. |
Add -w/--workers CLI flag and workers notebook variable to run up to N notebooks simultaneously. Uses spawn context to avoid inheriting Jupyter's asyncio event loop into worker processes. Results stream in as each notebook finishes. Summary cell reports total CPU time, wall-clock time, and speedup — all derived from the returned testresults object. Default is workers=4.