swarm: guard empty-partition reshape in _pack_array_to_data_format (parallel read_timestep crash)#221
Open
lmoresi wants to merge 1 commit into
Open
swarm: guard empty-partition reshape in _pack_array_to_data_format (parallel read_timestep crash)#221lmoresi wants to merge 1 commit into
lmoresi wants to merge 1 commit into
Conversation
A rank that owns no local particles has an N=0 array; numpy then cannot infer the -1 component dimension in array_data.reshape(shape[0], -1) (raises "cannot reshape array of size 0 into shape (0,newaxis)"). This crashed parallel read_timestep of a vector field (e.g. velocity) at np>=2 when a rank had an empty partition, killing the run at checkpoint load. Compute the component count from the trailing dims explicitly when the array is empty. Underworld development team with AI support from Claude Code
Contributor
There was a problem hiding this comment.
Pull request overview
Fixes a parallel SwarmVariable.read_timestep crash when a rank owns zero local particles by avoiding NumPy’s reshape(..., -1) inference on size-0 arrays inside SwarmVariable._pack_array_to_data_format.
Changes:
- Add an empty-array guard in
SwarmVariable._pack_array_to_data_formatto compute component count explicitly before reshaping. - Prevent parallel checkpoint loads from failing on empty-partition ranks during routed
read_timestep.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+936
to
939
| if array_data.size == 0: | ||
| ncomp = int(np.prod(array_data.shape[1:])) if array_data.ndim > 1 else 1 | ||
| return array_data.reshape(array_data.shape[0], ncomp) | ||
| return array_data.reshape(array_data.shape[0], -1) |
Comment on lines
+932
to
+936
| # infer the -1 component dimension ("cannot reshape array of size 0 into | ||
| # shape (0,newaxis)"). This bites a rank that owns no local particles | ||
| # during a parallel read_timestep. Compute the component count from the | ||
| # trailing dims explicitly. | ||
| if array_data.size == 0: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
On a rank that owns no local particles,
SwarmVariable._pack_array_to_data_formatdoesarray_data.reshape(array_data.shape[0], -1)on a size-0 array. numpy cannot infer the-1component dimension from a 0-element array and raises:This crashes parallel
read_timestepof a vector field (e.g. P2 velocity) atnp>=2whenever a rank lands an empty partition — killing the run at checkpoint load.Fix
When the array is empty, compute the component count explicitly from the trailing dims (
prod(shape[1:])) instead of relying on-1inference. Non-empty path unchanged.Validation
Parallel (np=4/5)
read_timestepof velocity from a serial checkpoint now succeeds; reproduced the crash before, gone after. Used throughout the parallel adaptive-convection runs.Underworld development team with AI support from Claude Code