perf(graphs): vectorize PlanarAreaWeights with shoelace area#1191
perf(graphs): vectorize PlanarAreaWeights with shoelace area#1191frazane wants to merge 8 commits into
Conversation
| has_neg = np.add.reduceat((turn < 0).astype(np.intp), starts) > 0 | ||
| flagged = np.flatnonzero(has_pos & has_neg) | ||
|
|
||
| # Exact ConvexHull fallback for the rare degenerate regions (matches the old code). |
There was a problem hiding this comment.
why do you still need this? what are are rare degenerate regions?
There was a problem hiding this comment.
Sometimes the data (as in our case, see #690) can be numerically hard for Qhull to compute Voronoi regions, so we need to set a large joggle to be able to do it. When we do, some regions are "degenerate" in the sense that they have near-collinear, near-duplicate vertices and things like this. The old way of computing the areas automatically handles these cases, so we use it as a fallback since the shoelace algorithm does not.
|
Hey @frazane nice one! Out of curiosity, how were you profiling the graph creation? |
Hi @anaprietonem nothing fancy, just |
Description
PlanarAreaWeights(andMaskedPlanarAreaWeights, which inherits it) computed each node's planar Voronoi-cell area in a Python loop that constructed onescipy.spatial.ConvexHullper node. This replaces that loop with a vectorised shoelace-formula computation over all Voronoi regions at once, with an exactConvexHullfallback for the rare degenerate regions where the shoelace formula would be wrong. The public behaviour of both attributes is unchanged.What problem does this change solve?
Performance, not a bugfix or feature. The per-node
ConvexHullloop is the dominant cost when generating stretched-grid / LAM graphs that use planar area weights.Measured on a stretched lam=10 graph (cutout of a 1 km regional dataset + a global N320 one; 1.69M data nodes), comparing
mainand this branch on the same Voronoi tessellation so only the area extraction differs:mainPlanarAreaWeights.compute_area_weights(area step)About 60x faster on the part this PR changes. The surrounding
Voronoibuild (~60 s) and everything else are untouched; this area step was the single biggest cost in data-node generation for such graphs.The output is numerically equivalent to the previous
ConvexHullresult: across all 1.69M cells the maximum relative difference is 1.1e-08 (float32-epsilon level), and the degenerate cells are matched exactly by the fallback. It is not bit-identical — the shoelace andConvexHullfloat computations differ at ~1e-11, so a small fraction of cells land on the far side of a float32 rounding boundary. This is below the precision of the stored (float32) weights.What issue or task does this change relate to?
No tracking issue — found while profiling
anemoi-graphs createon stretched-grid recipes.Additional notes
How it works:
1/2 |sum_i (x_i*y_{i+1} - x_{i+1}*y_i)|, evaluated for all regions at once via flat indexing andnp.add.reduceat. SciPy returns 2-D region vertices in boundary order, so for a convex cell this equalsConvexHull(...).volume.QJ, needed for numerically hard grids) can return degenerate regions carrying interior or near-collinear vertices, where a plain shoelace under-counts the area. Such regions are detected (a convex polygon's consecutive edge-turn cross products all share one sign, so a region with both signs is non-convex) and recomputed exactly withConvexHull, matching the previous behaviour. On the lam=10 graph above this affected 5 of 1.69M regions; detection false positives are harmless, as a clean cell simply takes the exact path.Tests: a clean-equivalence test (areas match the per-node
ConvexHullreference on a clustered patch resembling a regional cutout) and a degenerate-fallback test (injects an interior vertex into a real Voronoi region and asserts the exact hull area is recovered — verified to fail by ~18% without the fallback). The fullgraphstest suite passes, aside from pre-existing failures that only need the optionalh3dependency.This change was developed in tandem with AI coding agents.
As a contributor to the Anemoi framework, please ensure that your changes include unit tests, updates to any affected dependencies and documentation, and have been tested in a parallel setting (i.e., with multiple GPUs). As a reviewer, you are also responsible for verifying these aspects and requesting changes if they are not adequately addressed. For guidelines about those please refer to https://anemoi.readthedocs.io/en/latest/
By opening this pull request, I affirm that all authors agree to the Contributor License Agreement.