A kernel-independent FMM without interaction lists.
The public package is built around explicit kernel/operator objects, a geometry-agnostic tree/skeletonization core, and GPU-friendly batched applies through PyTorch and PyKeOps when available.
This repository implements the method from Anna Yesypenko, Chao Chen, and Per-Gunnar Martinsson, "A simplified fast multipole method based on strong recursive skeletonization," Journal of Computational Physics 524 (2025), Article 113707. DOI: 10.1016/j.jcp.2024.113707
skelFMM is a research implementation of a novel kernel-independent fast
multipole method (FMM) for efficiently evaluating discrete kernel interactions
with given source distributions. This method introduces a
simplified approach that eliminates the need for explicit interaction lists by
leveraging near-neighbor computations at each level of an adaptive tree
structure. The algorithm is well-suited for parallelization on modern hardware
and supports a wide range of kernels.
Unlike traditional FMM implementations, skelFMM simplifies data structures
by operating exclusively on the near-neighbor list rather than the interaction
list. This makes the implementation lightweight and efficient while retaining
full compatibility with adaptive quad-tree and octree structures. The method
also introduces novel translation operators to handle adaptive point
distributions effectively.
The current package exposes that method through explicit operator objects, a geometry-agnostic skeletonization core, and GPU-friendly batched apply paths through PyTorch and PyKeOps when available.
For contribution and development guidelines, see CONTRIBUTING.md.
If you use skelFMM in your research, please cite:
@article{yesypenko2025simplified,
title={A simplified fast multipole method based on strong recursive skeletonization},
author={Yesypenko, Anna and Chen, Chao and Martinsson, Per-Gunnar},
journal={Journal of Computational Physics},
volume={524},
pages={113707},
year={2025},
publisher={Elsevier}
}For most users, the fastest way to get oriented is:
python -m pip install -e .
python -m skelfmm.examples.example_laplace_curvy_annulus
python -m skelfmm.examples.example_helmholtz_cube
python -m skelfmm.examples.example_sphere_adjoint_double_layer_kernel
python -m skelfmm.examples.driver_single_layer_kernels --geom square --rank_or_tol 1e-5 --N 1000000For direct use from Python:
import numpy as np
from skelfmm import SkelFMM, operators
points = np.random.default_rng(0).random((100_000, 2))
fmm = SkelFMM(points, operators.LAPLACE_2D, tol=1e-5, leaf_size=200)
random = np.random.RandomState(1)
x = random.randn(points.shape[0])
y = fmm.apply(x)Use fmm.build() if you want to time setup explicitly. Otherwise the first
apply() builds the compressed operator automatically and later applies reuse
that build.
If you are starting fresh, begin with:
SkelFMMfor building and applying an operatorskelfmm.operatorsfor the built-in kernelspython -m skelfmm.examples.example_laplace_curvy_annuluspython -m skelfmm.examples.driver_single_layer_kernels --help
For local development on CPU:
conda env create -f environment.yml
conda activate skelfmmenvFor CUDA machines:
conda env create -f environment-gpu.yml
conda activate skelfmmenv-gpuBoth environments install:
- the local
skelfmmpackage in editable mode - the
simpletreedependency torchpykeops
The CPU environment uses a CPU-only PyTorch wheel. The GPU environment uses the CUDA 12.1 PyTorch wheel index and is the recommended starting point for NVIDIA machines. It also installs the CUDA runtime and NVRTC wheels that PyKeOps needs for GPU JIT compilation.
For a plain editable install without Conda:
python -m pip install -e .[dev]The public package code now lives in skelfmm/.
skelfmm/skel_fmm.py: public fast-apply wrapperskelfmm/recursive_skel.py: core recursive skeletonization implementationskelfmm/batched_fmm.py: batched torch implementationskelfmm/operators.py: explicit kernel/operator objectsskelfmm/util.py: scalar kernels and skeletonization helpersskelfmm/util_batched.py: batched kernel helpersskelfmm/examples/: packaged examples and convergence studies
The mainline package path is the single-layer FMM/apply workflow together with the packaged sphere kernel convergence studies.
The package expects explicit operator objects rather than raw kernel callables.
The main built-ins live in skelfmm.operators, for example:
operators.LAPLACE_2Doperators.LAPLACE_3Doperators.HELMHOLTZ_2Doperators.HELMHOLTZ_3Doperators.LAPLACE_DOUBLE_LAYER_3Doperators.LAPLACE_ADJOINT_DOUBLE_LAYER_3Doperators.HELMHOLTZ_DOUBLE_LAYER_3Doperators.HELMHOLTZ_ADJOINT_DOUBLE_LAYER_3D
The Laplace and Helmholtz double-layer examples use the same operator infrastructure, with geometry passed separately from the operator object.
Run the package and mainline example tests with:
python -m pytest -q tests/test_operators.py tests/test_util.py tests/test_fmm.py tests/test_batched.py tests/test_examples.py tests/test_pykeops_cuda.pyThe example modules live in skelfmm/examples/.
skelfmm/examples/driver_single_layer_kernels.py: single-layer example across the built-in point geometries; use--kappa 0for Laplace and nonzero--kappafor Helmholtz. The CLI defaults to single precision and runs at the requestedN.skelfmm/examples/example_laplace_curvy_annulus.py: minimal 1M-point Laplace example on a curvy annulusskelfmm/examples/example_helmholtz_cube.py: minimal 1M-point Helmholtz example on a cube surface withkappa = 5skelfmm/examples/example_sphere_adjoint_double_layer_kernel.py: directSkelFMMexample for the sphere double-layer kernel and its adjoint, including an adjoint-identity check and 1M-scale matvec timingsskelfmm/examples/convergence_laplace_single_layer_sphere.py: Laplace single-layer sphere convergence studyskelfmm/examples/convergence_helmholtz_single_layer_sphere.py: Helmholtz single-layer sphere convergence studyskelfmm/examples/convergence_laplace_double_layer_sphere.py: Laplace double-layer kernel convergence on the sphere panel hierarchyskelfmm/examples/convergence_helmholtz_double_layer_sphere.py: Helmholtz double-layer kernel convergence on the sphere panel hierarchy
Run them from the repo root:
python -m skelfmm.examples.driver_single_layer_kernels --geom square --rank_or_tol 1e-5 --N 1000000
python -m skelfmm.examples.driver_single_layer_kernels --geom cube --rank_or_tol 1e-5 --N 1000000 --kappa 30.0
python -m skelfmm.examples.example_laplace_curvy_annulus
python -m skelfmm.examples.example_helmholtz_cube
python -m skelfmm.examples.convergence_laplace_single_layer_sphere
python -m skelfmm.examples.convergence_helmholtz_single_layer_sphere
python -m skelfmm.examples.convergence_laplace_double_layer_sphere
python -m skelfmm.examples.convergence_helmholtz_double_layer_sphereExample timing output reports Tapply_ms on CUDA machines and Tapply_s on
CPU-only runs.
To list the built-in point geometries for the CLI example:
python -m skelfmm.examples.driver_single_layer_kernels --helpThe mainline sphere convergence scripts use a fixed benchmark near the
1.3M-panel level rather than a conditional scaling path.
The notebooks live at the repo root:
sphere_double_layer_kernel_adjoint_demo.ipynb: self-contained mainline double-layer kernel notebook on the sphere panel hierarchy, including an adjoint checkskeletonization_matvec_curvy_annulus_demo.ipynb: skeletonization and matvec on a curvy annulus
Launch Jupyter from the repo root so imports resolve cleanly:
jupyter notebookFor the sphere geometry construction in the example layer, this repository credits the FLAM geometry utilities as an inspiration: klho/FLAM geom.
The package is intended to remain GPU-first through the torch batched path.
When PyKeOps provides a lazy kernel implementation, the batched code prefers it.
When a PyKeOps kernel is unavailable, the code falls back automatically to the
explicit torch batched kernel. The torch kernels use a larger chunking safety
factor than the PyKeOps kernels because they materialize dense pairwise blocks.
At import time, skelfmm also performs a best-effort CUDA environment setup
for PyKeOps by populating CUDA_PATH / CUDA_HOME and related search paths
from the active torch/CUDA installation.
On a machine without CUDA, pykeops falls back to CPU.