Skip to content

michelin/TorchSOM

torchsom: The Reference PyTorch Library for Self-Organizing Maps

PyPI version Python versions PyTorch versions License: Apache 2.0 arXiv

Tests Code Quality Coverage Downloads GitHub stars

TorchSOM_logo

GPU-accelerated Self-Organizing Maps in PyTorch with a scikit-learn API, rich visualization, and clustering --- from dimensionality reduction to Just-In-Time Learning.

Paper | Documentation | Quick Start | Examples | Contributing

⭐ If you find torchsom valuable, please consider starring this repository ⭐


Overview

Self-Organizing Maps (SOMs) remain highly relevant in modern machine learning due to their interpretability, topology preservation, and computational efficiency. They are widely used in energy systems, biology, IoT, environmental science, and industrial applications.

Despite their utility, the Python SOM ecosystem is fragmented - existing implementations are often outdated, unmaintained, and lack GPU acceleration or integration with modern deep learning frameworks.

torchsom addresses these gaps as a reference PyTorch library for SOMs, providing:

  • GPU-accelerated training via PyTorch CUDA backend
  • Advanced clustering (K-Means, GMM, HDBSCAN) on the SOM latent space
  • A scikit-learn-style API for ease of use and extensibility
  • Rich visualization tools for both rectangular and hexagonal topologies
  • Just-In-Time Learning (JITL) for supervised regression and classification

This library accompanies the paper: torchsom: The Reference PyTorch Library for Self-Organizing Maps (Berthier et al., 2025). If you use torchsom in academic or industrial work, please cite both the paper and the software (see Citation).

Key Results

Benchmarked against MiniSom on synthetic datasets (240–16,000 samples, 4–300 features) with identical hyperparameters:

Metric Improvement
Training speed Up to 99% faster (GPU) and 77–98% faster (CPU)
Topographic Error 34–81% lower — better topology preservation
Quantization Error Comparable fidelity across all configurations

Hardware: Intel Xeon Platinum 8370C (CPU), NVIDIA Tesla T4 (GPU). See the paper for full benchmark tables.

Reproducing the JMLR benchmarks. All scripts, configurations, and the exact MiniSom pin (v2.3.5 / 65b6ba6) used to produce these numbers are released under benchmark/ — see benchmark/README.md for a step-by-step walkthrough. Two annotated tags pin the version of record: jmlr-submission-v1 (original October 2025 submission) and jmlr-revision-v1 (accepted revised version). git checkout <tag> reproduces the corresponding Table 2.


How It Works

A SOM is an unsupervised neural network that maps high-dimensional data onto a low-dimensional grid (typically 2D) while preserving topological relationships. At each training step, the Best Matching Unit (BMU) — the neuron closest to the input — is identified, and its weights along with its neighbors are updated:

$$\mathbf{w}_{ij}(t+1) = \mathbf{w}_{ij}(t) + \alpha(t) \cdot h_{ij}(t) \cdot \bigl(\mathbf{x} - \mathbf{w}_{ij}(t)\bigr)$$

where $\alpha(t)$ is the learning rate, $h_{ij}(t)$ is a neighborhood function (e.g., Gaussian) centered on the BMU, and $\mathbf{x} \in \mathbb{R}^k$ is the input vector. The BMU is found by:

$$\text{BMU} = \underset{i,j}{\arg\min}, \lVert \mathbf{x} - \mathbf{w}_{ij} \rVert_2$$

Training quality is assessed via Quantization Error (representation fidelity) and Topographic Error (topology preservation). See the documentation for the full mathematical background.


Why torchsom?

torchsom MiniSom SimpSOM SOMPY somoclu som-pbc
Framework PyTorch NumPy NumPy NumPy C++/CUDA NumPy
GPU Acceleration ✅ CUDA ✅ CuPy/CUML ✅ CUDA
API Design scikit-learn Custom Custom MATLAB Custom Custom
Maintenance ✅ Active ✅ Active ⚠️ Minimal ⚠️ Minimal ⚠️ Minimal
Documentation ✅ Rich ⚠️ Basic ⚠️ Basic ⚠️ Basic
Test Coverage ✅ 90% ~53% Minimal
Visualization ✅ Advanced Moderate Moderate Basic Basic
Clustering ✅ Advanced
JITL Support ✅ Built-in
SOM Variants PBC, Growing*, Hierarchical* PBC PBC PBC

* Work in progress

Just-In-Time Learning (JITL): Given an online query, JITL collects relevant samples by topology and distance to form a local buffer. A lightweight local model is then trained on this buffer, enabling efficient supervised learning (regression or classification).


Quick Start

import torch
from torchsom.core import SOM
from torchsom.visualization import SOMVisualizer

som = SOM(x=10, y=10, num_features=3, epochs=50)

X = torch.randn(1000, 3)
som.initialize_weights(data=X, mode="pca")
q_errors, t_errors = som.fit(data=X)

visualizer = SOMVisualizer(som=som)
visualizer.plot_training_errors(
    quantization_errors=q_errors, topographic_errors=t_errors
)
visualizer.plot_hit_map(data=X, batch_size=256)
visualizer.plot_distance_map(
    distance_metric=som.distance_fn_name,
    neighborhood_order=som.neighborhood_order,
    scaling="sum",
)

Tutorials

Explore our collection of Jupyter notebooks:

Notebook Task Dataset
iris.ipynb Multiclass classification Iris
wine.ipynb Multiclass classification Wine
boston_housing.ipynb Regression Boston Housing
energy_efficiency.ipynb Multi-output regression Energy Efficiency
clustering.ipynb Clustering analysis Synthetic blobs

Visualization Gallery

D-Matrix (U-Matrix)
Inter-neuron distances
D-Matrix
Hit Map
BMU activation frequency
Hit Map
Mean Map
Target value distribution
Mean Map
Component Planes
Feature-wise weight distribution
Component Plane 1
Classification Map
Dominant class per neuron
Classification Map
HDBSCAN Cluster Map
Cluster assignment
HDBSCAN Cluster Map
Component Planes
Another feature dimension
Component Plane 2
K-Means Elbow
Optimal cluster selection
K-Means Elbow
Cluster Quality Metrics
Algorithm comparison
Cluster Metrics

Installation

This project uses uv for fast, reproducible dependency management.

From PyPI

uv add torchsom

With optional FAISS acceleration for BMU search:

uv add torchsom[faiss]

Development Setup

git clone https://github.com/michelin/TorchSOM.git
cd TorchSOM
uv sync --all-extras      # creates .venv and installs everything

All Make targets use uv run so the correct environment is always activated:

make help                  # see all available commands
make cov                   # run tests with coverage
make check                 # lint / type-check
make fix                   # auto-format
make docs                  # build documentation

Documentation

Comprehensive documentation is available at opensource.michelin.io/TorchSOM, including:

  • Getting Started: installation, quick start, SOM concepts
  • User Guide: visualization, architecture, benchmarks
  • API Reference: core, utils, visualization, configs
  • Additional Resources: FAQ, troubleshooting, changelog

Citation

If you use torchsom in your academic, research, or industrial work, please cite both the paper and the software:

@misc{berthier2025torchsom,
    title={torchsom: The Reference PyTorch Library for Self-Organizing Maps},
    author={Berthier, Louis and Shokry, Ahmed and Moreaud, Maxime
            and Ramelet, Guillaume and Moulines, Eric},
    year={2025},
    eprint={2510.11147},
    archivePrefix={arXiv},
    primaryClass={stat.ML},
    note={Preprint submitted to Journal of Machine Learning Research},
    url={https://arxiv.org/abs/2510.11147}
}

@software{berthier2025torchsom_software,
    author={Berthier, Louis},
    title={torchsom: The Reference PyTorch Library for Self-Organizing Maps},
    year={2025},
    version={1.1.1},
    url={https://github.com/michelin/TorchSOM},
    note={Documentation available at \url{https://opensource.michelin.io/TorchSOM/}}
}

For more details, see the CITATION file.


Contributing

We welcome contributions from the community! See our Contributing Guide and Code of Conduct for details.


Acknowledgments


License

torchsom is licensed under the Apache License 2.0. See the LICENSE file for details.


Related Work and References

Foundational Literature

Related Software

  • MiniSom: Minimalistic Python SOM
  • SimpSOM: Simple Self-Organizing Maps
  • SOMPY: Python SOM library
  • somoclu: Massively Parallel Self-Organizing Maps
  • som-pbc: SOM with periodic boundary conditions
  • SOM Toolbox: MATLAB implementation

About

TorchSOM is a PyTorch-based library for training Self-Organizing Maps (SOMs), a model trained in an unsupervised manner, that can be used for clustering, dimensionality reduction and data visualization. It is designed to be scalable and user-friendly.

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors