Flow-Matching BDT

A small library for training flow-matching models. Its primary focus is using efficient algorithms for tabular learning — e.g. histogram boosted-decision trees — but it works with any scikit-learn compatible regressor.

Installation

pip install flowmatching-bdt

Quick Start

from sklearn.datasets import make_moons
from flowmatching_bdt import FlowMatchingBDT

data, _ = make_moons(n_samples=500, noise=0.1, random_state=0)
model = FlowMatchingBDT()

# train the model
model.fit(data)

# generate new samples
samples = model.predict(num_samples=500)

Conditional Generation

import numpy as np
from sklearn.datasets import make_moons
from flowmatching_bdt import FlowMatchingBDT

data, labels = make_moons(n_samples=500, noise=0.1, random_state=42)
model = FlowMatchingBDT()

model.fit(data, conditions=labels)

conditions = np.ones(500)
samples = model.predict(num_samples=500, conditions=conditions)

How It Works

Flow matching trains a model to predict a velocity field that transports samples from a simple source distribution (e.g. Gaussian noise) to the data distribution. This implementation:

Discretises the flow into n_flow_steps time steps
Trains one regressor per step to predict the velocity field
At inference, integrates the learned field using Euler steps to generate new samples

Gradient-boosted trees can learn this velocity field just as well as neural networks, while being faster to train on tabular data.

Useful Resources

Introduction to Flow Matching — Tor Fjelde, Emilie Mathieu, Vincent Dutordoir
Generating Tabular Data with XGBoost — Alexia Jolicoeur

Citation

This repository started as a reproduction of the following paper:

@inproceedings{jolicoeur2024generating,
  title={Generating and Imputing Tabular Data via Diffusion and Flow-based Gradient-Boosted Trees},
  author={Jolicoeur-Martineau, Alexia and Fatras, Kilian and Kachman, Tal},
  booktitle={International Conference on Artificial Intelligence and Statistics},
  pages={1288--1296},
  year={2024},
  organization={PMLR}
}

Acknowledgements

This repository is inspired heavily and borrows parts from lucidrains (project structure) and torch-cfm.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.github/workflows		.github/workflows
assets		assets
docs		docs
flowmatching_bdt		flowmatching_bdt
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
zensical.toml		zensical.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Flow-Matching BDT

Installation

Quick Start

Conditional Generation

How It Works

Useful Resources

Citation

Acknowledgements

About

Uh oh!

Releases 3

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Flow-Matching BDT

Installation

Quick Start

Conditional Generation

How It Works

Useful Resources

Citation

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages