Add support for gpu computations by Joon-Klaps · Pull Request #15 · Joon-Klaps/rapidtrees

Joon-Klaps · 2026-06-04T12:02:17Z

No description provided.

codspeed-hq · 2026-06-04T12:04:24Z

Merging this PR will improve performance by 17.04%

⚠️

Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 2 improved benchmarks
✅ 14 untouched benchmarks

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
⚡	Memory	`rf_diverse`	1,034.3 KB	846.9 KB	+22.14%
⚡	Memory	`rf_similar`	164.6 KB	146.7 KB	+12.15%

Tip

Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.

_{Comparing GPU-implmentation (b05906d) with master (b46f8a9)}

codecov · 2026-06-04T12:16:26Z

Codecov Report

❌ Patch coverage is 41.92950% with 313 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.54%. Comparing base (b46f8a9) to head (b05906d).

Files with missing lines	Patch %	Lines
src/gpu.rs	23.89%	309 Missing ⚠️
src/lib.rs	0.00%	3 Missing ⚠️
src/distances.rs	96.96%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           master      #15       +/-   ##
===========================================
- Coverage   96.64%   86.54%   -10.10%     
===========================================
  Files           6        8        +2     
  Lines        2501     2950      +449     
===========================================
+ Hits         2417     2553      +136     
- Misses         84      397      +313

Flag	Coverage Δ
rust	`86.54% <41.92%> (-10.10%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…ates

…m for perpetutity.

Joon-Klaps · 2026-06-05T12:34:06Z

Dropping the GPU (wgpu) path; keeping the fast CPU path

I explored GPU acceleration for the pairwise distance matrix (branch GPU-implmentation, wgpu → Vulkan/Metal). I'm not merging it — the payoff doesn't justify the amount of code the additional layer of complexity & additional param.

Result (Tesla P100 vs 8-core CPU, RF): only 1.2–2.5× on large jobs, a slowdown on small ones, and the gain shrinks as taxa grow. RF is a memory-bandwidth-bound, near-zero-arithmetic kernel — a poor GPU fit — and our CPU path (bit-packed popcount + rayon, upper-triangle only) is already near-optimal, so there's little to win back.

Why not worth it:

Modest, size-dependent speedup only.
Reliability: wgpu's "device lost" (hit via the driver watchdog on long kernels) panics and aborts the whole process — uncatchable from Python, so we can't guarantee a graceful CPU fallback for treetracer users.
Portability: behavior varies by GPU/driver; most of the effort went into Vulkan loader/ICD plumbing, not algorithm work.
The CPU path has none of these hazards and is super fast already payoffs really become apparant (10s to 5s) only come when taxa is >5K and trees >5k.

The effort wasn't wasted. Debugging the GPU's memory limits surfaced a real CPU-side win: snapshot construction now interns bipartitions incrementally in bounded chunks instead of holding every tree's raw bitsets at once — peak construction memory drops ~15× (e.g. 4.4 GB → ~0.3 GB at 2000×5000), large runs that used to OOM now complete, results byte-identical. Plus some CI speedups. These land in a separate GPU-free PR.

Can revisit this if we are consistenly hitting time limits instead of memory limits.

Joon-Klaps added 2 commits June 4, 2026 13:49

Patch ci wheels for faster setups

b7aaaab

adding the gpu code

e04fbbc

Joon-Klaps added 2 commits June 4, 2026 14:07

patch ci pixi version

c230604

There are more pixi versions

e270ad6

Joon-Klaps added 11 commits June 4, 2026 14:39

allow gpu to be accsible from cli as well & pixi patch

ca84268

add extra R cross validate test to speed up ci?

0a416d7

Adding pip as dep in pixi

90f011a

undo failed pixi speedups

d2b869a

minimize code & remove env vars

0e7ce59

patch memory_time_benhmark to allow differnt iterations and std estim…

3c63930

…ates

reduce max trees to 5000

9b9928e

implement warning if nog gpu found & don't keep snapshot stored in me…

5880ec9

…m for perpetutity.

Include vulcan for HPC in pixi

cfbe184

adding more debug messages to detect gpus.

65ac3dd

Furhter trial and error to improve GPU

b05906d

Joon-Klaps closed this Jun 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for gpu computations#15

Add support for gpu computations#15
Joon-Klaps wants to merge 15 commits into
masterfrom
GPU-implmentation

Joon-Klaps commented Jun 4, 2026

Uh oh!

codspeed-hq Bot commented Jun 4, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 4, 2026 •

edited

Loading

Uh oh!

Joon-Klaps commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Joon-Klaps commented Jun 4, 2026

Uh oh!

codspeed-hq Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will improve performance by 17.04%

Performance Changes

Uh oh!

codecov Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Joon-Klaps commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codspeed-hq Bot commented Jun 4, 2026 •

edited

Loading

codecov Bot commented Jun 4, 2026 •

edited

Loading