Skip to content

Add support for gpu computations#15

Closed
Joon-Klaps wants to merge 15 commits into
masterfrom
GPU-implmentation
Closed

Add support for gpu computations#15
Joon-Klaps wants to merge 15 commits into
masterfrom
GPU-implmentation

Conversation

@Joon-Klaps

Copy link
Copy Markdown
Owner

No description provided.

@codspeed-hq

codspeed-hq Bot commented Jun 4, 2026

Copy link
Copy Markdown

Merging this PR will improve performance by 17.04%

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 2 improved benchmarks
✅ 14 untouched benchmarks

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Memory rf_diverse 1,034.3 KB 846.9 KB +22.14%
Memory rf_similar 164.6 KB 146.7 KB +12.15%

Tip

Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.


Comparing GPU-implmentation (b05906d) with master (b46f8a9)

Open in CodSpeed

@codecov

codecov Bot commented Jun 4, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 41.92950% with 313 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.54%. Comparing base (b46f8a9) to head (b05906d).

Files with missing lines Patch % Lines
src/gpu.rs 23.89% 309 Missing ⚠️
src/lib.rs 0.00% 3 Missing ⚠️
src/distances.rs 96.96% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master      #15       +/-   ##
===========================================
- Coverage   96.64%   86.54%   -10.10%     
===========================================
  Files           6        8        +2     
  Lines        2501     2950      +449     
===========================================
+ Hits         2417     2553      +136     
- Misses         84      397      +313     
Flag Coverage Δ
rust 86.54% <41.92%> (-10.10%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Joon-Klaps

Copy link
Copy Markdown
Owner Author

Dropping the GPU (wgpu) path; keeping the fast CPU path

I explored GPU acceleration for the pairwise distance matrix (branch GPU-implmentation, wgpu → Vulkan/Metal). I'm not merging it — the payoff doesn't justify the amount of code the additional layer of complexity & additional param.

Result (Tesla P100 vs 8-core CPU, RF): only 1.2–2.5× on large jobs, a slowdown on small ones, and the gain shrinks as taxa grow. RF is a memory-bandwidth-bound, near-zero-arithmetic kernel — a poor GPU fit — and our CPU path (bit-packed popcount + rayon, upper-triangle only) is already near-optimal, so there's little to win back.

Why not worth it:

  • Modest, size-dependent speedup only.
  • Reliability: wgpu's "device lost" (hit via the driver watchdog on long kernels) panics and aborts the whole process — uncatchable from Python, so we can't guarantee a graceful CPU fallback for treetracer users.
  • Portability: behavior varies by GPU/driver; most of the effort went into Vulkan loader/ICD plumbing, not algorithm work.
  • The CPU path has none of these hazards and is super fast already payoffs really become apparant (10s to 5s) only come when taxa is >5K and trees >5k.

The effort wasn't wasted. Debugging the GPU's memory limits surfaced a real CPU-side win: snapshot construction now interns bipartitions incrementally in bounded chunks instead of holding every tree's raw bitsets at once — peak construction memory drops ~15× (e.g. 4.4 GB → ~0.3 GB at 2000×5000), large runs that used to OOM now complete, results byte-identical. Plus some CI speedups. These land in a separate GPU-free PR.

Can revisit this if we are consistenly hitting time limits instead of memory limits.

@Joon-Klaps Joon-Klaps closed this Jun 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant