Skip to content

Add GPU support#108

Open
hakkelt wants to merge 1 commit into
JuliaDSP:masterfrom
hakkelt:add-gpu-implementation
Open

Add GPU support#108
hakkelt wants to merge 1 commit into
JuliaDSP:masterfrom
hakkelt:add-gpu-implementation

Conversation

@hakkelt
Copy link
Copy Markdown

@hakkelt hakkelt commented Apr 21, 2026

This PR introduces GPU acceleration to the library. The implementation is designed to be a drop-in solution, with no breaking changes to the existing API. Thanks to KernelAbstractions (https://juliagpu.github.io/KernelAbstractions.jl/stable/), the implementation is fully backend-agnostic.

Besides, the PR brings two additional changes:

  • It introduces a benchmarking script for CPU (benchmark/benchmark.jl), and another that compares the speed of CPU and GPU (benchmark/gpu_benchmark.jl)
  • It adds a GitHub CI action that executes the CPU benchmarking script, compares its results with the results of the master branch, and then posts the results as a job summary report using AirspeedVelocity.jl (.github/workflows/benchmark.yml).
  • Moving test dependencies to a separate Project.toml file in the test folder using the workspace feature introduced in Julia 1.12, as this is now recommended by approach (https://pkgdocs.julialang.org/v1/creating-packages/#Recommended-approach:-Using-workspaces-with-test/Project.toml)

The testing included verification (the GPU implementation's output matches the CPU functions' output) and benchmarking (the CPU implementation didn't get slower, and the GPU implementation provides significant speedup for large arrays).

Disclaimer: The code is mostly generated by GitHub Copilot.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 21, 2026

Codecov Report

❌ Patch coverage is 99.84000% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 96.64%. Comparing base (6d382a1) to head (054142c).

Files with missing lines Patch % Lines
src/mod/Transforms.jl 80.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #108      +/-   ##
==========================================
+ Coverage   95.27%   96.64%   +1.37%     
==========================================
  Files           8       12       +4     
  Lines        1459     2055     +596     
==========================================
+ Hits         1390     1986     +596     
  Misses         69       69              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@hakkelt hakkelt force-pushed the add-gpu-implementation branch from d470b88 to 11d373b Compare April 21, 2026 19:28
@hakkelt hakkelt force-pushed the add-gpu-implementation branch from 11d373b to 054142c Compare April 21, 2026 20:16
@hakkelt hakkelt marked this pull request as ready for review April 21, 2026 20:37
@hakkelt
Copy link
Copy Markdown
Author

hakkelt commented Apr 21, 2026

CPU benchmarks are posted here: https://github.com/JuliaDSP/Wavelets.jl/actions/runs/24744254291?pr=108 It is a bit noisy, as expected on CI servers, but otherwise no significant regression is detected.

I run the GPU benchmarks locally, with the following results:

Wavelets.jl GPU benchmark
Backend: CUDA

==============================================================================
1D Filter DWT
==============================================================================
Size                 CPU (ms)     GPU (ms)    Speedup
2^16                    1.878        0.166      11.31x
2^18                    4.627        0.163      28.37x
2^20                   17.633        0.244      72.16x

==============================================================================
1D Filter IDWT
==============================================================================
Size                 CPU (ms)     GPU (ms)    Speedup
2^16                    3.329        0.260      12.82x
2^18                    9.058        0.383      23.63x
2^20                   35.451        0.664      53.43x

==============================================================================
1D Filter WPT
==============================================================================
Size                 CPU (ms)     GPU (ms)    Speedup
2^14                    4.778        0.646       7.39x
2^16                   20.077        1.905      10.54x
2^18                   84.281        6.376      13.22x

==============================================================================
1D Filter IWPT
==============================================================================
Size                 CPU (ms)     GPU (ms)    Speedup
2^14                    6.920        0.705       9.82x
2^16                   29.985        1.165      25.75x
2^18                  129.083        3.520      36.67x

==============================================================================
2D Filter DWT
==============================================================================
Size                 CPU (ms)     GPU (ms)    Speedup
512x512                 9.686        0.183      52.87x
1024x1024              49.757        0.527      94.33x
2048x2048             208.208        1.641     126.85x

==============================================================================
2D Filter IDWT
==============================================================================
Size                 CPU (ms)     GPU (ms)    Speedup
512x512                15.799        0.312      50.71x
1024x1024              76.399        0.884      86.46x
2048x2048             310.145        2.993     103.61x

==============================================================================
3D Filter DWT
==============================================================================
Size                 CPU (ms)     GPU (ms)    Speedup
32^3                    1.771        0.138      12.80x
64^3                   13.156        0.251      52.50x
128^3                 313.733        1.161     270.22x

==============================================================================
3D Filter IDWT
==============================================================================
Size                 CPU (ms)     GPU (ms)    Speedup
32^3                    2.956        0.190      15.55x
64^3                   22.172        0.430      51.59x
128^3                 551.068        2.258     244.01x

==============================================================================
1D Lifting DWT
==============================================================================
Size                 CPU (ms)     GPU (ms)    Speedup
2^16                    0.197        0.540       0.36x
2^18                    0.541        0.536       1.01x
2^20                    3.056        0.540       5.66x

==============================================================================
1D Lifting IDWT
==============================================================================
Size                 CPU (ms)     GPU (ms)    Speedup
2^16                    0.123        0.548       0.22x
2^18                    0.488        0.548       0.89x
2^20                    2.790        0.667       4.18x

==============================================================================
1D Lifting WPT
==============================================================================
Size                 CPU (ms)     GPU (ms)    Speedup
2^16                    9.994        2.353       4.25x
2^18                   73.857        7.534       9.80x
2^20                  177.003       13.182      13.43x

==============================================================================
1D Lifting IWPT
==============================================================================
Size                 CPU (ms)     GPU (ms)    Speedup
2^16                   10.325        2.425       4.26x
2^18                   44.074        4.189      10.52x
2^20                  186.721       14.191      13.16x

==============================================================================
1D MODWT
==============================================================================
Size                 CPU (ms)     GPU (ms)    Speedup
2^14                    4.969        0.137      36.38x
2^16                   18.735        0.143     131.24x
2^18                   80.447        0.320     251.15x

==============================================================================
1D IMODWT
==============================================================================
Size                 CPU (ms)     GPU (ms)    Speedup
2^14                    7.636        0.091      84.25x
2^16                   17.584        0.158     111.59x
2^18                   73.600        0.478     153.90x

==============================================================================
1D MODWT Step
==============================================================================
Size                 CPU (ms)     GPU (ms)    Speedup
2^14                    0.667        0.046      14.45x
2^16                    2.643        0.054      49.25x
2^18                   10.633        0.103     103.58x

==============================================================================
1D IMODWT Step
==============================================================================
Size                 CPU (ms)     GPU (ms)    Speedup
2^14                    1.022        0.054      18.94x
2^16                    3.760        0.053      70.79x
2^18                   10.377        0.098     106.06x

==============================================================================
2D Lifting DWT
==============================================================================
Size                 CPU (ms)     GPU (ms)    Speedup
128x128                 0.142        0.422       0.34x
1024x1024              32.929        0.627      52.54x
4096x4096            5123.286        9.603     533.48x

==============================================================================
2D Lifting IDWT
==============================================================================
Size                 CPU (ms)     GPU (ms)    Speedup
128x128                 0.187        0.410       0.46x
1024x1024              32.396        0.798      40.59x
4096x4096            4237.213        8.568     494.53x

==============================================================================
3D Lifting DWT
==============================================================================
Size                 CPU (ms)     GPU (ms)    Speedup
32^3                    0.832        0.370       2.25x
128^3                 504.394        1.167     432.14x
512^3               86224.944      115.496     746.56x

==============================================================================
3D Lifting IDWT
==============================================================================
Size                 CPU (ms)     GPU (ms)    Speedup
32^3                    1.166        0.534       2.18x
128^3                 497.403        1.276     389.73x
512^3               87540.194      110.549     791.86x

==============================================================================
Summary
==============================================================================
1D filter dwt     	 avg:  37.28x, min:  11.31x, max:  72.16x
1D filter idwt    	 avg:  29.96x, min:  12.82x, max:  53.43x
1D filter iwpt    	 avg:  24.08x, min:   9.82x, max:  36.67x
1D filter wpt     	 avg:  10.38x, min:   7.39x, max:  13.22x
1D imodwt         	 avg: 116.58x, min:  84.25x, max: 153.90x
1D imodwt step    	 avg:  65.26x, min:  18.94x, max: 106.06x
1D lifting dwt    	 avg:   2.34x, min:   0.36x, max:   5.66x
1D lifting idwt   	 avg:   1.77x, min:   0.22x, max:   4.18x
1D lifting iwpt   	 avg:   9.31x, min:   4.26x, max:  13.16x
1D lifting wpt    	 avg:   9.16x, min:   4.25x, max:  13.43x
1D modwt          	 avg: 139.59x, min:  36.38x, max: 251.15x
1D modwt step     	 avg:  55.76x, min:  14.45x, max: 103.58x
2D filter dwt     	 avg:  91.35x, min:  52.87x, max: 126.85x
2D filter idwt    	 avg:  80.26x, min:  50.71x, max: 103.61x
2D lifting dwt    	 avg: 195.45x, min:   0.34x, max: 533.48x
2D lifting idwt   	 avg: 178.52x, min:   0.46x, max: 494.53x
3D filter dwt     	 avg: 111.84x, min:  12.80x, max: 270.22x
3D filter idwt    	 avg: 103.72x, min:  15.55x, max: 244.01x
3D lifting dwt    	 avg: 393.65x, min:   2.25x, max: 746.56x
3D lifting idwt   	 avg: 394.59x, min:   2.18x, max: 791.86x

@hakkelt
Copy link
Copy Markdown
Author

hakkelt commented May 14, 2026

Hi @gummif,
Could you give some guidace if I can do anything for the acceptance of this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant