University of Pennsylvania, CIS 5650: GPU Programming and Architecture, Project 1 - Flocking
Figure 1: Boids Simulation using Coherent Grids
- Yuntian Ke
- LinkedIn, etc.
- Tested on: Windows 11, Intel Core Ultra 9 275HX @ 2.70GHz 32GB, RTX 5070 Ti 30160MB
Figure 2: Coherent Grids with 50000 Boids
Figure 2: Uniform Grids with 20000 Boids
Figure 2: Coherent Grids with 10000 Boids
To enable this setting, you can just set "GridLoopingOptimization==1".
I turned off Vertical Sync and recorded the FPS after it became quite stable.
1. For each implementation, how does changing the number of boids affect performance? Why do you think this is?
I test this using the number of boids from 5000 to 1000000.
🔵 Naive FPS
- Trend: It drops sharply, and when number of boids is quite large, FPS goes close to 0.
-
Reason: The Naive method checks every pair of boids to update the velocity, so that it gets Complexity
$O(N^2)$ . As$N$ grows, the computations it needs increase quadratically.
🔴 Uniform Grid FPS
- Trend: It has much higher FPS than Naive Method, and FPS also decreases as number of boids increasing.
-
Reason: Since it only checks the nearby cells instead of checking all pairs of boid as Naive Method did, it improve the performance, and FPS is higher than Naive Method. However, it still has a poor cache efficiency, so that when
$N$ goes very large, FPS will still drops quickly.
🟡 Coherent Grid FPS
- Trend: Best performance overall. It has the highest FPS among all three methods, and FPS also decreases as number of boids increasing.
- Reason: By sorting particle data by cell index, memory access becomes coherent. This improves GPU cache and memory throughput, so even with many boids, lookups are efficient. The scaling is closer to linear than quadratic, so it outperforms the other methods.
Figure 4: Left:N=50000, Right:N=10000
2. For each implementation, how does changing the block count and block size affect performance? Why do you think this is?
I test this using
3. For the coherent uniform grid: did you experience any performance improvements with the more coherent uniform grid? Was this the outcome you expected? Why or why not?
Yes — I observed a clear performance improvement with the coherent uniform grid compared to the naive or non-coherent uniform grid. The FPS stayed much higher as the number of boids increased.
This outcome was expected. The main difference is that the coherent grid sorts boids by cell index, which makes neighbor data stored contiguously in memory. That improves memory coalescing and cache utilization on the GPU. Since the simulation kernels are memory-bound (spending most time loading neighbor positions/velocities), improving memory access efficiency leads directly to higher performance.
Figure 5: check 8 vs 27 neighboring cells
4. Did changing cell width and checking 27 vs 8 neighboring cells affect performance? Why or why not?
Short answer — Yes. No matter use Uniform Grids Method or Coherent Grids Method, checking 27 neighboring cells will have a higher performance. This is because even the number of cells need to be checked is increasing, the volume that needed to be checked is actually decreasing. For example, if we have "cellWidth = R" for 27 neighboring cells case, the volume it checks every time is
