diff --git a/README.md b/README.md index ee39093..e3a0b36 100644 --- a/README.md +++ b/README.md @@ -1,11 +1,73 @@ **University of Pennsylvania, CIS 5650: GPU Programming and Architecture, Project 1 - Flocking** -* (TODO) YOUR NAME HERE - * (TODO) [LinkedIn](), [personal website](), [twitter](), etc. -* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab) +* Marcus Hedlund + * [LinkedIn](https://www.linkedin.com/in/marcushedlund/) +* Tested on: Windows 11, Intel Core Ultra 9 185H @ 2.5 GHz 16GB, NVIDIA GeForce RTX 4070 Laptop GPU 8GB (Personal Computer) -### (TODO: Your README) +# CUDA Boids Flocking Simulation +|![10000 Boids simulation demo](images/10000BoidGif.gif)| +|:--:| +|10000 Boids simulation using naive approach| -Include screenshots, analysis, etc. (Remember, this is public, so don't put -anything here that you don't want to share with the world.) +# Overview + +In this project I implemented a 3D flocking simulation in CUDA based on the Reynolds Boids algorithm. The simulation models flocking behavior through particles called boids (bird-oid) using three rules: + +* Cohesion - move toward the center of nearby boids +* Separation - avoid getting too close to nearby boids +* Alignment - try to match the velocity of nearby boids + +A boid updates its velocity at each timestep by applying these principles to other boids within a defined distance for each rule (the largest of which we'll call the neighborhood distance). This means that two boids can only influence each other if they are within a neighborhood distance of one another. + +I developed the algorithm in three progressively optimized implementations: +1. Naive Approach - every boid checks if every other boid is within its neighborhood distance +2. Uniform Spatial Grid - boids are preprocessed into spatial grid cells so boids only have to check their own cell and neighboring cells +3. Uniform Spatial Grid with Semi-Coherent Memory Access (Coherent Grid) - extends the uniform spatial grid by ensuring positions and velocity data for boids in the same cell are stored contiguously in memory. + +|![10000 Boids simulation demo](images/10000BoidGif.gif)| ![25000 Boids simulation demo](images/25000BoidGif.gif) | ![100000 Boids simulation demo](images/100000BoidGif.gif) | +|:--:|:--:|:--:| +| *10000 Boids simulation using naive approach* | *25000 Boids simulation using uniform grid* | *100000 Boids simulation using coherent grid* | + +# Performance Analysis + +### Collecting Data + +I collected frame rate data during the first 12 seconds of simulation. The first two seconds were discarded because they were often abnormally high during the simulation's startup, and the remaining 10 values were averaged to obtain the final result. The data was then logged to a CSV and processed in the [data](https://github.com/mhedlund7/Project1-CUDA-Flocking/tree/main/data) folder for analysis. + +### Varying Boid Count +|![128 Block Size Boids: Average FPS vs Number of Boids](images/AverageFPSvsNumBoids1.png)|![256 Block Size Boids: Average FPS vs Number of Boids](images/AverageFPSvsNumBoids2.png)|![1024 Block Size Boids: Average FPS vs Number of Boids](images/AverageFPSvsNumBoids3.png)| +|:--:|:--:|:--:| +| *5000 Boids* | *100000 Boids* | *500000 Boids* | + +As we would expect, in these graphs we can see that increasing the number of boids in the simulation decreases the average fps. This makes sense because as the number of boids goes up, every additional boid needs to have its velocity be recomputed every timestep, and also each boid will likely have more boids in its neighborhood, increasing the per-boid computation time as well. + +Additionally we can see that the uniform and coherent grid implementations significantly improve over the naive approach. For example at 250,000 boids and a block size of 128 the naive simulation drops to around 3 FPS while the uniform grid stays at 177 fps and the coherent grid is all the way at 744 fps. We also see that at high boid counts the coherent grid greatly outperforms the uniform grid while at low boid counts their frame rates are roughly the same. This is the outcome I expected, but I did not expect it to have such a large impact. The increase though does make sense because the contiguous memory has much locality when reading position and velocity data which greatly increases cache hit rates when compared to the scattered accesses in the normal uniform grid. + +### Varying Block Size +|![5000 Boids: Average FPS vs BlockSize](images/AverageFPSvsBlockSize1.png)|![100000 Boids: Average FPS vs BlockSize](images/AverageFPSvsBlockSize2.png)|![500000 Boids: Average FPS vs BlockSize](images/AverageFPSvsBlockSize3.png)| +|:--:|:--:|:--:| +| *5000 Boids* | *100000 Boids* | *500000 Boids* | + +For each implementation changing the block size had no noticeable impact on performance. This was surprising to me, but I think it is likely because we're computing enough boids that the GPU is already saturated and has enough warps to hide memory latency so increasing block size doesn't improve performance further. The block size only ends up changing how the threads are grouped without actually increasing parallelism. + +### Neighborhood Size Comparison + +|![Comparing Frame Rates of Checking 8 vs 27 neighbors](images/NumNeighborsChecked.png)| +|:--:| +|Comparing Frame Rates of Checking 8 vs 27 neighbors| + +I also compared the strategies of using a grid width of twice the neighborhood distance and checking the eight grid cells surrounding the boid for neighbors versus using a grid width of exactly the neighborhood distance and checking the 27 cells surrounding the boid. +Surprisingly in the graph we can see that checking the 27 surrounding cells worked better. I initially expected that increasing the density of the cells would lead to an increase in computation overall, but because the 27 cells form a 3x3x3 cube with side length 3 neighborhood distances whereas the 8 cells form a 2x2x2 cube with side length 4 neighborhood distances, the increase in simulation space we need to check ends up meaning the 8 neighbor version takes longer. This is because the increased simulation space means we have to check more boids that are outside the current boids neighborhood distance, taking a longer time. + +### Optimized Grid Looping + +I additionally implemented an optimized grid-looping strategy: +* Grid width can be set to a scalar multiple of the neighborhood distance instead of being hardcoded along with the number of neighbor cells to check +* The algorithm only checks cells that have part of them within the boid's neighborhood sphere, avoiding unnecessarily checking full cubes of space where the corner cells might be entirely outside of the boid's neighborhood + +|![Comparing Frame Rates of Optimized Grid-Looping with Varying Grid Widths](images/GridLoopingOptimization.png)| +|:--:| +|Comparing Frame Rates of Optimized Grid-Looping with Varying Grid Widths| + +From the graph we can see that the optimized grid looping with a grid width of 1 neighborhood distance worked the best. This is likely because as the grid width decreases, the increased number of cells we have to iterate through increases the computation enough to outweigh the reduced search area. diff --git a/data/data.csv b/data/data.csv new file mode 100644 index 0000000..e819f09 --- /dev/null +++ b/data/data.csv @@ -0,0 +1,166 @@ +// Note average FPS was incorrectly divided by 12 instead of 10. Processed data fixes this mistake and formats the data better for a csv +Visualize: 0, Uniform Grid: 0, Coherent_Grid: 0, Boid Count:5000, Block Size: 128, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:919.507, Data: 1109.8, 1104.86, 1098.96, 1098.88, 1102.65, 1099.17, 1115.87, 1094.06, 1103, 1106.84, +Visualize: 0, Uniform Grid: 0, Coherent_Grid: 0, Boid Count:10000, Block Size: 128, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:515.862, Data: 623.93, 614.013, 616.236, 619.358, 621.802, 617.958, 618.582, 619.277, 619.135, 620.055, +Visualize: 0, Uniform Grid: 0, Coherent_Grid: 0, Boid Count:25000, Block Size: 128, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:134.537, Data: 148.118, 152.355, 154.78, 159.057, 164.832, 162.381, 164.55, 168.616, 169.906, 169.85, +Visualize: 0, Uniform Grid: 0, Coherent_Grid: 0, Boid Count:50000, Block Size: 128, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:40.3107, Data: 47.5212, 47.1917, 47.664, 48.3129, 49.3864, 48.5303, 48.746, 48.6992, 48.6929, 48.9838, +Visualize: 0, Uniform Grid: 0, Coherent_Grid: 0, Boid Count:100000, Block Size: 128, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:10.4849, Data: 12.2577, 12.5642, 12.4388, 12.7291, 12.5423, 12.7202, 12.6675, 12.4264, 12.7009, 12.7715, +Visualize: 0, Uniform Grid: 0, Coherent_Grid: 0, Boid Count:250000, Block Size: 128, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:1.69007, Data: 1.95682, 2.03793, 2.05672, 2.07253, 2.042, 2.04581, 1.93993, 1.99542, 2.09031, 2.04341, +Visualize: 0, Uniform Grid: 0, Coherent_Grid: 0, Boid Count:500000, Block Size: 128, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:0.432583, Data: 0.514608, 0.524682, 0.526058, 0.52036, 0.523293, 0.514972, 0.518819, 0.516847, 0.511484, 0.519869, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 0, Boid Count:5000, Block Size: 128, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:1780.11, Data: 2077.87, 2124.38, 2140.62, 2099.89, 2181.36, 2235.33, 2124.64, 2121.63, 2130.38, 2125.21, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 0, Boid Count:10000, Block Size: 128, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:1748.85, Data: 2090.78, 2093.95, 2106.73, 2104.2, 2132.95, 2132.95, 2071.85, 2068.68, 2088.67, 2095.4, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 0, Boid Count:25000, Block Size: 128, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:1325.22, Data: 1648.82, 1642.87, 1663.27, 1654.77, 1584.29, 1557.23, 1575.12, 1503.46, 1524.85, 1547.93, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 0, Boid Count:50000, Block Size: 128, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:802.206, Data: 869.139, 916.385, 921.852, 940.928, 995.588, 991.987, 1036.81, 1041.32, 960.219, 952.245, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 0, Boid Count:100000, Block Size: 128, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:428.457, Data: 508.81, 478.183, 477.24, 500.727, 519.893, 530.958, 539.919, 525.331, 524.632, 535.791, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 0, Boid Count:250000, Block Size: 128, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:148.066, Data: 174.465, 182.122, 175.201, 176.602, 178.191, 176.1, 178.821, 177.707, 177.725, 179.856, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 0, Boid Count:500000, Block Size: 128, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:44.0612, Data: 55.0846, 54.0375, 52.6078, 52.5181, 53.0859, 53.2199, 53.3629, 52.0353, 51.183, 51.5992, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:5000, Block Size: 128, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:1745.6, Data: 2208.32, 2200.79, 2136.05, 2162.66, 2183.34, 2020.76, 2316.82, 1891.87, 1947.81, 1878.74, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:10000, Block Size: 128, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:1651.35, Data: 2329.6, 1921.03, 1958.29, 1997.87, 1941.4, 1952.14, 1926.82, 1896.55, 1971.22, 1921.29, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:25000, Block Size: 128, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:1715.83, Data: 1906.2, 1886.64, 1983.66, 1948.22, 1896.54, 2032.85, 2300.91, 2295.53, 2225.54, 2113.82, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:50000, Block Size: 128, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:1677.07, Data: 2034.44, 2096.67, 2006.22, 2041.95, 1987.91, 1992.8, 2004.38, 1962.13, 1998.29, 2000.06, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:100000, Block Size: 128, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:1311.44, Data: 1601.94, 1593.17, 1564.12, 1587.7, 1570.64, 1570.77, 1565.17, 1545.15, 1575.57, 1563, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:250000, Block Size: 128, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:620.41, Data: 728.677, 722.393, 740.913, 737.652, 756.51, 750.81, 743.234, 752.43, 767.461, 744.836, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:500000, Block Size: 128, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:275.051, Data: 302.284, 328.172, 327.869, 327.811, 327.127, 331.612, 338.873, 336.837, 340.92, 339.105, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:1000000, Block Size: 128, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:91.012, Data: 104.41, 109.535, 110.188, 109.922, 110.523, 109.993, 109.998, 110.516, 109.556, 107.502, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:2000000, Block Size: 128, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:26.4164, Data: 29.8735, 30.4449, 30.7009, 31.3575, 32.1299, 32.4932, 32.5359, 32.5703, 32.4751, 32.4158, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 0, Boid Count:1000000, Block Size: 128, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:12.1433, Data: 14.0264, 14.0909, 14.2572, 14.5596, 14.8143, 14.7995, 14.7737, 14.8069, 14.8116, 14.7793, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 0, Boid Count:2000000, Block Size: 128, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:2.62929, Data: 2.96731, 3.02818, 3.12581, 3.21074, 3.21358, 3.21316, 3.19532, 3.20256, 3.19963, 3.19512, + +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:5000, Block Size: 256, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:1594.66, Data: 1853.69, 1894.72, 1905.51, 1801.37, 1936.74, 1922.93, 1957.94, 1916.77, 1993.32, 1952.91, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:10000, Block Size: 256, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:1804.75, Data: 2239.89, 2284.91, 2220.67, 2197.56, 2185.78, 2148.63, 2163.95, 2107.25, 2042.8, 2065.55, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:25000, Block Size: 256, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:1756.16, Data: 2107.29, 2144.35, 2205.39, 2138.8, 2123.33, 2123.68, 2111.97, 2049.77, 2037.41, 2031.95, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:50000, Block Size: 256, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:1485.32, Data: 1744.97, 1740.94, 1751.89, 1689.4, 1758.16, 1724.81, 1773.77, 1870.37, 1904.88, 1864.63, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:100000, Block Size: 256, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:1317.19, Data: 1611.96, 1565.29, 1587.87, 1593.71, 1582.98, 1590.49, 1585.94, 1571.29, 1563.61, 1553.13, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:250000, Block Size: 256, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:618.286, Data: 731.059, 754.817, 731.922, 735.818, 752.196, 741.111, 759.227, 721.616, 761.031, 730.637, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:500000, Block Size: 256, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:268.744, Data: 308.752, 318.835, 321.446, 323.449, 322.127, 322.565, 329.151, 324.553, 326.937, 327.119, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:1000000, Block Size: 256, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:86.8753, Data: 102.794, 103.589, 104.744, 104.999, 106.224, 105.313, 104.264, 103.285, 103.925, 103.367, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:20000000, Block Size: 256, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:0.292421, Data: 0.332413, 0.352362, 0.352183, 0.353812, 0.354139, 0.353595, 0.354087, 0.353896, 0.354636, 0.347923, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:5000, Block Size: 512, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:1696.96, Data: 1996.82, 1942.92, 2020.97, 2025.69, 2033.06, 2035.95, 2072.92, 2093.64, 2109.69, 2031.85, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:10000, Block Size: 512, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:1808.99, Data: 2118, 2092.98, 2167.61, 2135.87, 2105.71, 2212.96, 2258.6, 2174.22, 2280.35, 2161.61, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:25000, Block Size: 512, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:1567.23, Data: 1901.88, 1983.48, 1912.42, 1834.77, 1879.43, 1926.9, 1932.9, 1781.44, 1842.62, 1810.86, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:50000, Block Size: 512, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:1422.6, Data: 1723.7, 1701.88, 1697.04, 1694.89, 1726.65, 1733.95, 1696.85, 1701.63, 1692.47, 1702.17, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:100000, Block Size: 512, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:1286.15, Data: 1598.24, 1547.6, 1494.69, 1481.87, 1574.76, 1545.72, 1540.78, 1553.92, 1546.6, 1549.63, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:250000, Block Size: 512, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:645.399, Data: 738.576, 745.688, 783.743, 779.883, 777.335, 775.751, 795.839, 784.956, 783.756, 779.259, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:500000, Block Size: 512, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:274.305, Data: 304.268, 322.523, 321.45, 326.054, 332.386, 333.352, 336.305, 338.926, 338.098, 338.292, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:1000000, Block Size: 512, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:88.8813, Data: 98.9619, 103.549, 105.546, 106.951, 107.799, 108.304, 108.951, 108.929, 107.999, 109.585, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:2000000, Block Size: 512, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:26.0657, Data: 29.5693, 30.3291, 30.6002, 31.0233, 31.6785, 31.9086, 31.9161, 31.9229, 31.9077, 31.9331, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:5000, Block Size: 1024, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:1863.46, Data: 2280.48, 2184.12, 2190.67, 2251.89, 2229.38, 2225.8, 2259.84, 2281.45, 2227.21, 2230.67, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:10000, Block Size: 1024, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:1822.8, Data: 2213.85, 2257.6, 2214.33, 2246.76, 2189.22, 2154.49, 2162.25, 2132.88, 2135.64, 2166.59, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:25000, Block Size: 1024, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:1791.98, Data: 2152.94, 2065.67, 2149.83, 2176.01, 2132.55, 2148.46, 2186.48, 2160.7, 2147.78, 2183.34, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:50000, Block Size: 1024, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:1510.49, Data: 1797.07, 1791.55, 1813.87, 1834.97, 1826.39, 1823.55, 1828.94, 1797.48, 1798.91, 1813.14, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:100000, Block Size: 1024, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:1296.18, Data: 1582.96, 1554.57, 1589.86, 1554.82, 1554.78, 1540.91, 1539.72, 1548.72, 1537.65, 1550.18, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:250000, Block Size: 1024, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:647.645, Data: 754.023, 760.751, 761.649, 770.555, 773.441, 786.147, 787.83, 772.747, 801.208, 803.39, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:500000, Block Size: 1024, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:267.031, Data: 300.708, 317.285, 319.83, 323.745, 324.911, 328.946, 321.887, 319.705, 325.777, 321.576, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:1000000, Block Size: 1024, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:83.1176, Data: 97.2266, 99.8487, 99.7886, 99.6893, 100.998, 100.73, 100.195, 99.1731, 100.277, 99.4834, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:2000000, Block Size: 1024, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:23.9344, Data: 27.1068, 27.675, 27.9388, 28.5352, 28.7898, 29.6516, 29.4209, 29.4696, 29.2728, 29.352, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:5000, Block Size: 64, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:1822.38, Data: 1911.97, 2095.81, 2224.5, 2218.96, 2287.28, 2255.69, 2295.51, 2271.47, 2141.03, 2166.35, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:10000, Block Size: 64, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:1625.5, Data: 2243.81, 2265.32, 1891.83, 1906.66, 1951.47, 1845.72, 1852.76, 1905.78, 1816.31, 1826.37, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:25000, Block Size: 64, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:1510.85, Data: 1908.83, 1812.26, 1800.12, 1831.78, 1822.53, 1809.14, 1805.82, 1783.23, 1797.07, 1759.38, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:50000, Block Size: 64, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:1626.79, Data: 1978.46, 1962.87, 1982.51, 1965.5, 1963.34, 1944.91, 1941.96, 1961.65, 1907.38, 1912.87, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:100000, Block Size: 64, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:1273.03, Data: 1551.95, 1567.43, 1540.78, 1541.92, 1519.52, 1545.46, 1503.8, 1534.35, 1505.44, 1465.74, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:250000, Block Size: 64, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:621.847, Data: 714.933, 731.054, 716.379, 735.893, 741.983, 768.98, 764.974, 764.118, 760.063, 763.786, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:500000, Block Size: 64, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:258.55, Data: 284.994, 287.301, 295.657, 308.935, 311.899, 323.616, 322.925, 322.5, 321.51, 323.262, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:1000000, Block Size: 64, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:84.0163, Data: 98.2385, 99.6133, 101.481, 101.201, 100.755, 100.783, 100.656, 101.572, 101.923, 101.973, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:2000000, Block Size: 64, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:24.9008, Data: 28.082, 28.5576, 28.7664, 28.8247, 29.8837, 30.4189, 30.9117, 31.0751, 31.1418, 31.1473, +Visualize: 0, Uniform Grid: 0, Coherent_Grid: 0, Boid Count:5000, Block Size: 64, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:921.973, Data: 1082.74, 1090.05, 1090.12, 1083.07, 1115.89, 1124.81, 1114.78, 1118.72, 1121.66, 1121.83, +Visualize: 0, Uniform Grid: 0, Coherent_Grid: 0, Boid Count:10000, Block Size: 64, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:526.124, Data: 629.244, 632.135, 630.593, 631.557, 631.257, 631.498, 632.14, 630.145, 632.945, 631.973, +Visualize: 0, Uniform Grid: 0, Coherent_Grid: 0, Boid Count:25000, Block Size: 64, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:145.778, Data: 172.147, 162.901, 170.941, 175.592, 178.304, 178.194, 178.954, 175.728, 177.619, 178.962, +Visualize: 0, Uniform Grid: 0, Coherent_Grid: 0, Boid Count:50000, Block Size: 64, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:40.4481, Data: 46.3548, 47.3977, 48.1934, 48.7693, 48.9839, 49.3091, 48.787, 49.2916, 49.3059, 48.9841, +Visualize: 0, Uniform Grid: 0, Coherent_Grid: 0, Boid Count:100000, Block Size: 64, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:10.4278, Data: 12.2488, 12.6021, 12.5333, 12.5288, 12.5656, 12.609, 12.4778, 12.5725, 12.4805, 12.5154, +Visualize: 0, Uniform Grid: 0, Coherent_Grid: 0, Boid Count:250000, Block Size: 64, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:1.71076, Data: 2.05106, 2.05523, 2.0516, 2.06039, 2.07269, 2.05321, 2.04433, 2.05202, 2.03205, 2.05655, +Visualize: 0, Uniform Grid: 0, Coherent_Grid: 0, Boid Count:500000, Block Size: 64, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:0.430889, Data: 0.523279, 0.522647, 0.500199, 0.516197, 0.518434, 0.515223, 0.516985, 0.520278, 0.512097, 0.525336, +Visualize: 0, Uniform Grid: 0, Coherent_Grid: 0, Boid Count:5000, Block Size: 256, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:916.846, Data: 1097.54, 1099.46, 1103.25, 1098.8, 1099.38, 1108.6, 1093.74, 1103.04, 1109.99, 1088.35, +Visualize: 0, Uniform Grid: 0, Coherent_Grid: 0, Boid Count:10000, Block Size: 256, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:485.069, Data: 583.02, 581.929, 584.453, 583.926, 577.101, 580.826, 581.182, 583.056, 585.971, 579.365, +Visualize: 0, Uniform Grid: 0, Coherent_Grid: 0, Boid Count:25000, Block Size: 256, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:147.302, Data: 170.083, 173.958, 176.819, 175.642, 177.96, 181.999, 177.305, 178.417, 178.979, 176.456, +Visualize: 0, Uniform Grid: 0, Coherent_Grid: 0, Boid Count:50000, Block Size: 256, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:38.4509, Data: 44.8851, 44.6897, 45.5926, 46.8329, 46.5963, 46.7622, 46.3921, 46.8515, 46.213, 46.5952, +Visualize: 0, Uniform Grid: 0, Coherent_Grid: 0, Boid Count:50000, Block Size: 256, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:39.324, Data: 46.7472, 45.1565, 48.2483, 47.4793, 47.4075, 47.3493, 47.8154, 46.892, 47.4421, 47.351, +Visualize: 0, Uniform Grid: 0, Coherent_Grid: 0, Boid Count:100000, Block Size: 256, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:10.5032, Data: 12.23, 12.5457, 12.581, 12.4251, 12.8286, 12.5873, 12.7958, 12.6084, 12.6164, 12.8195, +Visualize: 0, Uniform Grid: 0, Coherent_Grid: 0, Boid Count:250000, Block Size: 256, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:1.70363, Data: 2.02673, 1.97155, 1.98797, 2.05779, 2.07261, 2.07376, 2.05508, 2.06145, 2.06775, 2.06886, +Visualize: 0, Uniform Grid: 0, Coherent_Grid: 0, Boid Count:500000, Block Size: 256, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:0.434654, Data: 0.523365, 0.526024, 0.522407, 0.526864, 0.517421, 0.521357, 0.523765, 0.521845, 0.520419, 0.512376, +Visualize: 0, Uniform Grid: 0, Coherent_Grid: 0, Boid Count:5000, Block Size: 1024, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:560.455, Data: 672.789, 671.352, 668.659, 675.69, 671.554, 675.439, 677.908, 671.323, 669.496, 671.252, +Visualize: 0, Uniform Grid: 0, Coherent_Grid: 0, Boid Count:10000, Block Size: 1024, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:304.98, Data: 369.978, 361.085, 365.576, 366.759, 366.83, 366.754, 363.801, 365.509, 363.474, 369.991, +Visualize: 0, Uniform Grid: 0, Coherent_Grid: 0, Boid Count:25000, Block Size: 1024, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:128.037, Data: 149.705, 151.135, 152.512, 154.409, 155.8, 152.224, 158.956, 150.293, 156.034, 155.374, +Visualize: 0, Uniform Grid: 0, Coherent_Grid: 0, Boid Count:50000, Block Size: 1024, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:33.0318, Data: 38.4245, 38.5967, 39.462, 39.8591, 40.0364, 39.9981, 40.2485, 39.9693, 39.8459, 39.9413, +Visualize: 0, Uniform Grid: 0, Coherent_Grid: 0, Boid Count:100000, Block Size: 1024, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:10.2463, Data: 11.9107, 12.2756, 12.2755, 12.4296, 12.4188, 12.387, 12.2511, 12.3968, 12.3887, 12.2221, +Visualize: 0, Uniform Grid: 0, Coherent_Grid: 0, Boid Count:250000, Block Size: 1024, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:1.69728, Data: 1.99547, 2.03462, 2.0435, 2.04844, 2.05452, 2.02696, 2.0422, 2.03615, 2.03831, 2.04718, +Visualize: 0, Uniform Grid: 0, Coherent_Grid: 0, Boid Count:500000, Block Size: 1024, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:0.419625, Data: 0.500432, 0.504942, 0.507526, 0.501575, 0.503762, 0.503793, 0.505402, 0.503419, 0.504233, 0.50042, +Visualize: 0, Uniform Grid: 0, Coherent_Grid: 0, Boid Count:5000, Block Size: 512, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:843.579, Data: 1007.88, 1009.03, 1022.92, 1008.89, 1005.67, 1014.68, 1013.81, 1012.3, 1013.48, 1014.28, +Visualize: 0, Uniform Grid: 0, Coherent_Grid: 0, Boid Count:10000, Block Size: 512, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:474.288, Data: 569.885, 565.818, 567.097, 570.151, 571.253, 572.927, 567.29, 569.302, 568.444, 569.284, +Visualize: 0, Uniform Grid: 0, Coherent_Grid: 0, Boid Count:25000, Block Size: 512, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:123.685, Data: 143.796, 144.272, 149.685, 149.499, 148.498, 150.478, 149.669, 148.796, 149.564, 149.961, +Visualize: 0, Uniform Grid: 0, Coherent_Grid: 0, Boid Count:50000, Block Size: 512, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:39.198, Data: 44.9368, 45.5107, 45.8552, 46.9578, 48.3569, 47.8196, 47.344, 48.0342, 47.7437, 47.8169, +Visualize: 0, Uniform Grid: 0, Coherent_Grid: 0, Boid Count:100000, Block Size: 512, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:10.1111, Data: 12.049, 11.9834, 12.1233, 12.2167, 12.0312, 12.1721, 12.3551, 11.9419, 12.2617, 12.1984, +Visualize: 0, Uniform Grid: 0, Coherent_Grid: 0, Boid Count:250000, Block Size: 512, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:1.71866, Data: 2.02584, 2.0528, 2.06442, 2.09415, 2.07748, 2.06726, 2.02688, 2.07249, 2.06598, 2.07659, +Visualize: 0, Uniform Grid: 0, Coherent_Grid: 0, Boid Count:500000, Block Size: 512, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:0.429971, Data: 0.519183, 0.519409, 0.516075, 0.519777, 0.520327, 0.513647, 0.515789, 0.514795, 0.513245, 0.507399, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 0, Boid Count:5000, Block Size: 64, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:1752.71, Data: 2008.51, 2088.94, 2078.96, 2037.31, 2194.68, 2140.31, 2129.36, 2136.22, 2162.35, 2055.83, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 0, Boid Count:10000, Block Size: 64, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:1481.87, Data: 2056.57, 2006.45, 1839.61, 1832.49, 1844.32, 1711.77, 1645.67, 1648.96, 1593.89, 1602.65, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 0, Boid Count:25000, Block Size: 64, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:1246.88, Data: 1541.85, 1496.7, 1585.65, 1531.34, 1513.45, 1493.26, 1453.34, 1469.47, 1443.89, 1433.63, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 0, Boid Count:50000, Block Size: 64, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:820.156, Data: 889.846, 950.688, 936.338, 970.629, 1040.77, 1002.37, 1061.85, 1039.35, 992.687, 957.341, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 0, Boid Count:100000, Block Size: 64, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:430.574, Data: 508.555, 495.055, 490.971, 520.994, 529.871, 531.345, 537.468, 510.161, 516.641, 525.824, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 0, Boid Count:250000, Block Size: 64, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:141.568, Data: 160.8, 171.596, 170.906, 165.94, 176.097, 172.06, 168.569, 171.28, 169.728, 171.846, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 0, Boid Count:500000, Block Size: 64, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:41.5868, Data: 51.4946, 50.4189, 48.5386, 48.42, 49.2647, 50.8396, 51.3825, 50.6829, 49.3189, 48.6809, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 0, Boid Count:5000, Block Size: 256, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:1769.02, Data: 2202.6, 2187.68, 2159.09, 2142.94, 2166.55, 2175.98, 2192.39, 2008.93, 1994.33, 1997.73, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 0, Boid Count:10000, Block Size: 256, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:1598.83, Data: 1893.6, 1881.26, 1903.47, 1977.65, 1943.98, 1920.39, 1899.79, 1925.33, 1867.63, 1972.88, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 0, Boid Count:25000, Block Size: 256, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:1272.58, Data: 1563.59, 1613.81, 1559.34, 1573.86, 1535.79, 1522.93, 1498.42, 1477.64, 1468.7, 1456.86, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 0, Boid Count:50000, Block Size: 256, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:793.6, Data: 913.284, 910.828, 932.486, 927.385, 963.999, 969.524, 993.141, 1036.86, 952.986, 922.7, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 0, Boid Count:100000, Block Size: 256, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:431.664, Data: 506.234, 501.82, 487.467, 518.338, 531.456, 539.243, 530.951, 517.739, 516.781, 529.937, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 0, Boid Count:250000, Block Size: 256, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:142.536, Data: 168.142, 174.222, 170.93, 172.676, 174.757, 170.335, 172.575, 168.929, 168.425, 169.447, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 0, Boid Count:500000, Block Size: 256, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:42.0235, Data: 52.3793, 51.3526, 49.548, 49.07, 49.6148, 51.0952, 51.8327, 50.721, 49.5314, 49.1369, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 0, Boid Count:5000, Block Size: 512, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:1794.12, Data: 2197.23, 2174.55, 2202.33, 2258.16, 2204.81, 2300.67, 2026.8, 2056.64, 2152.92, 1955.28, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 0, Boid Count:10000, Block Size: 512, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:1624.92, Data: 1997.5, 1959.52, 1941.09, 1948.3, 1981.4, 1974.68, 1903.97, 1937.21, 1923.46, 1931.88, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 0, Boid Count:25000, Block Size: 512, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:1394.7, Data: 1807.4, 1757.39, 1695.5, 1667.98, 1674.9, 1618.87, 1644.53, 1615, 1651.82, 1603.04, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 0, Boid Count:50000, Block Size: 512, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:812.434, Data: 904.85, 945, 951.374, 976.405, 977.741, 1012.81, 1026.7, 1046, 951.853, 956.47, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 0, Boid Count:100000, Block Size: 512, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:465.069, Data: 542.99, 536.84, 539.643, 561.629, 580.711, 584.982, 563.533, 559.306, 556.992, 554.208, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 0, Boid Count:250000, Block Size: 512, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:148.414, Data: 167.314, 179.696, 177.434, 182.122, 181.89, 181.47, 177.289, 178.404, 176.099, 179.249, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 0, Boid Count:500000, Block Size: 512, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:43.3654, Data: 53.902, 53.1774, 51.4822, 51.4608, 52.095, 52.8461, 52.7639, 51.6695, 50.3985, 50.5893, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 0, Boid Count:5000, Block Size: 1024, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:1658.25, Data: 2063, 2081.52, 2051.75, 1868.61, 1893.76, 1970.36, 2043.96, 2095.76, 2026.43, 1803.83, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 0, Boid Count:10000, Block Size: 1024, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:1593.41, Data: 1945.14, 1923.87, 1900.74, 1930.87, 1966.28, 1908.95, 1878.36, 1890.35, 1936.92, 1839.48, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 0, Boid Count:25000, Block Size: 1024, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:1224.37, Data: 1534.99, 1555.2, 1584.17, 1511.47, 1475.87, 1398.25, 1378.84, 1455.81, 1379.66, 1418.16, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 0, Boid Count:50000, Block Size: 1024, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:831.794, Data: 922.799, 947.612, 981.898, 981.239, 1060.11, 1030, 1082.32, 1049.98, 983.323, 942.251, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 0, Boid Count:100000, Block Size: 1024, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:469.687, Data: 547.872, 542.373, 544.866, 558.801, 572.968, 590.353, 576.379, 571.37, 571.876, 559.386, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 0, Boid Count:250000, Block Size: 1024, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:146.387, Data: 169.197, 177.699, 174.453, 177.266, 178.889, 175.919, 173.829, 174.936, 176.852, 177.603, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 0, Boid Count:500000, Block Size: 1024, Grid Looping Optimization: 0, Grid Width Scale: 2, Average FPS:43.1287, Data: 53.2788, 52.3166, 50.2446, 50.1261, 51.557, 52.8952, 53.1776, 52.2252, 50.8776, 50.8454, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 0, Boid Count:5000, Block Size: 128, Grid Looping Optimization: 0, Grid Width Scale: 1, Average FPS:1647.59, Data: 2064.12, 1997.56, 1998.88, 1961.96, 2013.62, 1909.99, 1987.84, 1964.51, 1971.64, 1900.99, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 0, Boid Count:10000, Block Size: 128, Grid Looping Optimization: 0, Grid Width Scale: 1, Average FPS:1631.87, Data: 1980.58, 1998.6, 1989.01, 2017.86, 2021.95, 1936.28, 1972.9, 1888.47, 1916.96, 1859.82, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 0, Boid Count:25000, Block Size: 128, Grid Looping Optimization: 0, Grid Width Scale: 1, Average FPS:1042.53, Data: 1246.94, 1238.49, 1217.92, 1251.77, 1226.42, 1265.97, 1256.17, 1262.86, 1273.2, 1270.57, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 0, Boid Count:50000, Block Size: 128, Grid Looping Optimization: 0, Grid Width Scale: 1, Average FPS:933.12, Data: 1112.63, 1190.27, 1116.59, 1157.47, 1152.88, 1109.25, 1075.79, 1074.25, 1094.62, 1113.69, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 0, Boid Count:100000, Block Size: 128, Grid Looping Optimization: 0, Grid Width Scale: 1, Average FPS:624.077, Data: 755.697, 739.284, 762.223, 751.963, 765.816, 748.89, 739.526, 751.781, 751.785, 721.952, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 0, Boid Count:250000, Block Size: 128, Grid Looping Optimization: 0, Grid Width Scale: 1, Average FPS:243.592, Data: 295.664, 294.338, 287.665, 291.495, 291.791, 292.631, 297.375, 288.756, 287.736, 295.66, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 0, Boid Count:500000, Block Size: 128, Grid Looping Optimization: 0, Grid Width Scale: 1, Average FPS:80.9717, Data: 92.7929, 89.076, 93.3632, 93.6474, 93.2163, 97.8329, 101.398, 103.175, 102.957, 104.202, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 0, Boid Count:5000, Block Size: 128, Grid Looping Optimization: 1, Grid Width Scale: 1, Average FPS:1410.77, Data: 1580.23, 1498.88, 1624.24, 1718.15, 1656.35, 1671.4, 1799.71, 1823.12, 1837.87, 1719.29, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 0, Boid Count:10000, Block Size: 128, Grid Looping Optimization: 1, Grid Width Scale: 1, Average FPS:1508.87, Data: 1818.21, 1755.38, 1875.32, 1931.31, 1866.33, 1878.9, 1817.68, 1711.82, 1684.52, 1766.96, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 0, Boid Count:25000, Block Size: 128, Grid Looping Optimization: 1, Grid Width Scale: 1, Average FPS:899.555, Data: 1003.52, 1080.33, 1097.12, 1096.21, 1058.5, 1073.81, 1088.17, 1106.53, 1101.99, 1088.47, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 0, Boid Count:50000, Block Size: 128, Grid Looping Optimization: 1, Grid Width Scale: 1, Average FPS:1049.15, Data: 1239.96, 1271.58, 1265.9, 1280.72, 1262.61, 1286.28, 1238.79, 1245.9, 1260.33, 1237.77, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 0, Boid Count:100000, Block Size: 128, Grid Looping Optimization: 1, Grid Width Scale: 1, Average FPS:708.604, Data: 837.648, 850.319, 871.457, 859.434, 867.992, 837.796, 857.611, 841.301, 835.22, 844.464, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 0, Boid Count:250000, Block Size: 128, Grid Looping Optimization: 1, Grid Width Scale: 1, Average FPS:297.103, Data: 354.628, 357.524, 353.958, 355.195, 360.813, 360.917, 354.336, 356.293, 354.832, 356.744, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 0, Boid Count:500000, Block Size: 128, Grid Looping Optimization: 1, Grid Width Scale: 1, Average FPS:101.009, Data: 113.725, 115.364, 115.029, 113.89, 120.876, 124.235, 126.767, 125.897, 128.429, 127.898, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:5000, Block Size: 128, Grid Looping Optimization: 1, Grid Width Scale: 2, Average FPS:1562.49, Data: 1945.18, 1930.1, 1937.6, 1920.75, 1849.36, 1809.26, 1860.33, 1810.44, 1866.37, 1820.43, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:10000, Block Size: 128, Grid Looping Optimization: 1, Grid Width Scale: 2, Average FPS:1737.98, Data: 2129.45, 2081.8, 2025.59, 2048.86, 2163.36, 2058.3, 2055.82, 2093.76, 2093.24, 2105.58, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:25000, Block Size: 128, Grid Looping Optimization: 1, Grid Width Scale: 2, Average FPS:1770.8, Data: 2157.07, 2157.38, 2151.63, 2222.32, 2183.87, 2029.62, 2118.85, 2036.29, 2064.64, 2127.94, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:50000, Block Size: 128, Grid Looping Optimization: 1, Grid Width Scale: 2, Average FPS:1401.71, Data: 1725.35, 1763.73, 1706.22, 1717.65, 1755.26, 1717.08, 1706.6, 1627.14, 1543.56, 1557.89, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:100000, Block Size: 128, Grid Looping Optimization: 1, Grid Width Scale: 2, Average FPS:1183.84, Data: 1393.9, 1425.39, 1455.71, 1391.67, 1421.17, 1429.27, 1446.89, 1367.77, 1448.98, 1425.29, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:250000, Block Size: 128, Grid Looping Optimization: 1, Grid Width Scale: 2, Average FPS:633.522, Data: 731.461, 768.785, 763.837, 736.466, 753.693, 771.373, 763.438, 763.076, 780.309, 769.832, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:500000, Block Size: 128, Grid Looping Optimization: 1, Grid Width Scale: 2, Average FPS:262.541, Data: 300.153, 307.366, 318.149, 322.144, 322.02, 318.553, 301.608, 308.675, 323.055, 328.771, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:1000000, Block Size: 128, Grid Looping Optimization: 1, Grid Width Scale: 2, Average FPS:87.1464, Data: 101.172, 102.966, 102.883, 104.53, 105.92, 106.637, 106.287, 104.727, 104.979, 105.656, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:5000, Block Size: 128, Grid Looping Optimization: 1, Grid Width Scale: 1, Average FPS:1658.79, Data: 2148.31, 2176.8, 2184.98, 2026.48, 1868.73, 1877.73, 1942.71, 1894.85, 1894.88, 1889.96, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:10000, Block Size: 128, Grid Looping Optimization: 1, Grid Width Scale: 1, Average FPS:1401.61, Data: 1821.95, 1671.46, 1744.39, 1678.7, 1629.98, 1686.24, 1642.9, 1605.05, 1630.03, 1708.58, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:25000, Block Size: 128, Grid Looping Optimization: 1, Grid Width Scale: 1, Average FPS:934.663, Data: 1225.32, 1117.07, 1077.73, 1038.07, 1077.25, 1134.69, 1144.44, 1119.94, 1154.91, 1126.54, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:50000, Block Size: 128, Grid Looping Optimization: 1, Grid Width Scale: 1, Average FPS:1623.72, Data: 2061.98, 1961.94, 1935.83, 1935.89, 1967.08, 1922.22, 1925.8, 1968.55, 1894.36, 1910.98, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:100000, Block Size: 128, Grid Looping Optimization: 1, Grid Width Scale: 1, Average FPS:1466.59, Data: 1788.96, 1815.55, 1792.25, 1820.19, 1770.24, 1712.65, 1702.34, 1746.35, 1716.79, 1733.79, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:250000, Block Size: 128, Grid Looping Optimization: 1, Grid Width Scale: 1, Average FPS:750.279, Data: 897.896, 929.903, 929.273, 918.624, 934.053, 891.846, 884.248, 870.763, 876.849, 869.898, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:500000, Block Size: 128, Grid Looping Optimization: 1, Grid Width Scale: 1, Average FPS:418.567, Data: 492.117, 492.985, 491.774, 501.403, 502.96, 503.7, 503.851, 516.292, 504.739, 512.98, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:1000000, Block Size: 128, Grid Looping Optimization: 1, Grid Width Scale: 1, Average FPS:162.53, Data: 192.14, 196.439, 195.147, 197.438, 195.751, 198.965, 193.865, 196.149, 190.497, 193.965, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:5000, Block Size: 128, Grid Looping Optimization: 1, Grid Width Scale: 0.5, Average FPS:1512.63, Data: 1946.86, 1941.71, 1959.11, 1953.95, 1977.64, 1875.81, 1527.21, 1661.54, 1708.86, 1598.87, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:10000, Block Size: 128, Grid Looping Optimization: 1, Grid Width Scale: 0.5, Average FPS:1423.88, Data: 2049.93, 2029.54, 1732.47, 1552.96, 1627.47, 1588.65, 1654.86, 1639.93, 1580.18, 1630.54, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:25000, Block Size: 128, Grid Looping Optimization: 1, Grid Width Scale: 0.5, Average FPS:1603.81, Data: 2056.49, 1909.65, 1887.19, 1962.37, 1861.74, 1878.92, 1894.94, 1949.33, 1907.43, 1937.73, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:50000, Block Size: 128, Grid Looping Optimization: 1, Grid Width Scale: 0.5, Average FPS:1524.43, Data: 1889.83, 1899.35, 1873.59, 1848.31, 1804.71, 1770.72, 1759.71, 1828.22, 1775.33, 1843.43, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:100000, Block Size: 128, Grid Looping Optimization: 1, Grid Width Scale: 0.5, Average FPS:1153.35, Data: 1375.03, 1393.58, 1389.58, 1358.13, 1380.67, 1381.09, 1337.1, 1382.87, 1417.97, 1424.23, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:250000, Block Size: 128, Grid Looping Optimization: 1, Grid Width Scale: 0.5, Average FPS:630.807, Data: 732.28, 747.785, 742.916, 756.202, 774.585, 750.381, 768.996, 772.009, 754.891, 769.639, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:500000, Block Size: 128, Grid Looping Optimization: 1, Grid Width Scale: 0.5, Average FPS:306.825, Data: 367.718, 370.372, 369.766, 365.02, 372.499, 368.871, 368.587, 363.57, 368.171, 367.324, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:1000000, Block Size: 128, Grid Looping Optimization: 1, Grid Width Scale: 0.5, Average FPS:133.507, Data: 158.68, 161.242, 160.848, 162.315, 159.064, 160.947, 160.46, 160.518, 160.5, 157.508, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:5000, Block Size: 128, Grid Looping Optimization: 1, Grid Width Scale: 0.25, Average FPS:1121.63, Data: 1369.51, 1369.34, 1356.78, 1352.31, 1378.82, 1305.42, 1347.4, 1335.79, 1310.25, 1333.99, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:10000, Block Size: 128, Grid Looping Optimization: 1, Grid Width Scale: 0.25, Average FPS:1069.66, Data: 1262.18, 1281.02, 1290.59, 1324.52, 1271.49, 1277.34, 1267.45, 1301.41, 1272.28, 1287.6, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:25000, Block Size: 128, Grid Looping Optimization: 1, Grid Width Scale: 0.25, Average FPS:901.533, Data: 994.476, 1019.63, 1061.62, 1074.57, 1086.94, 1093.19, 1086.55, 1136.46, 1122.77, 1142.18, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:50000, Block Size: 128, Grid Looping Optimization: 1, Grid Width Scale: 0.25, Average FPS:482.969, Data: 579.774, 588.421, 590.29, 573.331, 571.796, 570.414, 569.144, 577.465, 576.557, 598.436, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:100000, Block Size: 128, Grid Looping Optimization: 1, Grid Width Scale: 0.25, Average FPS:305.397, Data: 345.137, 356.999, 366.632, 373.802, 387.23, 377.198, 365.719, 363.891, 363.154, 365.003, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:250000, Block Size: 128, Grid Looping Optimization: 1, Grid Width Scale: 0.25, Average FPS:158.186, Data: 186.614, 190.376, 192.728, 190.275, 187.104, 190.579, 190.993, 188.732, 192.553, 188.281, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:500000, Block Size: 128, Grid Looping Optimization: 1, Grid Width Scale: 0.25, Average FPS:106.738, Data: 122.334, 127.633, 128.776, 129.897, 128.13, 127.648, 128.527, 129.771, 130.378, 127.762, +Visualize: 0, Uniform Grid: 1, Coherent_Grid: 1, Boid Count:1000000, Block Size: 128, Grid Looping Optimization: 1, Grid Width Scale: 0.25, Average FPS:55.3591, Data: 62.7094, 64.3149, 65.0831, 66.0031, 66.9893, 67.0656, 68.0941, 67.8468, 68.3214, 67.8821, diff --git a/data/processed_data.csv b/data/processed_data.csv new file mode 100644 index 0000000..f0c22b4 --- /dev/null +++ b/data/processed_data.csv @@ -0,0 +1,165 @@ +Visualize,Uniform Grid,Coherent Grid,Boid Count,Block Size,Grid Looping Optimization,Grid Width Scale,Average FPS +0,0,0,5000,128,0,2,1103.4084 +0,0,0,10000,128,0,2,619.0344 +0,0,0,25000,128,0,2,161.4444 +0,0,0,50000,128,0,2,48.37284 +0,0,0,100000,128,0,2,12.58188 +0,0,0,250000,128,0,2,2.028084 +0,0,0,500000,128,0,2,0.5190996 +0,1,0,5000,128,0,2,2136.132 +0,1,0,10000,128,0,2,2098.62 +0,1,0,25000,128,0,2,1590.264 +0,1,0,50000,128,0,2,962.6472 +0,1,0,100000,128,0,2,514.1484 +0,1,0,250000,128,0,2,177.6792 +0,1,0,500000,128,0,2,52.87344 +0,1,1,5000,128,0,2,2094.72 +0,1,1,10000,128,0,2,1981.62 +0,1,1,25000,128,0,2,2058.996 +0,1,1,50000,128,0,2,2012.484 +0,1,1,100000,128,0,2,1573.728 +0,1,1,250000,128,0,2,744.492 +0,1,1,500000,128,0,2,330.0612 +0,1,1,1000000,128,0,2,109.2144 +0,1,1,2000000,128,0,2,31.69968 +0,1,0,1000000,128,0,2,14.57196 +0,1,0,2000000,128,0,2,3.155148 +0,1,1,5000,256,0,2,1913.592 +0,1,1,10000,256,0,2,2165.7 +0,1,1,25000,256,0,2,2107.392 +0,1,1,50000,256,0,2,1782.384 +0,1,1,100000,256,0,2,1580.628 +0,1,1,250000,256,0,2,741.9432 +0,1,1,500000,256,0,2,322.4928 +0,1,1,1000000,256,0,2,104.25036 +0,1,1,20000000,256,0,2,0.3509052 +0,1,1,5000,512,0,2,2036.352 +0,1,1,10000,512,0,2,2170.788 +0,1,1,25000,512,0,2,1880.676 +0,1,1,50000,512,0,2,1707.12 +0,1,1,100000,512,0,2,1543.38 +0,1,1,250000,512,0,2,774.4788 +0,1,1,500000,512,0,2,329.166 +0,1,1,1000000,512,0,2,106.65756 +0,1,1,2000000,512,0,2,31.27884 +0,1,1,5000,1024,0,2,2236.152 +0,1,1,10000,1024,0,2,2187.36 +0,1,1,25000,1024,0,2,2150.376 +0,1,1,50000,1024,0,2,1812.588 +0,1,1,100000,1024,0,2,1555.416 +0,1,1,250000,1024,0,2,777.174 +0,1,1,500000,1024,0,2,320.4372 +0,1,1,1000000,1024,0,2,99.74112 +0,1,1,2000000,1024,0,2,28.72128 +0,1,1,5000,64,0,2,2186.856 +0,1,1,10000,64,0,2,1950.6 +0,1,1,25000,64,0,2,1813.02 +0,1,1,50000,64,0,2,1952.148 +0,1,1,100000,64,0,2,1527.636 +0,1,1,250000,64,0,2,746.2164 +0,1,1,500000,64,0,2,310.26 +0,1,1,1000000,64,0,2,100.81956 +0,1,1,2000000,64,0,2,29.88096 +0,0,0,5000,64,0,2,1106.3676 +0,0,0,10000,64,0,2,631.3488 +0,0,0,25000,64,0,2,174.9336 +0,0,0,50000,64,0,2,48.53772 +0,0,0,100000,64,0,2,12.51336 +0,0,0,250000,64,0,2,2.052912 +0,0,0,500000,64,0,2,0.5170668 +0,0,0,5000,256,0,2,1100.2152 +0,0,0,10000,256,0,2,582.0828 +0,0,0,25000,256,0,2,176.7624 +0,0,0,50000,256,0,2,46.14108 +0,0,0,50000,256,0,2,47.1888 +0,0,0,100000,256,0,2,12.60384 +0,0,0,250000,256,0,2,2.044356 +0,0,0,500000,256,0,2,0.5215848 +0,0,0,5000,1024,0,2,672.546 +0,0,0,10000,1024,0,2,365.976 +0,0,0,25000,1024,0,2,153.6444 +0,0,0,50000,1024,0,2,39.63816 +0,0,0,100000,1024,0,2,12.29556 +0,0,0,250000,1024,0,2,2.036736 +0,0,0,500000,1024,0,2,0.50355 +0,0,0,5000,512,0,2,1012.2948 +0,0,0,10000,512,0,2,569.1456 +0,0,0,25000,512,0,2,148.422 +0,0,0,50000,512,0,2,47.0376 +0,0,0,100000,512,0,2,12.13332 +0,0,0,250000,512,0,2,2.062392 +0,0,0,500000,512,0,2,0.5159652 +0,1,0,5000,64,0,2,2103.252 +0,1,0,10000,64,0,2,1778.244 +0,1,0,25000,64,0,2,1496.256 +0,1,0,50000,64,0,2,984.1872 +0,1,0,100000,64,0,2,516.6888 +0,1,0,250000,64,0,2,169.8816 +0,1,0,500000,64,0,2,49.90416 +0,1,0,5000,256,0,2,2122.824 +0,1,0,10000,256,0,2,1918.596 +0,1,0,25000,256,0,2,1527.096 +0,1,0,50000,256,0,2,952.32 +0,1,0,100000,256,0,2,517.9968 +0,1,0,250000,256,0,2,171.0432 +0,1,0,500000,256,0,2,50.4282 +0,1,0,5000,512,0,2,2152.944 +0,1,0,10000,512,0,2,1949.904 +0,1,0,25000,512,0,2,1673.64 +0,1,0,50000,512,0,2,974.9208 +0,1,0,100000,512,0,2,558.0828 +0,1,0,250000,512,0,2,178.0968 +0,1,0,500000,512,0,2,52.03848 +0,1,0,5000,1024,0,2,1989.9 +0,1,0,10000,1024,0,2,1912.092 +0,1,0,25000,1024,0,2,1469.244 +0,1,0,50000,1024,0,2,998.1528 +0,1,0,100000,1024,0,2,563.6244 +0,1,0,250000,1024,0,2,175.6644 +0,1,0,500000,1024,0,2,51.75444 +0,1,0,5000,128,0,1,1977.108 +0,1,0,10000,128,0,1,1958.244 +0,1,0,25000,128,0,1,1251.036 +0,1,0,50000,128,0,1,1119.744 +0,1,0,100000,128,0,1,748.8924 +0,1,0,250000,128,0,1,292.3104 +0,1,0,500000,128,0,1,97.16604 +0,1,0,5000,128,1,1,1692.924 +0,1,0,10000,128,1,1,1810.644 +0,1,0,25000,128,1,1,1079.466 +0,1,0,50000,128,1,1,1258.98 +0,1,0,100000,128,1,1,850.3248 +0,1,0,250000,128,1,1,356.5236 +0,1,0,500000,128,1,1,121.2108 +0,1,1,5000,128,1,2,1874.988 +0,1,1,10000,128,1,2,2085.576 +0,1,1,25000,128,1,2,2124.96 +0,1,1,50000,128,1,2,1682.052 +0,1,1,100000,128,1,2,1420.608 +0,1,1,250000,128,1,2,760.2264 +0,1,1,500000,128,1,2,315.0492 +0,1,1,1000000,128,1,2,104.57568 +0,1,1,5000,128,1,1,1990.548 +0,1,1,10000,128,1,1,1681.932 +0,1,1,25000,128,1,1,1121.5956 +0,1,1,50000,128,1,1,1948.464 +0,1,1,100000,128,1,1,1759.908 +0,1,1,250000,128,1,1,900.3348 +0,1,1,500000,128,1,1,502.2804 +0,1,1,1000000,128,1,1,195.036 +0,1,1,5000,128,1,0.5,1815.156 +0,1,1,10000,128,1,0.5,1708.656 +0,1,1,25000,128,1,0.5,1924.572 +0,1,1,50000,128,1,0.5,1829.316 +0,1,1,100000,128,1,0.5,1384.02 +0,1,1,250000,128,1,0.5,756.9684 +0,1,1,500000,128,1,0.5,368.19 +0,1,1,1000000,128,1,0.5,160.2084 +0,1,1,5000,128,1,0.25,1345.956 +0,1,1,10000,128,1,0.25,1283.592 +0,1,1,25000,128,1,0.25,1081.8396 +0,1,1,50000,128,1,0.25,579.5628 +0,1,1,100000,128,1,0.25,366.4764 +0,1,1,250000,128,1,0.25,189.8232 +0,1,1,500000,128,1,0.25,128.0856 +0,1,1,1000000,128,1,0.25,66.43092 \ No newline at end of file diff --git a/images/100000BoidGif.gif b/images/100000BoidGif.gif new file mode 100644 index 0000000..17f0ab9 Binary files /dev/null and b/images/100000BoidGif.gif differ diff --git a/images/10000BoidGif.gif b/images/10000BoidGif.gif new file mode 100644 index 0000000..f38b98b Binary files /dev/null and b/images/10000BoidGif.gif differ diff --git a/images/10000BoidScreenshot.png b/images/10000BoidScreenshot.png new file mode 100644 index 0000000..2ee994f Binary files /dev/null and b/images/10000BoidScreenshot.png differ diff --git a/images/150000BoidGif.gif b/images/150000BoidGif.gif new file mode 100644 index 0000000..68cb7b6 Binary files /dev/null and b/images/150000BoidGif.gif differ diff --git a/images/25000BoidGif.gif b/images/25000BoidGif.gif new file mode 100644 index 0000000..ec87fd2 Binary files /dev/null and b/images/25000BoidGif.gif differ diff --git a/images/5000BoidGif.gif b/images/5000BoidGif.gif new file mode 100644 index 0000000..2acab23 Binary files /dev/null and b/images/5000BoidGif.gif differ diff --git a/images/AverageFPSvsBlockSize1.png b/images/AverageFPSvsBlockSize1.png new file mode 100644 index 0000000..0ceada0 Binary files /dev/null and b/images/AverageFPSvsBlockSize1.png differ diff --git a/images/AverageFPSvsBlockSize2.png b/images/AverageFPSvsBlockSize2.png new file mode 100644 index 0000000..16ae79f Binary files /dev/null and b/images/AverageFPSvsBlockSize2.png differ diff --git a/images/AverageFPSvsBlockSize3.png b/images/AverageFPSvsBlockSize3.png new file mode 100644 index 0000000..ab9b819 Binary files /dev/null and b/images/AverageFPSvsBlockSize3.png differ diff --git a/images/AverageFPSvsNumBoids1.png b/images/AverageFPSvsNumBoids1.png new file mode 100644 index 0000000..ddcbe89 Binary files /dev/null and b/images/AverageFPSvsNumBoids1.png differ diff --git a/images/AverageFPSvsNumBoids2.png b/images/AverageFPSvsNumBoids2.png new file mode 100644 index 0000000..8bb4df5 Binary files /dev/null and b/images/AverageFPSvsNumBoids2.png differ diff --git a/images/AverageFPSvsNumBoids3.png b/images/AverageFPSvsNumBoids3.png new file mode 100644 index 0000000..2af565a Binary files /dev/null and b/images/AverageFPSvsNumBoids3.png differ diff --git a/images/GridLoopingOptimization.png b/images/GridLoopingOptimization.png new file mode 100644 index 0000000..ed09906 Binary files /dev/null and b/images/GridLoopingOptimization.png differ diff --git a/images/NumNeighborsChecked.png b/images/NumNeighborsChecked.png new file mode 100644 index 0000000..cdef6ef Binary files /dev/null and b/images/NumNeighborsChecked.png differ diff --git a/src/kernel.cu b/src/kernel.cu index 7149917..dd15e02 100644 --- a/src/kernel.cu +++ b/src/kernel.cu @@ -61,6 +61,13 @@ void checkCUDAError(const char *msg, int line = -1) { #define maxSpeed 1.0f +// Toggles the grid looping optimization on and off +#define GRID_LOOPING_OPTIMIZATION 0 +// Controls the width of the grid cells in the uniform grid. 1 is the neighborhood distance and corresponds to checking 27 surrounding cells +// and 2 is double the neighborhood distance and corresponds to checking 8 surrounding cells. Can adjust to different levels long with the grid +// looping to create smaller cells, but also check less area outside the distance radius for each rule +#define GRID_WIDTH_SCALE 2.0f + /*! Size of the starting area in simulation space. */ #define scene_scale 100.0f @@ -96,10 +103,15 @@ int *dev_gridCellEndIndices; // to this cell? // TODO-2.3 - consider what additional buffers you might need to reshuffle // the position and velocity data to be coherent within cells. +glm::vec3* dev_pos2; + // LOOK-2.1 - Grid parameters based on simulation parameters. // These are automatically computed for you in Boids::initSimulation int gridCellCount; int gridSideCount; + +// Set to double neighborhood distance (10) to check 8 surrounding +// cells, set to neighborhood distance (5) to check 27 surrounding cells float gridCellWidth; float gridInverseCellWidth; glm::vec3 gridMinimum; @@ -167,7 +179,7 @@ void Boids::initSimulation(int N) { checkCUDAErrorWithLine("kernGenerateRandomPosArray failed!"); // LOOK-2.1 computing grid params - gridCellWidth = 2.0f * std::max(std::max(rule1Distance, rule2Distance), rule3Distance); + gridCellWidth = GRID_WIDTH_SCALE * std::max(std::max(rule1Distance, rule2Distance), rule3Distance); int halfSideCount = (int)(scene_scale / gridCellWidth) + 1; gridSideCount = 2 * halfSideCount; @@ -179,6 +191,21 @@ void Boids::initSimulation(int N) { gridMinimum.z -= halfGridWidth; // TODO-2.1 TODO-2.3 - Allocate additional buffers here. + cudaMalloc((void**)&dev_particleArrayIndices, N * sizeof(int)); + checkCUDAErrorWithLine("cudaMalloc dev_particleArrayIndices failed!"); + + cudaMalloc((void**)&dev_particleGridIndices, N * sizeof(int)); + checkCUDAErrorWithLine("cudaMalloc dev_particleGridIndices failed!"); + + cudaMalloc((void**)&dev_gridCellStartIndices, gridCellCount * sizeof(int)); + checkCUDAErrorWithLine("cudaMalloc dev_gridCellStartIndices failed!"); + + cudaMalloc((void**)&dev_gridCellEndIndices, gridCellCount * sizeof(int)); + checkCUDAErrorWithLine("cudaMalloc dev_gridCellEndIndices failed!"); + + cudaMalloc((void**)&dev_pos2, N * sizeof(glm::vec3)); + checkCUDAErrorWithLine("cudaMalloc dev_pos2 failed!"); + cudaDeviceSynchronize(); } @@ -243,7 +270,46 @@ __device__ glm::vec3 computeVelocityChange(int N, int iSelf, const glm::vec3 *po // Rule 1: boids fly towards their local perceived center of mass, which excludes themselves // Rule 2: boids try to stay a distance d away from each other // Rule 3: boids try to match the speed of surrounding boids - return glm::vec3(0.0f, 0.0f, 0.0f); + glm::vec3 perceivedCenter = glm::vec3(0.0f, 0.0f, 0.0f); + int numRule1Neighbors = 0; + + glm::vec3 c = glm::vec3(0.0f, 0.0f, 0.0f); + + glm::vec3 perceivedVelocity = glm::vec3(0.0f, 0.0f, 0.0f); + int numRule3Neighbors = 0; + + for (int b = 0; b < N; b++) { + if (b != iSelf) { + float dist = glm::distance(pos[iSelf], pos[b]); + // Rule 1 cohesion + if (dist < rule1Distance) { + perceivedCenter += pos[b]; + numRule1Neighbors++; + } + + // Rule 2 separation + if (dist < rule2Distance) { + c -= pos[b] - pos[iSelf]; + } + + // Rule 3 alignment + if (dist < rule3Distance) { + perceivedVelocity += vel[b]; + numRule3Neighbors++; + } + } + } + if (numRule1Neighbors > 0) { + perceivedCenter /= numRule1Neighbors; + } + if (numRule3Neighbors > 0) { + perceivedVelocity /= numRule3Neighbors; + } + + glm::vec3 v1 = (perceivedCenter - pos[iSelf]) * rule1Scale; + glm::vec3 v2 = c * rule2Scale; + glm::vec3 v3 = perceivedVelocity * rule3Scale; + return v1 + v2 + v3; } /** @@ -254,7 +320,23 @@ __global__ void kernUpdateVelocityBruteForce(int N, glm::vec3 *pos, glm::vec3 *vel1, glm::vec3 *vel2) { // Compute a new velocity based on pos and vel1 // Clamp the speed - // Record the new velocity into vel2. Question: why NOT vel1? + // Record the new velocity into vel2. Question: why NOT vel1? -- other threads computing velocity for + // other boids will use vel1 in there calculations so new velocity should be saved to a different buffer + int index = threadIdx.x + (blockIdx.x * blockDim.x); + if (index >= N) { + return; + } + + glm::vec3 dv = computeVelocityChange(N, index, pos, vel1); + glm::vec3 finalVel = vel1[index] + dv; + + // clamp + float speed = glm::length(finalVel); + if (speed > maxSpeed) { + finalVel = glm::normalize(finalVel) * maxSpeed; + } + + vel2[index] = finalVel; } /** @@ -299,6 +381,14 @@ __global__ void kernComputeIndices(int N, int gridResolution, // - Label each boid with the index of its grid cell. // - Set up a parallel array of integer indices as pointers to the actual // boid data in pos and vel1/vel2 + int index = (blockIdx.x * blockDim.x) + threadIdx.x; + if (index >= N) { + return; + } + glm::ivec3 gridIdx3D = glm::ivec3((pos[index] - gridMin) * inverseCellWidth); + int gridIdx1D = gridIndex3Dto1D(gridIdx3D.x, gridIdx3D.y, gridIdx3D.z, gridResolution); + indices[index] = index; + gridIndices[index] = gridIdx1D; } // LOOK-2.1 Consider how this could be useful for indicating that a cell @@ -316,6 +406,18 @@ __global__ void kernIdentifyCellStartEnd(int N, int *particleGridIndices, // Identify the start point of each cell in the gridIndices array. // This is basically a parallel unrolling of a loop that goes // "this index doesn't match the one before it, must be a new cell!" + int index = (blockIdx.x * blockDim.x) + threadIdx.x; + if (index >= N) { + return; + } + int gridIdx = particleGridIndices[index]; + int prevGridIdx = (index > 0) ? particleGridIndices[index - 1] : -1; + if (gridIdx != prevGridIdx) { + gridCellStartIndices[gridIdx] = index; + if (prevGridIdx >= 0) { + gridCellEndIndices[prevGridIdx] = index; + } + } } __global__ void kernUpdateVelNeighborSearchScattered( @@ -332,6 +434,128 @@ __global__ void kernUpdateVelNeighborSearchScattered( // - Access each boid in the cell and compute velocity change from // the boids rules, if this boid is within the neighborhood distance. // - Clamp the speed change before putting the new speed in vel2 + int index = (blockIdx.x * blockDim.x) + threadIdx.x; + if (index >= N) { + return; + } + + // Set up velocity calculation variables + glm::vec3 perceivedCenter = glm::vec3(0.0f, 0.0f, 0.0f); + int numRule1Neighbors = 0; + glm::vec3 c = glm::vec3(0.0f, 0.0f, 0.0f); + glm::vec3 perceivedVelocity = glm::vec3(0.0f, 0.0f, 0.0f); + int numRule3Neighbors = 0; + + // Calculate bounding cube of grid cells + float neighborhoodDist = imax(imax(rule1Distance, rule2Distance), rule3Distance); + glm::vec3 neighborhoodMinPos = pos[index] - glm::vec3(neighborhoodDist); + glm::vec3 neighborhoodMaxPos = pos[index] + glm::vec3(neighborhoodDist); + glm::ivec3 nieghborhoodMinIdx3D = glm::ivec3(glm::floor((neighborhoodMinPos - gridMin) * inverseCellWidth)); + glm::ivec3 nieghborhoodMaxIdx3D = glm::ivec3(glm::ceil((neighborhoodMaxPos - gridMin) * inverseCellWidth)); + + // Iterate through all potential neighbor cells in a cube + for (int x = nieghborhoodMinIdx3D.x; x < nieghborhoodMaxIdx3D.x; x += 1) { + for (int y = nieghborhoodMinIdx3D.y; y < nieghborhoodMaxIdx3D.y; y += 1) { + for (int z = nieghborhoodMinIdx3D.z; z < nieghborhoodMaxIdx3D.z; z += 1) { + glm::ivec3 neighborGridIdx3D = glm::ivec3(x, y, z); + // Check if neighborGridIdx3D is within bounds + if (neighborGridIdx3D.x < 0 || neighborGridIdx3D.x >= gridResolution || + neighborGridIdx3D.y < 0 || neighborGridIdx3D.y >= gridResolution || + neighborGridIdx3D.z < 0 || neighborGridIdx3D.z >= gridResolution) { + continue; + } + + // If using Grid-Looping optimization, Check if any part of the cell is within the neighborhood distance + if (GRID_LOOPING_OPTIMIZATION) { + glm::vec3 gridCellMin = gridMin + glm::vec3(neighborGridIdx3D) * cellWidth; + glm::vec3 gridCellMax = gridCellMin + glm::vec3(cellWidth); + // Compute closest point in grid cell to boid position + glm::vec3 closest = glm::clamp(pos[index], gridCellMin, gridCellMax); + // See if distance to closest point is within neighborhood distance + float dist = glm::distance(closest, pos[index]); + if (dist > neighborhoodDist) { + continue; + } + } + + // Check its start and end indices and that it has at least one boid + int neighborGridIdx1D = gridIndex3Dto1D(neighborGridIdx3D.x, neighborGridIdx3D.y, neighborGridIdx3D.z, gridResolution); + int startIdx = gridCellStartIndices[neighborGridIdx1D]; + int endIdx = gridCellEndIndices[neighborGridIdx1D]; + if (startIdx == -1 || endIdx == -1) { + continue; + } + + // for each boid in the cell compute velocity change + for (int i = startIdx; i < endIdx; i++) { + int b = particleArrayIndices[i]; + if (b == index) { + continue; + } + float dist = glm::distance(pos[index], pos[b]); + if (b != index) { + if (dist < rule1Distance) { + perceivedCenter += pos[b]; + numRule1Neighbors++; + } + if (dist < rule2Distance) { + c -= pos[b] - pos[index]; + } + if (dist < rule3Distance) { + perceivedVelocity += vel1[b]; + numRule3Neighbors++; + } + } + } + } + } + } + // Finish the velocity calculation + if (numRule1Neighbors > 0) { + perceivedCenter /= numRule1Neighbors; + } + if (numRule3Neighbors > 0) { + perceivedVelocity /= numRule3Neighbors; + } + glm::vec3 v1 = (perceivedCenter - pos[index]) * rule1Scale; + glm::vec3 v2 = c * rule2Scale; + glm::vec3 v3 = perceivedVelocity * rule3Scale; + + glm::vec3 finalVel = vel1[index] + v1 + v2 + v3; + + // clamp + float speed = glm::length(finalVel); + if (speed > maxSpeed) { + finalVel = glm::normalize(finalVel) * maxSpeed; + } + + vel2[index] = finalVel; +} + +__global__ void kernReshufflePosVel(int N, int * particleArrayIndices, + glm::vec3 *pos, glm::vec3 *pos2, glm::vec3 *vel1, glm::vec3 *vel2) { + // Uses particleArrayIndices to reshuffle pos and vel1 into pos2 and vel2 so + // that in a single cell they are contiguous in memory + int index2 = (blockIdx.x * blockDim.x) + threadIdx.x; + if (index2 >= N) { + return; + } + int index1 = particleArrayIndices[index2]; + pos2[index2] = pos[index1]; + vel2[index2] = vel1[index1]; +} + +__global__ void kernUnshufflePosVel(int N, int* particleArrayIndices, + glm::vec3* pos, glm::vec3* pos2, glm::vec3* vel1, glm::vec3* vel2) { + // Uses particleArrayIndices to unshuffle pos2 and vel1 into pos1 and vel2 so + // that they are back in their original order (not contiguous in memory) + int index1 = (blockIdx.x * blockDim.x) + threadIdx.x; + if (index1 >= N) { + return; + } + int index2 = particleArrayIndices[index1]; + pos[index1] = pos2[index2]; + vel2[index1] = vel1[index2]; } __global__ void kernUpdateVelNeighborSearchCoherent( @@ -350,7 +574,107 @@ __global__ void kernUpdateVelNeighborSearchCoherent( // checked in to maximize the memory benefits of reordering the boids data. // - Access each boid in the cell and compute velocity change from // the boids rules, if this boid is within the neighborhood distance. - // - Clamp the speed change before putting the new speed in vel2 + // - Clamp the speed change before putting the new speed in + + int index = (blockIdx.x * blockDim.x) + threadIdx.x; + if (index >= N) { + return; + } + + // Set up velocity calculation variables + glm::vec3 perceivedCenter = glm::vec3(0.0f, 0.0f, 0.0f); + int numRule1Neighbors = 0; + glm::vec3 c = glm::vec3(0.0f, 0.0f, 0.0f); + glm::vec3 perceivedVelocity = glm::vec3(0.0f, 0.0f, 0.0f); + int numRule3Neighbors = 0; + + // Calculate bounding cube of grid cells + float neighborhoodDist = imax(imax(rule1Distance, rule2Distance), rule3Distance); + glm::vec3 neighborhoodMinPos = pos[index] - glm::vec3(neighborhoodDist); + glm::vec3 neighborhoodMaxPos = pos[index] + glm::vec3(neighborhoodDist); + glm::ivec3 nieghborhoodMinIdx3D = glm::ivec3(glm::floor((neighborhoodMinPos - gridMin) * inverseCellWidth)); + glm::ivec3 nieghborhoodMaxIdx3D = glm::ivec3(glm::ceil((neighborhoodMaxPos - gridMin) * inverseCellWidth)); + + // Iterate through all potential neighbor cells, iterate in z, y, x order because when calculating the 1D index x changes + // index by 1, y changes index by gridResolution, and z changes index by gridResolution^2 (so having x in innermost for loop + // maximizes memory coherence) + for (int z = nieghborhoodMinIdx3D.z; z < nieghborhoodMaxIdx3D.z; z++) { + for (int y = nieghborhoodMinIdx3D.y; y < nieghborhoodMaxIdx3D.y; y++) { + for (int x = nieghborhoodMinIdx3D.x; x < nieghborhoodMaxIdx3D.x; x++) { + glm::ivec3 neighborGridIdx3D = glm::ivec3(x, y, z); + // Check if neighborGridIdx3D is within bounds + if (neighborGridIdx3D.x < 0 || neighborGridIdx3D.x >= gridResolution || + neighborGridIdx3D.y < 0 || neighborGridIdx3D.y >= gridResolution || + neighborGridIdx3D.z < 0 || neighborGridIdx3D.z >= gridResolution) { + continue; + } + + // If using Grid-Looping optimization, Check if any part of the cell is within the neighborhood distance + if (GRID_LOOPING_OPTIMIZATION) { + glm::vec3 gridCellMin = gridMin + glm::vec3(neighborGridIdx3D) * cellWidth; + glm::vec3 gridCellMax = gridCellMin + glm::vec3(cellWidth); + // Compute closest point in grid cell to boid position + glm::vec3 closest = glm::clamp(pos[index], gridCellMin, gridCellMax); + // See if distance to closest point is within neighborhood distance + float dist = glm::distance(closest, pos[index]); + if (dist > neighborhoodDist) { + continue; + } + } + + // Check its start and end indices and that it has at least one boid + int neighborGridIdx1D = gridIndex3Dto1D(neighborGridIdx3D.x, neighborGridIdx3D.y, neighborGridIdx3D.z, gridResolution); + int startIdx = gridCellStartIndices[neighborGridIdx1D]; + int endIdx = gridCellEndIndices[neighborGridIdx1D]; + if (startIdx == -1 || endIdx == -1) { + continue; + } + + // for each boid in the cell compute velocity change, startIdx and endIdx now refer directly to pos and vel1 + for (int i = startIdx; i < endIdx; i++) { + int b = i; + if (b == index) { + continue; + } + float dist = glm::distance(pos[index], pos[b]); + if (b != index) { + if (dist < rule1Distance) { + perceivedCenter += pos[b]; + numRule1Neighbors++; + } + if (dist < rule2Distance) { + c -= pos[b] - pos[index]; + } + if (dist < rule3Distance) { + perceivedVelocity += vel1[b]; + numRule3Neighbors++; + } + } + } + } + } + } + // Finish the velocity calculation + if (numRule1Neighbors > 0) { + perceivedCenter /= numRule1Neighbors; + } + if (numRule3Neighbors > 0) { + perceivedVelocity /= numRule3Neighbors; + } + glm::vec3 v1 = (perceivedCenter - pos[index]) * rule1Scale; + glm::vec3 v2 = c * rule2Scale; + glm::vec3 v3 = perceivedVelocity * rule3Scale; + + glm::vec3 finalVel = vel1[index] + v1 + v2 + v3; + + // clamp + float speed = glm::length(finalVel); + if (speed > maxSpeed) { + finalVel = glm::normalize(finalVel) * maxSpeed; + } + + vel2[index] = finalVel; + } /** @@ -359,6 +683,16 @@ __global__ void kernUpdateVelNeighborSearchCoherent( void Boids::stepSimulationNaive(float dt) { // TODO-1.2 - use the kernels you wrote to step the simulation forward in time. // TODO-1.2 ping-pong the velocity buffers + dim3 blocksPerGrid((numObjects + blockSize - 1) / blockSize); + kernUpdateVelocityBruteForce<<>>(numObjects, dev_pos, dev_vel1, dev_vel2); + checkCUDAErrorWithLine("kernUpdateVelocityBruteForce failed!"); + kernUpdatePos<<>>(numObjects, dt, dev_pos, dev_vel2); + checkCUDAErrorWithLine("kernUpdatePos failed!"); + + // ping-pong velocities + glm::vec3* dev_oldVel1 = dev_vel1; + dev_vel1 = dev_vel2; + dev_vel2 = dev_oldVel1; } void Boids::stepSimulationScatteredGrid(float dt) { @@ -374,6 +708,42 @@ void Boids::stepSimulationScatteredGrid(float dt) { // - Perform velocity updates using neighbor search // - Update positions // - Ping-pong buffers as needed + dim3 boidBlocksPerGrid((numObjects + blockSize - 1) / blockSize); + kernComputeIndices<<>>( + numObjects, gridSideCount, gridMinimum, gridInverseCellWidth, + dev_pos, dev_particleArrayIndices, dev_particleGridIndices); + checkCUDAErrorWithLine("kernComputeIndices failed!"); + + dev_thrust_particleArrayIndices = thrust::device_ptr(dev_particleArrayIndices); + dev_thrust_particleGridIndices = thrust::device_ptr(dev_particleGridIndices); + thrust::sort_by_key(dev_thrust_particleGridIndices, dev_thrust_particleGridIndices + numObjects, dev_thrust_particleArrayIndices); + + dim3 cellBlocksPerGrid((gridCellCount + blockSize - 1) / blockSize); + kernResetIntBuffer <<>> (gridCellCount, dev_gridCellStartIndices, -1); + checkCUDAErrorWithLine("kernResetIntBuffer failed!"); + kernResetIntBuffer << > > (gridCellCount, dev_gridCellEndIndices, -1); + checkCUDAErrorWithLine("kernResetIntBuffer failed!"); + + kernIdentifyCellStartEnd<<>>( + numObjects, dev_particleGridIndices, + dev_gridCellStartIndices, dev_gridCellEndIndices); + checkCUDAErrorWithLine("kernIdentifyCellStartEnd failed!"); + + kernUpdateVelNeighborSearchScattered<<>>( + numObjects, gridSideCount, gridMinimum, + gridInverseCellWidth, gridCellWidth, + dev_gridCellStartIndices, dev_gridCellEndIndices, + dev_particleArrayIndices, + dev_pos, dev_vel1, dev_vel2); + checkCUDAErrorWithLine("kernUpdateVelNeighborSearchScattered failed!"); + + kernUpdatePos<<>>(numObjects, dt, dev_pos, dev_vel2); + checkCUDAErrorWithLine("kernUpdatePos failed!"); + + //ping-pong velocities + glm::vec3* dev_oldVel1 = dev_vel1; + dev_vel1 = dev_vel2; + dev_vel2 = dev_oldVel1; } void Boids::stepSimulationCoherentGrid(float dt) { @@ -392,6 +762,60 @@ void Boids::stepSimulationCoherentGrid(float dt) { // - Perform velocity updates using neighbor search // - Update positions // - Ping-pong buffers as needed. THIS MAY BE DIFFERENT FROM BEFORE. + + // Velocity buffer managing procedure: + // Start with curr velocities and positions in vel 1 and pos 1 + // Compute reshuffled velocities and positions into vel 2 and pos 2 + // Calculate Updated Velocities and Positions into vel 1 and pos 2 (these will also be in the reshuffled state) + // Unshuffle the velocites and positions back into vel 2 and pos 1 + // Ping-pong the velocity buffers so that updated velocities are in vel 1 and updated positions are in pos 1 + + dim3 boidBlocksPerGrid((numObjects + blockSize - 1) / blockSize); + kernComputeIndices <<>> ( + numObjects, gridSideCount, gridMinimum, gridInverseCellWidth, + dev_pos, dev_particleArrayIndices, dev_particleGridIndices); + checkCUDAErrorWithLine("kernComputeIndices failed!"); + + dev_thrust_particleArrayIndices = thrust::device_ptr(dev_particleArrayIndices); + dev_thrust_particleGridIndices = thrust::device_ptr(dev_particleGridIndices); + thrust::sort_by_key(dev_thrust_particleGridIndices, dev_thrust_particleGridIndices + numObjects, dev_thrust_particleArrayIndices); + + dim3 cellBlocksPerGrid((gridCellCount + blockSize - 1) / blockSize); + kernResetIntBuffer<<>>(gridCellCount, dev_gridCellStartIndices, -1); + checkCUDAErrorWithLine("kernResetIntBuffer failed!"); + kernResetIntBuffer<<>>(gridCellCount, dev_gridCellEndIndices, -1); + checkCUDAErrorWithLine("kernResetIntBuffer failed!"); + + kernIdentifyCellStartEnd<<>>( + numObjects, dev_particleGridIndices, + dev_gridCellStartIndices, dev_gridCellEndIndices); + checkCUDAErrorWithLine("kernIdentifyCellStartEnd failed!"); + + // BIG DIFFERENCE + kernReshufflePosVel<<>> ( + numObjects, dev_particleArrayIndices, + dev_pos, dev_pos2, dev_vel1, dev_vel2); + checkCUDAErrorWithLine("kernReshufflePosVel failed!"); + + kernUpdateVelNeighborSearchCoherent<<>>( + numObjects, gridSideCount, gridMinimum, + gridInverseCellWidth, gridCellWidth, + dev_gridCellStartIndices, dev_gridCellEndIndices, + dev_pos2, dev_vel2, dev_vel1); + checkCUDAErrorWithLine("kernUpdateVelNeighborSearchScattered failed!"); + + kernUpdatePos<<>>(numObjects, dt, dev_pos2, dev_vel1); + checkCUDAErrorWithLine("kernUpdatePos failed!"); + + kernUnshufflePosVel<<>>( + numObjects, dev_particleArrayIndices, + dev_pos, dev_pos2, dev_vel1, dev_vel2); + checkCUDAErrorWithLine("kernUnshufflePosVel failed!"); + + //ping-pong velocities and positions + glm::vec3* dev_oldVel1 = dev_vel1; + dev_vel1 = dev_vel2; + dev_vel2 = dev_oldVel1; } void Boids::endSimulation() { @@ -400,6 +824,12 @@ void Boids::endSimulation() { cudaFree(dev_pos); // TODO-2.1 TODO-2.3 - Free any additional buffers here. + cudaFree(dev_particleArrayIndices); + cudaFree(dev_particleGridIndices); + cudaFree(dev_gridCellStartIndices); + cudaFree(dev_gridCellEndIndices); + + cudaFree(dev_pos2); } void Boids::unitTest() { @@ -465,3 +895,17 @@ void Boids::unitTest() { checkCUDAErrorWithLine("cudaFree failed!"); return; } + +// Getter functions for data collection + +int Boids::getBlockSize() { + return blockSize; +} + +int Boids::getGridLoopingOptimization() { + return GRID_LOOPING_OPTIMIZATION; +} + +float Boids::getGridWidthScale() { + return GRID_WIDTH_SCALE; +} \ No newline at end of file diff --git a/src/kernel.h b/src/kernel.h index a38b64d..4f213ac 100644 --- a/src/kernel.h +++ b/src/kernel.h @@ -6,6 +6,9 @@ namespace Boids { void stepSimulationScatteredGrid(float dt); void stepSimulationCoherentGrid(float dt); void copyBoidsToVBO(float *vbodptr_positions, float *vbodptr_velocities); + int getBlockSize(); + int getGridLoopingOptimization(); + float getGridWidthScale(); void endSimulation(); void unitTest(); diff --git a/src/main.cpp b/src/main.cpp index 9c917c0..5b04af5 100644 --- a/src/main.cpp +++ b/src/main.cpp @@ -17,19 +17,58 @@ #include #include +#include + // ================ // Configuration // ================ // LOOK-2.1 LOOK-2.3 - toggles for UNIFORM_GRID and COHERENT_GRID -#define VISUALIZE 1 +#define VISUALIZE 0 #define UNIFORM_GRID 0 -#define COHERENT_GRID 0 +#define COHERENT_GRID 0S + +// Set to 1 to write fps collection data to a log file +// fps is collected and documented for 20 seconds and then averaged +#define DATA_COLLECTION 0 // LOOK-1.2 - change this to adjust particle count in the simulation const int N_FOR_VIS = 5000; const float DT = 0.2f; + +// Helper Function to write a data collection log +void writeDataCollectionRow(const double data[], int size) { + const char* dataPath = "data.csv"; + std::ofstream dataFile(dataPath, std::ios::app); + if (!dataFile) { + return; + } + const int block_size = Boids::getBlockSize(); + const int grid_looping_optimization = Boids::getGridLoopingOptimization(); + const float grid_width_scale = Boids::getGridWidthScale(); + + dataFile << "Visualize: " << (VISUALIZE ? 1 : 0) << ", " + << "Uniform Grid: " << (UNIFORM_GRID ? 1 : 0) << ", " + << "Coherent_Grid: " << (COHERENT_GRID ? 1 : 0) << ", " + << "Boid Count:" << N_FOR_VIS << ", " + << "Block Size: " << block_size << ", " + << "Grid Looping Optimization: " << grid_looping_optimization << ", " + << "Grid Width Scale: " << grid_width_scale; + + double sum = 0.0; + for (int i = 2; i < size; i++) { + sum += data[i]; + } + dataFile << ", Average FPS:" << (sum /(size - 2)) << ", Data: "; + for (int i = 2; i < size; i++) { + dataFile << data[i] << ", "; + } + dataFile << std::endl; +} + + + /** * C main function. */ @@ -226,6 +265,12 @@ void initShaders(GLuint * program) { double timebase = 0; int frame = 0; + #if DATA_COLLECTION + double fpsData[12]; // Take 12 data points, ignore the first two + int sampleCount = 0; + bool wroteData = false; + #endif + Boids::unitTest(); // LOOK-1.2 We run some basic example code to make sure // your CUDA development setup is ready to go. @@ -239,8 +284,22 @@ void initShaders(GLuint * program) { fps = frame / (time - timebase); timebase = time; frame = 0; + #if DATA_COLLECTION + if (!wroteData) { + if (sampleCount < 12) { + fpsData[sampleCount] = fps; + sampleCount++; + } + else { + writeDataCollectionRow(fpsData, sampleCount); + wroteData = true; + std::cout << "Collected Data" << std::endl; + } + } + #endif } + runCUDA(); std::ostringstream ss;