$ make mpicc -g -O2 flux.c -c mpicc -g -O2 main.c -c mpicc -g -O2 lim.c -c mpicc -g -O2 smooth.c -c mpicc flux.o main.o lim.o smooth.o -lm -o taubench
To change the compiler and compiler flags
$ make 'MPICC = mpicc.openmpi' 'CFLAGS = -O3 -mtune=native -march=native -flto' mpicc.openmpi -O3 -mtune=native -march=native -flto flux.c -c mpicc.openmpi -O3 -mtune=native -march=native -flto main.c -c mpicc.openmpi -O3 -mtune=native -march=native -flto lim.c -c mpicc.openmpi -O3 -mtune=native -march=native -flto smooth.c -c mpicc.openmpi flux.o main.o lim.o smooth.o -lm -o taubenchTau supports both vector and cache colored grids, you might also need to adapt the compiler directives in nodep.inc and expand.inc. The benchmark itself can be used with two flags, the gridsize per process and the number of pseudo steps:
$ mpiexec -n 2 ./taubench -n 100000 -s 10
This is TauBench.
Evaluating kernels - please be patient.
..........
- kernel_1_0 : 0.315 secs - 4470.269 mflops
- kernel_1_1 : 0.131 secs - 1666.135 mflops
- kernel_2_1 : 0.210 secs - 4733.000 mflops
- kernel_2_2 : 0.186 secs - 6560.788 mflops
- kernel_2_3 : 0.089 secs - 2759.613 mflops
- kernel_2_4 : 0.069 secs - 7992.762 mflops
- kernel_3_0 : 0.344 secs - 7952.712 mflops
total : 1.311 secs - 9658.964 mflops
points : 100000
steps : 10
procs : 2
comp : 1.302 secs
comm : 0.009 secs
comm ratio : 0.007
on one node
$ mpiexec --mca btl sm,self ./taubench -n 100000 -s 10
This is TauBench.
Evaluating kernels - please be patient.
..........
- kernel_1_0 : 0.312 secs - 4509.797 mflops
- kernel_1_1 : 0.139 secs - 1564.283 mflops
- kernel_2_1 : 0.164 secs - 6071.223 mflops
- kernel_2_2 : 0.156 secs - 7844.050 mflops
- kernel_2_3 : 0.072 secs - 3411.729 mflops
- kernel_2_4 : 0.080 secs - 6946.915 mflops
- kernel_3_0 : 0.157 secs - 17392.511 mflops
total : 1.106 secs - 22890.682 mflops
points : 100000
steps : 10
procs : 4
comp : 1.095 secs
comm : 0.011 secs
comm ratio : 0.010