When using the --power flag I saw extremely bad decode speeds ~0.11tok/s.
I tried the extremely naive approach of having a warmup period in order to get a more reliable avg speed and a guard against outliers, which seemed to solve the issue.
This was from an m5 max 128gb ./ds4 -c 30000 --power 50
ds4: metal graph token pos=96 encode=6.150 ms execute=3499.725 ms read=0.010 ms total=3505.885 ms logits=1
ds4: metal graph token pos=97 encode=5.487 ms execute=1018.332 ms read=0.008 ms total=1023.827 ms logits=1
ds4: metal graph token pos=98 encode=5.848 ms execute=1023.379 ms read=0.008 ms total=1029.235 ms logits=1
ds4: prefill: 25.39 t/s, generation: 0.24 t/s
#464
^ for your reference @antirez
When using the --power flag I saw extremely bad decode speeds ~0.11tok/s.
I tried the extremely naive approach of having a warmup period in order to get a more reliable avg speed and a guard against outliers, which seemed to solve the issue.
This was from an m5 max 128gb
./ds4 -c 30000 --power 50#464
^ for your reference @antirez