CUDA: scale q8->f16 cache reserve on >=112 GiB cards (fixes session OOM on large models)#472

Open

slackarea wants to merge 1 commit into

antirez:mainfrom

vcnngr:fix-cuda-q8f16-reserve

Commits on Jun 28, 2026

CUDA: scale q8->f16 cache reserve on >=112 GiB cards

slackarea
and
claude
committed