Skip to content

CUDA: scale q8->f16 cache reserve on >=112 GiB cards (fixes session OOM on large models)#472

Open
slackarea wants to merge 1 commit into
antirez:mainfrom
vcnngr:fix-cuda-q8f16-reserve
Open

CUDA: scale q8->f16 cache reserve on >=112 GiB cards (fixes session OOM on large models)#472
slackarea wants to merge 1 commit into
antirez:mainfrom
vcnngr:fix-cuda-q8f16-reserve

Commits

Commits on Jun 28, 2026