CUDA: scale q8->f16 cache reserve on >=112 GiB cards (fixes session OOM on large models)#472

Open

slackarea wants to merge 1 commit into

Enhance your code review process with GitHub Actions

GitHub Actions make it easy to automate all your software workflows, now with world-class CI/CD.
Build, test, and deploy your code right from GitHub. Learn more about GitHub Actions.

Linux, macOS, Windows, and containers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA: scale q8->f16 cache reserve on >=112 GiB cards (fixes session OOM on large models)#472

CUDA: scale q8->f16 cache reserve on >=112 GiB cards (fixes session OOM on large models)#472
slackarea wants to merge 1 commit into
antirez:mainfrom
vcnngr:fix-cuda-q8f16-reserve

Enhance your code review process with GitHub Actions

Linux, macOS, Windows, and containers

Matrix builds

Any language

Live logs

Built-in secret store

Multi-container testing