With #14 we introduced a direct dependency to the nvidia-gpu-operator
Dependabot flagged the dependency for potential vulnerability:
The goal of this issue is to discuss how to proceed.
I looked at the source code and the operator checks for the nvidia gpu operator cluster policy to know if the driver is enabled.
Here some of the calls:
While checking the cluster policy gives us hints on the gpu-operator configuration, this does not ensure that the driver is actually installed. Errors during the installation might translate in the driver not being present.
Question: Should we verify if the driver is enabled by inspecting the current status of the node?
For instance, files like /proc/driver/nvidia/version or /proc/driver/nvidia/version can give us insight on the current state. As a bonus point we also remove the direct dependency to the nvidia gpu-operator
With #14 we introduced a direct dependency to the nvidia-gpu-operator
Dependabot flagged the dependency for potential vulnerability:
The goal of this issue is to discuss how to proceed.
I looked at the source code and the operator checks for the nvidia gpu operator cluster policy to know if the driver is enabled.
Here some of the calls:
While checking the cluster policy gives us hints on the gpu-operator configuration, this does not ensure that the driver is actually installed. Errors during the installation might translate in the driver not being present.
Question: Should we verify if the driver is enabled by inspecting the current status of the node?
For instance, files like
/proc/driver/nvidia/versionor/proc/driver/nvidia/versioncan give us insight on the current state. As a bonus point we also remove the direct dependency to the nvidia gpu-operator