Skip to content

Support for NVSwitch in Shared NVSwitch Virtualization Model #24

Description

@LandonTClipp

I'm running a Kata containers k8s cluster with HGX H100 servers. I want to run fabricmanager in the Shared NVSwitch Virtualization Mode whereby a service VM runs fabricmanager to set up the NVSwitch partitions. This could be done by making a Kata daemonset that only runs fabricmanager and has the NVSwitch devices passed through. However this requires k8s to know about NVSwitch devices. I would like to ask if there are any plans to support this, and also whether you'd be open to contributions to get this supported.

This has tie-ins with the CoCo work already released in GPU Operator: https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/confidential-containers-deploy.html and another PR just recently opened in GPU Operator that allows further configuration of fabricmanager.

CC @zvonkok @fidencio

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions