Skip to content

Add Go SDK for GPUs#688

Closed
pteranodan wants to merge 6 commits into
NVIDIA:device-apifrom
pteranodan:add-go-sdk-v2
Closed

Add Go SDK for GPUs#688
pteranodan wants to merge 6 commits into
NVIDIA:device-apifrom
pteranodan:add-go-sdk-v2

Conversation

@pteranodan
Copy link
Copy Markdown
Contributor

Summary

This PR introduces a Go SDK for the NVIDIA Device API, enabling Kubernetes-native patterns (Informers, Listers, controller-runtime) over a node-local gRPC connection via generated clients.

It also slightly refactors the internal protobuf representations in the api module to ensure seamless compatibility with the Kubernetes ecosystem without changing the core of the API.

Key Changes

1. API Module Refactor (api/)

Refactored the API layer to support stricter interface compliance and standard client machinery:

  • Object Model Alignment:
    • Formalizes metadata by moving fields into dedicated ObjectMeta and ListMeta message types to satisfy metav1 interface requirements without needing custom conversions.
    • Adds resource_version fields to enable the optimistic concurrency and Watch semantics required by K8s caching layers.
  • Request Options:
    • Formalize request options with new GetOptions and ListOptions message types to satisfy standard Kubernetes client APIs.
  • Go Type & Conversion Layer:
    • Introduces native Go types to serve as the internal object model, decoupling logic from the Protobuf wire format.
    • Code Generation: All helper generation (e.g., deepcopy methods) handled by a centralized code-generator library via make code-gen.

2. Go Client SDK (client-go/)

Adds a generated client that provides the standard client-go experience over a node-local gRPC connection via Unix domain sockets (UDS).

  • Components: Generated Clientset (and Fakes), Listers, and Informers.
  • Patterns: Enables users to drive logic using standard Informer caches and Reconciler loops.

3. Code Generation (code-generator)

Introduces a dedicated directory for managing all code generation:

  • Bash Library: Provides kube_codegen.sh, a reusable library for orchestrating generation of Go bindings, gRPC interfaces, helpers, and clients (and fakes).
  • Custom Client Generator: Contains a modified client-gen which handles gRPC service mapping to the standard K8s client interfaces.

3. Documentation & Examples

  • Patterns: contains examples demonstrating various use cases and functionality of the NVIDIA Device API Go client.
  • Testing: Includes a fake-server to mock the Device API for local development/verification.
  • Guides: Adds or updates README.md's, DEVELOPMENT.md's, and CONFIGURATION.md's.

Future Extensibility

  • Expanding Capability: Adding write operations (e.g., Update, Patch) to existing resources only requires updating the // +genclient tags in the API module, implementing the corresponding service methods on the server.
    • Requires one time implementation of the corresponding methods in the modified client-gen machinery.
  • Onboarding New Types: Define new resources in the api module with standard markers will automatically generate the full stack: DeepCopy methods, Proto-to-Go conversions, Typed Clients (and Fakes), Informers, and Listers.

Type of Change

  • 🐛 Bug fix
  • ✨ New feature
  • 💥 Breaking change (Internal Protobuf refactor)
  • 📚 Documentation
  • 🔧 Refactoring
  • 🔨 Build/CI

Component(s) Affected

  • Core Services
  • Documentation/CI
  • Other: Protobuf definitions / Examples

Testing

  • Tests pass locally
  • Manual testing completed:
    • Verified all 3 examples locally with fake server.
  • No breaking changes (or documented)

Checklist

  • Self-review completed
  • Documentation updated (if needed)
  • Ready for review

Review Guidance

This PR includes a significant amount of generated code which is collapsed by default in the diff (via .gitattributes).

Files to Skip:

  • api/**/zz_generated.*.go (DeepCopy, Conversions, etc.)
  • *.pb.go (Protobuf bindings)
  • client-go/{client,informers,listers}/** (Standard K8s client machinery)

Focus Areas:

  • Core Logic: api/ (Go types), client-go/nvgrpc/ (Connection logic), and code-generator/
  • Example Implementations: client-go/examples/

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Jan 13, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@ArangoGutierrez
Copy link
Copy Markdown
Contributor

/ok-to-test 68292a4

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Comment thread code-generator/cmd/client-gen/generators/util/tags.go Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 98 out of 102 changed files in this pull request and generated no new comments.

Copy link
Copy Markdown
Contributor

@ArangoGutierrez ArangoGutierrez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Dan Huenecke <dhuenecke@nvidia.com>
Signed-off-by: Dan Huenecke <dhuenecke@nvidia.com>
Signed-off-by: Dan Huenecke <dhuenecke@nvidia.com>
Signed-off-by: Dan Huenecke <dhuenecke@nvidia.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 98 out of 102 changed files in this pull request and generated 2 comments.

Comment thread code-generator/cmd/client-gen/generators/generator_for_type.go Outdated
Comment thread code-generator/cmd/client-gen/generators/generator_for_type.go Outdated
Signed-off-by: Dan Huenecke <dhuenecke@nvidia.com>
@pteranodan pteranodan force-pushed the add-go-sdk-v2 branch 9 times, most recently from 866111e to 2c8bdb0 Compare January 14, 2026 20:09
Comment thread .github/workflows/device-api-ci.yml
Signed-off-by: Dan Huenecke <dhuenecke@nvidia.com>
@pteranodan
Copy link
Copy Markdown
Contributor Author

Closing this PR in favor of #692 to avoid fighting git/github ui after squashed version didn't share enough common history

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants