Commit ab4dddb
committed
feat: device-api-server with NVML provider, client SDK, and gRPC services
Implement a Kubernetes-style device API server with the following components:
Core server (cmd/device-api-server):
- gRPC-based controlplane API server using apiserver-runtime patterns
- GPU service with full CRUD + Watch + UpdateStatus via storage.Interface
- Server lifecycle management with graceful shutdown
- Health and metrics endpoints with gRPC reflection
Storage (pkg/storage):
- In-memory storage.Interface implementation with watch support
- Configurable watch channel buffer sizes and event drop metrics
- Factory pattern for storage backend selection
NVML provider (cmd/nvml-provider):
- Sidecar that enumerates GPUs via NVML and registers them with the API
- XID error event monitoring with health condition updates
- Reconciliation loop with configurable intervals
- Environment-based driver root configuration (NVIDIA_DRIVER_ROOT)
Client SDK (pkg/client-go):
- Typed gRPC client with Get/List/Watch/Create/Update/UpdateStatus/Delete
- Fake client for testing with k8s.io/client-go/testing integration
- Informers and listers following Kubernetes client-go conventions
- Clientset pattern for versioned API access
Code generator (code-generator):
- Fork of k8s.io/code-generator/cmd/client-gen for gRPC backends
- Generates typed clients, fake clients, and expansion interfaces
- Full UpdateStatus template with proper gRPC implementation
- Integrated into hack/update-codegen.sh pipeline
Proto API (api/proto/device/v1alpha1):
- GPU resource with spec (UUID) and status (conditions, recommendedAction)
- Standard CRUD + Watch + UpdateStatus RPCs
- K8s-style request/response patterns with options
Security hardening:
- gRPC message size and stream limits
- Server error detail scrubbing for client responses
- Unix socket path validation and restrictive permissions
- Localhost-only enforcement for insecure credentials
Deployment (deployments/helm):
- Helm chart with configurable storage, gRPC, health, and metrics
- Static manifests with versioned image references
- Dockerfile with pinned base images
Testing:
- Unit tests for storage, services, providers, and utilities
- Integration tests for client-go with full gRPC stack
- Shared testutil with bufconn gRPC test helpers
- Fake client examples for consumer testing patterns
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>1 parent 6bf52de commit ab4dddb
174 files changed
Lines changed: 28098 additions & 3202 deletions
File tree
- .github
- ISSUE_TEMPLATE
- headers
- workflows
- api
- device/v1alpha1
- proto/device/v1alpha1
- cmd
- device-api-server
- nvml-provider
- code-generator
- cmd/client-gen
- args
- generators
- fake
- scheme
- util
- types
- demos
- deployments
- container
- helm
- device-api-server
- templates
- static
- docs
- api
- design
- operations
- examples
- client
- controller
- fake-client
- fake-server
- watch
- hack
- internal/generated/device/v1alpha1
- pkg
- client-go
- client/versioned
- fake
- scheme
- typed/device/v1alpha1
- fake
- informers/externalversions
- device
- v1alpha1
- internalinterfaces
- listers/device/v1alpha1
- controlplane/apiserver
- api
- metrics
- options
- grpc
- registry
- grpc/client
- providers/nvml
- services/device/v1alpha1
- storage
- memory
- storagebackend
- options
- testutil
- util
- net
- testutils
- validation
- verflag
- version
- test/integration/client-go/device/v1alpha1
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
41 | 41 | | |
42 | 42 | | |
43 | 43 | | |
44 | | - | |
45 | | - | |
46 | | - | |
47 | | - | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
48 | 48 | | |
49 | 49 | | |
50 | 50 | | |
| |||
65 | 65 | | |
66 | 66 | | |
67 | 67 | | |
68 | | - | |
69 | | - | |
70 | | - | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
71 | 71 | | |
72 | | - | |
73 | | - | |
74 | | - | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
75 | 75 | | |
76 | 76 | | |
77 | 77 | | |
| |||
83 | 83 | | |
84 | 84 | | |
85 | 85 | | |
86 | | - | |
| 86 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
18 | | - | |
19 | | - | |
20 | | - | |
21 | | - | |
22 | | - | |
| 18 | + | |
| 19 | + | |
23 | 20 | | |
24 | | - | |
25 | | - | |
| 21 | + | |
| 22 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
56 | 56 | | |
57 | 57 | | |
58 | 58 | | |
59 | | - | |
60 | | - | |
61 | | - | |
62 | | - | |
63 | | - | |
64 | | - | |
65 | | - | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
66 | 65 | | |
67 | | - | |
| 66 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
7 | | - | |
| 7 | + | |
8 | 8 | | |
9 | 9 | | |
10 | | - | |
11 | | - | |
12 | | - | |
13 | | - | |
14 | | - | |
15 | | - | |
16 | | - | |
17 | | - | |
18 | | - | |
19 | | - | |
20 | | - | |
21 | | - | |
22 | | - | |
23 | | - | |
24 | | - | |
| 10 | + | |
25 | 11 | | |
26 | 12 | | |
| 13 | + | |
| 14 | + | |
27 | 15 | | |
28 | 16 | | |
29 | | - | |
| 17 | + | |
0 commit comments