NVIDIA AI Blueprint: Nsight Copilot

GPU development demands deep expertise across CUDA, parallel computing, and performance optimization. Developers frequently context-switch between documentation, code examples, and best practices scattered across multiple sources. Traditional code assistants lack the specialized knowledge required for high-performance GPU programming.

This blueprint deploys Nsight Copilot on DGX Spark — a self-hosted backend for the Nsight Copilot Visual Studio Code extension that delivers expert-level, contextually aware answers to complex CUDA challenges, generates optimized CUDA snippets and kernels from natural language descriptions, and grounds every response in authoritative CUDA documentation via retrieval-augmented generation. Developers can run the backend locally on DGX Spark and connect the IDE extension without sending prompts or code to an external service. Benchmarked using the ComputeEval framework for assessing CUDA-related task proficiency.

Third-Party Software Notice This project will download and install additional third-party open source software projects. Please review the license terms of these open source projects before use.

Architecture Diagram

Key Features

Expert CUDA-Aware Chat — Multi-turn conversational AI with OpenAI-compatible streaming that delivers expert-level answers to complex CUDA challenges — from architectural best practices to deep-dive conceptual explanations.
CUDA Code Generation and Autocompletion — Generate complex, optimized CUDA snippets and kernels from natural language descriptions. Real-time inline code completions powered by the nvidia/CUDA-autocomplete model provide low-latency suggestions with minimal time-to-first-token.
Interactive Code Transformation — Directly modify and optimize CUDA code in the editor — refactoring for efficiency, converting PyTorch operations into optimized CUDA kernels, and ensuring compatibility with NVIDIA technologies.
CUDA Knowledge Retrieval (RAG) — Retrieval-augmented generation powered by Bodhi Tree RAG surfaces relevant documentation, code examples, and best practices from an authoritative CUDA knowledge corpus — including CUDA Toolkit documentation, programming guides, and optimization references.
Supported IDE Clients — Visual Studio Code and compatible forks; see Use the Blueprint for the full list and connection points.

Software Components

NVIDIA Models

gpt-oss-120b NIM — LLM for chat and RAG-augmented code generation
nvidia/CUDA-autocomplete — Specialized model for real-time CUDA code completion
llama-nemotron-rerank-1b-v2 NIM — Reranking model for retrieval relevance

Models

BAAI/bge-m3 — Embedding model for CUDA knowledge corpus

Infrastructure

vLLM — High-performance model serving
LiteLLM — Unified model routing proxy
FastAPI — Async web framework with SSE streaming

Minimum System Requirements

Hardware Requirements

NVIDIA DGX Spark
At least 200 GB of free disk space for Docker images, model weights, caches, and vector database data

OS Requirements

Ubuntu 22.04+

Software Requirements

Docker with Compose v2
NVIDIA Container Toolkit

Get Started

The recommended way to get started is to deploy the blueprint with Docker Compose on a DGX Spark. For details, refer to Deploy with Docker Compose.

Use the Blueprint

After deployment, connect a client to the local backend running on the DGX Spark. The primary client is the Nsight Copilot Visual Studio Code extension; VS Code-compatible forks may also work. For the supported clients and step-by-step setup, see Connect from IDE.

The default offline Compose deployment serves prompts and code locally through containers on the DGX Spark. NGC access is used for image and model downloads; do not override local model endpoints to external services unless your deployment policy allows code or prompt data to leave the machine.

Contributing

This project is not currently open to external code contributions. We welcome bug reports, feature requests, and feedback — please file an issue.

Ethical Considerations

NVIDIA believes Trustworthy AI is a shared responsibility, and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure the models meet requirements for the relevant industry and use case and address unforeseen product misuse. For more detailed information on ethical considerations for the models, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI concerns here.

License

This NVIDIA AI Blueprint is licensed under the Apache License, Version 2.0. This project will download and install additional third-party open source software projects and containers. Review the license terms of these open source projects before use.

Use of the models in this blueprint is governed by the NVIDIA AI Foundation Models Community License.

Terms of Use

GOVERNING TERMS: This blueprint uses the following components, which are governed by the terms listed below:

Nsight Copilot

Use of Night Copilot is governed by NVIDIA Technology Access Terms of Use and the CUDA content is governed by the License Agreement for NVIDIA Software Development Kits and CUDA Toolkit Supplement to Software License Agreement for NVIDIA Software Development Kits.

gpt-oss-120b & llama-nemotron-rerank-1b-v2 NIM Containers

Use of gpt-oss-120b & llama-nemotron-rerank-1b-v2 NIM containers is governed by the NVIDIA Software License Agreement and Product-Specific Terms for NVIDIA AI Products.

gpt-oss-120b, bge-m3, llama-nemotron-rerank-1b-v2 & cuda-autocomplete models

Use of gpt-oss-120b, bge-m3, llama-nemotron-rerank-1b-v2 & cuda-autocomplete models is governed by the NVIDIA Open Model License Agreement.

ADDITIONAL INFORMATION

gpt-oss-120b model is licensed under Apache License, Version 2.0.

llama-nemotron-rerank-1b-v2 is licensed under Llama 3.2 Community Model License Agreement. Built with Llama.

CUDA Autocomplete model is based on Qwen2.5-Coder-7B model, which is licensed under Apache License, Version 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
deploy		deploy
docs		docs
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
THIRD-PARTY.txt		THIRD-PARTY.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NVIDIA AI Blueprint: Nsight Copilot

Architecture Diagram

Key Features

Software Components

NVIDIA Models

Models

Infrastructure

Minimum System Requirements

Hardware Requirements

OS Requirements

Software Requirements

Get Started

Use the Blueprint

Contributing

Ethical Considerations

License

Terms of Use

Nsight Copilot

gpt-oss-120b & llama-nemotron-rerank-1b-v2 NIM Containers

gpt-oss-120b, bge-m3, llama-nemotron-rerank-1b-v2 & cuda-autocomplete models

ADDITIONAL INFORMATION

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NVIDIA AI Blueprint: Nsight Copilot

Architecture Diagram

Key Features

Software Components

NVIDIA Models

Models

Infrastructure

Minimum System Requirements

Hardware Requirements

OS Requirements

Software Requirements

Get Started

Use the Blueprint

Contributing

Ethical Considerations

License

Terms of Use

Nsight Copilot

gpt-oss-120b & llama-nemotron-rerank-1b-v2 NIM Containers

gpt-oss-120b, bge-m3, llama-nemotron-rerank-1b-v2 & cuda-autocomplete models

ADDITIONAL INFORMATION

About

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages