This repository is a collection of accelerated platform best practices, reference architectures, example use cases, reference implementations, and various other assets on Google Cloud.
An accelerated platform utilizes specialized hardware components, or accelerators, such as GPUs (Graphics Processing Units) and TPUs (Tensor Processing Units), to significantly increase the speed of computationally intensive tasks. These tasks may include data analysis, machine learning, artificial intelligence, and graphics rendering. By offloading demanding workloads from traditional CPUs to dedicated hardware accelerators, which are capable of much faster parallel calculations, the platform optimizes high-performance computing.
Note
The Cloud Workstations (CWS) Platform is currently in beta and is still being actively developed.
The Cloud Workstations (CWS) Platform is a core, best practices, fully managed workstation environments built to meet the needs of security-sensitive enterprises. It enhances the security of workstation environments while accelerating onboarding and productivity.
The GKE Base Platform is an implementation of a foundational platform built on GKE that incorporates best practices and provides a core environment optimized for running accelerated workloads. It offers a streamlined and efficient solution to leverage the benefits of GKE as the primary runtime.
-
Inference reference architecture
- Inference reference implementation
- Online inference with GPUs
- Online inference with TPUs
- Benchmarking Online inference performance on Google Kubernetes Engine (GKE)
- Batch inference with GPUs
- Offline batch inference with GPUs
- Intelligent inference scheduling quickstart using llm-d
- Intelligent inference scheduling using llm-d with GPU on Google Kubernetes Engine (GKE)
- Inference reference implementation
- LLM Inference Optimization: Achieving faster Pod Startup with Google Cloud Storage
- Optimizing GKE Workloads with Custom Compute Classes
The Playground AI/ML Platform on GKE is a quick-start implementation of the platform that can be used to familiarize yourself with the GKE architecture and to get an understanding of various concepts covered in the use cases.
- Scalable and Distributed LLM Inference on GKE with vLLM
- Retrieval Augmented Generation (RAG) pipeline
For more information about contributing to this repository, see CONTRIBUTING.