Skip to content

shen-shanshan/cs-self-learning

Repository files navigation

Computer Science Self-Learning Notes

📌 Overview

This repository archives my notes and materials during my computer science self-learning jouney. Currently, I mainly focus on LLM/VLM inference engine and GPU/NPU computing, thus I have gathered many technical blogs for AI infra beginners and MLSys papers for researchers.

🔍 Contents:

Welcome to star this repository! 😊

📚 Learning Notes

🧱 Basic Knowledges

🤖 AI

🚀 Backend & Big Data

🛠️ Tools

🔗 Others

📚 Technical Blogs

📖 Basic Knowledges

Title Category Author Note Rec Read
The Illustrated Transformer Transformer @Jay Alammar Transformer 原理详解 ⭐️⭐️⭐️⭐️⭐️
The Illustrated GPT-2 (Visualizing Transformer Language Models) Transformer @Jay Alammar Transformer 推理过程 ⭐️⭐️⭐️⭐️⭐️
图文详解 LLM inference:KV Cache KV Cache @季叶 ⭐️⭐️⭐️
Mixture of Experts Explained MoE @HuggingFace Blog MoE 综述 ⭐️⭐️⭐️⭐️
MoE 并行负载均衡:EPLB 的深度解析与可视化 MoE @kaiyuan ⭐️⭐️⭐️
LLM 推理并行优化的必备知识 Parallel Strategy @kaiyuan
分布式推理优化思路 Parallel Strategy @kaiyuan
The Ultra-Scale Playbook: Training LLMs on GPU Clusters Parallel Strategy @HuggingFace Blog
图解大模型计算加速系列:分离式推理架构 1,从 DistServe 谈起/u> PD Disaggregation @猛猿 PD 分离原理详解 ⭐️⭐️⭐️⭐️
图解大模型计算加速系列:分离式推理架构 2,模糊分离与合并边界的 chunked-prefills Schedule @猛猿 ⭐️⭐️⭐️⭐️
LLM 推理提速:Attention 与 FFN 分离方案解析 AF Disaggregation @kaiyuan AF 分离原理详解 ⭐️⭐️⭐️
Step-3 AF 分离推理系统 vs Deepseek EP 推理系统,谁更好? AF Disaggregation @不归牛顿管的熊猫 AF 分离与大 EP 优劣对比 ⭐️⭐️
Step-3 推理系统:从 PD 分离到 AF 分离(AFD) AF Disaggregation @Yibo Zhu Step3 作者杂谈 ⭐️⭐️
GPU 内存(显存)的理解与基本使用 Hardware @kaiyuan ⭐️⭐️⭐️⭐️

📖 Dive into vLLM

Title Category Author Note Rec Read
Inside vLLM: Anatomy of a High-Throughput LLM Inference System Overview @vLLM Blog vLLM 全面详解 ⭐️⭐️⭐️⭐️⭐️
vLLM V1 整体流程|从请求到算子执行 Architecture @SSS不知-道 vLLM 推理流程 ⭐️⭐️⭐️⭐️⭐️
图解 vLLM V1 系列 1:整体流程 Architecture @猛猿 ⭐️⭐️⭐️
图解 vLLM V1 系列 2:Executor-Workers 架构 Architecture @猛猿 ⭐️⭐️⭐️
图解 vLLM V1 系列 3:KV Cache 初始化 KV Cache @猛猿 ⭐️⭐️⭐️
图解 vLLM V1 系列 4:加载模型权重 Model @猛猿 ⭐️⭐️
vLLM 模型权重加载:使用 setattr Model @风之魔术师 ⭐️⭐️
ColumnParallelLinear 和 RowParallelLinear Model @风之魔术师 ⭐️⭐️
图解 vLLM V1 系列 5:调度器策略 Scheduler @猛猿 ⭐️⭐️⭐️⭐️
Introducing vLLM Hardware Plugin, Best Practice from Ascend NPU Platform @The Ascend Team on vLLM vLLM 硬件插件化机制 ⭐️⭐️⭐️
vLLM 算力多样性|Platform 插件与 CustomOp Platform @SSS不知-道 ⭐️⭐️⭐️⭐️
vLLM 算子开发流程:“保姆级”详细记录 Kernel @DefTruth ⭐️⭐️⭐️⭐️⭐️
Introduction to torch.compile and How It Works with vLLM Graph @vLLM Blog ⭐️⭐️
vLLM 为什么没在 Prefill 阶段支持 Cuda Graph? Graph @kaiyuan ⭐️⭐️⭐️
Piecewise CUDA Graph:面向变长 Prefill 的分段图捕获与自动算子融合 Graph @注意力机制不集中 ⭐️⭐️⭐️
vLLM torch.compile Integration Graph @Jiangyun Zhu 自定义 Pass 方法 ⭐️⭐️⭐️
vLLM 显存管理详解 Memory @kaiyuan ⭐️⭐️⭐️⭐️
Shared Memory IPC Caching: Accelerating Data Transfer in LLM Inference Systems IPC @vLLM Blog ⭐️⭐️⭐️
vLLM 结构化输出|Guided Decoding (V0) Guided Decoding @SSS不知-道 ⭐️⭐️⭐️
vLLM 结构化输出|Guided Decoding (V1) Guided Decoding @SSS不知-道 ⭐️⭐️⭐️
vLLM DP 特性与演进方案分析 Parallel Strategy @kaiyuan ⭐️⭐️⭐️⭐️
LLM 推理数据并行负载均衡(DPLB)浅析 Parallel Strategy @kaiyuan ⭐️⭐️⭐️
vLLM PD 分离方案浅析 PD Disaggregation @kaiyuan ⭐️⭐️⭐️
vLLM PD 分离 KV Cache 传递机制详解与演进分析 PD Disaggregation @kaiyuan ⭐️⭐️⭐️
vLLM 多模态推理|ViT 性能优化 Multi-Modal @SSS不知-道 ⭐️⭐️⭐️⭐️
vLLM 多模态推理|卷积计算加速 Multi-Modal @SSS不知-道 ⭐️⭐️
vLLM 多模态 Cache 缓存机制分析 Multi-Modal @黑白 ⭐️⭐️

📖 Dive into PyTorch

Title Category Author Note Rec Read
PyTorch 显存管理介绍与源码解析(一) Memory @kaiyuan ⭐️⭐️⭐️⭐️
PyTorch 显存可视化与 Snapshot 数据分析 Memory @kaiyuan ⭐️⭐️⭐️⭐️
A guide on good usage of non_blocking and pin_memory() in PyTorch Data Transfer @Vincent Moens ⭐️⭐️⭐️⭐️⭐️

📖 CUDA Programming

Title Category Author Note Rec Read
CUDA 内核优化策略 Performance @Zhang ⭐️⭐️⭐️
从啥也不会到 CUDA GEMM 优化 Performance @猛猿 ⭐️⭐️⭐️⭐️

📖 Communication

Title Category Author Note Rec Read
NCCL: Collective Operations Collective Communication @NVIDIA Developer 集合通信常用操作 ⭐️⭐️⭐️⭐️
一文读懂|RDMA 原理 Network @Linux内核库 ⭐️⭐️⭐️⭐️
PyTorch 中基于 CUDA IPC 的进程间 Tensor 共享简介 IPC @kaiyuan ⭐️⭐️⭐️

📖 Multi-Modality

Title Category Author Note Rec Read
多模态技术梳理:ViT 系列 ViT @姜富春 ViT 研究综述 ⭐️⭐️⭐️
LLaVA 系列模型结构详解 ViT @Zhang ⭐️⭐️⭐️
文生图模型之 Stable Diffusion Diffusion @小小将 ⭐️⭐️⭐️⭐️
DiT 推理加速综述: Caching Diffusion @DefTruth ⭐️⭐️⭐️⭐️

📖 Dive into Qwen

Title Category Author Note Rec Read
多模态技术梳理:Qwen-VL 系列 VL @姜富春 ⭐️⭐️⭐️⭐️
Qwen2-VL 源码解读:从准备一条样本到模型生成全流程图解 VL @姜富春 ⭐️⭐️⭐️⭐️⭐️
万字长文图解 Qwen2.5-VL 实现细节 VL @猛猿 ⭐️⭐️⭐️⭐️⭐️
Qwen3-VL 解剖 VL @Plunck ⭐️⭐️⭐️
Qwen3-Omni:统一多模态大模型再升级 Omni @tomsheep
Gated Attention:Qwen3-Next 背后的门控机制 Attention @Plunck
Qwen3.5 解剖 @Plunck

📖 Dive into DeepSeek

Title Category Author Note Rec Read
DeepSeek 技术解读(1)- 彻底理解 MLA(Multi-Head Latent Attention) Attention @姜富春 ⭐️⭐️⭐️⭐️
DeepSeek 技术解读(2)- MTP(Multi-Token Prediction)的前世今生 Decoding @姜富春 ⭐️⭐️⭐️⭐️
DeepSeek 技术解读(3)- MoE 的演进之路 MoE @姜富春 ⭐️⭐️⭐️⭐️

📖 Development

Title Category Author Note Rec Read
LLM Inference 高效 Debug 方法汇总 Debug @CarryPls ⭐️⭐️
推理性能优化:GPU/NPU Profiling 阅读引导 Profiling @kaiyuan ⭐️⭐️⭐️⭐️

📚 Papers

Refer to How to Read a Paper to master a practical and efficient three-pass method for reading papers.

📚 Learning Projects

Project Category Author/Organization About
llm-action MLSys @liguodongiot 本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)。
awesomeMLSys MLSys @GPU MODE An ML Systems Onboarding list.
InfraTech MLSys @CalvinXKY 分享AI Infra知识&代码练习:PyTorch/vLLM/SGLang框架入门⚡️、性能加速🚀、大模型基础🧠、AI软硬件🔧等。
AI-Infra-from-Zero-to-Hero MLSys @HuaizhengZhang 🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSys, etc. 🗃️ Llama3, Mistral, etc. 🧑‍💻 Video Tutorials.
resource-stream CUDA @GPU MODE GPU programming related news and material links.
BasicCUDA CUDA @CalvinXKY A tutorial for CUDA & PyTorch.

©️ Citation

@misc{cs-self-learning@2023,
  title  = {cs-self-learning},
  url    = {https://github.com/shen-shanshan/cs-self-learning},
  note   = {Open-source software available at https://github.com/shen-shanshan/cs-self-learning},
  author = {shen-shanshan},
  year   = {2023}
}

📜 License

MIT License, find more details here.

⭐ Star History

Star History Chart

About

This repo is used for archiving my notes, codes and materials of cs learning.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors