This repository documents my implementations for the assignments of the CMU course 11-868: Large Language Model Systems.
- Course Homepage: https://llmsystem.github.io/llmsystem2025spring/
- Homework Tutorial: https://llmsystem.github.io/llmsystemhomework/
- My blog notes https://blog.mindorigin.top/AI/llmsys
Here are some important notes and fixes I encountered while completing the assignments. These may be helpful for debugging or avoiding common pitfalls.
-
HW1 (
autodiff): Crucial Gradient InitializationIn the
autodiffpart of HW1, it is essential to initialize all gradients to zero. Failing to do so can cause all backward tests in HW3 to fail unexpectedly. -
HW2: Correcting CUDA Grid Dimensions
In HW2, a small but important modification to the grid dimensions in the CUDA kernel is recommended for correctness. Specifically, swap the order of
mandpwhen calculatinggridDims.Original:
dim3 gridDims((m + threadsPerBlock - 1) / threadsPerBlock, (p + threadsPerBlock - 1) / threadsPerBlock, batch);
Modified:
dim3 gridDims((p + threadsPerBlock - 1) / threadsPerBlock, (m + threadsPerBlock - 1) / threadsPerBlock, batch);
-
HW4 (
cuda_kernel_ops): Replacingpycuda.autoinitThe
import pycuda.autoinitin HW4 can be intrusive. It's better to replace it with a more compatible PyTorch-based initialization method.- Remove the line:
import pycuda.autoinit
- Add the following snippet to ensure proper CUDA initialization via PyTorch:
import torch if torch.cuda.is_available(): # This line gently ensures PyTorch handles CUDA initialization, # which is less intrusive than pycuda.autoinit. _ = torch.tensor([1.0]).cuda()
- Remember to remove any redundant
import torchstatements later in the file.
- Remove the line:
-
HW6: Python Version for Conda Environment
For HW6, the required environment uses Python 3.10, not 3.9. When creating the Conda environment, be sure to specify the correct version:
conda create --name your_env_name python=3.10