This guide provides a structured learning path to understand S4 (Structured State Space Sequence Model) and Mamba from fundamentals to implementation. Follow these steps sequentially for the best learning experience.
Goal: Understand the basics and motivation behind state space models
- 📖 Start with A Visual Guide to Mamba and State Space Models
- Focus on Part 1: The Problem with Transformers
- This explains why we need alternatives to Transformers and introduces S4
Goal: Master the fundamental concepts of State Space Models
-
🎥 Watch State Space Models (SSMs) and Mamba by Serrano.Academy
- Provides excellent visualizations and examples of SSMs
-
📘 Continue with Part 2: The State Space Model
- Builds on the theoretical foundation
-
💻 Study The Annotated S4
Note: Since the original repo is outdated, use the s4_in_depth.ipynb notebook in this repository
对于中文读者,我已经将原作者的The Annotated S4翻译成了中文,放在了cn 文件夹中,方便中文读者阅读。
Goal: Understand how Mamba builds upon and improves SSMs
- 📖 Read Part 3: Visual Guide to Mamba
- Explains how Mamba extends and improves upon S4
Goal: Get practical experience with Mamba
- 💻 Study The Annotated Mamba
- Detailed implementation walkthrough
- Includes code examples and explanations
Goal: Explore practical applications and stay updated
-
📚 Explore Awesome-Mamba-Collection
- Comprehensive overview of Mamba applications
- Covers NLP, time series analysis, and more
-
🔍 Additional Resources:
💡 Tip: Follow this path sequentially for the best learning experience. Each section builds upon the knowledge from previous sections.
This repository contains a PyTorch implementation of the S4 (Structured State Space Sequence Model) for sequence modeling and classification tasks. The implementation focuses on the MNIST dataset with two main tasks:
- MNIST Sequence Modeling: Predict next pixel value given history (784 pixels x 256 values)
- MNIST Classification: Predict digit class using sequence model (784 pixels => 10 classes)
- SSM Kernel: Basic implementation of State Space Model kernel
- S4 Kernel: Advanced implementation with HiPPO-based initialization
- Sequence Processing: Layer normalization, SSM/S4 layer, and MLP components
- Model Architecture: Stacked sequence model with configurable parameters
# Run S4 model for both sequence modeling / classification case
python train_s4.pyNotes
- Model uses reduced dimensions for demonstration
- Supports both CNN and RNN modes
For detailed implementation, see train_s4.py.