A fully-custom, end-to-end neural network implementation in Modern C++ (C++20) following Google C++ Standards. This project features a handwritten matrix engine, multithreaded operations, backpropagation from scratch, and a full training pipeline capable of learning the MNIST handwritten digit dataset.
- Overview
- Key Features
- Tech Stack & Engineering
- Performance Optimizations
- Getting Started
- Dataset
- Lessons Learned
- License
This project is a complete neural network engine built entirely from scratch without any machine learning frameworks (TensorFlow, PyTorch) or external linear algebra libraries (Eigen, BLAS).
Everything, from memory management and matrix multiplication to Softmax and Backpropagation, is manually implemented. The goal was to bridge the gap between theoretical deep learning concepts and high-performance C++ systems engineering.
- Efficient Storage: Row-major 1D contiguous memory allocation for cache locality.
- Arithmetic: Full support for scalar, vector, and matrix operations including Broadcasting.
- Concurrency: Multithreaded matrix multiplication using
std::jthread. - Manipulation: Optimized row shuffling, slicing, and transposition.
- Layers: Fully connected linear (Dense) layers.
- Activation: ReLU (Rectified Linear Unit) for hidden layers.
- Output: Softmax activation for probability distribution.
- Loss Function: Categorical Cross-Entropy with full gradient implementation.
- Mini-batch Gradient Descent: Implemented with epoch-based shuffling.
- Metrics: Real-time logging of Training vs. Testing accuracy/loss.
- Inference: Fast forward-pass evaluation for testing.
- Persistent Model Weights: Efficient model saving & loading without re-training.
This project adheres to Google C++ Style principles and Modern C++ practices:
- C++20: Utilizes modern features (concepts, auto, etc.).
- Abseil (absl): Used for robust error handling (Check()) and logging, strictly avoiding C++ exceptions.
- Google Test (GTest): Comprehensive unit testing for the matrix engine and network components.
- CMake & vcpkg: Professional build system handling dependencies and cross-platform compilation.
- Github Actions: Automated cross platform builds & testing.
-
Multithreaded Matrix Multiplication: The engine analyzes matrix dimensions and dynamically dispatches threads to parallelize dot products, significantly reducing training time on large matrices.
-
Cache-Friendly Memory Layout: Data is stored in flattened 1D arrays. During multiplication, the right-hand matrix is transposed to access memory sequentially, minimizing cache misses.
-
Batching: Training is performed in batches rather than single-item updates.
- CMake (3.20+)
- C++ Compiler supporting C++20 (GCC, Clang, or MSVC)
- vcpkg for dependency management (Abseil, GTest)
git clone https://github.com/makifcevik/mnist_digit_recognition.git
cd mnist_digit_recognition
# Configure the project (replace with your actual vcpkg path)
cmake -S . -B build -DCMAKE_TOOLCHAIN_FILE="C:/PATH_TO_vcpkg/scripts/buildsystems/vcpkg.cmake"
# Build the project
cmake --build build --config ReleaseBefore running the executable, download the MNIST dataset using the provided zero-dependency script:
python ./scripts/download_mnist.pyThis automatically downloads and extracts the binary files into a local data/ directory
To run the application (runs the inference mode with a pretrained model by default):
cd src/Release
./mnist_digit_recognition.exeThe executable is located under build/src/Release (or build/src/Debug) because Visual Studio builds create subfolders. If you are on a different system, it should be directly under the build folder.
The project uses MNIST, consisting of 60,000 training images and 10,000 test images.
- Input: 28x28 grayscale images (flattened to 784 features).
- Output: 10 classes (Digits 0-9).
The engine automatically handles both standard naming conventions (hyphens or dots) for the binary .idx1 and .idx3 files.
- Math: Implementing a matrix class and a neural network from scratch clarified the math behind neural networks.
- Machine Learning: Building a neural network without external libraries deepened my understanding of Machine Learning.
- Concurrency Cost: I learned that threading isn't "free." Managing thread overhead vs. workload size was crucial for actual speedups.
- Memory Matters: Cache locality optimizations (transposition before multiplication) resulted in an 8x performance gain, highlighting the importance of hardware-aware programming.
- Tooling: Integrating GTest and Abseil taught me how to structure a project for maintainability, not just functionality.
Distributed under the MIT License.