Skip to content

ezautorres/Advanced-Deep-Learning-CIMAT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Advanced Deep Learning – CIMAT (Fall 2024)

Python Jupyter PyTorch TensorFlow PINNs Neural Networks Transformers

Author: Ezau Faridh Torres Torres
Advisor: Dr. Mariano Rivera Meraz
Course: Advanced Deep Learning
Institution: CIMAT – Centro de Investigación en Matemáticas
Term: Fall 2024

Comprehensive exploration of modern deep learning architectures—including transformers, recurrent networks, diffusion models, and domain-specific adaptations like LoRA and state-aware mechanisms—applied to time series forecasting and physical system modeling. Each assignment targets a specific modeling challenge and showcases techniques such as attention, transfer learning, and diffusion processes across domains ranging from financial forecasting to PDE-based simulations.

📄 Table of Contents


Repository Structure

Each assignment comprises the following elements:

  • Jupyter Notebooks implementing the model and training pipeline.
  • Supporting scripts or utility functions if needed.

Technical Stack

Developed in Python 3.11 using:

  • Deep learning: TensorFlow, PyTorch
  • Time series & sequence models: LSTM, Transformer, Seq2Seq
  • Visualization: matplotlib, seaborn
  • Utilities: numpy, pandas, scikit-learn, yfinance

Some assignments may use additional specialized libraries such as keras, scipy, or torch.nn.functional.


Overview of Assignments

The following section presents a concise overview of each task, highlighting its primary objective:

Assignment 1 – Extreme Learning Machines

Implementation and comparison of a multilayer perceptron (MLP), a standard extreme learning machine (ELM), and a binary-weight ELM for emotion classification using VQ-VAE encoded inputs. The study evaluates the impact of different regularization strategies (none, Ridge, Lasso, ElasticNet) on the ELM’s output layer, using 12×12 integer matrices as input representations of facial expressions.

 Assignment 1

Assignment 2 – Seq2Seq Prediction for Cryptocurrency Time Series

Implementation of a sequence-to-sequence (Seq2Seq) model with attention and teacher forcing to predict future values in multivariate time series of cryptocurrency prices. Using data from 7 cryptocurrencies (including Bitcoin) over 100 hourly intervals, the model forecasts the final segment of each series. Historical data is fetched via yahoo-finance and normalized using MinMax scaling. The model is trained for 300 epochs with LSTM layers of 1000 hidden units.

 Assignment 2

Assignment 3 – Transformer Encoder for Time Series Forecasting

Implementation of a Transformer encoder model in TensorFlow to predict future cryptocurrency prices from past sequences. The model is built from scratch using MultiHeadAttention layers, with three encoding blocks and a latent dimension of 64. Two versions are explored: one using logarithmic normalization, and another using MinMax scaling. The model’s performance is compared against a naive baseline (no change in price) and the Seq2Seq architecture from Assignment 2.

 Assignment 3

Assignment 4 – Transfer Learning and LoRA for Currency Forecasting

Extension of the Transformer model from Assignment 3 to a new dataset including exchange rates for multiple currencies and daily oil prices. Three strategies are compared: (1) training a new model from scratch, (2) full fine-tuning of the pretrained transformer, and (3) fine-tuning using Low-Rank Adaptation (LoRA) on affine layers. Multiple LoRA ranks are tested to evaluate efficiency vs. performance trade-offs.

 Assignment 4

Assignment 5 – Diffusion Model with Transformer Sampler

Adaptation of a DDIM-based diffusion model by replacing the original U-Net architecture with a custom Transformer for the reverse sampling process. The architecture uses 12 attention heads across 8 layers, maintaining all other parameters from the original implementation (e.g., number of diffusion steps, embedding size, learning rate). The model is trained for 50 epochs due to high computational cost.

 Assignment 5a
 Assignment 5b

Final Project – State-Exchange Attention (SEA) for Physics Transformers

Investigation and replication of the SEA architecture proposed by Esmati et al. (2024), which integrates a novel State-Exchange Attention mechanism into transformer-based models for simulating PDE-governed physical systems. The SEA module enables dynamic cross-field communication between state variables such as velocity, pressure, and volume fraction, effectively reducing autoregressive rollout error. The full ViT-SEA framework achieves up to 91% error reduction compared to state-of-the-art baselines, demonstrating its capacity to capture complex spatiotemporal dynamics in computational fluid dynamics scenarios.
For further details, see the original publication by Esmati et al. (2024).


Learning Outcomes

  • Built custom deep learning models for time series forecasting and generative modeling.
  • Gained hands-on experience with transformer architectures, LSTMs, and ELMs.
  • Explored fine-tuning strategies including full transfer learning and LoRA.
  • Adapted diffusion models and autoregressive frameworks to novel architectures.
  • Analyzed and visualized model performance across financial and physical domains.

References

  • Esmati, S., Gholami, A., & Mahoney, M. W. (2024). State Exchange Attention for Physics Transformers. arXiv:2403.04603.
    https://arxiv.org/abs/2403.04603

  • Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, L., & Chen, W. (2021). LoRA: Low-Rank Adaptation of Large Language Models. arXiv:2106.09685.
    https://arxiv.org/abs/2106.09685


📫 Contact

About

Course repo (CIMAT, Fall 2024) on advanced deep learning, covering ELMs, LSTMs, Transformers, LoRA, and diffusion models applied to time series, finance, and physics simulations.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors