TheElementsMath
diff --git a/‎00-chapter.Rmd‎
Lines changed: 90 additions & 0 deletions b/‎00-chapter.Rmd‎
Lines changed: 90 additions & 0 deletions
diff --git a/‎01-IntroductionAndMotivation.Rmd‎
Lines changed: 155 additions & 0 deletions b/‎01-IntroductionAndMotivation.Rmd‎
Lines changed: 155 additions & 0 deletions
@@ -0,0 +1,90 @@
+# Table of Symbols {-}
+
+### Important Symbols and Where to Find Them: {-}
+
+| **Symbol** | **Typical Meaning** | **Reference**|
+|:-------|:--------------------|:-------|
+| *a, b, c, α, β, γ* | Scalars (lowercase) | |
+| **x, y, z** | Vectors (bold lowercase) | Section 1.1, 2.0|
+| **A, B, C** | Matrices (bold uppercase) | |
+| \( x^\top, A^\top \) | Transpose of a vector or matrix | |
+| \( A^{-1} \) | Inverse of a matrix | |
+| \( \langle x, y \rangle \) | Inner product of \(x\) and \(y\) | |
+| \( x^\top y \) | Dot product of \(x\) and \(y\) | Section 2.0 |
+| \( B = (b_1, b_2, b_3) \) | Ordered tuple |
+| \( B = [b_1, b_2, b_3] \) | Matrix of column vectors stacked horizontally |
+| \( B = \{b_1, b_2, b_3\} \) | Set of vectors (unordered) |
+| \( \mathbb{Z}, \mathbb{N} \) | Integers and natural numbers |
+| \( \mathbb{R}, \mathbb{C} \) | Real and complex numbers |
+| \( \mathbb{R}^n \) | \(n\)-dimensional vector space of reals |
+| \( \forall x \) | Universal quantifier (“for all \(x\)”) |
+| \( \exists x \) | Existential quantifier (“there exists \(x\)”) |
+| \( a := b \) | \(a\) is defined as \(b\) |
+| \( a =: b \) | \(b\) is defined as \(a\) |
+| \( a \propto b \) | \(a\) is proportional to \(b\) (\(a = \text{constant} \cdot b\)) |
+| \( g \circ f \) | Function composition (“\(g\) after \(f\)”) |
+| \( \Leftrightarrow \) | If and only if |
+| \( \Rightarrow \) | Implies |
+| \( A, C \) | Sets |
+| \( a \in A \) | \(a\) is an element of \(A\) |
+| \( \emptyset \) | Empty set |
+| \( A \setminus B \) | Elements in \(A\) but not in \(B\) |
+| \( D \) | Number of dimensions (\(d = 1, \dots, D\)) | Chapter 8
+| \( N \) | Number of data points (\(n = 1, \dots, N\)) | Chapter 8
+| \( I_m \) | Identity matrix of size \(m \times m\) |
+| \( 0_{m,n} \) | Matrix of zeros of size \(m \times n\) |
+| \( 1_{m,n} \) | Matrix of ones of size \(m \times n\) |
+| \( e_i \) | Standard (canonical) basis vector (1 in the \(i\)-th position) |
+| `dim` | Dimensionality of a vector space |
+| `rk(A)` | Rank of matrix \(A\) |
+| `Im(Φ)` | Image of a linear mapping \(Φ\) |
+| `ker(Φ)` | Kernel (null space) of \(Φ\) |
+| `span[b₁]` | Span (generating set) of \(b_1\) |
+| `tr(A)` | Trace of \(A\) |
+| `det(A)` | Determinant of \(A\) |
+| \( | \cdot | \) | Absolute value or determinant (depending on context) |
+| \( \| \cdot \| \) | Norm (Euclidean unless stated otherwise) |
+| \( \lambda \) | Eigenvalue or Lagrange multiplier |
+| \( E_\lambda \) | Eigenspace corresponding to eigenvalue \( \lambda \) |
+| \( x \perp y \) | \(x\) and \(y\) are orthogonal |
+| \( V \) | Vector space |
+| \( V^\perp \) | Orthogonal complement of \(V\) |
+| \( \sum_{n=1}^N x_n \) | Sum: \(x_1 + \dots + x_N\) |
+| \( \prod_{n=1}^N x_n \) | Product: \(x_1 \cdot \dots \cdot x_N\) |
+| \( \theta \) | Parameter vector |
+| \( \frac{\partial f}{\partial x} \) | Partial derivative of \(f\) with respect to \(x\) |
+| \( \frac{df}{dx} \) | Total derivative of \(f\) with respect to \(x\) |
+| \( \nabla \) | Gradient |
+| \( f^* = \min_x f(x) \) | Minimum value of \(f\) |
+| \( x^* \in \arg\min_x f(x) \) | Value \(x^*\) that minimizes \(f\) |
+| \( \mathcal{L} \) | Lagrangian |
+| \( \mathcal{L} \) | Negative log-likelihood |
+|$\binom{n}{k}$ | Binomial coefficient, $n$ choose $k$|
+| \( V_X[x] \) | Variance of \(x\) with respect to the random variable \(X\) |
+| \( E_X[x] \) | Expectation of \(x\) with respect to the random variable \(X\) |
+| \( \mathrm{Cov}_{X,Y}[x, y] \) | Covariance between \(x\) and \(y\) |
+| \( X \perp\!\!\!\perp Y \mid Z \) | \(X\) is conditionally independent of \(Y\) given \(Z\) |
+| \( X \sim p \) | Random variable \(X\) is distributed according to \(p\) |
+| \( \mathcal{N}(\mu, \Sigma) \) | Gaussian distribution with mean \(\mu\) and covariance \(\Sigma\) |
+| \( \mathrm{Ber}(\mu) \) | Bernoulli distribution with parameter \(\mu\) |
+| \( \mathrm{Bin}(N, \mu) \) | Binomial distribution with parameters \(N, \mu\) |
+| \( \mathrm{Beta}(\alpha, \beta) \) | Beta distribution with parameters \(\alpha, \beta\) |
+
+---
+
+### Table of Abbreviations and Acronyms {-}
+
+| Acronym | Meaning |
+|:---------|:---------|
+| e.g. | *Exempli gratia* (Latin: “for example”) |
+| GMM | Gaussian mixture model |
+| i.e. | *Id est* (Latin: “this means”) |
+| i.i.d. | Independent, identically distributed |
+| MAP | Maximum a posteriori |
+| MLE | Maximum likelihood estimation/estimator |
+| ONB | Orthonormal basis |
+| PCA | Principal component analysis |
+| PPCA | Probabilistic principal component analysis |
+| REF | Row-echelon form |
+| SPD | Symmetric, positive definite |
+| SVM | Support vector machine |
@@ -0,0 +1,155 @@
+# Introduction and Motivation
+
+Machine learning focuses on designing algorithms that automatically extract meaningful information from data.  A concise and widely accepted definition of machine learning comes from Tom Mitchell (1997):  
+
+> “A computer program is said to learn from experience E with respect to 
+> some class of tasks $T$ and performance measure $P$, if its performance 
+> at tasks in $T$, as measured by $P$, improves with experience $E$.”
+
+
+## Finding Words for Intuitions
+
+
+<div class="definition">
+**Machine learning**  is the study and development of algorithms that improve automatically through experience and data, without being explicitly programmed for each task.
+</div>
+
+Machine learning is a field that combines **data**, **models**, and **learning methods** to identify patterns and make predictions or decisions — ideally generalizing well to new, unseen situations.  Data is the foundation — machine learning aims to discover useful patterns from data without relying heavily on domain expertise.
+
+<div class="definition">**Data** are pieces of information collected to describe, measure, or analyze phenomena. 
+</div>
+
+In practice, data is represented numerically, often as **vectors**, $\mathbf{x} = \begin{bmatrix}x_1\\ x_2\\ \vdots\\ x_N \end{bmatrix}$.  Models describe how data is generated or how inputs map to outputs.
+
+<div class="definition">A **model** is a learned representation that maps inputs to outputs based on patterns found in data.
+</div>
+
+A model *learns* when its performance improves after processing data.  Good models generalize to new, unseen data.
+
+<div class="definition">**Learning** is the process of using data to automatically improve a model’s ability to perform a task.
+</div>  
+  
+The goal is not just to fit the training data, but to perform well on new examples.
+
+
+
+Formally, you can think of an algorithm as a mapping from inputs to outputs, where each step is precise, unambiguous, and executable by a computer.
+
+<div class="definition">
+An **Algorithm:** is a finite sequence of well-defined instructions or steps designed to solve a specific problem or perform a computation.
+</div>
+
+
+In the context of machine learning, an algorithm provides a systematic procedure for processing data — either to make predictions (as in a predictive algorithm) or to adjust model parameters (as in a training algorithm).  In this way, machine learning involves two overlapping meanings of “algorithm”:  
+
+1. A **predictor** that makes predictions based on data.  
+2. A **training procedure** that updates the predictor’s parameters to improve future performance.
+
+Understanding the **mathematical foundations** behind data, models, and learning helps us build, interpret, and improve machine learning systems — and recognize their assumptions and limitations.
+
+
+
+## Two Ways to Read This Book
+
+
+There are two main strategies for learning the mathematics that underpins machine learning.
+The **bottom-up approach** builds understanding from fundamental mathematical concepts toward more advanced ideas. This method provides a solid conceptual foundation, ensuring that each new idea rests on well-understood principles. However, for many learners, this approach can feel slow or disconnected from practical motivation, since the relevance of abstract concepts may not be immediately clear.
+
+In contrast, the **top-down approach** begins with real-world problems and drills down to the mathematics required to solve them. This goal-driven strategy keeps motivation high and helps learners understand why each concept matters. The drawback, however, is that the underlying mathematical ideas can remain fragile—readers may learn to use tools effectively without fully grasping their theoretical basis.
+
+Mathematics for Machine Learning is designed to support both approaches  — foundational (Part I) and applied (Part II) — so readers can move between mathematics and machine learning freely.  
+
+This book is designed to assist readers in their understanding of the textbook Mathematics for Machine Learning.  It is more of a foundational approach designed to fill in any gaps a reader might have.  In particular, we aim to provide more examples in a less theoretical way.  Whether you are reading from a top down or bottom up approach, this book will support your learning.
+
+---
+
+### Part I: Mathematical Foundations
+
+Part I develops the mathematical tools that support all major ML methods — the four pillars of machine learning:
+
+1. **Regression**  
+2. **Dimensionality Reduction**  
+3. **Density Estimation**  
+4. **Classification**
+
+
+<p align="center">
+<img src="Figure1.1MML.png" alt="The Foundations and Four Pillars of Machine Learning" width="400">
+</p>
+
+It covers:
+
+- **Linear Algebra (Ch. 2):** Vectors, matrices, and their relationships.  
+- **Analytic Geometry (Ch. 3):** Similarity and distance between vectors.  
+- **Matrix Decomposition (Ch. 4):** Interpreting and simplifying data.  
+- **Vector Calculus (Ch. 5):** Gradients and differentiation.  
+- **Probability Theory (Ch. 6):** Quantifying uncertainty and noise.  
+- **Optimization (Ch. 7):** Finding parameters that maximize performance.
+
+---
+
+### Part II: Machine Learning Applications
+
+Part II applies the math from Part I to the four pillars:
+
+- **Ch. 8 — Foundations of ML:** Data, models, and learning; designing robust experiments.  
+- **Ch. 9 — Regression:** Predicting continuous outcomes using linear and Bayesian approaches.  
+- **Ch. 10 — Dimensionality Reduction:** Compressing high-dimensional data (e.g., PCA).  
+- **Ch. 11 — Density Estimation:** Modeling data distributions (e.g., Gaussian mixtures).  
+- **Ch. 12 — Classification:** Assigning discrete labels (e.g., support vector machines).
+
+---
+
+### Learning Path
+
+Readers are encouraged to mix **bottom-up** and **top-down** learning:
+
+- Build foundational skills when needed.  
+- Explore applications that connect math to real machine learning systems.  
+
+This modular structure makes the book suitable for both **mathematical learners** and **practitioners** aiming to deepen their theoretical understanding.
+
+
+
+
+
+## Exercises and Feedback
+
+While Mathematics for Machine learning provides some examples and exercises, this book is built to support those who want to practice particular skills or build their knowledge in a particular area.  We have added a number of exercises, examples, and videos to hopefully aid your understanding of the material.
+
+
+
+
+### Exercises {.unnumbered .unlisted}
+
+
+
+
+<div class="exercise">
+Discuss the ideas of data, models and learning.  How are they related?
+
+<div style="text-align: right;">
+[Solution](https://youtu.be/YSlIPuSSvaI)
+</div> 
+</div>
+
+
+<div class="exercise">
+In machine learning, how are data typically represented?
+
+<div style="text-align: right;">
+[Solution](https://youtu.be/-39cITqPMdk)
+</div> 
+</div>
+
+
+
+
+<div class="exercise">
+What is meant by learning in the context of a model?
+
+<div style="text-align: right;">
+[Solution](https://youtu.be/4HR3Kkb2Ieg)
+</div> 
+</div>
+