diff --git a/README.txt b/README.txt index b3bb1ba..eaedc22 100644 --- a/README.txt +++ b/README.txt @@ -8,4 +8,22 @@ * `make install` to install dependencies. * `make server` to start the development server - you will need to Ctrl-C and restart if there are CSS changes since they require re-compilation. -* `make build` to output the static HTML to the `_site/` folder. \ No newline at end of file +* `make build` to output the static HTML to the `_site/` folder. + +You may use the image `ghcr.io/cs357/textbook-devel` as the development environment. + +For example, run this to create a container with the Ruby dependencies + +``` +podman run \ + --volume $PWD:/srv/jekyll \ + --workdir /srv/jekyll \ + --rm \ + --interactive \ + --tty \ + --publish 4000:4000 \ + --publish 35729:35729 \ + ghcr.io/cs357/textbook-devel \ + /bin/bash + sh -c 'make install && bundle exec jekyll server --verbose --incremental --host 0.0.0.0 --livereload --watch' +``` diff --git a/notes/condition.md b/notes/condition.md index 945a00c..2b25d1b 100644 --- a/notes/condition.md +++ b/notes/condition.md @@ -5,6 +5,10 @@ sort: 11 author: - CS 357 Course Staff changelog: + - name: Samuel Grayson + netid: grayson5 + date: 2025-10-22 + message: Add response to student question, cond(A) = 1. - name: Arnav Aggarwal netid: arnava4 @@ -194,7 +198,65 @@ $$ \end{align} $$ -Hence, this means that orthogonal matrices have optimal conditioning. +Hence, this means that orthogonal matrices have optimal conditioning. The converse is also true. +
+**Proof** +
+ + +First, I will prove three lemmas: + +1. The transpose doesn't change the matrix norm. Proof: + + $$ + \begin{aligned}[t] + \| \mathbf{A} \| & = \max_{\mathbf{x}} \frac{\| \mathbf{A} \mathbf{x} \|}{\| \mathbf{x} \|} \\ + & = \max_{\mathbf{x}, \mathbf{y}} \frac{| \mathbf{y}^\mathsf{T} \mathbf{A} \mathbf{x} |}{\| \mathbf{y} \| \| \mathbf{x} \|} \\ + & = \max_{\mathbf{x}, \mathbf{y}} \frac{| \left( \mathbf{y}^\mathsf{T} \mathbf{A} \mathbf{x} \right)^\mathsf{T} |}{\| \mathbf{y} \| \| \mathbf{x} \|} \\ + & = \max_{\mathbf{x}, \mathbf{y}} \frac{| \mathbf{x}^\mathsf{T} \mathbf{A}^\mathsf{T} \mathbf{y} |}{\| \mathbf{y} \| \| \mathbf{x} \|} \\ + & = \max_{\mathbf{y}} \frac{\| \mathbf{A}^\mathsf{T} \mathbf{y} \|}{\| \mathbf{y} \|} \\ + & = \|\mathbf{A}^\mathsf{T}\| \\ + \end{aligned} + $$ + +2. The transpose of the inverse is the inverse of the transpose. Proof: $$\mathbf{I} = \mathbf{I}^\mathsf{T} = (\mathbf{A}^{-1} \mathbf{A})^\mathsf{T} = \mathbf{A}^\mathsf{T} {\mathbf{A}^{-1}}^\mathsf{T}$$. By definition, $${\mathbf{A}^{-1}}^\mathsf{T} = {\mathbf{A}^\mathsf{T}}^{-1}$$. + +3. Transpose doesn't change the condition number. Proof: + + $$ + \begin{aligned}[t] + \mathrm{cond}(\mathbf{A}) & = \|\mathbf{A}\| \|\mathbf{A}^{-1}\| \\ + & = \|\mathbf{A}^\mathsf{T}\| \|{\mathbf{A}^{-1}}^\mathsf{T}\| \\ + & = \|\mathbf{A}^\mathsf{T}\| \|{\mathbf{A}^\mathsf{T}}^{-1}\| \\ + & = \mathrm{cond}(\mathbf{A}) \\ + \end{aligned} + $$ + +On to the main result. Given $$\mathrm{cond}(\mathbf{A}) = 1$$, I will show that $$\mathbf{A}$$ is orthogonal in three steps. + +1. Let $$m := \|\mathbf{A}\|$$. Usually, $$m$$ is the _maximum_ scale factor, but in this case, the input is _always_ scaled by $$m$$. Assume for the sake of contradiction that some input, call it $$\mathbf{x}$$ is scaled by _less_ than $$m$$, i.e., $$\| \mathbf{A} \mathbf{x} \| < m \| x \|$$. However, $$\mathbf{A}^{-1}$$ can only scale the input by a factor not greater than $$\|\mathbf{A}^{-1}\| = 1 / \|\mathbf{A}\| = 1/m$$, which is not enough to fully undo the input, given $$\|\mathbf{x}\| = \| \mathbf{A}^{-1} (\mathbf{A} \mathbf{x}) \| < \| \mathbf{A}^{-1} m \mathbf{x} \| \leq \| \mathbf{x} \|$$, which is a contradiction. + +2. We will need to determine the norm of the $$i$$th column of $$\mathbf{A}$$, denoted $$\mathbf{A}_i$$. Let $$\mathbf{e}_i$$ be the $$i$$th basis vector, $$[0, \dotsc, 0, 1, 0, \dotsc, 0].$$ $$\| \mathbf{A} \mathbf{e}_i \| = m \|\mathbf{e}_i\| = m$$, but also $$\mathbf{e}_i$$ is a "getter" for the $$i$$th column, so $$\|\mathbf{A}_i\| = m$$. + +3. To show orthogonality, I will show that every column dotted with every _other_ column is zero (i.e., the columns are orthogonal). Note that $$ \mathbf{A}_{i,j} $$ denotes the entry in the $$i$$th row and $$j$$th column of $$\mathbf{A}$$. Let us consider $$ \|\mathbf{A}^\mathsf{T}\mathbf{A}_i \| $$, which we will see contains $$\mathbf{A}_i$$ dotted with every column of $$y \mathbf{A} $$. + + $$ + \begin{aligned}[t] + \| \mathbf{A}^\mathsf{T} \mathbf{A}_i \| & = \left\| \left[\sum_{k=0}^n \mathbf{A}^\mathsf{T}_{0,k} \mathbf{A}_{k,i},\quad \sum_{k=0}^n \mathbf{A}^\mathsf{T}_{1,k} \mathbf{A}_{k,i}, \quad \dotsb \right] \right\| \\ + & = \left\| \left[\sum_{k=0}^n \mathbf{A}_{k,0} \mathbf{A}_{k,i}, \quad \sum_{k=0}^n \mathbf{A}_{k,1} \mathbf{A}_{k,i}, \quad \dotsb \right] \right\| \\ + & = \| [\mathbf{A}_0 \cdot \mathbf{A}_i, \quad \mathbf{A}_1 \cdot \mathbf{A}_i, \quad \dotsb] \| \\ + & = \sqrt{\sum_{m=0}^n (\mathbf{A}_m \cdot \mathbf{A}_i)^2} \\ + & = \sqrt{(\mathbf{A}_0 \cdot \mathbf{A}_i)^2 + (\mathbf{A}_1 \cdot \mathbf{A}_i)^2 + \dotsb + (\mathbf{A}_i \cdot \mathbf{A}_i)^2 + \dotsb + (\mathbf{A}_{n-1} \cdot \mathbf{A}_{n-1})^2} \\ + & = \sqrt{(\mathbf{A}_0 \cdot \mathbf{A}_i)^2 + (\mathbf{A}_1 \cdot \mathbf{A}_i)^2 + \dotsb + m^4 + \dotsb + (\mathbf{A}_{n-1} \cdot \mathbf{A}_{n-1})^2} \\ + \end{aligned} + $$ + + On the other hand, from step 2 we know $$\| \mathbf{A}^\mathsf{T} \mathbf{A}_i \| = \| \mathbf{A}^\mathsf{T} \| \| \mathbf{A}_i \| = m^2$$, but we just got $$\sqrt{m^4 + \textrm{other non-negative terms}}$$. Therefore all of the other non-negative terms in the radical have to be 0. Therefore $$\mathbf{A}_i \cdot \mathbf{A}_m = 0$$ when $$i \neq m$$. The columns are orthogonal. + + +
+
+ ### Things to Remember About Condition Numbers * For any matrix $${\bf A}$$, $$\text{cond}({\bf A}) \geq 1.$$ @@ -382,4 +444,4 @@ Then, using the rule of thumb, we know the entries in \(\hat{\boldsymbol{x}}\) w
  • When solving a linear system \({\bf A}\mathbf{x} = \mathbf{b}\), does a small residual guarantee an accurate result?
  • Consider solving a linear system \({\bf A}\mathbf{x} = \mathbf{b}\). When does Gaussian elimination with partial pivoting produce a small residual?
  • How does the condition number of a matrix \({\bf A}\) relate to the condition number of \({\bf A}^{-1}\)?
  • - \ No newline at end of file +