Lecture 33: Least Squares, Adjoint Maps
We’re going to use the projection we studied in the previous lecture in studying systems of linear equations that are inconsistent. So consider the system of equations
and suppose that this system is inconsistent. This means that \(b\) can’t be written as \(Ax\) and so \(b \not\in \{Ax \ | \ x \in \mathbf{R}^n\}\). What is \(Ax\)? This is the range of the linear operator \(L_A\), the set of images when \(L_A\) is applied on \(x\) so \(R(L_A)\). But this is also the column space of \(A\) and so
From theorem 2 however, we know that there is a vector in the column space of \(A\) that is closest to \(b\) (with respect to the standard inner product). The closest vector has the form \(Ax_0\) and satisfies
Note here that \(Ax_0\) is unique but \(x_0\) is not. Such an \(x_0\) is called a least squares approximate solution to \(Ax = b\). So how do we find \(x_0\)? We can just compute \(proj_{Col(A)}(b)\) directly by finding an orthonormal basis for \(Col(A)\) using Gram Schmidt but it is an intensive process. It turns out there is a alternative way to find this vector. Formally we have the following theorem
But why is this true? to be able to prove this theorem we need a few other definitions and lemmas first.
Adjoint Linear Maps
Note here that the inner product on the left is the inner product of \(W\) but the one on the right is the inner product of \(V\).
Example
The matrix
Now we are ready to prove the first result that we need to prove the theorem we introduced earlier (theorem 3).
Proof
In \(\mathbf{F}^n\), we’re going to re-write the standard inner product as
With this observation, we are now ready to prove \((L_A)^* = L_{A^*}\). Specifically, we want to show that
Expanding the right hand side:
This is great but we still need another result before we are ready to prove theorem 3.
Proof
The dimension theorem implies that
Applying the dimension theorem on \(A^*A\), we see that
From these two equations, \(n=n\), we want to prove that \(\text{rank}(A^*A) = \text{rank}(A)\). Therefore, it suffices to show that \(N(A) = N(A^*A)\).
To show that \(N(A) = N(A^*A)\), we want to show that \(N(A) \subseteq N(A^*A)\) and \(N(A^*A) \subseteq N(A)\). But it should be clear that \(N(A) \subseteq N(A^*A)\). Why? because \(A^*A = 0\) implies that \(A = 0\). Next, we will show that \(N(A^*A) \subseteq N(A)\). So suppose that \(x \in N(A^*A)\), then this implies that
Theorem 3 Proof
We’re going to set \(\mathbf{F} = \mathbf{R}\). Therefore, \(A^* = A^t\). We are given that rank\((A)=n\). We want to show that
Which is what theorem 3 is asserting. The solution is unique and given by the above formula. One thing we immediately see is that \(\text{rank}(A) = n\) implies \(\text{rank}(A^tA) = n\) by lemma 2. This implies means that \(A^tA\) is invertible.
We also know that \(\text{proj}_{Col(A)}b\) is the unique vector \(w\) from theorem 1. In theorem 1, we asserted that every vector \(x\) (here it is \(b\)) can be decomposed into two components \(w \in W\) and \(z \in W^{\perp}\) such that \(b = w + z\). Here we have \(w = \text{proj}_{Col(A)}b = Ax_0\) but \(b - w = z\) so \(z = b - Ax_0\) and we want \(z\) to be orthogonal to the column space of \(A\) or \((Col(A))\).
References
- Math416 by Ely Kerman