Lecture 26/27: Markov Chains and Transition Matrices
We have a set of ingredients:
- Set of states \(S_1,...,S_n\). If we have 10m people in Chicagoland, then we can divide them into \(S_1\) living in the city and \(S_2\) living in a suburb
- Time Step. Example: we are interested in how the number of people living in these two states changes over time. So a time step could be 1 year
- Probability vector at time \(k\)
$$ \begin{align*} \bar{p_k} = \begin{pmatrix} (\bar{p_k})_1 \\ \vdots \\ (\bar{p_k})_n \end{pmatrix} \end{align*} $$where \(\bar{p_k}_1\) is the probability of being in state 1 at time \(k\). In general \(\bar{p_k}_j \geq 0\) and \(\sum_{j=1}^n \bar{p_k}_j = 1\). For example$$ \begin{align*} (\bar{p_k})_0 = \begin{pmatrix} .7 \\ .3 \end{pmatrix} \end{align*} $$
- Transition matrix \(A \in M_{n \times n}\)
$$ \begin{align*} A = \begin{pmatrix} A_{11} & \cdots & A_{1n} \\ \vdots & A_{ij} & \vdots \\ A_{n1} & \cdots & A_{nn} \end{pmatrix} \end{align*} $$where \(A_{ij}\) is the probability of moving from state \(j\) to state \(i\) in one time step. This transition matrix doesn't change over time. For example$$ \begin{align*} A = \begin{pmatrix} .9 & .02 \\ .1 & .98 \end{pmatrix} \end{align*} $$Here \(A_{ij} \geq 0\) and \(\sum_{i=1}^nA_{ij} = 1\). So \(A_{12} = 0.02\) is the probability of moving from state 1 (city) to the state 2 (suburb).
The Limit of a Markov Matrix
So \(\bar{p_0}\) is the probability at time step 0. What is \(\bar{p_1}, \bar{p_2},...,\bar{p_n}\)?
What is the limit of this matrix?? This often exists and is easy to describe.
Exercise: \(A^d\) is also a transition matrix!
What we want to know is what this limit is so we’re going to spend some time studying the properties of this transition matrix.
- 1 is an eigenvalue of \(A\). \(\dim(E_1) = 1\) and \(E_1 = span\{\bar{u}\}\) where \(\bar{u}\) is a probability vector
- For any other eigenvalue \(\lambda, |\lambda| < 1\).
-
\(\lim\limits_{k \rightarrow \infty} A^k =
\begin{pmatrix}
\bar{u} & \bar{u} & \cdots & \bar{u} \\
\end{pmatrix}
\)
As a Consequence for any probability vector \(p\), \(\lim\limits_{k \rightarrow \infty} A^k\bar{p} = \begin{pmatrix} \bar{u} & \bar{u} & \cdots & \bar{u} \\ \end{pmatrix}\bar{p} = \bar{u} \text{ (for any $\bar{p}$)} \)
\(c\) is saying that just knowing the eigenvector corresponding to eigenvalue 1, we can predict the future. No matter what initial probability vector we’re given, we know as \(k\) approaches infinity, \(A^k\) is just \(\begin{pmatrix} \bar{u} & \bar{u} & ... & \bar{u} \end{pmatrix}\) and the product of this matrix with any probability vector is just \(\bar{u}\)!
Proof
We’re going to prove these statements in several parts.
1 is an eigenvalue of \(A\):
FACT: \(A\) and \(A^t\) has the same eigenvalues. Why? This is because \(\det(B) = \det(B^t)\) which means that \(\det(A - tI_n) = \det(A^t - tI_n)\). We will use this fact to prove what we want.
Consider \(A^t\). We know that the rows now add to 1 (in \(A\), the columns add to 1). Let \(\bar{v} = \begin{pmatrix} 1 & \cdots & 1 \end{pmatrix}^t\). Now,
The last step is true because we know in \(A^t\), the sum of each row adds to 1. Therefore, \(\bar{v}\) is an eigenvector of \(A^t\) with eigenvalue is 1. But we know that \(A\) and \(A^t\) have the same eigenvalues and so 1 is an eigenvalue of \(A\).
If \(\lambda\) is an eigenvalue of \(A\), then \(|\lambda| \leq 1\):
Since \(\lambda\) is an eigenvalue of \(A\), then \(\lambda\) is an eigenvalue of \(A^t\). By definition this means
So \(v\) is an eigenvector of \(A^t\). Let \(|v_k| = \max_j \{|v_j|\}\). so \(|v_k|\) is the greatest entry in the vector \(v\) (and we’re going to assume it’s the \(k\)th entry?)
Now, let’s look at the \(k\)th entry of \(A^tv\). We’re going to compute it in two different ways. First, by the definition of matrix-vector multiplication as follows
Another way to compute this is by the definition of an eigenvector as follows
Combining both equations
As required.
If \(A_{ij} > 0\) for all \(i,j\), then \(\dim E_1 = 1\) and \(\lambda \neq 1 \implies |\lambda| < 1\):
Suppose \(|\lambda| = 1\) and \(A^tv = \lambda v\) for \(v \neq \bar{0}\). It suffies to show that
Why? if the eigenvector was \(\begin{pmatrix} 1 & \cdots & 1 \end{pmatrix}^t\), then \(\lambda\) must be 1. Because
so \(\lambda = 1\). So we argue as before,
So these inequalities are equalities. \(A_{jk}v_j\) all have the same sign.
[TODO continue this proof .. I’m lost now]
Instead of proving (c), we will prove a slightly weaker theorem where we will assume that \(A\) is diagonalizable next!
Diagonalizable Transition Matrix
Proof
We will assume the statements we proved in the previous theorem. So 1 is an eigenvalue, \(\dim(E_1)=1\) and any other \(\lambda\) has \(|\lambda| < 1\). On top of this, now we can also assume that \(A\) is diagonalizable. This means we can write \(A\) as
Assume that \(\lambda_1=1\). All the other \(\lambda\)’s absolute value is less than one.
We know we can compute \(A^k\) easily using
We can now apply the limit
What do we know about \(L\)? What happens if we multiply \(A\) by \(L\)?
We know that \(L\) is a limit of a transition matrix so it’s a transition matrix. So the sum of the entries in each column of \(L\) is 1. Moreover, we saw that \(AL = L\). If \(L\) was a vector, then it would be an eigenvector with \(\lambda = 1\). But \(L\) is not a vector. It is a matrix with column vectors. if we apply the matrix-vector definition, then
From this we see that \(Al_1 = l_1, Al_2 = l_2,...\). So each column vector \(l_j\) is in fact an eigenvector of \(A\) with eigenvalue 1. But we also know that the eigenspace of \(\lambda=1\) is one-dimensional so all of the eigenvectors \(l_1,...,l_n\) must be in \(E_{\lambda_1}\). Therefore, \(l_j = c_j\bar{u}\) for some \(c_j\).
We also know that the sum of entries of \(l_j\) is 1. Similarly the sum of entries in \(\bar{u}\) is also 1. So if both sides had vectors where the sum of entries is 1, then \(c_j\) must be 1 so \(l_j = \bar{u}\) and we can write
Additionally since \(\l_j\) is a probability vector, then we also know that \(\bar{u}\) is a probability vector.
Example
Continuing the same example from before where
We want to check if \(A\) is diagonalizable. In fact it is
To compute \(A^k\), we know from the previous theorem, that the answer has two copies of the eigenvector corresponding to eigenvalue \(\lambda = 1\). You can see above that the eigenvector corresponding to 1 is the first column vector in \(Q\) and so
and no matter what is our initially probability vector is,
\(\frac{1}{6}\) corresponds to state 1 and \(\frac{5}{6}\) correspond to state 2.
References
- Math416 by Ely Kerman