Absorbing
Markov Chains

Lucas Mentch

Often times, mathematical models can be used as a main tool
for making informed decisions. Markov
Chains, specifically absorbing Markov chains, are often useful in constructing
a mathematical model of a situation involving experiments with multiple
outcomes where the outcome of a given trial depends only on the outcome of the
previous trial. Consider the following
example involving typical human behavior.

Suppose we are at an amusement park. After paying top dollar to get into the park,
we are ready to get on some rides.
Suppose that in this park, there are only three rides: a roller coaster, a water ride, and a boring
ride. The line for the roller coaster is
always very long, the line for the water ride is always moderately long, and
the line for the boring ride is, predictably, always short. Since we don’t want to wait in line all day,
we are unlikely to ride the roller coaster several times in a row, but we still
want to have fun, so we can’t ride the boring ride all day. The wait for the ride we are currently on
will affect which ride we want to get on next.
That is, given that we are on a certain ride, the next ride we will get
on depends only on the type of ride we are currently riding and no other previously
ridden rides.

When the outcome of a given experiment can affect the outcome
of the next experiment, the process can be referred to as a Markov chain, or
Markov process^{1}. In the
amusement park example above, the ride we are currently on can affect the ride
that we will get on next, and thus we can regard this process as a Markov
chain. It is important to realize that
in a Markov process, the outcome of the next experiment is not affected by any
outcome other than the outcome of the current experiment.

In a Markov process, we consider a set of *states *and the ways in which we can move
from one state to another. To begin the
process of moving from one state to another - the Markov process - we begin in
some initial state and continue the process by moving from one state to
another. Each move from one state to
another is called a *step*. Given some set of states
, if we are currently in some state , the probability that our next step will
be to another state is given by a probability denoted .
These probabilities are called *transition
probabilities*. Since, in a Markov
process, the outcome of
the next experiment is not affected by any outcome other than the outcome of
the current experiment, the
probability of moving from some state
to
another state depends only on the current state and our probability distribution, not on any
previous state of the process. Observe
that the process can remain in the current state with probability .
To start the Markov process, in addition to specifying a particular
starting or initial state, we also need to specify an initial probability
distribution. We can represent the
probability distributions for a given initial state in a square matrix called a
*transition matrix*.

Let’s return to our amusement park example. Recall that there are 3 total rides in the
park: one roller coaster, one water
ride, and one boring ride. We can treat
each ride as a state and thus our set of states (rides) is ** **where** R **denotes the
roller coaster,** W **denotes the water
ride, and** B **denotes the boring
ride. If we are on a given ride, then
our next ride will always be one of the three rides at the park. Suppose that if we are on the roller coaster,
then our next ride will be the roller coaster with probability 0.1, the water
ride with probability 0.4, or the boring ride with probability 0.5. Notice that since we are always going to want
to ride another ride and the roller coaster, water ride, and boring ride are
the only rides in the park, our probabilities of next getting on one of these
rides must add up to 1. Now suppose that
if we are on the water ride, then our next ride will be the roller coaster with
probability 0.2, the water ride with probability 0.6, or the boring ride with
probability 0.2. Finally, if we are on
the boring ride, then our next ride will be a roller coaster with probability
0.35, the water ride with probability 0.35, or the boring ride again with
probability 0.3. Our transition matrix
is then

** R W B**

This transition matrix makes it easy to see the probability
of moving from one state (ride) to another.
Letting the row of the matrix represent the ride we are currently on and
the column of the matrix represent the ride we could next be on, we see, for
example, that the probability of moving from the water ride to the boring ride
is 0.2. That is, .

With this transition matrix at hand, we can now ask questions
like “Given that we are on the roller coaster, what is the probability that two
rides from now we will be on the boring ride?”
We denote this probability of being on the boring ride two rides from
now, given that we are currently on the roller coaster, by .
In order to determine , we need to look at all possible ways to
move from the roller coaster to the boring ride in exactly two steps.

There are three possible
ways for this to happen: (1) we ride the roller coaster twice and
then ride the boring ride, (2) we go from the roller coaster to the water ride
to the boring ride, or (3) we go from a roller coaster to the boring ride
twice. For case (1), we have , for case (2) we have , and for case (3) we have .
Thus, our total probability, the sum of each of the probabilities, is

In general, if a given Markov process has *n* states, then the probability of moving
from some state to
another state in
two steps is given by^{1}

Thus, for a general Markov process,
we have an efficient way to calculate the probability of moving from a state to
another state in
two steps. The obvious next question is
how to calculate the probability of moving from a state to
another state in
exactly r steps, where r is any positive integer. Probability arguments and induction lead us
to the following result^{1}:

**(Theorem
1 **(Grinstead
and Snell Theorem 11.1)**) Given the transition matrix **** of a Markov chain, taking the ij^{th} entry of the matrix **

Suppose in our amusement park example
that we are on the water ride and we want to know the probability that 5 rides
from now, we will be on the roller coaster.
Using the convention from the transition matrix **P** above, we want , the (2,1) entry of the transition
matrix **P** raised to the fifth power. Thus, we have

Given this transition matrix **P** raised to the fifth power, *,* we can look at the (2,1) entry of and say that the probability of being on the
roller coaster 5 rides after we are on the water ride is 0.222, or, 22.2%. Notice that the probability of being on the
water ride after 5 rides is 0.482 and the probability of being on the boring
ride after 5 rides is 0.296. Thus, we
are least likely to be on the roller coaster after 5 rides given that we are
currently on the water ride.

__Absorbing Markov Chains__

Now consider the following modification to the problem. Suppose that all of the changing stations at
the park are closed and we are not allowed to get on any other ride while
wet. Thus, once we ride the water ride,
we will have to continue riding the water ride until the end of the day. In other words, once we get on the water
ride, our process of determining which ride to get on next comes to an end, and
we are forced to stay at the water ride.
In the language of Markov processes and our transition matrix **P**, we can say that while and .
In general, when we have a given state in
which , we call the state an
*absorbing state*, since we cannot
leave that state after reaching it.
States that are not absorbing are called *transient states*. We then
call any Markov process with at least one absorbing state an absorbing Markov
chain if it is possible to reach that absorbing state in some finite number of
steps. Note that since it is possible
for us to reach an absorbing state from any other state, the probability of
eventually reaching an absorbing state is 1.
That is, given any absorbing Markov chain, we will eventually reach an
absorbing state^{1}. The
question we can now ask is how many states we will be able to reach before
reaching an absorbing state. That is, we
would like to know, on average, how long will it take for the process to be *absorbed*^{1}. In order to answer this question, we first need
to restructure our transition matrix **P**.

We begin by reordering the states in our transition matrix so
that the transient states come first. If
we have *r* absorbing states and *t* transient states, then our transition
matrix **P** will have the following form:

** Tr. Abs. **

where **I**
is the *r x r* identity matrix, **0** is the *r x t *zero matrix, **R** is
a non-zero *t x r* matrix, and **Q** is a *t x t* matrix^{1}.
This matrix is now said to be in *canonical
form*^{1}. We now need to
make use of the following theorem:

**(Theorem 2 **(Grinstead and
Snell Theorem 11.4)**) Given an absorbing
Markov chain and an associated transition matrix in the canonical form
described above, the matrix I – Q has
an inverse N with **

Since the probability of reaching an absorbing state tends to
one, we will always eventually reach an absorbing state and the above theorem
guarantees that we can compute **N**,
the inverse of the matrix **I – Q**. A final theorem will allow us to calculate,
for each potential starting state, the expected number of states reached before
reaching an absorbing state. We will
always assume that the Markov process starts in a transient state.

**(Theorem 3 **(Grinstead and
Snell Theorem 11.5)**) Let **** be the expected number of steps before the
chain is absorbed given that the chain starts in the (transient) state ****,
and let t be the column vector whose i^{th}
entry is **

**where****
c is a column vector all of whose entries are 1.**

Returning once again to our amusement park example, recall that
once we get on the water ride, we must stay on the water ride, so **W** is an absorbing state and **R
**and **B** are transient states.
Since we know we will have to stay on the water ride once we get on it,
let’s change our riding plan so that if we are on the roller coaster, we will
ride the roller coaster again with probability 0.3, we will ride the boring
ride next with probability 0.6, and we will ride the water ride next with
probability 0.1. If we start by riding
the boring ride, we will ride the roller coaster next with probability 0.45,
the boring ride again with probability 0.5, and the water ride next with
probability 0.05. Our transition matrix
in canonical form is then

* ***R
B W**

and we can see that

and thus we have

and we know from theorem 3 that **t
= Nc** where **c** is a column vector all of whose entries
are 1. Thus, we have

Thus, if we first ride a roller coaster, we will get to ride an
average of 13.75 rides before riding the water ride, and if we ride the boring
ride first, we will get to ride an average of 14.375 rides before riding the
water ride. Thus, starting the day by
riding the boring ride will yield the highest number of expected rides on
average before reaching the water ride.

This amusement park was a simple and admittedly unrealistic
example. Yet, it is easy to see that by
increasing the size of our transition matrix, this same process could be used
to deal with amusement parks with any number of rides. Absorbing Markov Chains are often used to
model several important and well-known situations. For example, the random or drunkards walk
problem, in which a person leaving a bar can stumbles in a random direction,
eventually reaching home, the absorbing state, is often described using
absorbing Markov chains. Gamblers
playing casino games in which the player is at a disadvantage can be modeled
using absorbing Markov chains, where the absorbing state occurs when the
gambler is out of money. Markov chains
with no absorbing states can be used to predict genetic traits in offspring.

There are many more types of Markov chains beyond absorbing
Markov chains. A deeper understanding of
Markov chains, including the proofs of the theorems stated above,
requires a moderate knowledge of probability.
In fact, Markov processes are generally viewed from a probability
standpoint with assumed minimal background knowledge in linear algebra. Proofs of the above theorems as well as a
more thorough introduction to absorbing Markov chains can be found in *Introduction to Probability* by Charles Grinstead and J. Laurie Snell, the primary source for this
page. For those with a significant
background in probability, more information on absorbing Markov chains and
other types of Markov chains can be found in *Probability, Markov Chains, Queues, and Simulation* by William J.
Stewart, and *Non-negative Matrices and
Markov Chains* by E. Seneta.

Sources:

1.
Grinstead, Charles
M. and J. Laurie Snell. (1997) *Introduction to Probability. *AMS Bookstore.