﻿ Football Commentary: Dynamic Programming Model

footballcommentary.com

A model-based approach to football strategy.

 March 9, 2004

Description of the Dynamic Programming Model

Introduction

 Contents Introduction Boundary Conditions For The Second Half Backward Induction Estimation Of The Transition Probabilities The First Half Of The Game Other Models

In this article we model a football game as a Markov Decision Process and show how to compute the model's implications using dynamic programming. The model is intended to provide guidance for certain decisions that arise during a game, such as two-point conversions and going for it on fourth down.

We explain the main ideas of the model in simplified form. Those who are interested in the details can peruse the source code, which is written for MATLAB®. The computations for the second half of the game are performed by dynprog.m. This program must be run before running dynprog1.m, which performs the computations for the first half. The programs lognorparams.m and lognordensity.m are also used.

The model is built around the idea that in making decisions, a team tries to maximize its probability of winning the game, and their opponents try to minimize that probability. There are three types of situations, called states, in which the model explicitly evaluates the probability of winning. The first type of state is when one team or the other has just gained possession. The second type is when a team has just scored a touchdown, but has not yet tried for the extra point (or points). The third type is when a team is about to kick off.

Options to attempt a two-point conversion, to try an onside kick, or to go to a hurry-up offense are modeled explicitly. In addition, making a first down at a particular time, field position, and point differential is equivalent (from the model's standpoint) to first gaining possession at that same time, field position, and point differential. Therefore, the model will allow us to analyze decisions to go for it on fourth down.

In the model, field position is denoted by a variable y. The model uses the usual convention that field position is always relative to the team that has the ball. Thus, the position denoted y when we have the ball is not the same spot on the field as position y when the opponents have the ball, but rather the mirror image.

Let g = 1 for our team and g = 2 for the opponents. Let R(t,d,y,g) denote the probability that our team wins the game, if team g has just gained possession at its position y on the field, we lead by d points, and t is the time remaining in the second half. Let E(t,d,g) denote the probability that our team wins the game if team g has just scored a touchdown (but has not yet attempted the extra point or points), we lead by d points, and t is the time remaining in the second half. Finally, Let K(t,d,g) denote the probability that our team wins the game if team g is about to kick off, we lead by d points, and t is the time remaining in the second half.

Since time is continuous, we have made it discrete by dividing into ten-second increments. Similarly, we have divided the field into six sections: From the 1 yard line to the 9 yard line, from the 10 to the 29, the 30 to the 49, the fifty to the 69, the 70 to the 89, and the 90 to the 99. (The 69 yard line, for example, is the opponent's 31.) Each value for y can be thought of as representing the midpoint of one of these sections. The probability that we win the game from some other point on the field is approximated by interpolation or extrapolation.

Boundary Conditions For The Second Half

We solve for our probability of winning in the various states by backward induction, starting by filling in our probability of winning in the states when there is no time left in the game (t = 0). These are called the boundary conditions. Notice that if the game ends during a possession, our probability of winning is 1 if we're ahead and 0 if we're behind, regardless of which team has the ball or the position on the field when time expires. Formally, regardless of y or g, R(0,d,y,g) equals 1 if d > 0, equals 0 if d < 0, and equals 0.5 if d = 0 (because the game goes into overtime). Similarly, K(0,d,g) equals 1, 0, or 0.5 according to whether d is positive, negative, or zero.

Determining E(0,d,g) is more complicated because a team can try an extra point (or points) even if the TD occurs as time expires. For simplicity we'll look only at the case in which we have the ball (g=1); the other case is similar (or can be filled in by symmetry).

If d > 0, then E(0,d,1) = 1 because we already lead even before our extra point attempt. If d < −2, then E(0,d,1) = 0 because even a two-point conversion can't tie the game. For d = 0 (tie game prior to the extra point), we win the game if the extra point is good, or if it misses but we win in OT. So, if the kick succeeds with (say) probability 0.985, then E(0,0,1) = 0.985 + (1−0.985)0.5. Similarly, if two-point conversions succeed with probability 0.4, then E(0,−2,1) = 0.4(0.5), because if we score a TD to go two points down with time expired, we win only if we make a two-point conversion and then win in OT.

Finally, consider the case in which we score a TD to go one point down with time expired. Here, we have a decision. If we kick, our probability of winning the game is 0.985(0.5). If we go for two, our probability of winning is 0.4. Since we want to maximize our probability of winning, E(0,−1,1) is the larger of these values. (Under these assumptions, then, we kick the extra point and go to OT.)

Backward Induction

We've now shown how to determine the boundary conditions for the second half of the game. Now, assume we have already found the win probabilities R(t',d',y',g'), K(t',d',g'), and E(t',d',g') for every t', d', y', and g' with t'< t. We will show how to find the win probabilities R(t,d,y,g), K(t,d,g), and E(t,d,g). The process is called backward induction. Let's start with R, and to be very specific, suppose that there are 92 time units left in the second half (t = 92), we trail by 4 points (d = −4), we have the ball (g = 1), and we are at our position 3 on the field (y = 3). Suppose for simplicity that there are only three states to which we can make a transition under our "regular" offense. The first possibility is that we score a TD, using 18 units of time. (The resulting state has t' = 74, d' = 2, and g' = 1, and our win probability at that state, E(74,2,1), is assumed already known.) The second possibility is that we score a FG, using 16 units of time. The third is that the other team gets possession at its field position 2, after 10 units of time. Suppose that the associated transition probabilities are 0.2, 0.12, and 0.68 respectively.

We also have the option of using a "hurry-up" offense. Assume that under the hurry-up offense, 6 units of time will elapse, and we will either score a TD (with probability 0.16) or a FG (probability 0.1) or else the opponents will get possession at their position 3 (probability 0.74).

Now, given a choice of offense, we see from the rule

 P(win) = ∑s P(transition to state s) P(win | transition to state s)

that our probability of winning the game is the transition-probability-weighted average of our win probabilities in the states to which there can be a transition. This is

 0.2 E(74,2,1) + 0.12 K(76,−1,1) + 0.68 R(82,−4,2,2)

for the regular offense, and

 0.16 E(86,2,1) + 0.1 K(86,−1,1) + 0.74 R(86,−4,3,2)

for the hurry-up offense. Of course, we'll use whichever offense gives the higher chance of winning, so our actual win probability, R(92,−4,3,1), is the larger of these two values.

In the actual model, the probabilities that a possession culminates in a TD or FG (or even a TD by the opponents, say from an interception returned for a score) depend on our starting field position. In addition, when the state transition is to a possession by the opponents, their starting field position and the elapsed time have a joint probability distribution (conditional on where we started our possession). Similarly, if our possession ends in a TD or FG, there is a distribution for the elapsed time (conditional on where we started our possession). Therefore in the actual model, the number of states to which a transition can be made is very large, but the principle of backward induction is as demonstrated above.

To illustrate the computation of K, suppose there are 92 units of time left in the second half, the opponents are kicking off, and we lead by 7 points. They can either kick deep, or try an onside kick. For simplicity suppose that if they kick deep, we will get possession at our position 2 or 3 on the field (with equal probabilities), and that 1 unit of time will run off the clock. If they try an onside kick, assume that with probability 0.17 they will get possession at their position 3 on the field, whereas with probability 0.83 we will gain possession at our position 4 on the field. In either case, 1 unit of time is consumed. Therefore, if they kick deep, our probability of winning the game is

 0.5 R(91,7,2,1) + 0.5 R(91,7,3,1),

whereas if they try an onside kick, our probability of winning the game is

 0.17 R(91,7,3,2) + 0.83 R(91,7,4,1).

Since the opponent is making the decision, our actual probability of winning the game, K(92,7,2), is the smaller of these two values. As before, in the actual model the number of states to which a transition can be made is larger.

Finally, to illustrate the calculation of E, suppose that with 92 units of time left in the second half we have just scored a TD to go ahead by 5 points. We can go for one or two. In either case we will then be kicking off. If we decide to go for one, our probability of winning the game is

 0.985 K(92,6,1) + 0.015 K(92,5,1),

whereas if we go for two, our probability of winning the game is

 0.4 K(92,7,1) + 0.6 K(92,5,1).

Our actual probability of winning, E(92,5,1), is the larger of these two values.

In summary, our probability of winning the game at any state is the transition-probability-weighted average of our win probabilities at all the states to which there can be a transition. Moreover, if there is a decision to be made at a state, we have to compute our win probability both ways, and use the larger value (if we are making the decision) or the smaller value (if the opponent is making the decision).

If the teams have identical characteristics, there are symmetries that simplify the calculations. Specifically, the probability that we win the game if t units of time remain, we lead by d points, and we have possession at our position y is the same as the probability that other team wins the game if t units of time remain, they lead by d points, and they have possession at their position y. Hence

 R(t,d,y,1) = 1 − R(t,−d,y,2),

and similarly for K and E. Therefore, for a given t, once we have computed our win probabilities at the states with g = 1, our win probabilities at the states with g = 2 can be determined immediately.

Estimation Of The Transition Probabilities

With the exception of the hurry-up offense, the model assumes the transition probabilities are independent of the score and time. To understand what we mean by this, suppose for example that we begin a possession at our field position y, and choose our regular offense. Then the probability of a transition to possession by the opponents at their field position y' depends on y and y', but not on t or d. We have estimated these transition probabilities using data from the 2003 NFL season, combined with the requirement that the probabilities vary smoothly across states. The estimates can be regarded as appropriate for an average team.

Using the same data, we have estimated the distributions for the elapsed time associated with state transitions, assuming that the elapsed time is lognormal. Once again, for the regular offense, the parameters of the lognormal distribution depend on y and y' but not on t or d.

For the hurry-up offense, the probabilities associated with the state transitions do depend on t (but still not on d). We assume a team can compress a drive into the remaining time, albeit with a reduction in the probability of scoring that depends on the amount of compression.

The First Half Of The Game

Backward induction works the same way in the first half of the game as in the second half. However, the boundary conditions for the first half are simply that (a) the second half will begin with the score with which the first half ended, and (b) a particular team (known in advance) will kick off to start the second half. Formally, suppose team gk will be kicking off to start the second half. Let T be the number of time periods in a full half, and define B(d,gk) = K(T,d,gk), where K is from the second half. Then the boundary conditions for R and K in the first half are R(0,d,y,g) = B(d,gk) and K(0,d,g) = B(d,gk). In addition, using the one- and two-point conversion probabilities introduced earlier, E(0,d,1) is the larger of

 0.985 B(d+1,gk) + (1−0.985) B(d,gk)

and

 0.4 B(d+2,gk) + (1−0.4) B(d,gk),

and E(0,d,2) is the smaller of

 0.985 B(d−1,gk) + (1−0.985) B(d,gk)

and

 0.4 B(d−2,gk) + (1−0.4) B(d,gk).

Because it's known which team will kick off to start the second half, the first half lacks the kind of symmetry we described earlier for the second half. Consequently, no computational simplifications are available.

Other Models

The earliest model we are aware of is that of former NFL quarterback Virgil Carter and Robert E. Machol in their article "Operations Research On Football," which appeared in 1971 in the journal Operations Research. (Those with access to JSTOR can obtain the article online.) Their approach was later greatly refined by David Romer. The Carter-Machol-Romer approach isn't a backward-induction model, but rather a "steady state" model. Instead of supposing that we try to maximize our probability of winning, they assume we try to maximize the expected point differential in an infinite game. Although this isn't really correct, it does result in enormous simplification. Since the game is infinite, time isn't a state variable. Moreover, in this formulation, points scored in the past have no effect on strategy going forward, and so the point differential isn't a state variable either. The remaining state variables are just the field position, and which team has the ball. So, the Carter-Machol-Romer model is silent regarding decisions that don't involve field position, such as two-point conversions, or whether to go for a TD rather than a FG with one second left in the first half. However, as we explain in a separate article, these models can be useful early in the game.

Of course, exactly how early in the game it has to be is a question that can only be answered with a backward-induction model. Our model shows that it can be optimal to try a two-point conversion early in the second quarter of a close game (and earlier in a rout), but the increase in the probability of winning from doing so at that point is extremely small.

Another model, closer in spirit to ours, was built by Harold Sackrowitz solely to analyze two-point conversions. It's a backward-induction model in which our objective is to maximize our probability of winning the game. This probability is evaluated at the end of each possession, and the state variables are the point differential, the number of possessions remaining in the game, and which team has just completed the possession. There is no notion of field position. The probabilities of scoring a TD or FG are the same on every possession. Also, time is a state variable only in an indirect way: Essentially, Sackrowitz assumes that the time per possession is non-random. This would seem to be a drawback since according to the observed transition probabilities, time of possession varies enormously, and is correlated with whether or not the possession results in a score. The drawbacks exist more in principle than in practice, however, since Sackrowitz's model agrees reasonably closely with ours for two-point conversions. Moreover, his article contains a good explanation of dynamic programming.