A model-based approach to football strategy.

January 18, 2010


Data Selection for Estimating Play-Outcome Probabilities


In this article we address a problem that arises continually in the quantitative analysis of football strategy: Which historical data should be included when estimating probabilities associated with the outcome of a football play? This question was actually at the heart of the controversy in Week 10 of the 2009 season, when New England elected to go for the first down on 4th-and-2. An important component of the analysis of that decision was the probability of making the first down. Because NFL teams that lead narrowly late in the game almost invariably punt on fourth down from deep in their own end, no strictly comparable history exists from which one could estimate the probability of success. Analysts who based their probability estimates on historical data had to include situations that differed materially from the one New England faced. It's a legitimate criticism of those analyses to point out that the probability of success in New England's situation might be different from the average probability of success in the situations represented by the data. The most convincing way to counter that criticism is to limit the historical data to situations that, if not actually the same as the one New England faced, at least offered the same probability of success.

By the situation associated with a play we mean the score differential, field position, time remaining, down, and necessary gain for a first down. (For two-point tries the situation is just the score differential and time remaining.) We will call two situations equivalent if the strategies of both the offense and defense should be the same in those two situations. If we want an unbiased estimate of some probability associated with a particular situation, we can use historical data not just for similar situations, but for any equivalent situation. In the next section we will derive a sufficient condition for two situations to be equivalent. Although the condition is quite restrictive, there are important cases in which it can be applied—including New England's 4th-and-2 decision.

In the third section we examine whether two situations are equivalent if the only difference between them is that one of them is third down and the other is fourth down. This is important because third-down data are plentiful. Unfortunately, our analysis suggests that third down and fourth down are not equivalent. In particular, on fourth down the defense will focus more on preventing shorter gains, and less on preventing longer gains.

The point of this article is that biases can arise when non-equivalent situations are treated as equivalent. However, there is nothing wrong with combining data from non-equivalent situations as part of an estimation procedure in which the non-equivalence is explicitly recognized. For example, assuming that the probability of making a first down is a smooth, decreasing function of the necessary gain (holding other aspects of the situation constant), it's best to estimate the success probabilities simultaneously for all necessary gains.

Finally, even if all the data are from equivalent situations, the data will still not be homogeneous because the sample will include teams of differing abilities. Therefore, any probability estimates derived from such a sample are applicable to the average team in the sample. (Note that this is not the same thing as the average NFL team. Whether the sample is representative of the average team is something that has to be checked, not assumed. One can simply examine the data to see if all the teams are represented approximately equally.)

Conditions for Equivalent Situations

Consider a specific situation, and let θ and ψ denote the strategies of the offense and defense, respectively, in that situation. These can be randomized strategies. Let fi(θ,ψ) denote the probability that the play has outcome i, given that θ and ψ are the strategies of the teams. The set of possible strategies for the teams, and the functions fi, depend on the field position (for example, one can't send a receiver 30 yards down the field if the ball is at the defenders' 10-yard line), but they depend only on the field position.

Let vi denote the probability that the team on offense wins the game, given that the play has outcome i. Unlike the fi(θ,ψ), the vi depend on the score differential, the time remaining, and the down and distance.

Prior to the play, the probability that the team on offense wins the game is

i fi(θ,ψ) vi.

The offense chooses θ to maximize expression (1), while the defense chooses ψ to minimize expression (1). In a Nash equilibrium, the strategies of the teams will be optimal with respect to each other: θ will be optimal for the offense given that the defense uses strategy ψ, and ψ will be optimal for the defense given that the offense uses strategy θ. (We explained the notion of Nash equilibrium in more detail in a previous article.)

Now consider a second situation, with the same field position as the first situation, but otherwise different. For this second situation let wi denote the probability that the team on offense wins the game, given that the play has outcome i. Prior to the play, the probability that the team on offense wins the game is

i fi(θ,ψ) wi.

In general, the equilibrium strategies of the teams in this second situation will be different from the equilibrium strategies in the first situation. Suppose, however, that there are constants a and b, with b>0, such that

wi = a + b vi     for all i.

Then since the fi sum to 1, expression (2) equals

a + b ∑i fi(θ,ψ) vi.

Maximizing with respect to θ, or minimizing with respect to ψ, is exactly the same in expression (4) as in expression (1). Therefore, the Nash equilibrium strategies are the same in the two situations, and so the two situations are equivalent. (Of course, the win probability will not be the same.)

Although condition (3) is quite restrictive, there are important special cases in which it is useful. One of these is early in the game. Consider two situations that are identical except for the score differential. As before, let vi and wi be the offense's conditional win probabilities from the two situations, given outcome i. In a previous article we explained why, early in the game, there will be a number α (which depends only on the score differential) such that the equation wi = α + vi is approximately true for all i. This is a special case of condition (3).

As an example, suppose we want to estimate the probability of converting successfully on 3rd-and-5 near midfield, halfway through the first quarter, when the team with the ball trails by a field goal. Then we can legitimately use historical data not just for similar situations, but also for situations whose score differential is substantially different from −3 points. (Again, this is true only early in the game. Later in the game the teams' optimal strategies can be extremely sensitive to the score differential.)

Equation (3) also applies in situations we will call binary. These are situations for which the probabilities vi assume only two values. There will be some required gain such that if the play gains that many yards or more, the offense's win probability is V; and otherwise their win probability is v. (The cleanest example of a binary situation—in fact the only exact example—is a two-point try, which succeeds if 2 yards are gained and otherwise fails.) An important observation is that two binary situations are equivalent if they have the same field position and the same required gain. (Indeed, if the win probabilities corresponding to the second situation are W and w, then equation (3) holds with b=(W−w)/(V−v) and a=w−bv.) An immediate consequence is that all two-point conversion attempts are equivalent, regardless of whether (for example) the team attempting the try leads by 12 points with 2:00 remaining, or trails by 2 points in the 3rd quarter. Irrespective of the situation under which a two-point conversion is attempted, the Nash equilibrium strategies of the teams are the same. This is the formal justification for the standard practice of using all the available data on two-point conversions to estimate the probability of success.

As a final application of the results of this section, we return to New England's decision to go for the first down in 2009 Week 10, with which we began this article. If the Patriots make the first down, even if by an inch, their win probability is near 1 because they can almost run out the clock. Consequently, to a good approximation, the situation is binary. Now, teams almost never go for it on fourth down under those circumstances. However, it's not uncommon for the team that leads the game to face a third-down situation in which, if they make the first down, they can run out the clock; but if they don't make it they will punt. These situations are approximately binary as well. The subset of these cases in which the team with the ball is in its own end, and needs to gain 2 yards for the first down, are equivalent situations to the one New England faced. Therefore, the observed fraction of those cases in which the offense picked up the first down furnishes a relatively clean estimate of New England's probability of success. If this probability estimate is near the 0.6 that derives from generic data, it might satisfy some of Belichick's critics. On the other hand, if this estimate is materially lower than 0.6, it would be evidence that the critics are right when they argue that estimates using generic data don't capture the difficulty of gaining 2 yards in New England's situation.

Using Third-Down Data on Fourth Down

It's common to use third-down data to augment the relatively scarce fourth-down data when estimating probabilities related to the outcomes of fourth-down plays. Does this introduce a bias?

In an appendix we give a simple proof that equation (3), the sufficient condition for equivalence, will not hold in general for a third-down situation and a fourth-down situation that are otherwise identical. (The only exceptions are binary situations.) In this section, with some additional assumptions, we will show that third and fourth downs are in fact not equivalent. In particular, on fourth down (as compared to third down) the defense will focus more on preventing shorter gains, and less on preventing longer gains.

We will assume that at most a few yards are needed for the first down. This is a harmless assumption because, except for desperation situations, teams rarely go for it on fourth and long. The benefit of this assumption is that, to a good approximation, all failed attempts to make the first down give the same win probability.

The strategy of the offense consists of the formation, and then the play run from that formation. The defense sees the formation before the play starts. What we will analyze here is the defense's choice of how to defend, and the offense's selection of a play, given the formation.

Suppose there are n plays that the offense could run. Let pi be the offense's win probability, conditional on running the ith play and picking up the first down. Note that pi will be relatively large for a long pass play, but smaller for a short pass play or a run. However, the pi are the same regardless of the down. Let s be the offense's win probability if they fail to pick up the first down. Since failure to make the first down is worse on fourth down than on third down, s is lower on fourth down.

A choice of strategy by the defense determines probabilities q1,…,qn, where qi is the probability that the offense makes the first down if they choose to run the ith play. If the offense runs the ith play, their win probability is then

qipi + (1−qi)s.

The offense's strategy will be a randomized strategy, characterized by probabilities π1,…,πn, where πi is the probability with which the offense runs the ith play. Given those randomization probabilities, it would be optimal for the defense to choose a strategy whose associated q1,…,qn minimize the offense's win probability,

i πi[qipi + (1−qi)s].

Conversely, given a defense strategy characterized by q1,…,qn, an optimal strategy for the offense is any randomized strategy that assigns positive probability only to plays that give the maximum win probability. Formally, π1,…,πn is optimal for the offense if

πi=0   unless   qipi + (1−qi)s = maxj {qjpj + (1−qj)s}.

A pair , D), where Π is a set of randomization probabilities for the offense and D is a defense strategy, is a Nash equilibrium if Π is optimal against D and D is optimal against Π. Since the offense will always choose a play that gives them the highest win probability, intuition suggests that in equilibrium, the win probability must be the same for every play. Otherwise it would be beneficial for the defense to alter their strategy, defending less effectively against plays for which the offense's win probability is low, in order to defend more effectively against plays for which the offense's win probability is largest. Of course, this assumes a trade-off in which the defense can always lower the offense's likelihood of success on a particular play if they are willing to allow a higher likelihood of success on some other play. However, in many situations there will be one or more plays that are inferior for the offense, even if the defense is making no particular effort to defend against those plays. (An example might be a quarterback sneak with five yards to go for a first down.) Allowing the offense a higher likelihood of success on these plays is pointless, because it doesn't permit a reduction in the likelihood of success on another play. Consequently, these plays will be ignored by both teams. Except for such plays, though, the intuition is correct. In a mathematical appendix we show that in a Nash equilibrium, the win probability (given by expression (5)) is the same for every play, except possibly for plays that both teams ignore.

If , D) is an equilibrium on third down, it cannot be an equilibrium on fourth down, because the win probability will vary systematically across plays. Indeed, the lower s on fourth down lowers expression (5) for every i, but expression (5) falls more for those i for which 1−qi is larger. Those are the i for which pi is larger. Thus, if the defense were to stick with D on fourth down, the offense would have a clear preference for shorter attempted gains. To equalize win probability across plays, the defense must lower qi for those i for which pi is small; and this must be "paid for" by allowing larger qi for those i for which pi is large. In equilibrium, then, on fourth down (compared to third down) the defense focuses more on preventing shorter gains, and less on preventing longer gains.

These results do not imply that the probability of picking up the first down,

i πiqi,

is lower on fourth down than on third down. Without further assumptions, that probability could be either higher or lower. This is because expression (6), rather than expression (8), is the objective function. Certainly, the offense's win probability is lower on fourth down than on third down. (Proof: The lower s reduces expression (5) for every i, so that the offense's win probability would decrease even if the defense sub-optimally stuck with D.) But to determine how expression (8) differs on fourth down compared to third down, we would have to know how the equilibrium πi change, not just how the equilibrium qi change.

Copyright © 2010 by William S. Krasker