# Expected Standings Points

##### May 16, 2016, Micah Blake McCurdy, @IneffectiveMath

When you score, how much more likely are you to take points from the game you're playing? How much less likely are you to take points if you're scored on? What if you take a penalty? I made an expected standings points model to answer these questions. I don't love it enough to name it, unlike some of my other models. It is just "my expected [standings] points model".

Once you have such a model, you can use it to make descriptions of how a game was won or lost, for instance, like opening night of 2015-2016 where Montreal (road team, in green) played Toronto (home team, in blue).

The Leafs start out slightly higher, at around 1.15 points; they're at home and home teams win more. The Habs start out at around 1 expected point. The Canadiens score early, and jump up to almost 1.5 points; a one-goal lead is good but this early it's hardly enough to be very sure of points. They take a penalty soon after and drop some expected points, but they kill it off and go back to 1.5 points. Close to the 20 minute mark they take another penalty and this time the Leafs score, putting them back on top; the Leafs draw another penalty around 28 minutes but then take one shortly after. A long stretch of open play sees both lines rise slowly together---as the game wears on, overtime is more and more likely, where both teams will get points. Just over half-way through the third, however, the Habs score to take a 2-1 lead, this goal is much later in the game and so drops the Leafs expected points much more than the 1-0 goal did. The Leafs pull their goalie which gives them a slight extra chance but the Habs hold on to win and the final score is shown at the end of the graphs: two points for Montreal, zero points for Toronto. In fact the Canadiens score an empty net goal but it happens so late and affects the outcome so faintly that it barely registers on the chart.

I like to be able to tell stories like that; so the question is: where do the curves in the above graph come from? This is what this article is for.

## Model Inputs

I chose to consider as influences: the score (and not merely the score difference), the number of skaters per side, and the time remaining. I model the home team and away team differently. Specifically, consider:

1. The home score, between 0 and 5;
2. The away score, between 0 and 5;
3. The home skater number, between 3 and 6;
4. The away skater number, between 3 and 6;
For each regular season game since 2007-2008, for each second in a game matching a chosen quadruple above, I record the time and the number of standings points the home team eventually obtained. Overtime and shootout results are considered as having conferred 1.5 points to each team, instead of however many points were actually obtained. From this data I fit a cubic function whose value at t = 3600s (= 60 minutes, end of regulation) is constrained:
1. When the home score is greater, the home cubic is constrained to take the value 2 at t = 3600, and the away cubic is constrained to take the value 0.
2. When the home score is less, the home cubic is constrained to take the value 0 at t = 3600, and the home cubic the value of 2.
3. When the score is tied, both cubics are unconstrained. As a matter of course they end close to 1.5, since most games which are tied late remain tied.
As will become important later, none of the functions are otherwise constrained; there may be values of t between 0 and 3600 for which the value of one of the cubics is not between 0 and 2. We deal with such cases separately later.

## Model Outputs

The "raw output data" is a pair of cubics for each of six home scores, six away scores, four home skaters numbers, and four away skater numbers, making in all 2*6*6*4*4 = 1,152 cubics. The most interesting 90 of them are shown below in a handy grid: the home (solid) and away (dashed) curves for 5v5, 4v5, and 5v4 for the most common scores.

Home team down 2Home team down 1TiedHome team up 1Home team up 2
5v4
5v5
4v5

One source of trouble with the above 1,152 cubics is that some of them are fitted using very little historical data and may be very unreliable. Some of them, like "Score tied at 5 during a 6v3" have no data in them whatsoever; others, like "4-3 at 5v5" have a great deal of data for times close to the end of a game but almost none at early times. This latter class is why I chose to constrain the endpoints but not the early times---this lets me get a good handle on the late-game behaviour which is well-supported at the cost of the early-game behaviour about which we have scant historical information.

## Model Usage

For a given game we would like to use these cubics to build the chart that I used as an example above. In any particular game we may see outlandish situations, however, so we must have a way of knowing which curves to use. For a given score and skater number, I consider the relevant cubic (home or away) and evaluate it at the time of interest. If the cubic does not exist or if the obtained value is nonsensical (greater than two or less than zero), I alter the game state used in the lookup to a similar one using the following procedure, in this order:

1. If the skater situtation is 6v3 or 3v6 (that is, a team has pulled their goalie while on a two-skater advantage), look instead at 5v3 or 3v5, respectively.
2. If both teams have at least a goal, lower their scores by one. Where there might be very little data for a 5-2 scoreline, say, I expect that there will be better data at 4-1.
3. If one of the teams does not have a goal, remove a goal from the other team, replacing a 5-0 score with a 4-0 score, for instance.
4. If one of the teams has six skaters, look instead at the much more common situation where they have five skaters.
5. If one of the teams has a two-skater advantage, look instead at the same scoreline where they have only a one-skater advantage.
These substitutions are only peformed when necessary, and together they are enough to give sensible results for every regular season game in the past nine seasons.