The Magnus Prediction Models

September 24, 2018, Micah Blake McCurdy, @IneffectiveMath

Estimating Individual Impact on NHL 5v5 Shot Rates

(I discussed a closely related precursor to this portion of Magnus at the 2018 Ottawa Hockey Analytics Conference, of which you can read the slides.)

I would like to be able to isolate the individual impact of a given skater on shot rates from the impact of their teammates, their opponents, the scores at which they play, the zones in which they begin their shifts, and home-ice advantage. I have fit a regression model which provides such estimates. The most important feature of this model is that I use shot rate maps as the units of observation and thus also for the estimates themselves, permitting me to see not only what portion of a team's performance can be attributed to individual players but also detect patterns of ice usage.

Here, as throughout this article, "shot" means "unblocked shot", that is, a shot that is recorded by the NHL as either a goal, a save, or a miss (this latter category includes shots that hit the post or crossbar). I would prefer to include blocked shots also but cannot since the NHL does not record shot locations for blocked shots.

The most sophisticated element of Magnus is the method for estimating the marginal effect of a given player on shot rates; that is, that portion of what happens when they are on the ice that can be attributed to their individual play and not the context in which they are placed. We know that players are affected by their teammates, by their opponents, by the zones their coaches deploy them in, and by the prevailing score while they play. Thus, I try to isolate the impact of a given player on the shots which are taken and where they are taken from.

Although regression is mathematically more sophisticated than some other measures, it is in no way a "black box". As we shall see, every measurement can be broken down into its constituent pieces and scrutinized. If you are uneasy with the mathematical details but interested in the results, you should skip the "Method" section and just think of the method as like a souped-up "relative to team/teammate statistics", done properly.

Method

I use a simple linear model of the form \( Y = WX\beta \) where \(X\) is the design matrix, \(Y\) is a vector of observations, \(W\) is a weighting matrix, and \(\beta\) is a vector of marginals, that is, the impacts associated to each thing independent of each other thing.

The columns of \(X\) correspond to all of the different features that I include in the model. There are five different types of columns:

Using such a small list of covariates ensures that there is no formal linear dependence among the columns (as there would have been had I included an overall intercept, or a column for tied score, for instance).

The entries in \(Y\) are functions which encode the rate at which unblocked shots are generated from various parts of the ice. An unblocked shots with NHL-recorded location of \((x,y)\) is encoded as a two-dimensional gaussian centred at that point with width of ten feet; this arbitrary figure is chosen because it is large enough to dominate the measurement error typically observed by comparing video observations with NHL-recorded locations and also produces suitable smooth estimates.

One detailed example should make the structure of the model clear:

Suppose that the Senators are the home team and the Jets are the away team, and at a given moment of open play two Senators players (say, Karlsson and Phaneuf) jump on the ice, replacing two other Senators. The score is 2-1 for Ottawa, and the other players (say, Pageau, Hoffman, and Stone for Ottawa, against Laine, Ehlers, Wheeler, Byfuglien, and Enstrom) began their shift some time previously on a faceoff in Ottawa's zone. Play continues for 50 seconds, during which time Ottawa takes two unblocked shots from locations (0,80) and (-10,50) and Winnipeg takes no shots. This shift ends with the period and the players leave the ice.

These fifty seconds are turned into two rows of \(X\) and two entries in \(Y\). First, the Ottawa players are considered the attackers, and the attacking columns for Pageau, Hoffman, Stone, Karlsson, and Phaneuf are all marked as "1". The Jets are considered the defenders and the defending columns for Laine, Ehlers, Wheeler, Byfuglien, and Phaneuf are marked as "1". All of the other player columns, attacking or defending, are marked as "0". Because the Senators are winning 2-1, the score column for "leading by one" is marked with a "1" and the other score columns are marked as "0". Because three of the Senators players are playing on-the-fly shifts, and the Senators are the home team, the "home on-the-fly attacking" zone column is marked with "0.6", because sixty percent of the five players are playing such shifts. The other two Senators players are still playing the shift they began in their own zone, so the "home defensive-zone attacking" zone column is marked with "0.4". All of the jets skaters began their shift in the Ottawa zone, so the "away offensive-zone defending" zone column is marked with a "1.0", and all the other zone columns are marked as "0". The Senators are the home team, so the "home intercept" column is marked with a 1. Corresponding to this row of \(X\), an entry of \(Y\) is constructed as follows: Two gaussians of ten-foot width and unit volume are placed at (0,80) and (-10,50) and the two gaussians are added to one another. This function is divided by fifty; resulting in a continuous function that approximates the observed shot rates in the shift. Finally, I subtract the league average shot rate from this. This function, which associates to every point in the offensive half-rink a rate of shots produced in excess of league average from that location, is the "observation" I use in my model.

Second, the same fifty seconds are made into another observation where the Jets players are considered the attackers, and the Senators are the defenders. The attacking player columns for the Jets are set to 1, the defending columns for the Senators players are set to 1. Since the Jets are losing, the score column of "trailing by one" is set to 1, and the non-zero zone columns are:

The two rows have no non-zero columns in common. Since the Jets didn't generate any shots, the associated function is the zero function; I subtract league average shot rate from this and the result is placed in the observation vector \(Y\).

The weighting matrix \(W\), which is diagonal, is filled with the length of the shift in question. Thus, the above two rows will each have a weight of fifty.

By controlling for score, zone, teammates, and opponents in this way, I obtain estimates of each players individual isolated impact on shot generation and shot suppression.

To fit a simple model such as \(Y = WX\beta \) using ordinary least squares fitting is to find the \(\beta\) which minimizes the total error $$ (Y - X\beta)^TW(Y - X\beta) $$ When the entries of \(Y\) are numbers, this expression is a one-by-one matrix which I naturally identify with the single number it contains, and I can find the \(\beta\) which minimizes it by the usual methods of matrix differentiation. To extend this framework to our situation, where the elements of \(Y\) are functions from the half-rink to the reals, I must define an inner product on this function space, which I do by setting \( \left< f,g \right> = \int f(x,y) g(x,y) dA \) where the integral is taken over all coordinates \((x,y)\) in the half rink. Since I only use functions which are finite sums of gaussians, this integral always exists and is easily checked to define an inner product. Hence, I can use the well-known formula for the \(\beta\) which minimizes this error, namely $$ \beta = (X^TX)^{-1}X^TWY $$ which makes it clear that the units of \(\beta\) are the same as those of \(Y\); that is, if I put shot rate maps in, I will get shot rate maps out.

However, I choose not to fit this model with ordinary least-squares, preferring instead to use generalized ridge regression; that is, I choose to minimize $$ (Y - X\beta)^TW(Y - X\beta) + \beta^T \Lambda \beta $$ where \(\Lambda\) is a positive definite matrix which encodes our prior knowledge about the players. This is a kind of zero-biased regression---the first error term is as before, but the second term, weighted by \(\Lambda\), interprets deviation from zero (that is, league average) as intrinsically mistaken. The \(\beta\) which minimizes this combined expression is the one which simultaneously agrees with the observations while doing as little violence as possible to the idea that the players in question are all, broadly, of NHL quality.

The usual methods (that is, differentiating with respect to \(\beta\) to find the unique minimum of the error expression) gives a closed form for \(\beta\) as: $$ \beta = (X^TWX + \Lambda)^{-1}X^TWY $$ Any positive definite \(\Lambda\) will suffice but I choose to use a simple diagonal one, with entries \(\Lambda_{ii} = \lambda_i\) depending on the column \(i\) as follows:

One tantalising possibility for the future is that expected chemistry effects might be encoded in more subtle choices of \(\Lambda\) - for instance, if one expected ahead of time that player i and player j were going to produce similar results, one could set \(\Lambda_{ij}\) to be negative (instead of zero as above), effectively penalizing any differences between them. This is only the germ of a tiny idea but the Sedins come immediately to mind.

Results

For the purposes of examining the distribution of estimates, I use a simple "expected goals" style weighting which I call threat. That is, for a given location, I can calculate the historical chance of a shot (known to be unblocked, as assumed throughout this article) of becoming a goal, without paying attention to any other features of the shot.

To compute the associated threat of a shot map, multiply each point in a shot map with the corresponding probability of the shot becoming a goal as displayed above, and then sum. This gives a generic shooter-and-goalie-independent measure of how dangerous a given shot map is, in units of goals per hour. For instance, in 2017-2018, league average threat was around 2.3 goals per hour.

The model can be fit to any length of time; in this article I'll be describing the results from fitting the 2016-2017 and 2017-2018 regular seasons together, since these results are the ones which I used for my 2018-2019 season preview. Eventually, single-season results will be posted in relevant spots throughout the site.

Raw on-ice results

Before I describe the model outputs, I turn first to the raw observed results from the combined 2016-2018 regular seasons.

This graph is constructed as follows: for every skater who played any 5v5 minutes in these two seasons, compute the offensive and defensive threat observed while they were on the ice; this produces a point \(x,y\). Then, form the density map of all such points, where each point is weighted by the corresponding number of minutes played by the player in question. For ease of interpretation, I've scaled the threat values by league average threat, so that a value of \(5,5\) on the graph means "threating to score 5% more than league average, and also threatening to be scored on 5% more than league average". As is my entrenched habit, the defensive axis is inverted, so results that are favourable to the skater in question appear in the top right. The contours are chosen so that ten percent of the point mass is in the darkest region, another ten percent in the next region, and so on. The final twenty percent of the point mass is in the white region surrounding the shaded part of the graph.

Player Marginals

Repeating the above process with the individual player marginal estimates gives the following graph in green. Unsurprisingly, since I used a model with an explicit zero-bias, the overall distribution is roughly normal around zero. Intuitively, we expect this means that weak players will be slightly over-estimated and that strong players will be under-estimated; I hope in this way to obtain the clearest possible view of players whose true ability is close to league average---these players interest me a great deal more than the other players. The choice of zero-point for the regression is mathematically arbitrary, so its choice is determined by ease of interpretability and by which sorts of players are of interest---akin to moving a magnifying glass over a large map, knowing that the fine details in the centre will be easier to see but the areas under the edges of the glass distorted.

Notice also that there is no correlation between offensive ability and defensive ability, which confirms previous work by me. There are certainly many players who are offensively strong and defensively weak, but it does not appear to govern play like the tired cliche suggests.

Teammate Impact

Once I have individual estimates for players in hand, I can make many interesting secondary computations to show the distribution of various effects. Most obviously, for a given player, I can form the sum of the player estimates of all the given player's teammates, weighted by their shared icetime, and then multiply it by four (since every player has four teammates at 5v5). These estimates of teammate quality can then be graphed as above:

Here we see a definite skew towards good players; that is, most players are playing with better than average players. This fits our intuition, since better players play mostly with one another and also play considerably more minutes than weaker players when coaches are making rational decisions, which is most of the time.

Opponent Impact

The same computation can be done for any given players 5v5 opponents: form the sum of all of their isolates, weighted by head-to-head icetime, and then multiply by five, since every player has five opponents at 5v5. This is graphed below in red:

First, notice that the scale on the axes is smaller than for teammates; more discussion of that will follow later. More interestingly, this distribution is both sharply skewed and non-symmetrical. Opponent distributions are skewed broadly towards good players (as for teammates, for similar reasons) but the skew is much more pronounced in the offensive direction; that is, matchups against good offensive players are more common. Furthermore, the variation in competition quality is much more pronounced offensively, with some players playing against skaters of near-average offensive ability but others playing against very strong offensive players. By contrast, the range of defensive ability faced by players is much smaller.

Score and Zone Impact

There is one minor but slightly unusual bit of technical sleight-of-hand which I must explain here. One of the model columns is a home-ice advantage term, which does several things simultaneously---it plays a role analogous to the "tied" score state, which does not have a column to itself, it obviously measures in some very imprecise way some of the effect of home-ice advantage (I hope presumptively that modelling teammates and competition explicitly also captures some), and it serves as a reservoir into which any unmodelled effects can partially fall. Since it must do so many things, I have decided to include it partially in the score impacts and partially in the zone impacts, in a slightly ad-hoc way, as we shall see. For reference, the home-ice term is the following:

Score

The score terms for 2016-2018 are as follows:

As is by now familiar, teams which are losing dominate games in shots and goals, although they still usually lose. Curiously for 2016-2018, all of the "losing" states are broadly similar, with very little difference in the offence generated by teams that are losing slightly versus ones that are trailing by a large margin. On the other hand, leading teams show a more modest (negative) effect on their shot rates, with a slight bump in shots from the middle of the ice at the expense of shots from the points and perimeter.

To examine how the score affects player results, I form weighted sums of each of these maps, weighted by the fraction of their total icetime (including tied scores, for which there is no covariate). Furthermore, I compute each player's individual home-ice advantage, that is, I multiply the above home-ice covariate by the fraction of their individual ice-time which was spent on home ice. I then add -0.25(this will be explained below) times this amount to the score computation, to obtain their net score impact, which is shown below in orange:

Note that the distribution is quite linear---coaches have very little leeway in score deployment, since given score states tend to prevail for a long time compared to how long it takes players to rest.

Zone

Each player has their own zone start, so the net effect of zone starts for a given shift is computed by adding all of the various individual impacts together. The zone columns themselves are:

These are scaled for display so that, for instance, if all five skaters on one team were starting their shifts in the defensive zone, the impact on shot rates for that team would be as shown in the "defensive zone" column. The full impact of zone starts on the play can be found by adding the effect for one team to the effect for the other team; this requires some care and attention since defensive zone starts for one team need not correspond with offensive zone starts for the other team. In general, the zone starts for one team are not theoretically required to have anything to do with those of the other team, since each team's players change entirely when their coach desires them to change; however, certain patterns (offensive vs defensive, neutral vs neutral, on-the-fly vs on-the-fly) are more common.

To obtain the distribution of zone impacts on skaters in 2016-2018, I compute the sum of the associated zone impacts of the ten zone starts for the skaters on the ice using the above set of eight estimates, and then add 1.25 times the home-ice advantage of each player to obtain the net zone impact, below in purple:

The most obvious thing here is not the clear gaussian shape but the heavy skew towards fewer shots---fewer for both teams. First of all, I must explain the mysterious magic numbers -0.25 and 1.25 which appear in the two preceding paragraphs, which requires a small technical digression. Had I included an overall intercept column alongside the ones I did use, then the columns of the design matrix would have been linearly dependent; specifically, the "down 1", "up 1", "home ice", and "overall intercept" columns would have been linearly dependent, which means that they would have been redundant. Had we been doing ordinary least squares regression, this would have caused our problem to be ill-conditioned and therefore unsolvable. Ridge regression provides a way out of this technically, but it offends my sense of judgment to use non-zero ridge parameters for these columns since there is no theoretical (that is, hockey) reason for these terms to be close to zero. The effect of having a subset of linearly dependent columns in a regression is analogous to trusting a drop of water on a wide, hot flat saucepan to find the lowest point simply by gravity---since the pan is almost totally flat, the water will skitter about and never settle down. (In fact, the origin of the word "ridge" in the sense of ridge regression comes from a similar metaphor, where a "mountain ridge" is added to a "flat" section of configuration space).

Thus, with no clear indication of how to divide the observed home-ice advantage into its manifesting parts, I choose to artificially divide it as I have done (-0.25 to score, 1.25 to zone) purely to (roughly) align the score distribution with the origin, which seems plausible to me. Of course the two coefficients must add to one, the net effect of which moves the zone impact distribution closer to the origin than it was before any home-ice impact is considered.

The most obvious interpretation of this skew towards "DULL" is that changes of player personnel are, in general, associated with decreases of shot rates, which is broadly what we know intuitively happens, especially for faceoff starts, where all ten skaters are usually stationary. Moreover, we know quantitatively that shift starts, including on-the-fly shifts but especially faceoff shifts, begin with depressed shot rates.

"Residuals"

Strictly speaking, residuals for a regression refer to the differences between the individual observations (that is, each microshift, for 2016-2018 about a million or so) and the predicted shot rates for each microshift. I find it prohibitive to compute such residuals, so instead I settle for a suitable aggregate: for each player, I can compute the difference between their raw observed 5v5 on-ice results and what I would expect from computing the model outputs associated with the observed players and the zones and scores and home-ice advantage they player under. This is shown below:

Here again the skew towards "DULL" is the obvious, salient feature---that is, the model predicts more shots than in fact occur, for both teams. I think, but am not certain, that this is because the offensively strong players are so offensively strong that when they play both with and against one another, as the above exposition shows that they disproportionately do, there simply isn't enough time in single shifts for both teams to put up the shot totals that their results against weaker opponents suggest they are capable of doing. If this is true, and again I add that I am not certain, it would be the first suggestion that I have seen that offence and defence might not be independent at the team level, at least in certain shifts.

Relative Scale

In the above, I have shown each distribution on its own scale, so that I could discuss the shape of each one in turn. However, to understand their relative importance, they should all be placed on the same scale, which I have done below (except the residuals):

Here there are a great many interesting insights to be had: