This work was presented first at the 2017 Ottawa Hockey Analytics Conference, for which
you may view slides
and video.
I propose a new framework for evaluating goaltending performance, taking into account the
difficulty of shots faced as well as the quality of skaters playing for both teams. This is
the first non-trivial fragment of what I imagine will be a long sequence of articles.
Aim
Any method for evaluating goalies should be:
- Fair: it should ascribe as much responsibility as possible to the goaltender
in question for what they do and as little as possible for what they do not do.
- Simple: no matter how complicated it is in inputs or adjustments or
subtleties of design, it should produce single numbers for goaltenders which can be compared.
- Extensible: as new data (such as puck and player tracking data, skater and
goaltender posture measurements, fatigue information, and so on) becomes available, we should
be able to include it in the existing framework without having to go back to the drawing
board.
- Applicable: the many things being measured should give some insight to
those for whom goaltending is of daily concern: for goalie coaches to know what things might
merit extra attention, in addition to the more obvious value to managers and fans.
- Repeatable: the measures extracted from past performance should correlate
as well as possible with future values.
The gentle reader will decide for themself how successful I have been so far in the pursuit of
these goals but I feel I have grasped a foothold. In this first article I mainly describe the broad
framework but I mention two related applications: adjusting for quality of skaters, and adjusting
for quality of shots faced.
Model
The key technical tool I use in our framework is a simple model of play in the defensive
zone, from the goalie's perspective. I shoehorn all play into seven states:
- Shot: a shot that the goaltender will have to deal with somehow. For
this purpose blocked shots are not considered shots, but missed shots, saved shots,
and goals are.
- Freeze: when the action of the goaltender causes play to stop. This
includes goalies catching or otherwise smothering the puck with their equipment, as well as
shots deflected out of play by the goaltender or the goalframe. It does not include
faceoffs caused by the skaters (such as when they take penalties or otherwise put the puck
out of play in a way that does not involve the goaltender).
- Contested: when the puck is not known to be in the clear possession of
either team.
- Goal: when the puck is in the net.
- Attackers: when the attacking team has clear possession of the puck.
- Defenders: when the defending team has clear possession of the puck.
- Safe: when the puck is no longer in the defensive zone at all.
The gory details and most of the modelling judgment come in with how I tabulate transitions between
these states given the play-by-play information to which we have access. These details are
sufficiently gruesome and lengthy that I've included them in an
appendix at
the end of this article.
The results for the league in 2016-2017 as a whole are:
| Shot | Freeze | Contested | Goal | Attackers | Defenders | Safe |
Shot | 21.1% | 25.7% | 47.7% | 5.5% | | | |
Freeze | | | | | 49% | 49% | 2% |
Contested | | | | | 41% | 59% | |
Goal | | | | 100% | | | |
Attackers | 46% | | 54% | | | | |
Defenders | | | 35% | | | | 65% |
Safe | | | | | | | 100% |
The blank entries indicate zero transitions of that type were recorded, mostly because I imputed other
transitions around them. The aggregate effect is to only permit certain transitions: for instance, all
transitions from "Contested" are either to "Attackers" or "Defenders", notice how defenders consistently
win the bulk of such non-static puck battles. On the other hand, when goalies freeze pucks after shots,
the result is
usually a faceoff, for which the attacking team has a slight edge, but in a small
fraction (3%) of cases the puck is immediately taken out of the zone to "Safety" by the linesfolk,
because of scrumming attacking skaters or penalties.
Adjusting for Skater Quality
The primary benefit of shaping information about defensive zone play into such a matrix as the above
is that we can gain insights from simple computations using the matrix. For instance, notice that
there are two "terminal" or "absorbing" states, that is, "Goal" and "Safe". These are "terminal" in the
sense that I consider any play after them as totally distinct from the previous play. (We know that
this is not quite true---after all, any goal scored changes the score, which we know strongly affects
some aspects of play, and we know that some clearances of the puck to "safety" are actually very bad
plays which give up control of the puck with minimal benefit. These concerns will have to wait for
another day.) By taking a very high power of our transition matrix, we can compute the "eventual
goal probability" starting from any state, that is, the chance that, starting from a given state, the
puck will wind up in the net before the defenders manage to clear it. For instance, the eventual
goal probability for "Shot" is 10.2%, around double the immediate probability of scoring on any given
shot.
My broad opinion is that the goalie ought to be considered mostly responsible for all of the entries
in the "Shot" row and not in any way responsible for the entries in any of the other rows. However,
looking at transition matrixes for individual goaltenders, even for full seasons, shows significant
differences in the "skater" rows. For instance, Phillipp Grubauer played twenty-four games for
Washington in 2016-2017, his matrix is below:
| Shot | Freeze | Contested | Goal | Attackers | Defenders | Safe |
Shot | 19.2% | 25.4% | 51.1% | 4.3% | | | |
Freeze | | | | | 51% | 46% | 3% |
Contested | | | | | 38% | 62% | |
Goal | | | | 100% | | | |
Attackers | 43% | | 57% | | | | |
Defenders | | | 32% | | | | 68% |
Safe | | | | | | | 100% |
In virtually every skater entry the results in front of him are more favourable than league average.
Since the Capitals won the Presidents' Trophy in 2016-2017, it should hardly be surprising that his
team won more puck battles, cleared more rebounds, and broke out of the zone better than the league
average.
By comparision, consider Mike Smith, who played fifty-five games for Arizona in 2016-2017. His
matrix is below:
| Shot | Freeze | Contested | Goal | Attackers | Defenders | Safe |
Shot | 22.1% | 28.8% | 44.1% | 5.0% | | | |
Freeze | | | | | 49% | 48% | 3% |
Contested | | | | | 43% | 57% | |
Goal | | | | 100% | | | |
Attackers | 52% | | 48% | | | | |
Defenders | | | 36% | | | | 64% |
Safe | | | | | | | 100% |
In almost every skater entry we see results that are below league average. This again is not surprsising,
as the Coyotes finished third-last in 2016-2017. They are consistently more susceptible to opponent's
forechecking, win fewer puck battles, and break out in transition less.
The key idea for equalizing results across different skater contexts is to form the transition
matrixes for the goalies we want to compare, and then replacing all of entries in the "skater" rows
(that is, every row except the "Shot" row) with league-average values. This produces a transition
matrix which I imagine as representing what would transpire if the goalie in question were
provided with league-average skater context instead of the teammates and opponents they actually
faced. Then, by computing the long-run probability of a shot being converted into a goal, we can
compare two goaltenders more fairly. For the given pair of goaltenders, the immediate goal-per-shot
figures favour Grubauer---4.3% to Smith's 5.0%. Moving to eventual goal probabilities, Grubauer's
figure is 7.4% and Smith's 10.2%, where Smith's weak teammate support becomes very clear. After
replacing their skater contexts with league average ones, Grubauer's "skater-independent eventual
goal probability" is 7.9%, whereas Smith's is 9.5%. By this measure, Grubauer's performance was
actually stronger than Smith's, even after accounting for the differences in skater quality.
Adjusting for Shot Quality
So far I've treated all of the transitions from the "Shot" state to be the responsibility of
the goaltender. However, not all shots are equally easy to handle. This difficulty might be smoothed
over if every goaltender faced a similar distribution of shots in each year but this is not what we
observe in the NHL. For instance, Devan Dubnyk played sixty-five regular-season games for Minnesota in 2016-2017,
facing the following pattern of shots:

Blue regions indicate fewer
shots (than league average) per hour of 5v5 play and red regions show areas from which he saw more shots per hour. On the other
hand, Mike Smith in Arizona (55 games played) saw the following distribution of shots:

Which strongly suggests that some
accounting should be made to handle difficulty of shots faced.
To accomplish this, I replace the
single "Shot" state with a family of states, one for every recorded shot location. I divide the
defensive zone into a 100 by 100 grid, roughly corresponding to the recorded precision of the
NHL's real-time stats. This changes the "Shot" state into ten thousand shot states, and our
seven-by-seven transition matrixes become 10,006-by-10,006 matrixes, which makes them harder to look
at but not appreciably harder to compute with. Then, we can compute a Standardized Shot Profile,
that is, the relative likelihood of facing shots from given locations for a league-average goalie.
Graphically, it looks like this:

Where the colour units indicate relative frequency. By encoding this standard shot profile as a
matrix we can pre-multiply our observed transition matrixes by it and obtain what I call
Standardized Goals Against or sGA, that is, the number of goals that a given
goalie would
allow per hundred shots if they faced a typical distribution of shots, calculated from how they
performed on the
shot distribution they did face. Similarly, we can derive "Standardized Freeze Rates",
"Standardized Shot-to-contested Rates", and so on, though these quantities seem less interesting to me.
In our example above, we were comparing Dubnyk and Smith; their immediate goal probabilities
(adjusting nothing at all) were 4.8% and 4.2%, respectively. However, Dubnyk's sGA for this season
is 6.7, and Smith's is 4.5---unsurprisingly, Dubnyk's expected performance versus league-average
shot quality is worse than observed. Somewhat less expectedly, Smith's expected performance in
percentage terms also drops slightly, but the relative distribution of the shots he faces is not
so different from league average, he simply sees lots more from every location.
The two adjustments described here (replacing skater terms with league averages and shot
standardization) can be combined to obtain a stat that I call "sGA*".
Repeatability
If we want to imagine that our statistics are useful measures of skill than we hope that they
will be repeatable, that is, future values should be related to past values. I computed correlations
for several stats, including the two (sGA and sGA*) introduced here, using "career to date" as the
past value and "following twenty-five games" as the future value. I tried to imagine
a plausible scenario as it might appear to a decision-maker at a hockey team: which goaltender
shall I (primarily) play for the next twenty-five games? Many fewer than twenty-five games risks
being lost in noise completely, many more games risks disconnecting from practice. Computing
Pearson correlations in this way for all goaltenders over the past decade gives:
Stat | Correlation |
sGA | 0.185 |
sGA* | 0.135 |
xGA-GA per shot | 0.211 |
All-situations save % | 0.211 |
5v5 save % | 0.115 |
There are a number of surprises. Most surprisingly, the least repeatable measure of goaltending
talent is 5v5 save percentage, which is one of the most popular measures in the analyticky circles
in which I travel. The third entry is based on Emmanuel Perry's expected goals model, which assigns
to every shot a goal probability based on its type and location. Forming the difference between
expected goals allowed and actual goals allowed and then dividing by the number of shots puts this
notion on the same arithmetic footing as the other ones, allowing for comparisons. It is the most
sophisticated existing model for goaltending evaluation to date so it's not surprising to see it
perform well here; what is much more surprising is the equally strong repeatability from all-situation
save percentage, which indiscriminately buckets together shots from all different contexts.
I suspect that there may be a sort of survivor bias influencing results here; since all-situations
save percentage appears to be the most common evaluating tool among NHL decision-makers over the
past decade (with "consistent" goaltenders especially prized), perhaps there is artifically less
variance in this measure.
I am sufficiently heartened by the repeatability of sGA to publish this article and to push forward
with further work in this vein; however, the weakness of the repeatability of sGA* makes me think
this latter stat might not be worth applying immediately.
Future Work
A project this size will take, I expect, several years and there is much left to do. The most
obvious next steps to me are:
- Non-trivial Priors: Instead of assuming that goaltenders are a blank slate, we should
instead begin with a prior expectation of their ability that is sensible. League-average
results would be one plausible start, some suitably defined "replacement level" would
perhaps be better still. Very sophisticated implementations might use different priors
for different goalies using clever manipulations of data from other leagues.
- Analysis of shot types: While the difference between what NHL play-by-play calls
a "snap" shot and what it calls a "wrist" shot may be entirely negligible in practice,
the same is presumably not true for slap shots or for deflections, and so on. I do not see
how to accommodate this information yet; replacing 10,000 shot states by six or seven times
as many (which I have tried) leads to unpleasant renormilization difficulties to account for
how each individual shot state is very poorly supported in data for single goaltenders even
with very large samples of games.
- Vision and pre-shot movement: At present we have only fragmentary data, manually
gathered at great expense, largely by volunteers, concerning how much the goaltender can
see and how much and in what way the puck moves in the few seconds immediately preceding shots.
What little data we do have about these factors suggest that they strongly affect goal
probabilities, however, so modifications will have to be made for such data when it becomes
available.
- Zone Entries: I've chosen to work primarily from a shot context (influenced by
xG and save percentage,
which do so also) but one could instead use the framework I've introduced here to work
from a zone-entry context, considering transitions corresponding to "carry ins", "dump and
change", "dump and forecheck", and so forth.
In any event, I am sufficiently happy with sGA that I will be computing it for past and future
goaltending performances and quoting it on the site. The future work I mention here will take a long
time and I welcome the assistance of those who are interested in accelerating that progress.
2016-2017 Results
The graph below shows the raw goals per hundred unblocked shots and the sGA for all goalies who
played in at least fifteen regular season games in 2016-2017.

Goalies who appear above the red line would have posted better results had they faced a league-average
shot profile instead of the profile they did face; and those below would have posted worse. The players are coloured by their teams; goalies
who played for multiple teams are shown in white.
Appendix: Transition Imputation
These are the details of how I took the NHL play-by-play and coerced what is written there into
transitions for my model. First of all: no transitions were considered when the goalie under
consideration was not in the net, no matter what the play-by-play events. That said, there are two
stages of model design, one is encoding of states:
- If a team took a shot on the goalie of interest that was scored, saved, or missed, that was
recorded as a "Shot" in my sense. A blocked shot was recorded as "Attackers have the puck"
but not as a shot.
- Events labelled "GOALIE STOPPED" in the play-by-play are recorded as "Freeze". (Note the
imputation about transitions to freezes below though)
- Faceoffs are coerced into either "Defenders have the puck" or "Attackers have the puck" depending
on who wins them.
- Hits are labelled as transitions from "Attackers" or "Defenders", depending on who was hit,
to "Contested".
- Giveaways and takeways are coded as transitions from "Attackers" to "Contested" to "Defenders",
or inversely depending on who began and ended the play with the puck.
- Icings were coded as "Defenders have the puck" but were not coded as "puck to safety"
since play resumes in the defensive zone in almost all such cases.
- Everything else was recorded in the obvious way.
Once all of the states are encoded in this way some additionals states are imputed, with accompanying
transitions:
- When "Shot" appears followed by "Attackers" or "Defenders", a transition through "Contested"
is imputed.
- When "Shot" or "Attackers" is followed by "Safe", transitions through "Contested" and then
"Defendes" are imputed.
- When "Attackers" or "Defenders" is followed immediately by "Freeze", the freeze is replaced
with a "Contested", since I only want to consider goalies freezing the puck from Shots
and not all faceoffs.
- When "Defenders" is followed by "Shot", I insert transitions through "Contested" and then
"Attackers".
- When "Defenders" is followed by "Attackers", or vice versa, I insert a transition through "Contested".