# Gaussian Smoothing

Many of my charts use a smoothing method of which I am very fond. It is extremely similar to
kernel-density estimation but with some slight differences for extra control. First, for notation,
write b(x,s) for the bell curve of unit area centred at x with shape s. A shape of 0 is an infinitely
tall spike at x (a Dirac delta distribution) and a very large shape is a broad curve which stretches
considerably
to both sides of x.

Let (X,Y) be a pair of sets of points that I want to graph a smoothed version of. In most
examples, the x values will be some sort of time variable (seconds in a game, or game number in a
season) and the y values will be some sort of
hockey stat; minutes of ice time or goals or shot rates, say. First, form a function f(x) which is
the sum of y * b(x,s) for every pair (x,y) in (X,Y). I choose the scale parameter s differently
depending on the application. Large values of s effectively smear the point in question substantially
backwards and forwards in time and make a smoother function but are worse approximations to the
raw data.

## Boundaries

One of the reasons that I do not use moving averages is that they handle initial cases very
awkwardly. Strictly speaking there is no data for a five-game moving average until five games are
played, and plotting the raw data anyway wrongly suggests that the variation is high at first and
then settles down; this is very misleading. However, the smoothed function f(x) above has non-zero
values before the first value for x and after the last value for x, this is silly. Hence, if we
have fixed values for the first and last possible x values (the 0th second and 3600th second of
a regulation hockey game, for instance, or the first game and the 82nd game of a full season), then
we can use these boundaries to simply "fold back" the data in question: Namely, given a starting value
a and a final value b, we can define g(x) = f(x) + f(a-x) + f(2b-x). This is the smoothed function of
the data that I use. (Strictly speaking, this should be an infinite sum, but, unless a and b are
very close together, only these terms will be needed.)

Physically, you can think of this as making a box with very tall hard sides at a and b, and then
each point (x,y) corresponds to a clump of wet sand of weight y being dropped at point x. If the
sand is fine and flows easily, this corresponds to a very low s; if the sand is clumpy and flows
slowly then this corresponds to a very high s.