Fenwick, the hard way

    The following article presents what is likely the least efficient method available for calculating Fenwick for NHL teams and players using information found online. Most  advanced stats sites, such as Corsica and Natural Stat Trick, scrape NHL play-by-play files, tallying shooting events to provide a summary of possession stats at the end of the game. Some sites, namely NHL.com, calculate possession by recording random d20 rolls during the games and tallying the results on an abacus.

    HockeyViz is one of the most fantastic NHL websites available, and perhaps the most unique in that the majority of the data on the site is presented via figures rather than numbers. That leads to fun graphics like these one, which show the relative shot rates for NHL teams in their offensive and defensive zone.

    Heat map of the Carolina Hurricanes’ offence so far in the 2017-18 season.

    The data is presented creatively, using varying shades of red and blue to show the relative intensity of shots that the team is generating in different areas of the zone in question. Because Micah is a consummate academic and provides a descriptive legend with the figures, we actually know roughly how many more shots per hour teams are taking in each area.

    We can use all this to our advantage to reverse engineer Fenwick using these images. That’s right – we’re going to calculate some advanced stats in the most heavy-handed, roundabout way I can possibly imagine.

    All images on HockeyViz are in the PNG format – short for ‘Portable Network Graphics”. PNG is obviously a well-known, widely used image format on websites, and has been around since 1996. There are a few different varieties of PNG files with varying  pixel formats, but we’re not going to get into that too much here. What we need to know is that the PNG files used here use the truecolour format; each pixel contains three pieces of information: the intensity of red, blue and green (RGB) colours. The intensity of each colour ranges from 0 (black) to 255 (full colour). We can think of each pixel as a one-dimensional array containing these values as [R G B].

    • A black pixel would be [0 0 0]
    • A white pixel would be [255 255 255]
    • A pure red pixel would be [255 0 0]
    • A pure blue pixel would be [0 255 0]
    • A green pixel would be [0 0 255]

    Basically, the colour of any pixel in the image is defined by some ratio of red, blue and green. Luckily, there are some excellent tools out there for quantifying these things. imageJ is an excellent tool created by the National Institute of Health which can be used in countless image processing applications. It’s a freely available and simple tool, but a very powerful one. You can download it here.

    Using imageJ, we can open up an image from HockeyViz and measure the colour values in selected pixels. I am using an additional plugin called ‘Colour Histogram’, which measures the colour in selected pixels. As shown below, we can select a few pixels in one area and find out what the intensity of each colour channel (RGB) is.

    As we can see here, I’ve measured the value of a light blue section, specifically the one corresponding to between 0.1 to 0.2 shots per hour below league average. We can see in the bottom right corner the red, green and blue channel values, which in this case gives us an RGB array of [198 198 255]. Using this, we can find a value for each colour in the scale, and build our own reference table:

    Value (excess shots per hour per 100 ft2) R G B
    + 0.5 255 0 0
    + 0.4 255 28 28
    + 0.3 255 84 84
    + 0.2 255 142 142
    + 0.1 255 198 198
    0.1 > x > – 0.1 255 254 254
    – 0.1 198 198 255
    – 0.2 142 142 255
    – 0.3 84 84 255
    – 0.4 28 28 255
    – 0.5 0 0 255

    A few interesting things here:

    • Every red zone has a value of 255 for red, with varying green and blue values.
    • Every blue zone has a value of 255 for blue, with varying green and red values.
    • White rink space is cleverly defined as [255 254 254], so we can distinguish it from background white space, which is ever-so-differently defined as [255 255 255].
    • The red and blue scales are defined in a mirrored way.

    Unfortunately, because the red and blue scales are mirrored, the green values are not unique and thus not useful to us for identifying pixels. However, we can see that each red shade can be identified by its unique blue value (0, 28, 84, 142, or 198), and likewise we can identify each blue shade by its unique red value.

    Using imageJ’s ‘Measure’ function and knowing the scale of our NHL rink (which is conveniently also labeled on the image), we can calculate that every pixel in this image represents a 0.2 ft by 0.2 ft area, or 0.04 ft2. We also know that the values in the scale are expressed as “shots per 100 square feet per hour”. Thus, we can go from pixels to shots per hour like so:

    Where x is the value of excess shots per hour for each particular shade, which I will call the rate coefficient.

    Next, I can take a colour histogram of the entire image (in this case, we’re looking at the Carolina Hurricanes’ offensive zone results).

    We can export this histogram to a text file containing four columns:

    Intensity   |   Red   |   Blue   |   Green

    In which the intensity values range from 0 to 255, and the colour columns represent the counts of each colour at the specified intensity level.

    Knowing which colour corresponds to which shot rate, I wrote a short code that grabs the counts of red and blue at the intensity values that we’ve identified. I’m using Octave to process my data, which is a free open-source Matlab clone – this is because I am both too cheap to spring for real Matlab, and too unskilled to write this in R or Python. Please note that I am a terrible code writer and this will probably offend real programmers to their very core:

    %Read text files
    
    A=dlmread('carO.txt');
    B=dlmread('carD.txt');
    
    %Grab offensive shots pixel counts
    
    for5 = A(2,4);
    agn5 = A(2,2);
    
    for4 = A(30,4);
    agn4 = A(30,2);
    
    for3 = A(86,4);
    agn3 = A(86,2);
    
    for2 = A(144,4);
    agn2 = A(144,2);
    
    for1 = A(200,4);
    agn1 = A(200,2);
    
    %Convert to shots using multiplier and conversion
    
    for5 = 0.5*for5/2500;
    agn5 = 0.5*agn5/2500;
    
    for4 = 0.4*for4/2500;
    agn4 = 0.4*agn4/2500;
    
    for3 = 0.3*for3/2500;
    agn3 = 0.3*agn3/2500;
    
    for2 = 0.2*for2/2500;
    agn2 = 0.2*agn2/2500;
    
    for1 = 0.1*for1/2500;
    agn1 = 0.1*agn1/2500;
    
    %Sum shot values
    
    OshotsFOR = for1+for2+for3+for4+for5;
    OshotsAGN = agn1+agn2+agn3+agn4+agn5;
    
    Ototal = OshotsFOR - OshotsAGN;
    
    % League average Fenwick For/60 = 40.74
    
    FenwickFor60 = 40.74 + Ototal;

    The code above counts all of our pixels, multiplies them by the rate coefficients (0.5, 0.4, 0.3 etc.) divides them by our conversion factor of 2500 pixels per x shots, and sums them into the value ‘Ototal’, which is the net amount of shots the team takes above or below league average. Because we know the league average Fenwick For per 60 minutes is 40.74 (from Corsica), I added this to the ‘Ototal’ to get the team’s Fenwick For/60. If we do the same thing using a defensive plot, we can also generate the Fenwick Against/60, and using these two values we can find the team Fenwick%.

    Note that I’m operating under the assumption that HockeyViz only uses Fenwick ie; excludes blocked shots here, because blocked shots would skew the shot locations (the location logged for blocked shots is the location of the block, not the location of the shot).

    Using Carolina as an example, we can see below how this method compared to the 5v5 values available on Corsica:

     

    Method Fenwick For per 60 Fenwick Against per 60 Fenwick %
    Corsica 47.88 40.56 54.14
    Pixel count 47.01 40.55 53.69


    Not bad! We’re actually really close here, far closer than I expected. There are a few things to consider that can introduce error/discrepancies:

    • This method doesn’t count pixels that overlap coloured parts of the ice (circles, blue line, red line, crease) because these have slightly different colour values than the ones we defined.
    • This method does not count shots taken from beyond centre ice.
    • Some pixels are blocked by the black text indicating the time on ice.
    • We defined the rates as 0.5, 0.4, and so on, however the reality is that the areas assigned to 0.5 could actually be anywhere from 0.5 to infinity. Likewise, the areas assigned to 0.4 can actually be any value between 0.4 and 0.5. At best, we will always be limited by the resolution of the scale. There is no doubt a way to determine an upper bound and lower bound for the Fenwick values we are calculating, but alas I have neither the knowledge nor the patience to work it out.

    Game-by-game history

    Let’s see if we can extend this to another image that HockeyViz generates. I’ve always really loved the game-by-game history charts that HockeyViz generates for both teams and players. For a long time, I’ve wanted to try calculating Corsi by subtracting the area under the red line from the area under black line (integrals!), however I do not have the data for the smoothed curves that are plotted here.

    However, we can use the same principle to count the pixels in the grey areas (games where the team outshot the opponent) and subtract the pixels in the red areas (games where the team was outshot by the opponent). Using imageJ again we can find that the grey pixels have the value [204 204 204] while the red pixels have a value of [255 204 204].

    We run into a little problem here: we can count the grey pixels because they are the only pixels with a red value of 204, however we cannot count the red pixels because their red, green and blue values are all non-unique. However, there is a workaround for this – we can determine the average grey value of each pixel using imageJ (equal to the sum of (R + G + B) / 3), and we find that each colour has a unique value. The grey areas have a grey value of 204, while the red areas have a grey value of 221.

    On closer inspection of the image, we can also see some slightly lighter shaded lines through the grey and red regions – these have values of 206 and 222 respectively. We can add these to our calculations.

    If we run a histogram of the grey values across this entire region, we can count the pixels of each type:

    Grey (204 + 206) = 7074 pixels

    Red (221 + 222) = 202 pixels

    We can again find the scale, knowing that the x-axis represents games, and the y-axis represents 5v5 shots/60 minutes. From the x-axis, we know 1 game is approximately 7.6 pixels wide, and 1 shot/60 is 6.2 pixels wide. Thus we can approximate:

    (7.6 pixels / game) * (6.2 pixels / shot game-1) = 47.12 pixels2 per shot

    Note: here we are going to fudge this a little and assume 5v5 shots per 60 is equal to shots per game, which it is obviously not unless a team plays the whole season at evens. We’re also assuming that HockeyViz is counting goals, missed shots, and blocked shots as shots as well.

    Anyways, what we now know is that for every 47.12 square pixels, the team took or gave up one shot in excess. We also know that every pixel is a square region that is precisely 1 pixel in width by 1 pixel in length, or 1 pixel2. Based on this, we can convert the values we found earlier:

    Grey (204) = 7074 pixels = 150 excess shots for

    Red (221) = 202 pixels = 4 excess shots against

    And thus we can see that from this image, Carolina should have roughly 146 excess shots for on the season. From Corsica, we can see that Carolina has a real Corsi differential of 201 on the season. Again, this is not exact but it is in the same ballpark, which is far better than I could have imagined. There are a few factors that can contribute error here:

    • Discrepancy in area due to smoothing function used to create continuous curves of shots for and against. Smoothing discrete data into a nice looking curve can both add and take away area from the original values – it looks nicer, but we also get further away from the true value in terms of shot counts.
    • We equated ‘shots per 60’ to ‘shots per game’, which is not really true and will introduce error.

    In conclusion, we can see that with a little thinking and some rudimentary computer skills, we can take graphics from HockeyViz and process them to generate some cold, hard numbers. With a bit more refinement, this could revolutionize modern advanced stats in hockey. Just kidding.

    Please consider contributing to the Patreon crowdfunding efforts for the sites I mentioned in this article (HockeyViz, Corsica, and Natural Stat Trick). They all contribute incredibly valuable resources for the hockey community that are accessible to everyone.