Creating Player Ratings Using Swarm Intelligence Algorithms

There is plenty of room for growth in the field of hockey analytics. In particular, machine learning algorithms and deep learning methods which have become popular in a large variety of fields are mysteriously sparse. Machine learning has been used to solve classification and prediction problems in science and finance to great effect. There is a great deal of potential for innovation by applying these methods to the data available to us in sports like hockey, where much has yet to be learned.  Continue reading “Creating Player Ratings Using Swarm Intelligence Algorithms”

Hockey and Euclid: Predicting AAV With K-Nearest Neighbours

EP: The contract data used in this analysis was graciously provided by Tom Poraszka, the creator of the now-defunct General Fanager. While the hockey community suffers the loss of yet another tremendous resource, I wish Tom the best of luck with his new venture!

Not a year goes by without at least one NHL contract signing bewildering the hockey world. With healthy scratches making $5MM or more per year, it may seem as though the signing process is just one big roulette spun by managers, players and agents. In reality, though, the NHL player market is remarkably consistent as a whole. We can prove and exploit this fact by leveraging available information to try to predict how much an impending Free Agent will be paid. Continue reading “Hockey and Euclid: Predicting AAV With K-Nearest Neighbours”

Bootstrapping QoT/QoC and the Sedin Paradox

EP: Throughout this post, I’ll use “qualcomp” to describe both QoC and QoT because “QoC/QoT” is tiresome. 

Though we may not like to admit it, the hockey analytics collective has yet to crack the qualcomp code. The public sphere has yet to produce an agreed-upon method of weighing the impacts of QoT and QoC and the latter is sometimes dismissed outright. Traditionally, TOI-weighted averages are employed to determine the mean talent of teammates and opponents. The talent component may differ – 5v5 TOI% and Corsi being among the most common. On Corsica, three brands of qualcomp are offered: TOI%, CF% and xGF%. A wrinkle is that each teammate’s CF% or xGF% is calculated from the time they spent playing without the player in question. This ensures that the measured quality of a teammate is independent of the impact a player has on them. Despite this advantage, the methodology is imperfect. Namely, it introduces what I’ve come to label the Sedin paradox. Continue reading “Bootstrapping QoT/QoC and the Sedin Paradox”

The CoRsica Package for Hockey Analysis in R (0.2: Fundamentals)

EP: This is the third part in what I hope will become a lengthy and informative tutorial series on a pseudo-package I am building for R called coRsica. In this instalment, I’ll discuss some fundamentals of the R language and apply them to our Hello World script.

Review and More
In section 0.1 you were introduced to object classes, syntax rules, functions and some basic mathematical operators. There is still much more ground to cover when it comes to these fundamental concepts, so let’s do it right this time. Continue reading “The CoRsica Package for Hockey Analysis in R (0.2: Fundamentals)”

The CoRsica Package for Hockey Analysis in R (0.1: Hello World)

EP: This is the second part in what I hope will become a lengthy and informative tutorial series on a pseudo-package I am building for R called coRsica. In this instalment, I’ll discuss the RStudio console and some R basics, and show you how to write your first script.

Inside RStudio
In section 0.0 you installed R and RStudio onto your computer. Now, I’ll quickly show you around the RStudio interface so you can make sense of it! Continue reading “The CoRsica Package for Hockey Analysis in R (0.1: Hello World)”

The CoRsica Package for Hockey Analysis in R (0.0: An Introduction)

EP: This is the first part in what I hope will become a lengthy and informative tutorial series on a pseudo-package I am building for R called coRsica. In this instalment, I’ll discuss my intentions and teach you how to install R and RStudio on your machine.

I think hockey analytics is an endlessly interesting field. It pleases me to see and hear from so many others who’ve discovered the same sense of enjoyment from crunching hockey data that I have. My purpose in sharing this R package and tutorial series is to enhance people’s ability to conduct the research and analysis they want to, while learning a little about R in the process. Continue reading “The CoRsica Package for Hockey Analysis in R (0.0: An Introduction)”

Shot Quality and Expected Goals: Part 1.5

EP: This is the 1.5th instalment of the Shot Quality and Expected Goals series. Read the first part here.

I finished the first part of this series with a promise of certain things to follow in the next. Those things were delayed and eventually superseded by a pressing request I’ve heard echoed since the launch of the site. When WAR On Ice closed its doors, implementing scoring chance data became a top priority. Continue reading “Shot Quality and Expected Goals: Part 1.5”

7 Features You Didn’t Know Existed

I get a lot of questions about the site from users looking for a specific function or feature. Realizing it may not always be evident if and where certain elements of the site exist, I thought I’d list some of the most commonly-missed or unintentionally hidden features.

1. Custom Query
The most common response from me to questions I receieve on Twitter is “Custom Query.” Each of the Team, Goalie and Skater sections contain a tab linking to this tool at the top of the page. The Custom Query provides more flexibility to users, and importantly, the ability to search stats within a given range of dates. Users may also aggregate games or keep them separate for a game-by-game view. If you need functionality absent from the standard Team, Goalie or Skater tables, this should be the first place you look. Continue reading “7 Features You Didn’t Know Existed”

2016-17 NHL Schedule

The NHL schedule for the 2016-17 season was announced today. To the best of my knowledge, there isn’t a complete consolidated schedule on the official NHL site. In any case, I gave up looking for one and compiled it myself from the JSON graciously provided to me by Greg (friend of the site!). I’m hoping to have a schedule feature/section on the site in time for next season, but in the meantime you can download the full regular season schedule here:

Dropbox (CSV)
Dropbox (RData)

As always, let me know if you spot any errors.


Adjustments Explained

Corsica offers two brands of adjusted stats. The first accounts for score state and home ice advantage, while the second additionally factors in zone starts. The method used for the former is Micah McCurdy‘s and the latter is my own adaptation thereof. In principle, McCurdy’s method looks to adjust the value of shots taken by either team involved in a game according to the score state and status of the shooting team. It’s been shown that his approach represents a significant improvement over its predecessors, namely Eric Tulsky’s Score-Adjusted Fenwick proposed in 2012. Fewer attempts have been made to develop zone start adjustment methods. David Johnson’s method of removing shots occurring within an arbitrary time span of face-offs is exceedingly crude and inefficient and hence is not discussed further.

In McCurdy’s method, the historical number of events for either team is counted for each possible score state from the perspective of the home team. For instance, when trailing by one goal, home teams have recorded 51,921 unblocked 5v5 shots while their opponents (the away team) have recorded 43,075 between 2007 and 2014. The adjustment coefficients, or weights, are selected in order to satisfy this ratio while producing a total quantity of weighted shots that is equal to the original unadjusted total. The coefficients are given by: coef(team) = [average # of events]/[# of Events for team]. I calculated these coefficients for each of shots, unblocked shots, shots on goal and goals using the complete (at the time) data set since 2007, staying true to Micah’s original formula. These weights are used for “Score and Venue Adjusted” stats.

Coefficients are similarly calculated for the “Score, Zone and Venue Adjusted” measures. Here, a more diverse array of situations are considered. A face-off can occur in any of three zones – offensive, defensive or neutral. For each event of interest, we consider the zone in which the last face-off occurred in addition to the score state. As a final property, the recency of the last face-off is taken into account. This serves to avoid generalizing entire shifts by where they began. This implicitly solves the issue of on-the-fly deployment. That is, players on the ice for an offensive zone draw receive a much different (greater) advantage than those coming on 45 seconds thereafter. The face-off start parameter is divided into two subcategories: the first 20 seconds and the remainder of the sequence. This cut-off is chosen to reflect the fact that almost all of the advantage related to a zone start is contained within the first 20 seconds of play. The seven score states are then multiplied by six possible face-off start subcategories, giving a total of 84 coefficients for both teams:

Screen Shot 2016-06-19 at 3.45.04 PM

The distribution of differences obtained with the Score, Zone and Venue Adjusted measures is much wider than simple score adjustment:


The largest single season increase in CF% by Score, Zone and Venue adjustment over the last three years is Paul Gastad’s 2015-16 campaign, worth 7.57 percentage points. This represents an increase from 38.38% to 45.95%. Gaustad was deployed in only 31 offensive zone face-offs to 418 in the defensive zone at 5v5.