The NHL schedule for the 2016-17 season was announced today. To the best of my knowledge, there isn’t a complete consolidated schedule on the official NHL site. In any case, I gave up looking for one and compiled it myself from the JSON graciously provided to me by Greg (friend of the site!). I’m hoping to have a schedule feature/section on the site in time for next season, but in the meantime you can download the full regular season schedule here:
As always, let me know if you spot any errors.
Corsica offers two brands of adjusted stats. The first accounts for score state and home ice advantage, while the second additionally factors in zone starts. The method used for the former is Micah McCurdy‘s and the latter is my own adaptation thereof. In principle, McCurdy’s method looks to adjust the value of shots taken by either team involved in a game according to the score state and status of the shooting team. It’s been shown that his approach represents a significant improvement over its predecessors, namely Eric Tulsky’s Score-Adjusted Fenwick proposed in 2012. Fewer attempts have been made to develop zone start adjustment methods. David Johnson’s method of removing shots occurring within an arbitrary time span of face-offs is exceedingly crude and inefficient and hence is not discussed further.
In McCurdy’s method, the historical number of events for either team is counted for each possible score state from the perspective of the home team. For instance, when trailing by one goal, home teams have recorded 51,921 unblocked 5v5 shots while their opponents (the away team) have recorded 43,075 between 2007 and 2014. The adjustment coefficients, or weights, are selected in order to satisfy this ratio while producing a total quantity of weighted shots that is equal to the original unadjusted total. The coefficients are given by: coef(team) = [average # of events]/[# of Events for team]. I calculated these coefficients for each of shots, unblocked shots, shots on goal and goals using the complete (at the time) data set since 2007, staying true to Micah’s original formula. These weights are used for “Score and Venue Adjusted” stats.
Coefficients are similarly calculated for the “Score, Zone and Venue Adjusted” measures. Here, a more diverse array of situations are considered. A face-off can occur in any of three zones – offensive, defensive or neutral. For each event of interest, we consider the zone in which the last face-off occurred in addition to the score state. As a final property, the recency of the last face-off is taken into account. This serves to avoid generalizing entire shifts by where they began. This implicitly solves the issue of on-the-fly deployment. That is, players on the ice for an offensive zone draw receive a much different (greater) advantage than those coming on 45 seconds thereafter. The face-off start parameter is divided into two subcategories: the first 20 seconds and the remainder of the sequence. This cut-off is chosen to reflect the fact that almost all of the advantage related to a zone start is contained within the first 20 seconds of play. The seven score states are then multiplied by six possible face-off start subcategories, giving a total of 84 coefficients for both teams:
The distribution of differences obtained with the Score, Zone and Venue Adjusted measures is much wider than simple score adjustment:
The largest single season increase in CF% by Score, Zone and Venue adjustment over the last three years is Paul Gastad’s 2015-16 campaign, worth 7.57 percentage points. This represents an increase from 38.38% to 45.95%. Gaustad was deployed in only 31 offensive zone face-offs to 418 in the defensive zone at 5v5.
EP: This post will differ from my usual in terms of content and style. I typically jot down these sorts of thoughts on Twitter, but figured this might be a better venue.
Cooking is a passion of mine. For a long while, it was also an occupation. I started the way most do in that world: washing dishes for minimum wage at the ripe age of 15. My CV was bare for a complete lack of experience, save for vague promises of being a quick learner and such. As it happens, I would live up to that claim. I had climbed and, at times, stumbled through the ranks to earn a job at one of Canada’s foremost gastronomic destinations before I had finally had enough. Despite my divorce from cooking as a profession, it remains part of my life and story. It’s common for people to discern connections between various elements important to their lives, and I’m no different. Lately, I’ve become increasingly aware of such parallels between cooking and hockey. Continue reading “Corsi and Cooking”
Right around the time I began constructing my scraper, An idea struck me while watching a game on the league’s streaming service (formerly known as) GameCenter Live. The GameCenter video player includes markers indicating the time at which certain events of interest occurred and allows users to jump to these highlights. It had not occurred to me until then that this information must reside somewhere in the NHL.com site that can potentially be accessed. This data would eventually be collected by my scraper and stored in the Corsica database.
Highlights are comprised of goals and shots on goal. All goals are highlights, but only some shots are labelled as such. Unfortunately, I don’t know what criteria are considered in labelling these events. I imagine the process is very subjective and likely influenced in significant fashion by bias. The only thing we can ascertain is that somebody decided these events were worth categorizing as “highlights.” I haven’t yet delved into this information in great detail but I have pondered its potential uses. I’m particularly curious to know whether these data can be employed to introduce a human element to generalized scoring chance definitions. More specifically, can they indicate quality shots that are taken outside of the traditional high-danger areas? That’s a question for another day. In the meantime, we can answer a simpler and substantially more fun question. Who is the NHL’s Human Highlight Reel? Continue reading “Who Is the NHL’s Human Highlight Reel?”
Shot quality is a polarizing issue within the hockey stats community. Its relevance and value has been examined in various ways by many people and debated endlessly. To avoid a history lesson, I’ll keep the introduction to this topic brief, but I recommend conducting some additional research to anybody interested in learning more. It’s foremost important to understand nobody (worth listening to) has or will argue that shot quality does not exist. That some shots are better than others is a core tenet of hockey and indeed any such sport. Questions like “What makes a shot better?” or “Can players have a sustainable influence on shot quality?” are much more interesting questions. As is often the case with such things, answering these questions can prove tricky.
It is by virtue of work done by Eric Tulsky and others that we’ve come to question the importance of shot quality in our analysis, and it is by virtue of our intuition that we continue to pursue a better shot quality formula despite this. Continue reading “Shot Quality and Expected Goals: Part I”
EP: This is a less technical adaptation of the original Hockey and Euclid guest post on WAR On Ice. The underlying math will be discussed briefly in the footnotes.
Spatial reasoning is an innate characteristic in humans. It allows us to intuit conclusions from imagery, which may be exploited by graphical representation of data. You’ve no doubt inadvertently benefited from this fact if you’ve ever gleaned information from a chart or visualization. Elements like proximity between points are more easily interpreted in this fashion than they are through the examination of numeric data. In many ways, this idea was the first building block of what would become a generalized method of computing statistical similarity between hockey players. Continue reading “Hockey and Euclid: Calculating Statistical Similarity Between Players”
A couple months ago I resolved to build my own website.
I had built a few apps for WAR On Ice and authored some articles they were kind enough to host on their blog. When it was announced Andrew Thomas and Alexandra Mandrycky, the site’s remaining co-creators, had been hired by the Minnesota Wild and would renounce their engagement with WOI, I was among those who volunteered to help take on responsibility. I had ideas and was keen to learn what I could about what was under the proverbial hood from Andrew and Alexandra before their new duties put an end to their involvement. It was evident to me I simply didn’t have the chops for taking charge of maintaining the site and any role I assumed would be auxiliary in nature.
I had ideas. I’ve always had ideas. Some, I believed, didn’t belong on WAR On Ice. Some were experimental. Some were plain crazy. I set to work on building things before the idea struck me that they’d need a home. Having a site of my own I could use to consolidate these things I’d built and share them with anybody with an interest wasn’t a terrible concept. To this point, I had relied on data sourced from WOI – something I knew would have to change if I was to manufacture tools and content for public consumption. Ethics aside, I sought self-sufficiency with respect to raw data and building my first ever scraper would be a fun challenge. Continue reading “Hello World: A Mission Statement”
The following naming conventions apply across the site: All Off-ice stats are preceded by O, all individual stats are preceded by i, all expected stats are preceded by x and all Averages are denoted by the Avg. prefix. For instance, OCF% stands for off-ice CF%, iFF stands for individual FF, xFSh% stands for expected FSh% and Avg.DIST stands for average DIST. Events For or Against refer to events For and Against a team (team stats) or events For and Against a player’s team occurring while the player is on the ice (on-ice stats). Shots include those missing the net or blocked. Shots on goal are shots on goal.
Shot Quality and Expected Goals: Part I Continue reading “Glossary”