Hockey and Euclid: Calculating Statistical Similarity Between Players

EP: This is a less technical adaptation of the original Hockey and Euclid guest post on WAR On Ice. The underlying math will be discussed briefly in the footnotes.

Spatial reasoning is an innate characteristic in humans. It allows us to intuit conclusions from imagery, which may be exploited by graphical representation of data.1For my money, Micah McCurdy belongs in a class of his own when it comes to hockey-related data viz. You can donate to his cause on Patreon. You've no doubt inadvertently benefited from this fact if you've ever gleaned information from a chart or visualization. Elements like proximity between points are more easily interpreted in this fashion than they are through the examination of numeric data. In many ways, this idea was the first building block of what would become a generalized method of computing statistical similarity between hockey players.

Hello World: A Mission Statement

A couple months ago I resolved to build my own website.

I had built a few apps for WAR On Ice and authored some articles they were kind enough to host on their blog. When it was announced Andrew Thomas and Alexandra Mandrycky, the site’s remaining co-creators, had been hired by the Minnesota Wild and would renounce their engagement with WOI, I was among those who volunteered to help take on responsibility. I had ideas and was keen to learn what I could about what was under the proverbial hood from Andrew and Alexandra before their new duties put an end to their involvement. It was evident to me I simply didn’t have the chops for taking charge of maintaining the site and any role I assumed would be auxiliary in nature.

I had ideas. I've always had ideas. Some, I believed, didn't belong on WAR On Ice. Some were experimental. Some were plain crazy.  I set to work on building things before the idea struck me that they'd need a home. Having a site of my own I could use to consolidate these things I'd built and share them with anybody with an interest wasn't a terrible concept. To this point, I had relied on data sourced from WOI – something I knew would have to change if I was to manufacture tools and content for public consumption. Ethics aside, I sought self-sufficiency with respect to raw data and building my first ever scraper would be a fun challenge.


General Nomenclature

The following naming conventions apply across the site: All Off-ice stats are preceded by O, all individual stats are preceded by i, all expected stats are preceded by x and all Averages are denoted by the Avg. prefix. For instance, OCF% stands for off-ice CF%, iFF stands for individual FF, xFSh% stands for expected FSh% and Avg.DIST stands for average DIST. Events For or Against refer to events For and Against a team (team stats) or events For and Against a player’s team occurring while the player is on the ice (on-ice stats). Shots include those missing the net or blocked. Shots on goal are shots on goal.

Adjustments Explained

Shot Quality and Expected Goals: Part I