GitHub - jemmaissroff/hockey: NHL Time on Ice Data Scraping and Visualization

Visualising NHL Time on Ice Data

This is a project to visualize shared time on ice data. A visualization of this data is available here.

When I first started watching hockey, I was consistently astounded by line changes. Even today, as an avid hockey fan, I am always eager to learn more about which defensive pairings work best, how different lines' chemistry works, and most of all, who is actually playing together on a team.

This project seeks to answer the question of who shares time on ice with who. It also served as a tool for me to learn about graph databases and server and network setup.

This repo contains the scripts used to scrape the data, import it into a Neo4j database, and a sample index.html file detailing how to display it with NeoVis.

Scraping NHL API Data

The NHL API exposes many pages with information on teams, games and players. For this project, I use this teams page to get all teams' IDs. Given those IDs, I scrape each team's schedule. I pull the game IDs from the schedule, and using them can get shift data. For more information about the positions of individual players, I scrape the player's page.

This is where it gets interesting! Shift data only represents when a player enters and leaves the ice for a given shift. But, this project asks the question of who shares the ices with who. So, scrape_from_nhl_api.rb implements a small algorithm to calculate who is actually on the ice together, and give these pairings shared time. It does this for all pairings of the same position.

Then, the final step of scraping the data is writing it to a csv (import/all-pos-on-ice-data.csv), which is used in the next step to import the data into Neo4j.

Importing the data into a Neo4j DB

Fortunately, Neo4j has import from csv functionality. scripts/load_players.cql has the details for this import. It is run with cypher-shell.

The script loads the players as nodes in a weighted graph. The nodes have descriptions of the players including position and current team. Edges between each pair of players are weighted by their shared time on ice. So if two players spent 200 minutes on ice together over the course of a season, the edge connecting them would have a weight of 200. Notably, in an effort not to overcrowd the data, only players who shared the ice for at least two hours are loaded into the csv.

Visualizing with NeoVis

The next step in this pipeline is to actually display data. NeoVis is a javascript library for displaying Neo4j data. This controls the size and color of the nodes and their edges. Unfortunately, it does not allow one to hardcode colors, so I couldn't hardcode the players' nodes' colors to match their team colors. One potential future extension for this project would be to use vis.js which does have these configuration options.

Exposing the DB from a DigitalOcean Droplet

Lastly, to set up a server running Neo4j, I installed and configured Neo4j on a DigitalOcean droplet.

Future Work

There are many potential avenues to explore this more deeply. In no particular order, some on my mind are:

Looking into the NHL API, and scraping different data
Automating the data scraping to happen nightly (maybe with the next season?)
Using vis.js to have more control over tuning the data visualization
Putting a load balancer and lambda on top of the server to direct traffic

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

import

import

scripts

scripts

README.md

README.md

index.html

index.html

scrape_from_nhl_api.rb

scrape_from_nhl_api.rb

Repository files navigation

Visualising NHL Time on Ice Data

Scraping NHL API Data

Importing the data into a Neo4j DB

Visualizing with NeoVis

Exposing the DB from a DigitalOcean Droplet

Future Work

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
import		import
scripts		scripts
README.md		README.md
index.html		index.html
scrape_from_nhl_api.rb		scrape_from_nhl_api.rb

jemmaissroff/hockey

Folders and files

Latest commit

History

Repository files navigation

Visualising NHL Time on Ice Data

Scraping NHL API Data

Importing the data into a Neo4j DB

Visualizing with NeoVis

Exposing the DB from a DigitalOcean Droplet

Future Work

About

Resources

Stars

Watchers

Forks

Languages