Peter Jacobs

Personal Webpage

Details

Data about professional athletes is widely available on the web. Unfortunately, for any given professional athlete, their information is often distributed across multiple different sites. For example, consider former professional quarterback Peyton Manning. Do a quick google search for him, and you will quickly find his:

This is a problem for data analysts who want to build models to answer sports questions; scraping multiple sites and integrating information is generally not easy.

In this project, a web scraper is created which collects information on NFL players from multiple sites, and combines that information into a single table data structure. Professional statistics are collected from here and the college data comes from here.

Code

The code can be downloaded on this github repository and is ready for immediate use via a command line interface. Please see the README; it contains directions for using the scraper, and a couple of ideas for how the scraper can be extended to handle more complex queries.

Credits

Thanks to Sports Reference LLC for maintaining up-to-date sites containing MLB, NFL, and NBA data