Data scientist, physicist, and fantasy football champion

Data acquisition

Hey fellow data people,

This should be a quick post.  I'm going to discuss how I have collected the data and cleaned it up to fit it to a few models, but I need to get a few things out of the way first.

  1. If anyone knows a better way to get this same data please tell me ASAP.  It only takes me about 15 minutes a week now, but it's still a little convoluted
  2. This is probably going to make me look like a crazy person, but in my defense I started small and built up.

Every week I collect data in 3 parts and use the programming language R (and software RStudio) to join the data sets.  The 3 parts are: last week's data (including yards, sacks, fumbles, etc.), last week's scores (including which teams were home or away), and the predicted scores for the upcoming week.


Previous data

Using the website I pick up previous DEF data from the website and previous kicker data from . allows you to turn a webpage with a table on it into a csv spreadsheet.  The nice people at The Huddle and FootballDB collect it and I pick it up, use R to change the headings, merge all the data sets, and make sure the numbers make sense.  For K and DEF my league uses the Yahoo standard scoring.  Sometimes their numbers don't match what I see in Yahoo, but they're only off by a point at worst.  I'm not sure why and I'm still trying to track it down, so there will probably be an update in the future.


Weekly scores

I just do this by hand.  All I'm looking for is which teams are playing, who was at home, what their scores were, and what day of the week they played.  I haven't found a good source for this where I could use, so I just type it in by hand.  I then use R to turn an otherwise hairy dataset into a tidy data frame.  "Tidy" data has one observation per row whereas weekly score data is often one game per line.

These were the first 3 games this year.  Moving to a tidy data set allows me to model how well a team does against an opponent very easily.  Various packages for R make this very easy, but I use dplyR. 


Upcoming week (odds and predicted scores)

One of the strongest predictors for how well a DEF is going to do is the opponent's score for the week.  This makes sense given that the DEF score actually depends on the opponent's score.  So how do you estimate weekly team and opponent scores?  Operating under the assumption that Vegas isn't in it to lose money I use the odds for each game to estimate the scores. gives total and spread estimates for games and I use those as the estimates.  Again, I just enter this by hand.  I could use, but it's a 5 minute thing once a week for now



In the future I'd still like to expand this to QBs and TEs, but for now I only collect K and DEF data.  Seriously, if anyone knows a good way to get this data in a helpful format and quickly please tell me.  I'll keep hunting.  Ideally I would have the actual football data (players, yards, fumbles, etc.) and not just their scores.  This will allow me to write customized scoring algorithms.  For example, in my league QBs lose a point for getting sacked, and some leagues do PPR or half-PPR.  If I ultimately decide to just use standard scoring then I could probably get away with just fantasy scores per week, but I'd like the option to model more things than just points.  For example, if I had the number of yards that each DEF gives up I can maybe use that as a factor for how many yards I expect a QB to throw. 

So far I've collected 2015 and the first 9 weeks of 2016.  I'll keep analyzing the data as it comes and maybe at some point I'll pick up 2014 data.  Hopefully I'll find a better source by then.


Introducing TruScor