Data scientist, physicist, and fantasy football champion

QB Models: What might have been, 2016

Those of you who hanging on my every written word (and yes, there’s not a lot there to grip lately) know that I’m interested in looking at QBs next year. But what would have happened had I modeled and predicted last year? Here I look at a few candidate models for the 2017 season and see how they would have performed had I used them in 2016.

A few caveats: I’m doing this all in R, and it’s a little tough to make models about players who haven’t played yet. There are a few QBs who didn’t have a ton of playing time (1-2 games only), there’s Cody Kessler who came in partway through the season, and guys like Dak Prescott and Carson Wentz for whom I have limited 2015 data (i.e., none). To automate my models throughout the year I had to model only the top players, that is, guys who played 8 or more games. I also think that this gives me a better idea of the data that I actually care about. Those of you who read my QB exploration articles know that QB points per game were roughly broken into two groups: players with 8 or more games (regular starters) had high points per game, and players with fewer than 8 games (weekly replacements) had low points per game. I suspect that grouping them together like that will give me a better idea of how the different model parameters affect fantasy performance, so I feel justified in modeling only those top players.

Unfortunately, leads to a little weirdness in the modeling and a reduction of accuracy, but not that much. There will be weeks in which I didn’t model guys like, say, Landry Jones when he was in for Roethlisberger, but then again there weren’t a lot of weeks in which guys like Landry Jones ended up in the top 15 where they would impact my accuracy score. This is definitely a limitation of this article, but something I can get around weekly in 2017. I’ll never easily be able to model a guy’s first week ever, but I should be able to at least model his 2nd or more.

In this article, I’ll just use R’s basic modeling package. I’ve been working on Bayesian modeling, so expect that in the future, but for now, let’s just see if any of these models would have been much better than any others last year.

Here I have the results from 6 different models. All of mine use data from 2015 and 2016 to get the model parameters (I had some DEF models that only used 2016 data). They all have different model parameters, though:

  • LM-OSOs: uses the player that’s playing, the opponent they’re up against, the expected score, and the expected opponent’s score
  • LM-SOs: uses only Player, expected Score, and expected Opponent Score (ignores the particular opponent, since I found that that term is only weakly significant. An interesting fact for another article)
  • LM-SOsT: Player, expected Score, expected Opponent score, and Temperature (the most significant term of the weather data that I have)
  • LM-KS: The kitchen sink model. Includes Player, Score, Opponent score, Temperature, Wind speed, the week, the year, home vs. away, the day of the week (Thursday, Saturday, or Sunday), and whether it was clear/rainy/snowing
  • Yahoo: The predictions from Yahoo’s experts
  • Random: Every week I just pick a random order. If you’ll remember, none of my kicker models significantly beat this one.

The points are the accuracy score for the week (lower is better) and lines are a rough fit to the models. Just like in DEF predictions, some weeks everyone does as expected and all the models do well, and some weeks there are upsets and all the models do poorly. Overall the models get better throughout the season, but they’re all grouped together pretty closely. It’s a little tough to tell what’s going on in this figure. Let’s look at a box plot:

The good news is that all of my models are better than random. The bad news is that none of them are much better than each other or just using Yahoo’s experts. The kitchen sink model has the widest box and is probably overfitting. The best (lowest) median accuracy score is from Yahoo, which isn’t great for me. It’s fine for them, but that’s tough to get excited about, you know?


None of this is very promising. I’m not worse than Yahoo’s experts, and I’m better than randomly guessing, but those aren’t exactly the superlatives I’m looking for. “World’s Okayest Analyst” is a pretty sad coffee mug, but unfortunately it’s the one I deserve right now.

Of my models, I like SOs and OSOs the best; they make sense to me, and the weather stuff doesn’t seem to have much of an effect in my exploratory analysis. My next plan is to try a Bayesian model, but I wanted to get this all out there first, since I don’t know how long it’ll take me to mash that one into working order. For modeling defenses my Bayesian stuff did a little better than my frequentist linear models, but exploring in Bayesian is still tough for me. Still, I should be able to test something before the 2017 season starts, so don’t buy me that mug yet.

Modeling DEF with Bayesian modeling using rJAGS

New kicker models - GLMs