Data scientist, physicist, and fantasy football champion

Week 1 QB Predictions

When I tried to make good QB models based on last year’s data, the results weren’t encouraging. Yahoo’s experts did as well or better than most of my models. It may be that there’s something to narrative-based predictions, but I just refuse to believe that. I think I just don’t have enough data. Yeah, that’s probably it…

This is the first year I’m officially modeling QBs, so we’re starting fresh with Model A and I have no mess from last year to clean up. A few months ago I made up a few models, trained them on 2015 data, and then tested them on 2016 (check it out here). These are the models that would have done the best in 2016 had I used them.

ModelTypePlayerScoreOppScoreScoreOppScoreHomeAwayHomeAwayPlayerOppDataYears
ALMXXXXXX 2015-2017
BLMXXXXXXX2015-2017
CLMXXX 2015-2017
DGibbs SamplerXXX X2015-2017
EGibbs SamplerXXX XXX2015-2017
FPLSXXXXX X2015-2017

Models A and B have a lot of terms, but they’re all statistically significant or dependent on other terms (can’t drop HomeAway because we have HomeAway*Player in there and it’s a good term). The only exception is the Opponent term (Opp). It’s not statistically significant, but I think that’s because the difference between the 32 levels isn’t that large; the adjustment to the predicted fantasy score changes by about -3 to +3 fantasy points depending on which team you’re playing, with DEN being the toughest team to go against and CLE the easiest. Despite lacking statistical significance, 3 points is large enough to make a difference in my predictions and I probably need to include it. Indeed, I found that Model B (including Opp) would have done marginally better than Model A in 2016. I’m excited to see how it performs this year.

Some of these models include the terms HomeAway and HomeAway*Player. The HomeAway term determines whether players overall do better at home or away. I found that it’s actually a pretty small effect, about 0.2 fantasy points (better at home). Despite the small effect, I have to keep it in for bookkeeping reasons because I really want that HomeAway*Player term. It accounts for specific players who do much better at home or away. Roethlisberger’s prediction is adjusted by about -3 fantasy points when he plays an away game while Osweiler’s prediction would actually increased by 2.5 points when playing away. This might be a small sample size, or maybe it just sucks to play in Denver. I’ll have to look into that. Maybe I can see if QBs playing in Denver play worse than in other stadiums. Home Team wasn’t relevant for my kicker models, but maybe there’s something to it with QBs…

Ugh, that’s a job for later. Dammit, Kevin, focus!

The problem with the HomeAway*Player term is that it’s likely strongly dependent on the number of games I have for the player. It will be much more accurate for QBs who have played more than 20 games over the last 2 years, but much less accurate for players who have only played a few games. It might also be somewhat dependent on schedule, but I hope that including the Opp term might cancel that out a bit.

Model C would have done almost as well as Models A and B despite being stripped down to the bare essentials: the player and the predicted scores for each team. This model should avoid the overfitting problems I’m likely to see in Models A and B, but I think it might also miss a few things. Still worth including.

Model D uses a Gibbs sampler (shout out to rjags), which is a Bayesian model. It would have done marginally better than Models A through C in 2016, and I’m excited to try it out. The only problem is that it’s complicated to program that type of model into R and make predictions. I’m still new at it, so I worry that I’ve made mistakes. Plus, I’m not sure I actually get much more out of it than a linear model with the same terms. Bayesian models give you a probability-based output which can certainly be helpful, but if I’m not really trying to quantify uncertainty it’s probably not worth the added effort. I can easily say things like “there’s a 75% chance that Brady will be better than McCown this week”, but since my accuracy score isn’t based on that it won’t improve my model to know these things.

Model E uses almost the same terms as Model B, but using a Bayesian framework. This wasn’t any better than any others when I tested it on 2016 data, but after going through the work to code it up and test it against 2016 data I didn’t have the heart to leave it out of the predictions for this year.

Model F is cluttered. It uses a machine learning algorithm, but I don’t fully understand how to implement it. Don’t worry about this one yet, I’m just trying it out and it’s easier for me to put it here for now while I work on it.

So there you have it, six models to start the year, all would have been roughly equal in 2016, but they vary quite a bit in their modeling technique and parameters. With that, let’s get to it!

[A final note of caution: I can’t model Deshone Kizer or any rookie QB until I have any data for him. I left Osweiler in there because it’ll take to long to fix all my code for just one week and I don’t think it’s worth the effort. Don’t play Brock Osweiler this week. In fact, let me make a bold statement here: don’t play any QB who isn’t starting. This weirdness should be resolved next week.]

Model A:


Terms:

Player
Score
OppScore
Score*OppScore
Home/Away
Home/Away*Player (Some players have a bit more of a problem playing at home or away. Looking at you, Roethlisberger…)
Data years: 2015 - 2017

.Model A-1.png

Model B:

Terms:

Player
Score
OppScore
Score*OppScore
Home/Away
Home/Away*Player
Opponent
Data years: 2015 - 2017

.Model B-1.png

Model C:


Terms:

Player
Score
OppScore
Data years: 2015 - 2017

.Model C-1.png

Model D


Terms:

Player
Score
OppScore
Opponent
Data years: 2015 - 2017

Bayesian (Gibbs sampler)

.Model D-1.png

Model E:


Terms:

Player
Score
OppScore
Score*OppScore
Home/Away
Home/Away*Player
Opponent
Data years: 2015 - 2017

Bayesian (Gibbs sampler)

.Model E-1.png

Model F:


Terms:

Player
Score
OppScore
Score*OppScore
Score^2
OppScore^2
Home/Away
Home Team ( which stadium they’re in)
Opp
Team (some QBs play better under different teams)
day (Thu/Sun/Mon)
Data years: 2015 - 2017

Model details: Partial Least Squares fit using the caret package. I’m just throwing data science at the wall here and seeing what sticks. I need to work on variable selection, and if you’re reading this note past week 1 it’s because I still need to work on variable selection. For now, just consider Model F to be experimental.

.Model F-1.png

Weekly model summary

RankABCDEF
1Aaron RodgersAaron RodgersAaron RodgersTom BradyTom BradyAaron Rodgers
2Tom BradyTom BradyTom BradyAaron RodgersAaron RodgersDerek Carr
3Dak PrescottDak PrescottCam NewtonMarcus MariotaDak PrescottTom Brady
4Cam NewtonCam NewtonBen RoethlisbergerBen RoethlisbergerMatt RyanMarcus Mariota
5Kirk CousinsDerek CarrMarcus MariotaCam NewtonMarcus MariotaCam Newton
6Matt RyanKirk CousinsRussell WilsonDerek CarrDerek CarrJameis Winston
7Derek CarrJameis WinstonKirk CousinsMatt RyanSam BradfordMatthew Stafford
8Matthew StaffordMatt RyanDak PrescottRussell WilsonKirk CousinsKirk Cousins
9Russell WilsonRussell WilsonMatt RyanKirk CousinsCam NewtonRussell Wilson
10Marcus MariotaMarcus MariotaDrew BreesSam BradfordRussell WilsonDrew Brees
11Jameis WinstonMatthew StaffordDerek CarrCarson PalmerCarson PalmerDak Prescott
12Sam BradfordSam BradfordMatthew StaffordDak PrescottJameis WinstonBrian Hoyer
13Blake BortlesCarson PalmerAndy DaltonDrew BreesMatthew StaffordCarson Palmer
14Tyrod TaylorBen RoethlisbergerTyrod TaylorAndy DaltonBen RoethlisbergerMatt Ryan
15Andy DaltonAndy DaltonEli ManningJameis WinstonAndy DaltonBen Roethlisberger
16Ben RoethlisbergerDrew BreesJameis WinstonMatthew StaffordDrew BreesSam Bradford
17Philip RiversBlake BortlesAndrew LuckEli ManningTyrod TaylorEli Manning
18Carson PalmerTyrod TaylorCarson PalmerTyrod TaylorBlake BortlesAlex Smith
19Drew BreesAlex SmithBlake BortlesAndrew LuckEli ManningTyrod Taylor
20Eli ManningAndrew LuckSam BradfordAlex SmithAndrew LuckJay Cutler
21Andrew LuckEli ManningBrian HoyerJoe FlaccoAlex SmithAndrew Luck
22Brian HoyerPhilip RiversPhilip RiversBlake BortlesPhilip RiversCarson Wentz
23Trevor SiemianCarson WentzJoe FlaccoCarson WentzCarson WentzBlake Bortles
24Carson WentzJoe FlaccoCarson WentzBrian HoyerTrevor SiemianAndy Dalton
25Josh McCownJay CutlerAlex SmithPhilip RiversJoe FlaccoPhilip Rivers
26Jay CutlerJosh McCownTrevor SiemianJay CutlerJosh McCownJoe Flacco
27Joe FlaccoBrian HoyerJay CutlerBrock OsweilerJay CutlerJosh McCown
28Alex SmithTrevor SiemianBrock OsweilerJosh McCownBrian HoyerTrevor Siemian
29Tom SavageBrock OsweilerJosh McCownTrevor SiemianTom SavageBrock Osweiler
30Brock OsweilerTom SavageJared GoffMike GlennonBrock OsweilerMike Glennon
31Jared GoffJared GoffTom SavageJared GoffMike GlennonJared Goff
32Mike GlennonMike GlennonMike GlennonTom SavageJared GoffTom Savage

Conclusions


So many models to choose from but no way to choose!

Ignoring Model F this week, all of these would have been roughly as accurate as each other in 2016. I think technically models B and E were marginally more accurate, so I’ll likely start with one of those two if I haven’t drafted a good QB.

[Crap. I took Stafford. Man, I hope it’s Model A. Come on Model A!]

Tom Brady and Aaron Rogers are locks for the week. If you have one of them you’re good to go.

Derek Carr, Dak Prescott, Kirk Cousins, and Matt Ryan all consistenly appear toward the top of the models, especially B and E, and are good plays this week. Dak is going really late in drafts and may potentially be available to grab off the waiver wire this week in smaller leagues.

Your opinion on Ben Roethlisberger depends heavily on whether you’re terrified of an away game.

Beyond that initial list, the rest of the top 15 are all a little jumbled up and I wouldn’t really know how to pick one over another without resorting to outside info or, dare I say it, feelings. Mariota, Newton, Wilson, Stafford. All seem good, but until I start seeing results from these models I don’t know how to pick. Maybe I can use QB reliability. Didn’t someone just write something about that? (Me. I did. https://www.thedatastream.org/data-exploration/2017/9/1/qb-reliability-aka-a-note-to-marcus-mariota-and-matt-stafford-owners and it says… crap. It says don’t trust Matt Stafford this week. Man, I really hope I’m a bad analyst.)

If I haven’t drafted one of those top 5 or 6 QBs I’ll likely hem and haw and eventually pick whichever QB is higher in Model E. With no way to choose yet, it’s anyone’s game.

Meta-conclusions


This is too many models. It’s tough to pick what to do when staring in the face of so much data, and I don’t think just aggregating the models is the way to go. Here’s my plan for the year, subject to change:

  1. Run these models for 4 weeks with no big changes
  2. Compare them to each other and to Yahoo’s experts (or some other experts) and see how they stack up
  3. Eventually make new models when none of thse distinguish temselves. I’ll likely start by trying a model that uses only data from this season and/or 2016-2017 season data
  4. About halfway through the season I might start throwing out models if they consistently perform poorly. Or at least I’ll stop showing them to all to you, dear reader
  5. Win my league
  6. Be an unbearably poor winner for the next year

Week 1 K Predictions

Week 1 DEF Predictions