Data scientist, physicist, and fantasy football champion

Week 14 FLEX predictions - two new models!

None of my previous attempts to model FLEX players (WR/RB/TE) blew me away. They were just about as accurate as FantasyPros ECR rankings. Sometimes, one segment would be slightly better than another (e.g., my RB ranking would beat theirs but not my WR), but intermittent success isn’t really what I’m going for. So I’m pushing those models aside and trying some new ones this week.

This week I have two models that I’m going to try out that have a few key differences. They’re both giving me reasonable results that somewhat match FantasyPros, but as you’ll see in a sec one of the models provides some additional info that might be useful

Model 1: Quantile Regression model

One of the most common model techniques is called the linear model. It tries to model the mean of your data and makes a few assumptions of normality (Gaussian-shaped data). This generally works well and is often even good for non-gaussian data. At the very least, if you have non-normal data but you don’t know what the actual underlying distribution is you might be better off just assuming normality and moving on with your day. This is a useful technique, and I’ll even use it again later, but I wanted to get a better idea of the floors and ceilings of certain players, and linear models aren’t great at that. You can get confidence intervals, but that’s not actually what I’m looking for.

One way to explore floors and ceilings is with quantile regression. Instead of modeling the mean, I’m going to model the 10th, 50th, and 90th percentiles (quantiles) for each player. Here I’m calling the 10th percentile the player’s floor (you would expect them to do better than this 90% of the time), the 50th percentile is the median, and the 90th percentile the floor (they should do worse than this 90% of the time). This isn’t perfect since there’s some weirdness with predicting like this, but I’m just trying to get a handle on floors and ceilings. You’ll have to deal with a little imprecision in language.

So what’s the benefit to quantile regression? First, it’s nonparametric, meaning that it doesn’t matter if your data is gaussian or not. Second, it helps with guys that have long tails. In this model, if you have a player who regularly does alright but occasionally blows up for huge games (looking at you, Amari Cooper) then this model will show you their ceiling. In fact, let’s look at Amari Cooper for just a second. Here are his fantasy points over his career:

.amari cooper points-1.png

And let’s contrast this with someone a bit more consistent. How about Marqise Lee:

.marqise lee points-1.png

Remember what I said a minute ago about linear models (which use the mean) and quantile regression (which uses the median) giving similar results? You can see that here both players’ mean and median scores aren’t drastically different, and a fit to either type of model will return similar results for the predicted score. The extra information comes in looking at their 10th and 90th percentiles, a.k.a., their floors and ceilings. Amari Cooper has a much higher ceiling and a slightly lower floor than Marqise Lee. I might want to know this for my players, so I’m going to use quantile regression to show me.

I used quantile regression to model 10th, 50th, and 90th percentiles for the WRs, RBs, and TEs that I expect to play this week. I modeled RBs, WRs, and TEs separately, then combined them, which should account for teams that are good against one position but not necessarily another. I also only kept the significant variables for each position. For your (and my) reference, here are the terms that I kept:

WR:

  • the player
  • the predicted score for the team for the week (based on Vegas odds)
  • the predicted score differential (team score - opponent score)
  • the opponent
  • home/away
  • rush attempts expected for the player (average of attempts over last 4 games)
  • receptions expected for the player (average of receptions over last 4 games. I don’t have target data, sorry.)
  • and some cross terms that turned out to be significant (score^2, score differential ^ 2, and score * score differential)

RB:

  • the player
  • the predicted score for the team for the week (based on Vegas odds)
  • the predicted score differential (team score - opponent score)
  • the opponent
  • home/away
  • rush attempts expected for the player (average of attempts over last 4 games)
  • receptions expected for the player (average of receptions over last 4 games. I don’t have target data, sorry.)
  • and some cross terms that turned out to be significant (just score^2 here)

TE:

  • the player
  • the predicted score for the team for the week (based on Vegas odds)
  • the predicted score differential (team score - opponent score)
  • the opponent
  • home/away
  • receptions expected for the player (average of receptions over last 4 games. I don’t have target data, sorry.)
  • and some cross terms that turned out to be significant (just score^2 here)

Additional notes:

  • I’m only using data from the 2016-2017 and 2017-2018 seasons (this season and last).

Let’s see who this model predicts will do well, and get an idea of their floors and ceilings:

.quantreg model 1 fig 1-1.png

So far, so good. I like most of the guys in this list. There are a few neat things here that make me excited for this model. LeSean McCoy has one of the highest ceilings for an RB, and this model reflects that. Tyreek Hill has one of the highest ceilings and lowest floors, and again, this model recognizes this. Perine’s floor is probably reflecting his time as a backup and not the lead guy. McCaffrey’s floor is right where it should be.

.quantreg model 1 fig 2-1.png

One note on the Danny Woodhead weirdness: I have no idea why it’s doing this, but I would trust the floor/ceiling bar much more than the median dot. It seems to do this once in a while with players for which I have limited data. If you want to play him this week, that’s on you, but don’t play him this week based on my analysis.

The standout ceilings in this second grouping ate Kareem Hunt (way down at 68 this week) and Brandin Cooks. I have and am starting Kareem this week because of that ceiling, but FantasyPros ranks him somewhere in the top-12 FLEX, and I have a really hard time believing that.

I’m not sure what to make of Austin Ekeler being up so high, but it’s probably due to a good matchup (WAS) and good usage in the last few weeks. He’s also had 4 games over 10 points this year, so his ceiling is higher than most. I don’t know if I like his median this high, but I’d probably play him this week if I had him.

.quantreg model 1 fig 3-1.png

Some huge ceilings here. This is the thing with using the 10th and 90th percentile; it really pulls out a high ceiling. But I like it because it shows off the players’ extremes better than 20th/80th. For example, Theo Riddick has a really small floor, and that seems accurate to me.

.quantreg model 1 fig 4-1.png

This last grouping has some guys that I would play, but mostly for their ceilings (hey Amari!). Amari Cooper, Derrick Henry, Matt Breida, and T.Y. Hilton are all players with amazing upside, but half the time or more they’re disappointing. Play them at your own risk.

And that’s it for this model. I think there’s a lot to unpack here, and I’ll probably write up a brief report next week about how it did compared to FantasyPros ECR. Whole you’re here, want another model? I know I do!

Linear model (including quarterback)

This is similar to the last few FLEX models I’ve tried, but I adding a term that shifts how I think about how QB play affects players. Instead of thinking of a QB as a QB1 or QB2, I’m just going to see how players play with certain QBs. This won’t be player-dependent; there are some WRs that form a rapport with a QB and are buoyed by that. Instead, this will affect all RBs, WRs, or TEs associated with a QB. Basically, all WRs benefit under Tom Brady, but by how much?

I use roughly the same terms for WR, RB, and TE modeling:

  • the player
  • the predicted score for the team for the week (based on Vegas odds)
  • the predicted score differential (team score - opponent score)
  • the opponent
  • home/away
  • rush attempts expected for the player (average of attempts over last 4 games)
  • receptions expected for the player (average of receptions over last 4 games)
  • and some cross terms that turned out to be significant (score * score differential)
  • the QB playing

There are very slight differences, but I’m not going into it, because this is getting long. Let me just show you what this output looks like:

.lm model 1 fig 1-1.png
.lm model 1 fig 2-1.png
.lm model 1 fig 3-1.png
.lm model 1 fig 4-1.png

There’s a lot less info here without the floor/ceiling, but I can’t do quantile regression and include the QB term because it throws a (singular matrix) error. You don’t need to know that, though. What matters is that this is a little less interesting to pick apart. There are a few standouts here that I don’t love, but overall this model seems as valid as any. We’ll see next week after the results are in.

Conclusions

So that’s it for this week. 2 new models, to be evaluated next week. I really like the quantile regression, but as a technique it’s more limited; there are some model terms that I just can’t include. I plan to use this model this week, actually, if I need to find guys with higher changes for big games.

Week 14 K Results

Week 14 DEF Predictions