Data scientist, physicist, and fantasy football champion

Quarterback data exploration

I just can’t stop: I downloaded the QB data from 2016 and I’m digging through it now. The plan is to write a few data exploration posts while I learn myself. Those of you who want to read it can, and those of you who don’t can just look at the predictions whenever I get to them.

First, a quick note on scoring. Standard scoring doesn’t penalize QBs for getting sacked. In my league (called Illegal Use of Hands), QBs lose 1 point for getting sacked. The idea is that they could dump it off and not lose yards if they werre a little quicker. Sure, it also depends somewhat on the offensive line, but it’s something that the QB has some control over and it screws the team over a little so it’s worth it to take a point off their score. Most of the time it’s only 1 or 2 sacks, but there are some guys (looking at you, Luck and Rivers) who regularly get hit 4-6 times in game. In week 8 Carson Palmer got sacked 8 times and in week 3 Cam Newton got got 8 times himself. That’s 2 passing TDS!

I’ll try to be specific when I’m making predictions, but for now I’m going to just explore both scoring systems a little. Just remember that if my weekly prediction for Rivers looks a little low to you then I’m probably using the other system (abbreviated IUoH).

Also note that I’m going to filter out backup QBs as best I can. I can maybe make a list of whoever is starting for the week, but that seems like a bit of a pain in the ass. I can also just try keeping players who have been in for, say, more than 2 games and scored more than 35 points but think that’s going to keep in a few guys that I don’t want for the sake of keeping in guys I do. For example, As of week 10 Jimmy Garoppolo and Jay Cutler have both played in 4 games and scored about 38 points. You’ve also got Gabbert in there. I’m going to just try a general list of starters for this article. There are a few guys on bye week 11 but I’ll include them in here. There are also a few guys who played earlier in the season and may or may not come back. I just want to explore for now and I’ll deal with who’s actually playing on a week-to-week basis.

Average score (standard scoring)

No real surprises here. The players with the highest median scores (the horizontal bar in the middle of the) are Aaron Rogers, Drew Brees, Matt Ryan, and Tom Brady. Kaepernick has been one of the more consistent QBs since his return, scoring between 19 and 25 points. He actually has the highest floor this season. My boy Mariota has also been solid, but I’m a little surprised that he doesn’t look better.

Let’s take a quick look at sacks while I’m here


Palmer, Luck, Cutler, and Tannehill are getting their asses kicked! In fact, Luck has been sacked at least twice in every game. In contrast, Brees has only been sacked at most twice in a game.


Surprisingly, the median for everyone (but McCown) is 1 or fewer. Even Fitzpatrick who notoriously gives it away. He’s had a few bad games, but in half or more he hasn’t thrown more than 1 interception. That’s not bad.

Also, Andrew Luck and Andy Dalton are profiles of courage under pressure.

QB fantasy points by days of the week

Remember last week when I split up DEF scores by day of the week (Mon/Thu/Sun) and whether teams were home or away? Of course you do. It was a beautifully written, compelling story backed up with hard statistical analysis. Anyway, let’s do that again with QBs and see if anything interesting shakes out.

The means (Mon = 14.11, Thu = 16.75, Sun = 17.42) make it look like QBs score a little less on Mondays, but if you look at the distributions it doesn’t look like there’s much difference. Let’s see if our old friend ANOVA can confirm that:

## Analysis of Variance Table
## Response: fptsstd
##DfSum Sq Mean Sq F value Pr(>F)
## day 2 207.2 103.6222.0112 0.1356
## Residuals 300 15456.451.521

That p value (p = 0.1089) is borderline significant, or at least as much as we’ve seen in any of these data sets. I think I’ll use day of the week in my models. What about playing at home?

QB fantasy points by home vs away

The means (Home = 17.81, Away = 16.54) are a little different but the distributions are close. Let’s see if our old friend the t-test can say whether they’re different:

##Welch Two Sample t-test
## data:filter(data_QB_starters, HomeAway == "Home")$fptsstd and filter(data_QB_starters, HomeAway == "Away")$fptsstd
## t = 1.5447, df = 300.15, p-value = 0.1235
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## sample estimates:
## mean of x mean of y 

Again, borderline significant (p = 0.12) is borderline significant, and again I’ll keep this in my models.

QB fantasy points by home/away and day of the week

t-test for Thursday:

##Welch Two Sample t-test
## data:ThuHome and ThuAway
## t = 0.66792, df = 15.24, p-value = 0.5142
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##-5.62156 10.76289
## sample estimates:
## mean of x mean of y 

t-test for Monday:

##Welch Two Sample t-test
## data:MonHome and MonAway
## t = 1.3449, df = 16.56, p-value = 0.1968
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##-2.433332 10.942827
## sample estimates:
## mean of x mean of y 

Aaaaaaand…. those big differences on Monday and Thursday aren’t statistically significant. Maybe when I pick up 2015 data it’ll work, but it’ll probably still be borderline significant at best. I try models with the interaction term, but it’s only because I’m running out of things that can be significant and dammit, that looks big, doesn’t it?

QB score vs Team score

QBs score more fantasy points when their teams score more points. Stop the presses!

QB score vs opponent score

QBs score slightly more fantasy points when their opponents score more points. This is interesting. Maybe they have to throw more to catch up? It’s a weak relationship to be sure, but it’s there.

QB score by opponent

It’s tough to be a QB in Denver. Or going against the Giants. Atlanta is a much friendlier place

QB passing yards by player

Drew Brees and Tom Brady pass a lot. Interestingly, Kaepernick, Tannehill, Fitzpatrick, Osweiler, Wentz, and Smith all have about the same size box (25th and 75th percentiles, a.k.a., a QB landed in that box 50% of the time). This is a factor that I’d like to add to the model, but I need a way to predict the number of passing yards each week.

QB passing TDs by player

Roethlisberger is inconsistent. Mariota and Brady throw a good number of TDs with consistency.

QB Rushing yards by player

Kaepernick, Taylor, and Newton all commonly run for around 1 passing TD’s worth of points (4). I was a little surprised to see Bortles and Rogers up there, too, with medians of 25 yards. That’s not a ton, obviously, but it’s something.


I have a ton of data, but I don’t know how to model this. I’s very easy to model Fantasy Points vs:

  • Player
  • Expected score
  • Expected opponent score
  • What opponent you’re playing this week
  • Day of the week
  • Home/Away

However, I’d like to try and include a few other things:

  • Expected passing yards (by player, i.e., how many yards he throws)
  • Expected passing yards (by opponent, i.e., how many yards they allow)
  • Expected rushing yards (by player)

I think that yards should be a little more predictable than just points, but how do I get the expected values for a week? I can get everything on the first list independently, but expected yards I’d get from fitting those first five factors. This leads to factors that are heavily confounded and therefore sloppy models if I include all of it.

I don’t think I’ll include TDs since that feels a little more inconsistent week to week. Rushing yards… yeah, I pretty much have to, but I’m not sure if it has any dependence on the opponent defense. It’s definitely dependent on the QB, and there are a few guys up there where it makes a big difference from week to week.

Quarterback data exploration, part 2

Kicker model factors (up to week 10)