Some of you may have heard about Bayesian statistics. I’ve spent the last few months learning about it and how to implement it in R. This will be a relatively short article, but I assure you that this is the end of a lot of work on my part. Don’t worry, it paid off.

The big idea behind Bayesian modeling is that you use prior assumptions combined with your existing data to help you model the most likely future outcome, and that you (as the analyst) put all of your assumptions about the data up front. This is in comparison to frequentist modeling (a more classical approach) where you only use the existing data to model the future, and some of the assumptions are a little more hidden.

There are a ton of debates over which methodology to use and which is more valid. Some of you may have seen an XKCD comic about the difference between them, and there are no shortage of analogies that mock one method or the other. I won’t really get into them because they’re a little silly. Philosophically, both systems make sense, but they just look at the data differently and come up with different ways to describe the future. Frequentist is much more about defining how confident you are that the numbers you have are correct, saying things like “I’m 95% confident that CLE defense will score between 6.5 and 8.5 points this week”; Bayesian is more about getting a probability distribution so you can make nice statements like “There’s a 15% chance that CLE defense will do better than NE defense this week”. Both are valid systems, but computationally Bayesian is much tougher to do. I use R for my calculations, and I can create a nice frequentist model using simple functions in minutes. For the Bayesian analysis it’s a bit of a pain to make complex models, and then running them takes much longer. The benefit of Bayesian modeling is that I can make much more complex models and that they put all of your assumptions up front. Here I use the rJAGS package to make a model, which is the R interface to JAGS, which stands for Just Another Gibbs Sampler. I’m learning how to use this from a couple of Coursera courses, and I’m still in the process of learning, so please bear with me.

Last year, my 3 best DEF models were all frequentist models. There were:

- Model A: Used terms for which Team was playing, their Opponent, the projected score, the projected opponent score, and whether the team was playing at home or away. I assumed that defenses changed significantly between years and therefore used only data from 2016.
- Model B: Only Team, Opponent, and Opponent score terms (the three most significant terms) and again 2016 data only.
- Model F: My TruScor model where I said that TDs were unpredictable and I tried to get around that. It uses the same terms as model A but now uses data from 2015 and 2016. For more about my patent-pending TruScor system, go read about it in the Methodology section.

To see how I did in 2016, I simulated the accuracy scores for each of these models for weeks 1 through 17 of last year (along with the predictions from Yahoo’s pros). If we compare the accuracy scores of these three models to what the pros at Yahoo predicted, we get the following:

Remembering that lower accuracy scores are better models, it looks like Models A and F are *maybe* a little better, but not by a ton. If I subtract the weekly variations and squint really hard, I think those two are better than Yahoo or Model C, but it’s close.

Let’s run my new, sexy, bayesian model and see what we get:

Wow! That’s improvement!

As before, I simulated what would have happened had I used the model through all 17 weeks of the 2016 season. Model JAGS-F uses the same terms as Model F, but with the Gibbs sampler (Bayesian model). Let’s take a quick look at what happens every week:

Generally speaking, the models move up and down by week, and some weeks they all score high and others they all score lower. This is because of touchdowns; there are teams that were expected to do poorly who scored 1 or 2 TDs and propelled themselves to the top, messing up everyone’s predictions for the week. Still, if we look at the overall trend, we can see that Model JAGS-F is the best almost every week. Those are amazing results. If we run a two-way ANOVA test to see whether the model is significantly different, we get a resounding “yes”:

## Analysis of Variance Table ## ## Response: accuracy ## Df Sum Sq Mean Sq F valuePr(>F) ## model4 80592014.712.346 1.794e-07 *** ## week16527183294.920.190 < 2.2e-16 *** ## Residuals 6310281 163.2 ## --- ## Signif. codes:0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Those three asterisks next to “model” means that at least one of the models is significantly different than the others. If you remove Model JAGS-F, those little asterisks go away. Model JAGS-F is far and away the winner.

## Conclusions

Holy crap, Bayesian statistics are some seriously dark arts. When I started learning Bayesian modeling I was a little hesitant and I got bogged down in the philosophical debates. I was worried that I would always reach conclusion X because my prior assumption was X. That’s not how a good Bayesian model works, though. You start with assumption X, but you can include how confident you are in that assumption. For example, in this model most of my assumed values were close to their final results, but my initial confidence is equivalent to about 50 data points. Over 2 years I have 1024 data points (32 teams each playing 16 games for 2 years… yeah, that checks out). My data would overpower my initial assumptions if they were incorrect. You can still run into problems if you have high confidence in a bad prior, but as I showed above, this gave me a lot of improvement in my model.

Next year, this is the model to watch and my fantasy defenses will reign supreme!

## Update on other models

Kickers are still no better than random, though I haven’t tried a JAGS model yet. QBs are better than random, but worse than Yahoo. Again, I haven’t tried JAGS yet. I’ll get there before the season starts, though, and I’ll let you know.