A Guide to the Perplexing Posterior

For data scientists coming from an educational background where Frequentism is dominant, it can be daunting to begin using more Bayesian methods. Learning about priors, posteriors, Markov Chain Monte Carlo (MCMC), and so on, is overwhelming at times, particularly after filling one’s head with p-values and t-statistics. Fortunately, two books - one recenty released and one in its second edition - make the task much easier and more enjoyable, particularly for those versed in the programming language R. Here are my reviews in the order that I think they should be read.

Introduction to Empirical Bayes: Examples from Baseball Statistics

David Robinson is a data scientist whose prodigious contributions to open-source software are only matched by his ability to explain complex topics. This book started as a popular answer on Stack Exchange, but morphed into a new R package and what I consider the best introduction to Bayesian data analysis.

What makes this book so satisfying is that it follows a single thread, from credible intervals on a simple estimation of batting averages, to hierarchical modeling with several variables. It reads like lab notes from someone trying to solve a problem, and therefore conveys a sense of the process of empirical Bayes, instead of just the product.

Ultimately, this strength also highlights the book’s limitation: by focusing on empirical Bayes for (mostly) binary problems, Robinson leaves methods like MCMC unexplored. He addresses this early in the book, and never purports to be writing a full introduction to Bayesian tools. For that, we turn to another reccomended text:


Automate It

Written on