Nifty paper on Batting Average…

For some reason, I never do much reading about baseball during the season itself. But as the World Series approaches its end (still hoping for a game seven) I have started to dust off some of my reading materials. A couple years ago, I mentioned this work by Lawrence Brown on this blog, but the paper that he was writing was still a work-in-progress. But it’s available now:

IN-SEASON PREDICTION OF BATTING AVERAGES: A FIELD TEST OF EMPIRICAL BAYES AND BAYES METHODOLOGIES
By Lawrence D. Brown, University of Pennsylvania
.

Batting average is one of the principle performance measures for an individual baseball player. It is natural to statistically model this as a binomial-variable proportion, with a given (observed) number
of qualifying attempts (called “at-bats”), an observed number of successes (“hits”) distributed according to the binomial distribution, and with a true (but unknown) value of pi that represents the player’s latent ability. This is a common data structure in many statistical applications; and so the methodological study here has implications for such a range of applications.

There is a lot of meat to chew in it, but some easy take aways are that simply imagining that the batting average of a player in the first half of the season is indicative of their performance in the last half is just not true: it turns out to be a relatively poor measure. A better measure is simply to average that performance with the mean batting average of all players (best predictor is that the player will regress towards the mean). I’ve been aware of this principle for quite some time, but Brown goes on to derive some more interesting statistical techniques, well outside my normal comfort zone. If math/statistics is your thing, check it out.