October 14, 2009 at 12:00 pm by Capitol Avenue Club under Research Studies, Statistical Analysis
Updated 10/15/2009 : See Bottom
In an effort to deviate from prospects (I’ve been doing them constantly for awhile, now), I’ve written an article on “Beane Count”. Enjoy.
Sabermetricians are constantly talking about walks and homers. They are, perhaps, the two most important secondary offensive skills. We frequently judge players based on their slash statistics–their batting average, their on-base percentage, and their slugging percentage. In order to produce secondary offense (offense independent of batting average), hitting homers and drawing walks are hugely important. In addition to boosting one’s secondary offensive production, walks and homers are also relatively free of statistical noise. That is, their aren’t a lot of outside influences that distort the meaning of the data. When a player draws a walk, he’s drawn a walk. There’s no ball put in play and it’s simply a function of three things: plate discipline, pitchers’ control, and umpires. While a batter doesn’t have any control over a pitcher’s ability to throw strikes or an umpire’s ability to call balls balls and strikes strikes, these two things mostly even out over the course of a season, leaving only the third thing–plate discipline. The same can be said about Home Runs. There’s relatively little luck involved with the ability to draw a walk or hit a homer. The opposite is the case with batting average, a metric that is much more luck-dependent.
Eight or nine years ago, I came up with a silly little thing called Beane Count, which was a way of looking at how teams fared in a couple of sabermetrics-friendly measures: home runs and walks. How many you get, and how many you give up.
The metric is calculated on a ranking scale. If you rank 16th in the league in home runs hit, you get 16 points. The same for home runs allowed, walks drawn, and walks allowed. It’s scored by the golf method–lower is better. Here’s what the final 2009 Beane Count standings look like:
I love the simplicity and good intentions of the metric, but how well does it predict the ability for a team to win baseball games?
We’ll take a look at how well Beane Count did in 2009.
For the purposes of this study, I used two sets of data. First, I compared the opposite of a team’s winning percentage to their Beane Count. I used the opposite of winning percentage (1-W%), rather than the actual winning percentage to generate a positive correlation, rather than a negative one of the same magnitude. Since Beane Count works like golf (lower is better) and winning percentage works the other way, 1-W% also works like golf. I ran three sets of data. One using only the AL’s numbers, one using the NL’s numbers, and one using all 30 teams’ numbers. The following charts should be self-explanatory.
- Some things of note: the coefficient of determination (R squared) for the NL study was 0.373. It was 0.497 for the AL and 0.418 overall. This number attempts to tell us what percent of the change in our 1-winning percentage (also winning percentage) can be explained by change in Beane Count. Of course, since Beane Count influences winning percentage (aka the variables are not completely independent), the data isn’t perfect. And we aren’t working with an extremely robust sample size here. But still, I’m shocked at how well the two correlate.
I previously mentioned I used two sets of data. The second set was not the teams’ actual winning percentage, but their 3rd order winning percentage. This gives you a better snapshot of how good each team is. It attempts to answer the question: what would this team’s winning percentage be in a luck-neutral environment. The three graphs that followed:
- Some more things of note: the coefficient of determination for this data set is even stronger. 0.539 for the AL, 0.445 for the NL, and 0.460 overall. Again, the sample size is small, there is some statistical noise in the data, and I didn’t preform any sort of test of significance. Still, this is a fairly shocking result. 0.460 isn’t a particularly strong coefficient of determination, but we’re not shooting for 1.0 here. If it were really 1.0, Adam Dunn would be paid $40 million a year*.
*An exaggeration, of course. But maybe not.
Interpreted literally, this would mean that 46% of a team’s ability to win baseball games is derived only from their ability to draw walks, hit homers, and limit the opponents from doing the same. We’re not including the ability to hit singles, doubles, triples, steal bases, run the bases well in general, field the ball, for pitchers to get ground balls, for pitchers to strike batters out, etc.. We’re talking about two things here, walks and homers.
I’m sure there are other metrics that correlate just as well, if not better, with winning percentage. The thing is, I don’t think you’ll find another metric with as little statistical noise as walks and homers that correlates as well with winning percentage as Beane Count does.
In addition to being a “silly little thing” (as Rob calls it), it’s actually a pretty important thing to maximize (or minimize, golf scoring, remember) if you’re concerned with winning baseball games.
Final Thought: I don’t think I’ve discovered anything new here. But quantifying it makes it that much more real, I guess.
Final Thought #2: The eight playoff teams among the top-12 teams in Beane Count. The 4 in the top-12 that missed the playoffs? The White Sox, Rangers, Rays, and Braves. The latter three teams came very close to making the playoffs and were very good teams. The White Sox probably stumbled on a lot of bad luck.
The p-values for the study have been requested. P-values are stated as probabilities. In this case, they indicate the probability that the correlation demonstrated in the data is a result of random chance and no underlying correlation exists: