« Braves 2010 Top Prospects: 21-30 | Main | Braves 2010 Top Prospects: 11-20 »
Fun With Beane Count
By Capitol Avenue Club | October 14, 2009
Updated 10/15/2009 : See Bottom
In an effort to deviate from prospects (I’ve been doing them constantly for awhile, now), I’ve written an article on “Beane Count”. Enjoy.
Sabermetricians are constantly talking about walks and homers. They are, perhaps, the two most important secondary offensive skills. We frequently judge players based on their slash statistics–their batting average, their on-base percentage, and their slugging percentage. In order to produce secondary offense (offense independent of batting average), hitting homers and drawing walks are hugely important. In addition to boosting one’s secondary offensive production, walks and homers are also relatively free of statistical noise. That is, their aren’t a lot of outside influences that distort the meaning of the data. When a player draws a walk, he’s drawn a walk. There’s no ball put in play and it’s simply a function of three things: plate discipline, pitchers’ control, and umpires. While a batter doesn’t have any control over a pitcher’s ability to throw strikes or an umpire’s ability to call balls balls and strikes strikes, these two things mostly even out over the course of a season, leaving only the third thing–plate discipline. The same can be said about Home Runs. There’s relatively little luck involved with the ability to draw a walk or hit a homer. The opposite is the case with batting average, a metric that is much more luck-dependent.
Rob Neyer writes on May 6, 2009:
Eight or nine years ago, I came up with a silly little thing called Beane Count, which was a way of looking at how teams fared in a couple of sabermetrics-friendly measures: home runs and walks. How many you get, and how many you give up.
The metric is calculated on a ranking scale. If you rank 16th in the league in home runs hit, you get 16 points. The same for home runs allowed, walks drawn, and walks allowed. It’s scored by the golf method–lower is better. Here’s what the final 2009 Beane Count standings look like:
| Team | Beane Count |
| Red Sox | 14.0 |
| White Sox | 21.0 |
| Yankees | 21.8 |
| Rays | 22.1 |
| Rangers | 26.0 |
| Angels | 26.7 |
| Twins | 27.0 |
| Blue Jays | 28.8 |
| A’s | 29.7 |
| Tigers | 37.0 |
| Indians | 39.1 |
| Mariners | 39.2 |
| Royals | 42.0 |
| Orioles | 45.2 |
| Rockies | 12.0 |
| Braves | 21.0 |
| Cardinals | 21.0 |
| Phillies | 25.0 |
| Diamondbacks | 27.0 |
| Dodgers | 27.5 |
| Cubs | 30.1 |
| Brewers | 36.0 |
| Marlins | 37.1 |
| Nationals | 39.0 |
| Reds | 41.0 |
| Pirates | 41.0 |
| Padres | 44.0 |
| Giants | 44.5 |
| Astros | 46.0 |
| Mets | 51.0 |
I love the simplicity and good intentions of the metric, but how well does it predict the ability for a team to win baseball games?
We’ll take a look at how well Beane Count did in 2009.
For the purposes of this study, I used two sets of data. First, I compared the opposite of a team’s winning percentage to their Beane Count. I used the opposite of winning percentage (1-W%), rather than the actual winning percentage to generate a positive correlation, rather than a negative one of the same magnitude. Since Beane Count works like golf (lower is better) and winning percentage works the other way, 1-W% also works like golf. I ran three sets of data. One using only the AL’s numbers, one using the NL’s numbers, and one using all 30 teams’ numbers. The following charts should be self-explanatory.



- Some things of note: the coefficient of determination (R squared) for the NL study was 0.373. It was 0.497 for the AL and 0.418 overall. This number attempts to tell us what percent of the change in our 1-winning percentage (also winning percentage) can be explained by change in Beane Count. Of course, since Beane Count influences winning percentage (aka the variables are not completely independent), the data isn’t perfect. And we aren’t working with an extremely robust sample size here. But still, I’m shocked at how well the two correlate.
I previously mentioned I used two sets of data. The second set was not the teams’ actual winning percentage, but their 3rd order winning percentage. This gives you a better snapshot of how good each team is. It attempts to answer the question: what would this team’s winning percentage be in a luck-neutral environment. The three graphs that followed:



- Some more things of note: the coefficient of determination for this data set is even stronger. 0.539 for the AL, 0.445 for the NL, and 0.460 overall. Again, the sample size is small, there is some statistical noise in the data, and I didn’t preform any sort of test of significance. Still, this is a fairly shocking result. 0.460 isn’t a particularly strong coefficient of determination, but we’re not shooting for 1.0 here. If it were really 1.0, Adam Dunn would be paid $40 million a year*.
*An exaggeration, of course. But maybe not.
Interpreted literally, this would mean that 46% of a team’s ability to win baseball games is derived only from their ability to draw walks, hit homers, and limit the opponents from doing the same. We’re not including the ability to hit singles, doubles, triples, steal bases, run the bases well in general, field the ball, for pitchers to get ground balls, for pitchers to strike batters out, etc.. We’re talking about two things here, walks and homers.
I’m sure there are other metrics that correlate just as well, if not better, with winning percentage. The thing is, I don’t think you’ll find another metric with as little statistical noise as walks and homers that correlates as well with winning percentage as Beane Count does.
In addition to being a “silly little thing” (as Rob calls it), it’s actually a pretty important thing to maximize (or minimize, golf scoring, remember) if you’re concerned with winning baseball games.
Final Thought: I don’t think I’ve discovered anything new here. But quantifying it makes it that much more real, I guess.
Final Thought #2: The eight playoff teams among the top-12 teams in Beane Count. The 4 in the top-12 that missed the playoffs? The White Sox, Rangers, Rays, and Braves. The latter three teams came very close to making the playoffs and were very good teams. The White Sox probably stumbled on a lot of bad luck.
Update
The p-values for the study have been requested. P-values are stated as probabilities. In this case, they indicate the probability that the correlation demonstrated in the data is a result of random chance and no underlying correlation exists:
| 1-W%Both | |
| R | 0.647 |
| Rsquared | 0.418 |
| P-Value | 0.000112 |
| 1-W%NL | |
| R | 0.611 |
| Rsquared | 0.373 |
| P-Value | 0.011924 |
| 1-W%AL | |
| R | 0.692 |
| Rsquared | 0.479 |
| P-Value | 0.006103 |
| 1-Pct3 Both | |
| R | 0.678 |
| Rsquared | 0.460 |
| P-Value | 0.000038 |
| 1-Pct3 NL | |
| R | 0.667 |
| Rsquared | 0.445 |
| P-Value | 0.004767 |
| 1-Pct3 AL | |
| R | 0.734 |
| Rsquared | 0.539 |
| P-Value | 0.002802 |
Topics: Research Studies, Statistical Analysis | 8 Comments »







October 14th, 2009 at 4:55 PM
What if you did the same analysis but with pitching? In other words how does a team’s pitcher’s not giving up HRs and walks correlate with winning?
October 14th, 2009 at 5:06 PM
Beane Count includes walks drawn, homers hit, walks allowed, and homers allowed. Though if I used only walks allowed and homers allowed, my belief is that the coefficient of determination would be about half of what it was for Beane Count. Since Beane Count, like baseball, involves half hitting and half pitching. If you’re only using the pitching side, you’re correlation is going to be about half of what it was using the entire picture.
October 14th, 2009 at 6:39 PM
Is the reverse true? Hitting HR and taking walks vs giving up homers and issuing walks?
October 14th, 2009 at 6:41 PM
I’m afraid I don’t understand the question.
October 14th, 2009 at 11:02 PM
What’s your p value on these variables? Unless you tell us at what level of significance the variables are accurate these graphs don’t tell us anything.
October 14th, 2009 at 11:49 PM
Here are the P-Values:
With that said, it’s not the point. Like I said, I didn’t perform any test of significance. That’s not what we’re after, here.
A test of significance would attempt to answer the question “does Beane Count determine winning percentage”. Of course it doesn’t. This study would fail any test of significance I attempted to perform on it at a reasonable confidence interval.
But it’s a systematic problem. Of course Beane Count doesn’t determine winning percentage, we’re talking about walks and homers in isolation.
I’m not attempting to shock the world with a revelation. If I were, a test of significance would certainly be in order. Just making an amateur attempt to quantify the importance of Beane Count as it relates to winning baseball games. And no, the data isn’t particularly meaningful. But like I said, I’m not testing pharmaceuticals here.
The correlation isn’t powerful, but it is somewhat useful if interpreted carefully and correctly.
October 15th, 2009 at 4:00 PM
“The eight playoff teams among the top-12 teams in Beane Count.”
um, not quite right… the D-backs and Twins are actually tied for 11th, ahead of the Dodgers at #13.
not saying BC is meaningless, but what does it mean that the D-backs (a terrible team) are virtually tied with the Dodgers (best team in the NL)?
October 15th, 2009 at 4:07 PM
Good catch, Matt. You’re right. My bad.
What that means is the Diamondbacks were bad at a number of things other than walks and homers. I want to be clear, I’m not asserting that maximizing Beane Count is the only thing you have to do to win baseball games. That’s certainly not the case. But it does help.