Reliever WXRL and Pythagorean Over Achievement
October 28, 2009 at 8:00 am by Capitol Avenue Club under Atlanta Braves
Pythagorean Winning Percentage theory states that a team’s ability to score and prevent runs is a better representation of their fundamental ability to win baseball games than their actual record. (See Steven J. Miller’s derivation of the formula) The theory seems intuitive to me. The ability to score runs and prevent them is a team attribute, where as winning the games seems to be a function of deviation from the averages. Whether or not that’s an actual attribute is up for debate, but Pythagorean Winning Percentage theory doesn’t believe it is. I mostly tend to agree.
However, a debate has surfaced as to whether or not teams can take measures to outperform their Pythagorean Winning Percentage. Is the ability to consistently over preform really an ability or is it just luck?
One prominent belief is that a good bullpen, specifically the back end thereof, allows a team to over achieve. The logical argument is that a good back end of the bullpen allows a team to win a lot of close games, but it doesn’t factor into blowouts. Example – 2007 Diamondbacks. Their Pythagorean record was 79-83. They overachieved this mark by 11 wins and won the NL West behind a 90-72 record, despite having only the 4th best Pythagorean Winning Percentage in their division. Proponents of the argument claim that their strong bullpen–featuring Jose Valverde, Tony Pena, Juan Cruz, Doug Slaten, and Brandon Lyon–is responsible for the over achievement. Those who don’t buy the argument say the 11 game over achievement can be explained entirely by luck.
My belief is that the truth is probably somewhere in the middle. Back to the Bill James principle (the principle of questioning every belief. Always ask the question, “Is that true?”). Is this true? Does a strong bullpen, or a strong closer, really allow a team to over achieve? I’ve designed a study that attempts to answer this question.
For 2009, I’ve listed every team, their 3rd order winning percentage, their actual wins, their 3rd order wins, and the difference between their 3rd order wins and their actual wins (dubbed “luck”) in the chart below:
| Team | Pct3 | W | W3 | Luck |
| NYY | 0.605 | 103 | 98 | 5 |
| BOS | 0.604 | 95 | 98 | -3 |
| TBR | 0.541 | 84 | 88 | -4 |
| TOR | 0.466 | 75 | 75 | 0 |
| BAL | 0.457 | 64 | 74 | -10 |
| DET | 0.482 | 86 | 78 | 8 |
| MIN | 0.484 | 86 | 78 | 8 |
| CHW | 0.551 | 79 | 89 | -10 |
| CLE | 0.474 | 65 | 77 | -12 |
| KCR | 0.490 | 65 | 79 | -14 |
| LAA | 0.518 | 97 | 84 | 13 |
| TEX | 0.472 | 87 | 76 | 11 |
| SEA | 0.414 | 85 | 67 | 18 |
| OAK | 0.432 | 75 | 70 | 5 |
| PHI | 0.569 | 93 | 92 | 1 |
| FLA | 0.524 | 87 | 85 | 2 |
| ATL | 0.604 | 86 | 98 | -12 |
| NYM | 0.507 | 70 | 82 | -12 |
| WSH | 0.466 | 59 | 75 | -16 |
| STL | 0.556 | 91 | 90 | 1 |
| CHN | 0.525 | 83 | 85 | -2 |
| MIL | 0.480 | 80 | 78 | 2 |
| CIN | 0.441 | 78 | 71 | 7 |
| HOU | 0.467 | 74 | 76 | -2 |
| PIT | 0.362 | 62 | 59 | 3 |
| LAD | 0.536 | 95 | 87 | 8 |
| COL | 0.533 | 92 | 86 | 6 |
| SFG | 0.495 | 88 | 80 | 8 |
| ARI | 0.516 | 70 | 84 | -14 |
| SDG | 0.410 | 75 | 66 | 9 |
If you accept the systematic assumptions that the Baseball Prospectus crew makes when they calculate their Pct3, this should be a fairly good indicator of how lucky a team was.
I’ve chosen WXRL as my statistic to model how good the back end of each bullpen performed. WXRL answers the question “how much more cumulative win probability has this reliever added to his team over what a replacement level reliever would, adjusted for the quality of the opposing batters”. I’ve then plotted the “Luck” column on the X-axis versus the WXRL of the team’s top WXRL reliever (usually, but not necessarily, the team’s closer). If strong closer performances are, indeed, responsible for Pythagorean over performance, we should see a roughly linear pattern, trending upward.

There isn’t much convincing evidence here that closers, or a team’s most effective reliever, can vastly influence Pythagorean over achievement. An R value of 0.374410 isn’t particularly strong, a p-value of 0.041750 indicates our results aren’t too robust in the first place, and a R^2 value of 0.140183 suggests that around 14% of Pythagorean over achievement can be explained by an effective closer, best case scenario.
But what about closer and set-up man?

The abThe above figure is that of a scatter plot of the sum of the WXRL of the team’s top two relievers versus their Pythagorean luck. The results are starting to become more meaningful. A correlation coefficient of 0.479670 is much better than the 0.374410 from the previous plot. A p-value of 0.007267 suggests our results are more significant, and a coefficient of determination of 0.230083 suggests as much as 23% of the Pythagorean over achievement in 2009 can be explained by strong performances from the closer and set-up man.
Just for form, let’s take it one step further, top 3 relievers:

As alAs always, sample size is an issue, but it’s pretty clear to me that having a strong back end of the bullpen has some influence on Pythagorean over performance. In this graph, we have a correlation coefficient of 0.554878, a coefficient of determination of 0.307890, and a p-value of 0.001456. What does this mean? Well, the two things do correlate, 31% of the change in Pythagorean over performance in 2009 can be explained by a strong trio of relievers, and our results are statistically significant.
I’m not out to shock the baseball world, I’m sure a lot of people in front offices have done studies similar to this one (or have reached similar conclusions through other methods), but I do think we (outsiders to the game) oversimplify the value of relievers. When analyzing the game economically, most people conclude that a pitcher pitching only 72 innings can only have so much impact and that the bullpen should be the last area that a team attempts to address. Component-wise, yes, this is true. But I think there’s sufficient evidence that a good bullpen allows you to out-preform your components, which is valuable in itself.
Final take-away: I think this also runs parallel to my thoughts on bullpen construction theory in a different way, that diversification is the better play. You’ll see that the performance of the top 3 displays a much stronger correlation with Pythagorean over achievement than the performance of the most valuable reliever, alone. Of course, every case should be treated individually, but the strength of the bullpen (or the back end thereof) as a whole is more important than the upside of any singluar relief ace. Case study: 2009 New York Mets. Spend $36 million on the FA market’s top closer, only to a) have him be out-performed (on the Mets) by Pedro Feliciano (and by a rather substantial margain) in addition to many other moderately priced relievers on other teams (including Mike MacDougal, Mr. 31-to-31 strikeout-to-walk ratio. *Embarassing!* Good move, Omar) and b) post the 4th lowest cumulative WXRL of their top-3 relievers in baseball.








So I’ll start off be admitting that I’m not familiar with WXRL (specifically how it’s derived), so if I make any incorrect assumptions about this stat please correct me. I did some quick poking around the web and I’m still not 100% sure I get it. You explain it as a measure of performance. This makes it seem like it would be influenced by luck, as a pitcher’s performance undoubtedly is (like how many batted balls are batted right at a defender as opposed to just past him, etc.). But the question you’re trying to answer (I think) is if something aside from luck can help explain what we assume is a lucky event (outperforming your pythagorean record). In this case, wouldn’t you want to use a stat that attempts to get at the underlying skill level of the relief pitcher(s) (like FIP) and see if that correlates with their pythagorean luck? In any event, I’d love to hear a little more about why you chose WXRL and what that stat might show as compared to other pitching stats. Thanks!
If I understand this correctly, the Braves should have won 98 (!) games in 2009, where they won “only” 86. In other words, we underperformed by 12 games. How is that possible with one of the best – if not THE best – 1-2-3 punches in baseball at the back end of our pen in Moylan-Soriano-Gonzalez? I assume either your conclusions are incorrect or the 2009 Braves have been the unluckiest team ever to play the game.
Harrison,
Yes, WXRL is influenced by luck, it’s a derivation of WPA, basically, adjusted for a number of things. However, the goal of the study is to determine if a certain systematic design can influence something we presume to be luck. And it’s the performance of the relievers, not the underlying skill, that’s more useful for testing this phenomenon.
Using underlying skill metrics–ERC, FIP, tRA, etc..–are extremely useful for predicting future performance. However, with this study, we’re not trying to predict the future, we’re trying to detect trends from what happened. Whether or not a team’s bullpen performed well due to luck (high FIP, low ERA) or skill (low FIP, low ERA) is irrelevant. It’s the actual performance of the relievers that matters.
Now, using WXRL as the basis for evaluating individual players is silly as it doesn’t tell us nearly enough about a player’s fundamental skills to be useful. But it’s a great metric to use to look at a bullpen’s effectiveness in retrospect.
By using WXRL over FIP, I’ve attempted to answer “Does a bullpen’s performance influence
Pythagorean over/under achievement?”, not “Does a bullpen’s fundamental skill, regardless of performance, influence Pythagorean over/under achievement?”.
I hope all of this makes sense. I thought about the statistics I chose when I was designing this study a great deal, and it’s extremely difficult to explain. I tried to make it as palatable as possible, but I’m an engineer and it’s sometimes difficult to put my thoughts on paper. Let me know if you have any further questions.
For further explanation of WXRL (or Pct3, etc..), see the Baseball Prospectus Statistics Glossary
Frank,
First of all, not nearly the unluckiest team in the history of the game. The Royals, Diamondbacks, and Nationals under achieved by more games than the Braves and the Indians and Mets also under achieved by 12 games.
The Braves were actually 10th in baseball in top-3 relievers WXRL. Behind the Yankees, Red Sox, Twins, Mariners, A’s, Reds, Dodgers, Giants, and Padres. This probably just means their under achievement is a result of a) some shitty luck and b) dumb managerial decisions. Mostly a, I think.
It’s also possible that Baseball Prospectus is making some false assumptions when calculating their Pct3 and the Braves aren’t as good as their .604 record (I think they’ve probably over estimated the team, not by a whole lot, though).
Even so, like I said, only 31% (at most) of the change in Pythagorean luck can be explained by strong performances from the top-3 relievers. At least 69% of it is luck and other things.
PJ,
neither the Royals, nor the D-Backs, Nationals, Mets or Indians were in the Top 10 in relievers WXRL. So their underachievement can be explained by poor relief pitching. However, the Braves’ can’t.
In fact, with the exception of the Red Sox, all of the teams ahead of the Braves in relievers WXRL overachieved, most of them by a wide margin. The Braves underachieved by a wide margin. If your conclusions are correct, then the Braves were EXTREMELY unlucky (if not the unluckiest team ever).
It’s PW of course, not PJ, sorry for the typo.
It could be a number of untested variables other than luck. My only conclusion is that a strong bullpen helps a team overachieve, not that a team with a strong bullpen means they will overachieve. It only explains 31% of Pythagorean variation, at most.
Rosenthal, exactly 15 days ago
Rosenthal, today
Can we all agree to never listen to anything Ken Rosenthal says? Ever again?
Thanks for clearing that up for me. I see why a measure of actual performance is a better statistic to use than a measure of talent level. I got confused by the fact that a bullpen’s actual performance can involve a lot of luck, but that’s a different matter from the “luckiness” of outperforming your pythagorean record.
In regard to the Rosenthal story, I love how he still maintains that “Hudson initially leaned toward going out on the open market.”
Peter, I think your numbers may be off by a little; the Braves 3d order wins for 2009 is 88 (or 87.5), not 98, at least according to:
http://www.baseballprospectus.com/statistics/standings.php
The few other teams I checked were also off.
A few other comments:
1. Any chance you could highlight ATL in the graphs?
2. I don’t understand WXRL very well, but if it is directly tied to actual W (like WPA), then I would think that of course there would be a correlation. Teams with bullpens with high WXRL are likely to have high W. Teams with high W are likely to have luck.
3. Doesn’t 3d order W include an adjustment for strength of schedule (of the underlying components, anyway)? If the point of 3d order W is to measure the quality of teams, without regard to their strength of schedule, then that is not a good baseline to use for luck. E.g., if 2 teams had the same 3d order wins, we would expect more actual wins from the team with the easier schedule. The difference in wins could be due entirely to strength of schedule, not luck. 2d order wins might be better for your purpose. (But I might have this completely backwards.)
Thanks for your work, as always.
a lot has been made of Bobby Cox’s management decisions, including bullpen management the past few years. We don’t talk too much anymore about Leo’s effect on bullpen usage/effectiveness when he was the pitching coach and most of the credit you still read about when the topic turns to him concerns the starters. I haven’t seen this commented and i don’t have access to subscriber’s content to BP’s stats etc, but from the data i have been able to access it seems the braves usually performed at or above their ‘expected’ winning % during Leo’s tenure and for four straight years now w/o him the team has underperformed it significantly. thoughts anyone?
Hizouse,
I used 3rd order W%, not 3rd order W. The prospectus crew calculates them differently: BPro Post Season Odds Report – PECOTA Adjusted.
Teams with high W are likely to have luck.
I don’t see why that’s true.
The theory is very complicated. Look at it this way. 3rd order W% accounts for component hits, walks, homers, K’s, etc.. It also accounts for strength of schedule and park effects. Actual Wins account for everything, including the components. By subtracting the two, we’re able to see what–other than offensive/defensive components, strength of schedule, and park effects–influences team’s records. Luck, strong bullpens, and a plethora of other, untested variables.
If the point of 3d order W is to measure the quality of teams, without regard to their strength of schedule, then that is not a good baseline to use for luck.
That’s a good point. I don’t think it necessarily interferes with the data, but the correlation would probably be stronger using 2nd order Pythagorean W%. Good call. I’ll update it using 2nd order if I get bored.
jefft,
That seems to be the case, but it’s impossible to definitively say that’s responsible. Plenty of untested variables get in the way.
How is there such a huge discrepancy between the Braves W3 and L3 (link in post 10) and the P3 (link in post 12), which is supposed to be based on W3 and L3? I think BP may have some errors in the data. Note that the regular playoff odds report gives the Braves a .537 P3 (=87 wins). The PECOTA-adjusted versions gives them a .604 P3 (=98 wins). 11 wins is huge! The ELO version is completely screwy.
I might agree that the PECOTA version, if the data is correct, would be better for measuring a team’s true talent–but then your definition of luck would presumably include the difference between all the players’ actual performance and their true talent (measured by PECOTA). I would think for the purposes of this study we’d want to eliminate that part of luck.
Hizouse,
I don’t know. I’ll ask Kevin about it.