Table of Contents

Friday, May 18, 2007

Why have the Reds deviated from Pythagoras?

One of the puzzling things about the Reds' record this season is that they're well below where they should be according to their Pythagorean Record, which estimates where the record should be based on a team's runs scored and runs allowed. Here's the breakdown (thanks to The Hardball Times):

Actual record: 16-25
Runs scored: 181 (4.4 r/g)
Runs allowed: 189 (4.6 r/g)
Pythagorean record: 20-21

In Dave Studeman's column today, he uses a simple method that he developed last season to assign the blame (and credit) for deviations from a team's Pythagorean record based on win probability data. Basically, he uses a regression to estimate what a teams' batting (or pitching) win probability should be given how many runs it has scored (or allowed), and then compares that to its actual win probability. If a team scores a lot of runs when it doesn't count, but does not score runs when it does count, they will show up in this analysis as underperforming.

Studeman's analysis put the majority of the blame (2.4 wins below expected) on the Reds' offense. Pitching was tabbed with "only" 1.0 wins below expected, given the number of runs they have allowed. This goes a bit against conventional wisdom in the Reds' blogosphere right now, which seems to assign most of the blame to the Reds' bullpen. Of course, if the Reds' starters have had a more positive influence than is typical for their runs allowed, while the bullpen has had a more negative influence than is typical for their runs allowed, the effect of the bullpen might be hidden in his analysis.

To tease the effects of starters and relievers apart, I replicated Studeman's procedure, regressing batting WPA on runs scored, starter WPA on runs allowed by starters, and reliever WPA on runs allowed by relievers. All WPA and runs data came from, and were current through May 16 2007. Here's how it breaks down:
Teams Batter Deviation Starter Deviation Reliever Deviation Predicted Pythagorean Deviation
Pythagorean Deviation
ATL 1.7 -0.9 2.4 3 3
DET 0.9 0.8 1.3 3 3
CHA 3.1 -1.2 0.6 3 3
TBA 1.8 1.0 -0.4 2 3
MIL 0.2 0.4 1.6 2 2
CLE 2.2 0.4 -0.7 2 2
SEA 2.2 -0.4 0.0 2 2
ARI 1.0 0.4 0.3 2 2
LAN -0.3 -0.4 2.4 2 1
STL 1.8 -1.4 1.1 1 2
COL 1.4 0.6 -0.9 1 2
PIT -0.7 -0.6 2.2 1 2
LAA -0.3 1.2 -0.1 1 0
NYN 0.5 0.2 -0.2 1 0
WAS -1.4 -0.5 1.5 0 1
BOS 1.2 -0.6 -0.1 0 0
HOU 0.3 -0.6 0.3 0 0
FLO -0.8 -0.8 0.7 -1 -1
TOR 0.0 1.9 -2.7 -1 -1
BAL -1.5 0.8 -0.6 -1 -1
MIN -1.7 -0.6 0.9 -1 -1
SDN -2.6 0.5 1.3 -1 -2
PHI 0.8 0.3 -2.6 -1 -2
TEX -1.3 -1.3 0.9 -2 -1
KCA -1.7 0.5 -0.6 -2 -1
SFN -0.4 0.6 -2.1 -2 -2
OAK -1.5 0.7 -1.5 -2 -3
CHN -2.0 0.3 -1.6 -3 -4
NYA -0.3 -2.2 -1.3 -4 -4
CIN -2.6 0.8 -2.0 -4 -4
The deviations noted above are essentially the differences in expected wins attributable to hitters, starters, and relievers, given the number of runs they've scored or allowed. As you can see, there is good agreement between the predicted deviation in wins from my regressions and the calculated Pythagorean record. There are inconsistencies, but they are never by more than a win...and keep in mind, we're dealing with only ~41 games here, so we'd expect a few more misses we might later in the season due to the small sample size. Also, if you compare my hitter deviations to Studes', they match up well--I think the small differences you see are probably just due to the fact that I have an extra day's worth of data in my dataset.

This analysis shows that the Reds' offense and bullpen are both to blame for the team's under-performance of its Pythagorean record. In fact, the offense is tied with SDN for the largest negative residual in WPA of any team in the majors right now, while the bullpen ranks 4th from the bottom.

There is a 2.8 win difference between the performance of Reds starters and relievers, relative to where their overall rate of allowing runs predicts they should be. This is not uncommon, indicating that it is useful to consider starters and relievers separately. Other teams with equal or larger differences between starter and bullpen effects on Pythagorean Record include Toronto (4.6 wins), Atlanta (3.3 win difference), Philadelphia (2.9 win difference), Pittsburgh (2.8 win difference), and the LA Dodgers (2.8 win difference).

Here are the data graphically. The deviations listed above refer to distance above or below the regression line (i.e. the residual):

Hitters (R2 = 0.66)
Starters (R2 = 0.73)Relievers (R2 = 0.35)The good news is that this probably means that the Reds can be better than their record indicates over the rest of the season, as deviations from season-average performance in high-leverage situations don't tend to be consistent over time (i.e. "clutch" players in the past don't tend to be clutch players in the future). The bad news is that bad performance in the most important situations by the offense and the bullpen already has the Reds down 4 wins below where they should be, which makes a playoff run a fairly low probability venture at this point in the season.