Table of Contents

Tuesday, November 27, 2007

Player Value, Part 5a: Pitchers

To view the rest of the player value series, click on the player value label at the bottom of this post.

Now that we (or, at least, I) have decided on how we'll evaluate the value of position players, it's time to turn our attention to pitchers. The goal will be the same as with hitters--estimate of a player's contributions to the team, reported in the currency of runs.

There's a sense in which this might be an easier thing to do with pitchers than with hitters. After all, one of the traditional ways in which we report pitcher performance--earned run average--is based on counting up the actual number of runs a pitcher allowed over the course of a season!

Absolute Runs Allowed Estimates

Let's start with the first step, which is getting an estimate for the total number of runs a pitcher allowed over the course of the season. Once we have that, we'll take a look at runs allowed vs. average and replacement. The easiest way to get the number of runs a pitcher allowed is to record the actual number of runs allotted to a pitcher based on our conventional scoring practices (below I'll refer to this as "TrueRuns"). However, I also want to think about two other alternatives.

Base Runs

Base Runs ("BsR") was introduced in the piece on Run Estimation for position players. With hitters, it's not appropriate to use the base runs equation to estimate runs created. Doing so assumes an interaction between the individuals' ability to get on base and that individual's ability to move runners around the bases. With pitchers, though, that assumption is entirely appropriate--a pitcher's ability to prevent baserunners directly interacts with his ability to prevent the advancement of those baserunners to determine how many runs are permitted to score.

Why would you want to use base runs instead of actual runs scored to assess player value? Well, if starters threw complete games every time out, I'm not sure that there would be a compelling reason. However, that's not what actually happens. Often, starting pitchers will leave a game in the middle of an inning with runners on base. Whether those runners score has to do with the performance of relievers. On a team with an outstanding bullpen, a large number of those inherited runners might not score. But in that case, I think we're overestimating the value of that starter because of the outstanding performance of the bullpen.

Similarly, the performance of bullpen pitchers may not be properly assessed by using straight runs scored. A reliever can come into a game with the bases loaded and two outs and allow three base runners before ending the inning, and yet still not be charged with any runs allowed--they all go to the starter. Clearly that's overestimating how much value the reliever is bringing to the team.

Base Runs allows us to get around those issues by assessing the typical value, in terms of runs allowed, of each event that happens while a pitcher is on the mound. Base Runs still retains the nonlinear way in which singles, doubles, walks, etc, interact to produce runs as they become more frequent. It certainly misses out on some of the context of when exactly those events happen with respect to one another. But given its accuracy across a variety of situations and conditions, and the ability to focus exclusively on what happens while a pitcher is on the mound (as opposed to the actions of other pitchers in the same game), it may be preferable to use BsR--especially when dealing with small sample sizes or exceptionally strong/weak bullpens.

I'll present two alternative equations. The first, which is the one I'm using in this article, uses data provided by Baseball Reference. These data are really awesome because for each pitcher, they include all the various statistics typically reported for hitters--singles, doubles, triples, stolen bases, etc. This allows me to use essentially the same equation for pitchers that I used for hitters, though in this case I'm also including Reached On Errors (ROE) in my equation. As with the hitters, I've forced this equation to predict runs scored in the '04-'07 National League. Here's "my" custom base runs equation:

BsR = A*(B/(B+C)) + D
A = H - HR + NIBB + IBB + ROE + HBP + 0.08*SH
B = .829*1B + 2.224*2B + 3.578*3B + 1.872*HR + .059*NIBB + .912*ROE + .928*SB - 1.356*CS + 0.186*HBP - 0.551*IBB + .830*SH - 1.356*GDP - 0.005*nonKOuts - 0.065*K
C = 0.92*SH + nonKOuts + K
D = HR
Outs = AB - H + SF + 0.92*SH
nonKOuts = Outs - K - 0.92*SH

(note: SH has to be removed from the nonKouts term because it is handled separately in parts B & C, but I like to include it in my outs term when estimating r/g later on. This is something I should have done with hitters, and I'll revise those equations shortly--I'm sure it's a very minor adjustment).

Note that if you only have traditional pitching statistics, there is an alternative version of the base runs equation that is designed for those stats that was originally devised by David Smyth. Here's a slight variation on that equation, with the B term "fudged" slightly to perfectly match MLB '03-'07 totals.

BsR = A*(B/(B+C)) + D
A = H + BB - HR + HBP - IBB
B = -0.625*H + 0.104*BB - 3.123*HR + 1.457*eTB + 0.1*HBP - 0.1*IBB
C = IP*3
D = HR
eTB = 1.12*H + 4*HR

This is the equation I used in my profile on Francisco Cordero because it only uses widely available pitching statistics, which are the most convenient stats to pull from most online player profile pages.

FIP Runs

Another alternative way to estimate runs allowed is to make use of Tom Tango's Fielding Independent Pitching (FIP) statistic. FIP is calculated as FIP = (13*HR + 3*(BB+HBP) - 2*K)/IP + X, where X is usually a constant to force league average FIP to equal league average ERA. However, if we're interested in estimating total runs allowed, rather than just earned runs allowed, we can instead force the equation to match league average runs/9 innings (RA). In 2007 NL, X = 3.54. We can then convert FIP, which is a runs/9 estimate, to a season-total runs estimate like this:

FIPRuns = FIP / 9 * IP

Classically, FIP tries to estimate ERA using statistics that are under the exclusive control of the pitcher (i.e. not influenced by fielders)--walk rate, strikeout rate, and home run allowed rate. The literature on defense-independent pitching statistics (DIPS) indicates that these statistics are much more repeatable from year to year than hit/ball in play rate. Therefore, when ERA deviates from FIP, this is usually (though not always) the result of either "luck" or the fielding behind the pitcher.

Now, from the perspective of player value, there's a sense in which a DIPSish estimate like this isn't particularly informative. In the case of a pitcher with an unusually high BABIP due to poor fielding or bad luck, even if they're not necessarily his fault, those hits did happen and therefore reduce how much value that player "contributed" to his team. However, the reason we're interested in value in the first place is, usually, to get some idea of how well players performed. Therefore, I think there's worth in using a stat like this as a compliment to (not a replacement for) our other estimates of value.

Ok, all that said, let's take a look at how these three measures of absolute runs allowed compare using the '07 Reds as a case study. Note that I have applied a park factor adjustment on these numbers. One could argue whether that's appropriate for FIP, but I went ahead and did it under the assumption that a HR park factor is a big part of the typical park factor:

Update (1/18/08)
: Recently, I have taken to calculating FIPRuns based on a HR-Park factor adjusted HR total for players or teams. I then do not adjust by overall runs park factor. The effect turns out to be largely the same. Given that k/9 and bb/9 are somewhat dependent on runs environment (more runs = more PA's per nine innings), I'm also not convinced it's necessarily a better option. But it somehow feels like the right thing to do.

Pitcher IP TrueRuns BsR FIPRuns
MBelisle 177.7 109.9 105.3 94.4
BArroyo 210.7 107.9 117.4 111.7
AHarang 231.7 99.0 96.4 101.0
KLohse 131.7 75.2 71.1 68.7
MStanton 57.7 38.6 37.4 29.5
KSaarloos 42.7 35.6 33.5 29.1
TCoffey 51.0 35.6 39.3 35.1
BLivingston 56.3 34.7 36.8 30.1
DWeathers 77.7 32.7 33.1 35.6
HBailey 45.3 31.7 26.2 25.7
PDumatrait 18.0 29.7 29.2 17.7
VSantos 49.0 27.7 30.8 31.6
GMajewski 23.0 21.8 20.3 12.6
JCoutlangus 41.0 21.8 21.1 21.9
MGosling 33.0 21.8 26.3 22.3
EMilton 31.3 20.8 21.9 16.8
TShearn 32.7 17.8 19.3 24.7
JBurton 43.0 14.9 10.7 19.4
ERamirez 16.3 13.9 12.4 14.2
MMcBeth 19.7 12.9 12.8 9.3
BSalmon 24.0 10.9 10.1 12.3
EGuardado 13.7 10.9 9.0 8.0
BBray 14.3 9.9 7.8 5.5
RStone 5.3 5.9 5.8 7.4
RCormier 3.0 3.0 3.0 3.0
I have to primary observations from this table. First, each of these estimates tell us pretty similar things about the pitchers. That's good, because they're all estimates of the same thing. True Runs and BsR, in particular, are pretty darn close for most pitchers, with a correlation of 0.995. Bronson Arroyo, for whatever reason, shows the biggest difference, with BsR showing him costing the Reds ~10 more runs than he "actually" did. Given how close they are, my tendency is to just use BsR--it avoids confounds with the performances of other pitchers, and has the advantage of following a very similar methodology to that we use for hitters.

My second observation is that these values give us very little indication of player value. The finding that Belisle, Arroyo, Harang, and Lohse allowed more runs than anyone else has everything to do with the fact that they got more innings than anyone else on the staff! Fortunately, there are three simple stats that we can calculate from these runs totals to get a better handle on player value: runs per game, runs above average, and runs above replacement.

Baselines for Pitching Value

Runs Per Game

RPG = Runs/IP*9
RPG = Runs/Outs*26.25

For true runs and FIP, I'm using IP. For BsR, I'm using Outs. But it's basically just doing the same thing.

Runs Above Average

RAA = (RPG - lgRPG)/9*IP * -1
RAA = (RPG - lgRPG)/26.25*Outs * -1

Pretty straightforward, right? Just subtract league average runs per game from the player's runs, then extend it out to the full number of innings or outs (depending on how you're calculating runs per game) in a season. I multiply by -1 to convert this from runs allowed above average to runs saved above average, because I find that this makes it more straightforward to interpret (positive numbers are good).

Update (1/18/08): I currently include an adjustment for lgRPG based on expected differences in relievers and starters, similar to what I do for RAR (see below). In this case, the assumption is that the average pitcher will allow runs at 89.5% of league average as a reliever, but 110.5% of league average as a starter.

Runs Above Replacement

RAR = (RPG - Y*lgRPG)/9*IP * -1
RAR = (RPG - Y*lgRPG)/26.25*Outs * -1
(For starting pitchers, Y=1.28. For relief pitchers, Y=1.07)

This is the same as the RAA equation, except for the Y coefficient. That's where things get a bit interesting.

The issue is that there are two different roles for pitchers: starting and relieving. In general, pitchers perform much more poorly as starters than they do as relievers (see pp.201-207 in The Book for a nice study on this, or this thread and this thread for some additional estimates and arguments). Therefore, we need to use different baselines depending on whether a pitcher is being used as a starter or a relief pitcher. Unfortunately, probably even more so than for hitters, there is not a clear consensus on what numbers we should use to do this. For example, as far as I can tell, Tom Tango, MGL, and Patriot all use slightly different values for starter and reliever replacement level. At this point, until I do my own study on this, I'm just going to pick Tom Tango's numbers because it seems like he's done a lot of thinking/analysis on this issue. That's not to say that I'm sure his numbers are correct, or that the other numbers are wrong, but his numbers make sense and seem to be consistent with empirical data.

Anyway, Tango argues that a replacement pitcher, used as a starter, will produce a 0.380 winning percentage, which means he will allow runs at 128% of league average. In contrast, a reliever will be good for an 0.470 winning percentage, and will allow runs at 107% of league average. The latter number may surprise folks, because it means that replacement level for relievers is very close to league average! Not good news for the Reds' bullpen, or for evaluations the effectiveness of the Reds' front office.

There are pitchers, of course, who serve as both starters and relievers over the course of a season. Ideally, we'd treat a pitcher's relief outings separately from his starting outings and then sum the RAR together, but that's not possible to do if you're working from a single row of data per player like I often am. Therefore, I'm going to borrow from Patriot's approach and categorize pitchers this way: starting pitchers are defined as those who made at least 50% of their appearances as starters, or who started at least 15 games in a season. Relievers are everybody else. It's not perfect, but it'll get us pretty close to the mark.

One last point: thinking about it now, a starter/reliever adjustment should probably be done to the RAA calculations too. But given that I generally prefer RAR to RAA, I'm going to ignore that for now...someone can fill in the blanks for me if they like. :)

2007 Cincinnati Reds

Below I'm reporting RAR values for the '07 Reds using Base Runs and FIP Runs. "True" runs above replacement is very similar to Base Runs (correl = 0.97), but as I said above, I'm partial to Base Runs because they are less confounded by the performance of other pitchers. Therefore, I'm opting not to report those values to reduce the clutter in this table.

Update (12/3/07): I discovered I accidentally was using the wrong replacement-level values in my spreadsheet. Starters got a bump upward, relievers got a bump downward. Oops!
Base Runs
FIP Runs
Pitcher IP RAR
Pitcher IP RAR
AHarang 231.7 61.4

AHarang 231.7 56.0
BArroyo 210.7 25.9
BArroyo 210.7 30.9
KLohse 131.7 17.5
MBelisle 177.7 25.9
MBesile 77.7 13.7
KLohse 131.7 20.4
JBurton 43.0 13.5
DWeathers 77.7 8.3
DWeathers 77.7 11.7
BLivingston 56.3 8.0
HBailey 45.3 4.4
JBurton 43.0 4.9
BSalmon 24.0 3.2
HBailey 45.3 4.9
TShearn 32.7 2.2
EMilton 31.3 4.4
JCoutlangus 41.0 1.9
MStanton 57.7 3.1
BLivingston 56.3 1.6

BBray 14.3 2.6
BBray 14.3 0.4

MMcBeth 19.7 1.8
EMilton 31.3 0.4

JCoutlangus 41.0 1.3
EGuardado 13.7 -1.0
BSalmon 24.0 1.3
MMcBeth 19.7 -1.4
GMajewski 23.0 0.4
RCormier 3.0 -1.4
EGuardado 13.7 -0.3
ERamirez 16.3 -1.9
RCormier 3.0 -1.3
RStone 5.3 -2.8
TShearn 32.7 -2.6
49.0 -3.4
ERamirez 16.3 -3.2
MStanton 57.7 -5.2
MGosling 33.0 -3.7
GMajewski 23.0 -7.6
VSantos 49.0 -4.0
MGosling 33.0 -8.1
RStone 5.3
KSaarloos 42.7 -9.1
KSaarloos 42.7 -5.0
TCoffey 51.0 -10.4
PDumatrait 18.0 -5.6
PDumatrait 18.0 -17.1
TCoffey 51.0 -6.4

Brief Notes:
  • Aaron Harang was clearly the most valuable pitcher on the staff. Duh. However, his value estimate of 61 runs above replacement also puts him well over top-ranked position player Brandon Phillips, who I estimated at just shy of 40 runs above replacement. Phillips' outstanding PMR ratings will give him a bit of a boost when I update those numbers. Nevertheless, it will not be enough to catch Harang, the 2007 Reds MVP.
    • As an aside, a 5.5 WAR pitcher, which is what Harang could potentially be projected to be in the future based on this analysis, is worth roughly $24 million/year as a free agent according to Tom Tango's pay scale. Even if he "declines" to average "just" 4 WAR a season from here on out, that's still worth $20 million/year. That 4-year, $37 million extension prior to this season is looking pretty darn good, eh? One of the moves that Krivsky really got right.
  • The player getting the biggest boost in the FIP Runs column is Matt Belisle, who goes from scrub to respectable. Belisle's peripherals weren't terribly different from those of Bronson Arroyo, but his BABIP was a tad high at 0.326, and his FIP (4.54) looks a lot better than his actual ERA (5.32). If he can post a mid-4's ERA next season it would go a long way toward solidifying the Reds' rotation.
  • Falling the other direction was Jared Burton. Jared had a fine first season, especially given that he was making the jump from AA to MLB this year. Nevertheless, his walk rate (4.9 bb/g) was unacceptably high, and will have to improve if he's going to continue to be successful out of the pen. Fortunately, at least as a trend, his control was much improved during the last month or two out of the pen, giving hope that he can really be a force next season out of the pen.
  • What on earth happened to Todd Coffey?
Update (12/3/07): The reliever values reported above should be considered preliminary. As detailed in the comments below, I neglected to consider leverage when assessing reliever value. I'm working on an adjustment to correct this problem. Basically, David Weathers gets a nice boost. :)

The next in the player value series is a piece on runs environments, including park factors and custom team linear weights. That might get delayed for a bit though--I'm writing for the Hardball Times Season Preview again this season, and that's going to occupy a lot of my time over the coming weeks. Should be fun! BTW, if you haven't already, go here and order both the Hardball Times Annual and the Season Preview together and get a 10% discount (use code HTC08)! :)
Aaron Harang photo by Charles Rex Arbogast


  1. Great post, I was waiting for your Pitcher valuation piece, as I wanted to see how you handled leverage. I think you understate closers and set-up men by not including some form of pick-up for high leverage innings in your valuation. Further, have you considered giving league average run values for batted ball types (FB, GB, LD)instead of just FIP. Pitchers ability to control batted ball types is high (with the exception of LD), and this would do a better job in giving credit to the groundball pitchers of the world. I think Harang (given his high K rate) may be overvalued in your system. Anyways great series!

  2. I think Darren's got some good ideas. I think leverage should definitely play a role in the reliever valuation since their usage is usually determined (though not always) by how high leverage the situation is.

    I would disagree on overvaluing Harang though. Methinks that Darren didn't have to watch the Reds defense a lot this year. You can't overvalue strikeouts when the defense if iffy behind you. :)

    As for Harang's contract, the total value is a little deceptive since the Reds bought out 2 years of arbitration from him. Yes, he's still a good value in 2009 and 2010 (at around $11-12m), but the first two years only valued $11m because of the arb buyout. The bigger key is that it sure is nice to have young pitchers who perform well. Here's to 6 years of Bailey and Cueto giving the Reds 5 WAR! :)

  3. Good point Joel. Perhaps I should have said, the other groundball pitchers on the staff would be undervalued in relation to Harang, using FIP.

  4. Justin, as a THT contributor, have you had a chance to look at John Beamer's Markov Chain tool?

  5. Great comments folks.

    RE: Leverage and relievers.

    I'm open to suggestions on how to do this. My goal has been to have these sorts of estimates calculate-able from a row of fairly traditional player statistics. I also want it to be automated (mostly) in Excel, so manual assignments of role isn't something I'd want to get into doing. I suppose I could incorporate pLI from fangraphs into these numbers somehow, though whether or not that's appropriate to do to runs-based estimates is questionable. Anyway, I haven't researched leverage issues much, so links to example studies and methods papers would be very welcome.

    RE: overestimating the impact of high-strikeout pitchers via FIP.

    I see the issue, because high strikeout totals will depress one's hr/9, just as high GB%'s will. We could always do the exact same process QERA, which uses %K, %BB, and %GB data instead of k/9, bb/9, and hr/9. I'll see about running those numbers tonight or tomorrow night if I get a chance. That said, Harang is actually rated lower by the FIP system than by base runs, and his BABIP wasn't all that low (0.288). So I'm doubtful that the problem is all that severe.

    Re: Markov
    I haven't had a chance to see it yet--I'm just involved in the Season Preview (mostly). But I've been intrigued about the thing since he first announced it a month or two back. I don't think it'll be that much of an improvement, in terms of the actual estimates it provides, compared to the Base Runs method for estimating player value over a season (with the possible exception of relievers). But in terms of addressing questions about strategy, etc, it could be a fantastic resource.

  6. There is no over-estimate with the high K players.

    Here's how to test it. Grab the career records of all pitchers with at least 1000 BFP. There's 728 of them. Take the top 30 players in K per BFP.

    The actual ERA of these pitchers is 3.26. Their BaseRuns-based ERA is 3.31. Their FIP-based ERA is 3.41.

  7. Tango, thanks for that. I may still run QERA just for comparison's sake, but that's good to hear.

  8. Nice job, Justin. You could also use the list of Pitching Runs Created at THT, which is kind of like FIP aligned along a baseline of zero instead of bench level. It looks like the results are the same.

    FIP doesn't overvalue strikeout pitchers -- it values them most appropriately. Also, you won't gain anything by looking at the run values of batted ball type. Once you take out home runs (which FIP does), GB pitchers aren't different from FB pitchers as a rule. GB's are hits more often, but for less run value, so they tend to even out with FB's (which are hits less often but for more run value).

  9. Good stuff, once again.

    Am I right that your Baseruns numbers are not defense-independent? That seems like a necessary adjustment for grading any pitchers (although maybe I'm wrong and you did go that route.)

    As for leverage, I've played around with assigning reliever leverage based on skill and using a sliding scale. A player with skills implying a 3.00 ERA deserves to be used in higher leverage situations than a 4.00 ERA reliever. Because managers often misuse their relievers, leverage gets dished out in strange ways. In a WPA-style stat, you obviously have to use actual leverage. But if you want a context-neutral value stat, I'm perfectly fine with assigning leverage based on how a reliever should have been used.

  10. Just to back up Dave's point, once you remove HR from the the FB pool, the OPS of non-HR FB is .448 versus .495 for grounders. Thanks FanGraphs!

    Of course, GB have the double play possibility which FB don't have nearly as often, so I can see how it comes out to be a virtual wash.

    FWIW, the OPS on non-HR LD is 1.600. So yeah, don't allow liners. Perhaps including LD% would get at the batted ball issue a bit better. That said, I don't know if LD% is a skill. But perhaps it doesn't matter, because what we're getting at with FIP is taking defense out of the equation, and it's not the defenses fault if the pitcher allows a line drive, "random" or not.

    Matt Belisle is a good example of this. His FIP last year was 4.54 compared to a 5.32 ERA. His BABIP was in line with expectation, thanks to a very high LD% -- 22.0%, 3rd among qualified starters. Should we blame his fielders for his allowing such a high rate of liners? Now, it might not be his "fault" either if it's a more-or-less random occurrence, but to place the blame on the defense is unfair too.

    What kind of regression to the mean can we expect on LD%? According to Dave Studeman in a THT article ( -- I'm guessing this is you, Dave?), it appears that LD% is not a skill. However, it seems that analysis could use a revisit for confirmation -- perhaps using the Pitch/Fx angle.

    In any case, I do think we should control for hit type in some fashion if the goal is to truly understand the impact of the defense versus what the pitcher himself has done.

  11. Hi folks,

    Nice comments, thanks.

    The largest component of variation in BABIP (and thus, probably, FIP) is due to luck, as Tango et al.'s work has shown:
    luck: 44%
    pitcher: 28%
    fielding: 17%
    park: 11%

    I do think that it would be appropriate to adjust for fielding (like we do with park effects) when assessing pitcher value. A strictly fielding adjustment for pitcher value might be best done using a system like Pinto recently reported with PMR. But my experience with those numbers is that they usually don't make that much of a difference (+-5 runs on Reds starters last season), so I'm not super concerned about it.

    If we're assessing player value, though, I'm not sure that we should necessarily try to remove the effects of "luck." We certainly don't try to remove "luck" from hitters, where it still can be an important factor (though admittedly not to the extent as with pitchers). Value is a different thing than assessing performance--in which case we absolutely should be accouting for "luck." Depends on the question we're asking as to which is most appropriate.

    I think the two options I presented here--one based on Base Runs (no adjustments for luck...or fielding), and one based on FIP (adjustments for everything, including variation in pitcher skill at avoiding h/bip), presented together, give us a pretty nice picture of pitcher value and performance.

    The one modification I'm interested in potentially doing with these data is adjusting for leverage. Sky, your point about adjusting for leverage based on how pitchers should be used, rather than actual leverage, sort of speaks to the issues of value vs. performance! :) For value, I think we should use the actual values. But for performance, something like you mentioned might be more appropriate.

    However, from a practical standpoint, I can see a "what should leverage have been" approach being very helpful if it could be automated and estimated based on traditional pitching stat lines. The GuyM method that Tango's using at his blog to estimate monetary value might be adaptable to runs data... Sky, if you have any specific suggestions on methodology for this issue, please do drop me a line.

  12. I've tried to look at a simple way to include a context neutral leverage when I value pitchers. It's not perfect, but I would estimate how many innings a reliever would pitch in high leverage situations. For instance relievers with 35 save opportunities would have 35 innings at a leverage of 2.00. Or set up men with 35 holds would have 35 innings with a leverage of say 1.5. You can amend the estimated innings per save or hold event, but at least you can use readily available stats. All other innings would be a leverage of 1.00. For starting pitchers just leave the leverage at 1.00.

  13. Hi Darren,

    I've been fiddling around with just adjusting pLI based on quality of performance, as Sky mentioned above (he and I have been emailing back and forth a bit on this). I think the idea has quite a bit of merit.

    I understand your idea about using holds and saves instead. I may look at that too, though it seems a bit rougher than this approach because it wouldn't consider pitcher performance when a team was trailing slightly. Maybe that all events out, though. And it does have the advantage of actually telling you something about how pitchers were actually used.

    Anyway, I'll hopefully have more on this in the coming day or two.

  14. Fangraphs shows the reliever/starter data broken down on the team pages. Since you are only dealing with one team, it should be a snap for you to give multiple rows for pitchers.