Wednesday, December 05, 2007

Player Value, Part 5b: Leverage and Relievers

Among the biggest concerns that folks had about the first piece on pitcher value is that I was missing something about the value of relievers because I wasn't accounting for the leverage of the situations in which they are used. While this is getting into an area in which I'm not particularly well-read, I think those criticisms are correct and I'm going to take an initial stab at making this adjustment.

What is leverage?

If someone hooked me up to a heart-rate monitor during a ballgame, they'd note that my heart rate varies quite a bit over the course of the game. One run lead in the top of the third inning? I'm into the game, but fairly relaxed. Down eight runs in the fourth inning? I'm falling asleep. Tying run on third with no outs in the 9th? I'm on the verge of a heart attack.

The reason for the variation in fans' heart rates, degree of white-knuckleness, etc, over the course of a game, of course, is that different situations have different impacts on the outcome of the game. With a man on third with no outs in the 9th, every single pitch has a high likelihood of determining the outcome of the game. Down eight runs in the 4th? Well, in that case, the other team's chances of winning are so high that whatever happens next on the field is almost irrelevant to the outcome of the game.

Leverage is the term that baseball statisticians use to describe the importance of game situations. High leverage situations are those situations that are highly influential on the outcome of a game, whereas low-leverage situations don't mean a lot to the outcome of the game.

We can quantify the actual impact of events in a game by looking at changes in win probability. In this approach, folks have created a model of how likely teams are to win games based on the score, the inning, the number of men on base, and the number of outs. We can then monitor how this win probability changes over the course of the game. Let's say that the Reds are down by three runs in the bottom of the ninth, but load the bases with two outs. Certainly that's a situation that would still have the Reds losing more often than not (90% of the time, according to the model), but it's also a high leverage situation in that the Reds have a chance to pull out a win with one swing of the bat. Now, if Adam Dunn comes up and hits a grand slam to win the game, that's a huge improvement in the Reds' win probability, which changes from ~10% to 100%. In other words, with one swing of the bat, Adam Dunn's performance contributed 90% of a win to the team (we would credit him with +0.90 Win Probability Added [WPA] for that plate appearance).

Leverage Index (LI) is an effort to quantify the importance of game situations, as is essentially calculated as the relative spread of how much win probability could change, given different situations. This spread, divided by the average spread of all possible game states, is leverage index. Under this convention, a leverage index of 1.00 is, by definition, a situation with average leverage. The situation described above had an enormous spread in how win probability could change: -10% if Dunn makes an out, +90% if Dunn homers, with a bunch of other possibilities in between. In this case, the actual leverage index was 3.91, or ~4 times as important as an average game state. In contrast, if a team trails by eight runs in the 4th, then leverage index will be much less than one.

Why is this important to estimating the value of relievers? Well, if a pitcher is, on average, used in situations with an average leverage index of 2.0, the runs he gives up are about twice as important, in terms of value to team wins, as those given up by a pitcher used in situations with an average leverage index of 1.0. And the 1.0 LI pitcher's runs, in turn, are ~twice as important as those given up by a pitcher used in situations with an average leverage index of 0.5.

The calculation of leverage index, by necessity, requires play-by-play data, which still isn't something I've started to work with. Fortunately, FanGraphs.com reports the average leverage index per plate appearance for all relievers in its pLI statistic. This stat essentially tells us the average leverage index under which each reliever pitched last season, and thus gives an indication of his opportunity to influence a ballgame based on his performance.

So how can we go about using pLI to adjust our estimates of reliever value? Well, let's start with a way that LI is often used with win probability statistics: WPA/LI. WPA/LI, as described by Tom Tango, is a "situation deflated" version of Win Probability Added (WPA; see above), and describes the change in win probability that would occur based on that player's performances if every single plate appearance had happened in a situation with average leverage (i.e., LI=1.00). Therefore, it takes WPA, which is heavily situation-dependent, and converts it to something that is much more situation independent, and thus similar to more traditional estimates of player performance in which all plate appearances are given equal weight.

Now, WPA/LI is not the same thing as WPA/pLI. WPA/LI is calculated on a per-PA basis. Because pLI is just the average LI of all PA, WPA/pLI will differ from WPA/LI depending on the PA-to-PA variation in WPA and LI. Nevertheless, at least in concept, it's trying to do the same thing, and gives us a basis for applying leverage to reliever runs data.

If we use a 10 runs = 1 win approximator (commonly used, and consistent with the coefficients relating runs to wins that I showed in the first article of this series), then we can make this approximation:
RAA ~= WPA/pLI * 10

RAA is Runs Above Average, which mirrors WPA in that it's centered around league-average. If our goal is to get an estimate of value that is more dependent on the situations in which a reliever pitched, we're essentially asking for a value estimate that is more like Win Probability Added (WPA). So, with 9th-grade algebra, the above equation converts to:

WPA ~= RAA * pLI / 10
or
"RPA" = RAA * pLI = RARLI

In other words, we can simply multiply Runs Above Average by pLI to get something that approximates the situation-specific runs above average value of a reliever. Cool!

Nevertheless, I like to report value relative to a replacement-level baseline, not average. And converting our situation-specific RAA number to RAR requires a slightly round-about approach. If we revisit our reliever RAR equation from the previous article on pitchers, it was:

RAR = (RPG - 1.07*lgRPG) / 9 * IP * -1

which is the same as:

RAR = [(RPG - lgRPG) / 9 * IP * -1] + [(0.07*lgRPG)/9*IP]

This essentially just adds the additional runs a replacement pitcher would be expected to give up relative to an average pitcher to a reliever's RAA estimate. This converts it from a RAA estimate to a RAR estimate. So, the above equation is the same as:

RAR = RAA + [(0.07*lgRpG) / 9 * IP]

So, to make this a situation-specific RAR estimate, we can use this equation:

RARLI = [RAA*pLI] + [(0.07*lgRpG) / 9 * IP]
or

RARLI = [(RpG - lgRpG) / 9 * IP * -1 * pLI] + [(0.07*lgRpG)/9*IP]

It's a little bit ugly. But it does the job.

Please note that I'm only going to use this equation for relievers. While starting pitchers do diverge from 1.00 leverage from time to time, those deviations tend to be more or less random. Relievers, on the other hand, deviate in consistent ways from average leverage based on how their managers choose to employ them.

2007 Cincinnati Reds Pitchers, Take Two

How much of a difference does factoring reliever leverage into our estimates actually make? Well, here is the table from the previous article, expanded to also include leverage-based numbers for relievers (starters were forced to be 1.00 LI pitchers):

Base Runs
FIP Runs
Pitcher IP RAR
pLI
RARLI
Pitcher IP RAR
pLI
RARLI
AHarang 231.7 61.4 1.00 61.4
AHarang 231.7 56.0 1.00 56.0
BArroyo 210.7 25.9 1.00 25.9
BArroyo 210.7 30.9 1.00 30.9
DWeathers 77.7 11.7 1.95 20.0
MBelisle 177.7 25.9 1.00 25.9
KLohse 131.7 17.5 1.00 17.5
KLohse 131.7 20.4 1.00 20.4
MBelisle 177.7 13.7 1.00 13.7
DWeathers 77.7 8.3 1.95 13.5
JBurton 43.0 13.5 0.95 12.9
BLivingston 56.3 8.0 1.00 8.0
HBailey 45.3 4.4 1.00 4.4
HBailey 45.3 4.9 1.00 4.9
TShearn 32.7 2.2 1.00 2.2
JBurton 43.0 4.9 0.95 4.8
JCoutlangus 41.0 1.9 1.06 2.0
EMilton 31.3 4.4 1.00 4.4
BSalmon 24.0 3.2 0.46 1.9
MStanton 57.7 3.1 0.95 3.0
BLivingston 56.3 1.6 1.00 1.6
BBray 14.3 2.6 0.99 2.6
BBray 14.3 0.4 0.99 0.4
MMcBeth 19.7 1.8 1.21 2.0
EMilton 31.3 0.4 1.00 0.4
JCoutlangus 41.0 1.3 1.06 1.2
RStone 5.3 -2.8 0.33 -0.8
BSalmon 24.0 1.3 0.46 1.1
EGuardado 13.7 -1.0 0.95 -0.9
GMajewski 23.0 0.4 1.32 0.3
MMcBeth 19.7 -1.4 1.21 -1.9
EGuardado 13.7 -0.3 0.95 -0.2
ERamirez 16.3 -1.9 1.00 -1.9
RStone 5.3 -4.4 0.33 -1.3
RCormier 3.0 -1.4 1.40 -2.1
MGosling 33.0 -3.7 0.62 -1.8
VSantos 49.0 -3.4 0.78 -2.3
RCormier 3.0 -1.3 1.40 -1.9
MGosling 33.0 -8.1 0.62 -4.6
TShearn 32.7 -2.6 1.00 -2.6
MStanton 57.7 -5.2 0.95 -4.8
VSantos 49.0 -4.0 0.78 -2.7
KSaarloos 42.7 -9.1 1.05 -9.6
ERamirez 16.3 -3.2 1.00 -3.2
TCoffey 51.0 -10.4 0.94 -9.7
KSaarloos 42.7 -5.0 1.05 -5.4
GMajewski 23.0 -7.6 1.32 -10.2
PDumatrait 18.0 -5.6 1.00 -5.6
PDumatrait 18.0 -17.1 1.00 -17.1
TCoffey 51.0 -6.4 0.94 -5.9

The biggest difference we see between the first set of RAR numbers and the RARLI numbers, within both the Base Runs and FIP-based estimates, is that David Weathers' value gets a considerable boost. This reflects the excellent job that Reds' managers did in using him in high-leverage situations this season, often coming in to get an out or two during the 8th inning. Similarly, we see Gary Majewski's negative BaseRuns value exaggerated (appropriately) due to the fact that he performed terribly in high-leverage situations this year.

On the other side of the coin, the low leverage of the innings in which they pitched mitigated the negative value of several Reds pitchers, including Victor Santos, Ricky Stone, and Michael Gosling. While from a performance evaluation standpoint, this seems to let those pitchers off the hook, it seems appropriate to do this from the standpoint of assessing the value of these players to the 2007 Cincinnati Reds.

What if you don't have or don't want to deal with pulling pLI from fangraphs?

Updated 11 January 2008
As discussed earlier, when thinking about reliever value, it's insufficient to strictly consider the rate at which they give up runs because some runs are more valuable than others. Closers, in particular, tend to pitch in high leverage situations, and therefore should get more "credit" for their ability to pitch above reliever replacement level than a pitcher who only pitches in games that have a lopsided score.

For players since 2002, we can get actual pLI data from FanGraphs, and I discussed how to employ those data to adjust reliever run value estimates previously. However, what if you want to look at reliever value among players who played prior to 2002, like in my proposed series on past winning Reds teams? In that situation, you'd need some way of inferring reliever usage from other statistics.

One way to try to do this is by looking at performance--better pitchers should be used in higher-leverage situations. However, when attempting this approach, I've found that there's just very little predictive power (i.e. huge amount of scatter), even though there is a significant relationship between ERA (or FIP) and pLI. Whether that's due to within-team competition, inconsistent reliever performance, or poor decisions by managers, performance is just not a very good way to predict pLI.

On the other hand, as Darren implied, even in historical databases like Lahman's, we have at least one statistic that tells us something about usage: saves. Saves are well documented to be a rather poor indicator of reliever quality. Nevertheless, they do tell you who was pitching in the 9th inning of a team's games, which tends to be the inning with the highest leverages. So we should be able to use saves to infer something about reliever usage. Here's what I did:

Methods

I pulled stats, including both traditional pitching statistics and pLI, from fangraphs on all pitchers, 2002-2007, who threw at least 25 innings in relief in a season. There is some selection bias in such a sample, because it will tend to exclude a lot of bad pitchers who weren't given the opportunity to throw 25 IP. But it still does include pitchers that span much of the range in terms of performance, and gets around the issue of dealing with stats on pitchers with extremely small samples (not that 25 IP is a big sample...).

Next, I calculated saves per inning (Srate) as an indication of the proportion of a pitcher's innings that were associated with saves:

Srate = Saves/IP

It's important to use a rate because you want to know something about a player's opportunities. If someone gets 20 saves in 20 innings, they're probably pitching in much higher leverage situations, on average, than someone who gets 20 saves in 70 innings. Ideally, I'd also use blown saves--and maybe holds--but those stats are not available in the Lahman database or on baseball-reference's team pages, so I'm going to ignore them for now.

I also converted to pLI to a "rate" statistic using the approach suggested by Tom Tango:

rateLI = pLI/(pLI+1)

Such that:
pLI = 2 ---- rateLI = 0.667
pLI = 1 ---- rateLI = 0.500
pLI = 0.5 ---- rateLI - 0.333

This was important because as a pure ratio, pLI changes at a faster rate above 1.0 than it does below 1.0, which makes it hard to model using a regression-based approach.

Anyway, here's a plot of Srate vs. rateLI:
Obviously, that's a pretty ugly-looking relationship down in the zero/low-saves groups. But as you can see, there's a pretty nice relationship among pitchers who actually have a modest number of saves and their pLI. In other words, once someone starts to get saves, you can reasonably predict that he'll have an above-average pLI, and the player's pLI should steadily increase from there.

I decided to run with this and, in what I completely admit is a really terrible abuse of regression math (I've violated just about every assumption one can violate), I fitted a line to this relationship. I found that a second-order polynomial seemed to fit the data well. Furthermore, I forced the y-intercept to come in at a rateLI=0.5 (pLI=1.0), such that the average pitcher without saves is expected to pitch in average leverage (otherwise, the equation tended to predict that the vast majority of pitchers would have a pLI=0.8, and that's not reasonable). Here's the equation:
rateLI = -0.3764*(Srate^2) + 0.5034*Srate + 0.5

which we can convert back to pLI by:

pLI = rateLI/(1-rateLI)


Now, this rather shaky regression equation isn't something that I'd try to publish in the SABR newsletter, much less an academic journal. It's not built upon rigorous math. But it actually works pretty darn well. For demonstration, here's a table showing a hypothetical pitcher who has thrown 70 innings, and how his predicted pLI changes as the number of saves (and thus his Srate) increases:
Saves (70 IP)
Srate rateLI pLI
0 0.00 0.50 1.0
5 0.07 0.53 1.1
10 0.14 0.56 1.3
15 0.21 0.59 1.4
20 0.29 0.61 1.6
25 0.36 0.63 1.7
30 0.43 0.65 1.8
35 0.50 0.66 1.9
40 0.57 0.66 2.0
45 0.64 0.67 2.0
50 0.71 0.67 2.0
As you can see, the numbers seem to plateau at around a pLI of 2.0, which is about where MLB closers tend to plateau. David Weathers, for example, who had a Srate = 0.42 last season, had an actual pLI=1.95, which isn't far from his predicted pLI using this method. Pitchers with a smaller number of saves per IP--setup men, mostly--are assumed to have above-average but still relatively moderate leverage. Finally, guys without saves are assumed to have average leverage.

Anyway, I think that this is a pretty reasonable way to adjust for historical reliever leverage, at least among closers. Obviously, we're going to undervalue some relievers that aren't yet in the setup role but pitch in lots of big-time leverage situations in the 7th or 8th innings. But I think this approach will capture a lot of what we're trying to do with a reliever leverage adjustment.

David Weathers photo by Getty Images/David Maxwell

11 comments:

  1. Interesting. I was surprised to see that leverage took Burton's numbers down. After the all star break, 24 of his 36 appearances were in the 7th inning or later in games that were within 3 runs. Granted, he didn't pitch in any high leverage situations before the break, but that was only 11 games. His numbers after the break were comparable to David Weathers, who pitched in 22 of 33 games within 3 runs. Of course, Weathers almost always pitched later innings, so I would expect his pLI to be higher, but I didn't expect Burton's to hurt his value.

    Overall I think this is an interesting addition to the measurement. I'm not someone who can validate the technique, but conceptually, I think it's the right direction. I like the fact that it pushes Weathers up above Belisle and Lohse, at least on the Base Runs side. That seems more appropriate (though I can see the argument that starters have more inherent value based on the number of innings they pitch).

    ReplyDelete
  2. Joel, you'd be surprised how many situations after the 7th inning and within three runs actually aren't that high leverage. I'm going off the top of my head here, but a ninth-inning save with a three run lead isn't much more important than an average situation.

    In talking with Justin, I agree that you definitely want to use actual LI to judge a player's value in a prior season. But going forward or to judge a reliever's value in a context-neutral review, I think you want to come up with a model that assigns LIs to pitchers based on their talent. For example, take a reliever who has a 3.50 ERA talent. His manager might use him as a closer (2.0 LI) or as a mop-up guy (.5 LI). His talent deserves, however, to be used as a set-up guy, with maybe a 1.25 or a 1.50 LI. So you should value him with THAT LI.

    Because not all situations are created equal, not all runs are worth the same. And when you have relievers with varying talents, they should be used in different ways and have their skills rewarded in different ways. There's a reason relievers are important even though they only throw 80 innings. That needs to be taken into account.

    ReplyDelete
  3. Sky, I realize that, and I didn't mean to imply that all of those situations were created equal. It's just that it seemed like Burton pitched in a lot of tight situations last year, so it surprised me that he didn't have a higher pLI. I'm not saying the system is wrong, just noting how it didn't align with my perception. I'm just not used to my perception being wrong. :)

    ReplyDelete
  4. I'm in way over my head, but isn't what you're really looking at in "predictive pLI" the manager's usage pattern? Or the manager's usage pattern compared to his own pattern the prior year? (Or more accurately, whoever managed that reliever the prior year?)

    Again, I may be embarassing myself here, but what I'm most interested in is seeing how Dusty Baker has historically used his bullpens -- best pitcher in highest-leverage situations, or designated closer in the 9th, actual leverage be damned.

    ReplyDelete
  5. Chris, can we call that ppLI?

    Sure, that's one use of a predictive LI. Given a team and a manager, how will each pitcher be used, and therefore how valuable will he be.

    My interest takes another step from "value" towards "talent". Given a 3.50 ERA pitcher, how valuable is that? For starters, you simply do (repERA - ERA) * IP/9. But the bullpen is one place where runs are definitely not created equal. I'm interested in knowing how valuable the typical 3.50 ERA reliever is. Given all the other relievers out there, how will he typically be used? Or, how should he be used?

    It might be worth it to sign one 2.50 ERA reliever to a $10 million contract, because his LI can be over 2.0. But you wouldn't sign a second at that price because you won't have 150 innings at a 2.0 LI to split between two guys. The next guy might have a 1.35 LI. Third third guy in the bullpen might be at 1.0, then .9, .75, and .6.

    Therefore, 2.50 ERA guys deserve 2.0 LIs. But a 3.50 ERA guy won't get paid to close, making him less valuable beyond his ERA disadvantage. The value equation becomes (repERA - ERA) * IP/9 * ppLI. ppLI is 1 for starters and is a function of ERA for relievers. Except it's difficult to find a function that maps ERA onto leverage thanks to crazy managerial usage patterns, injuries, flukey ERAs, changing skills, etc.

    ReplyDelete
  6. Thanks for your comments folks. I'm confident that this is a perfectly acceptable way to evaluate historical value. We could probably do the same thing to every player to get their numbers closer to WPA if we wanted, but pLI's are likely to be repeatable only for relievers, so those are the only ones I'm interested in evaluating. The one exception might be an almost exclusive pinch-hit specialist like Lenny Harris back in the day. Would be interesting to look at that some time.

    With respect to Burton, he started the 8th inning a lot of times last year, didn't he? Seemed like they prefered to use him as an 8th inning closer rather than someone who came in to shut down opponents. I could be wrong about that, however. Even so, the adjustment really didn't affect his value much at all because his pLI was so close to 1. Given that he was really only a factor in the second half of the season, the above rankings make sense to me with respect to where he fits in vs. other players.

    @Chris, if you're interested in seeing Baker's usage of relievers, just pop over to fangraphs and look up the leverage for his Chicago Teams. They've got data going back to '02. FWIW, my initial take on those data is that Baker tends to have one or two other pitchers in his reliever corps that have pLI's well over 1.0. That means to me that he's using lesser relievers in some high leverage spots...though to be fair, sometimes the difference between his setup guy and his closer was not terribly big (e.g. Dempster vs. Howry).

    @Sky, don't forget about the nature of the variable we're trying to predict. I really do think that the fact that pLI is a ratio makes it really difficult to use in a simple linear or even a polynomial regression. It might be worth trying to calculate a separate regression above and below 1.0! Or, perhaps we should change the way we define leverage...more trouble than I want to deal with for now! :)
    -j

    ReplyDelete
  7. To help in your regression, you can try to convert LI into a rate.

    rateLI = LI/(LI+1)

    An average LI will have a rateLI of .500. League leaders have an LI of around 2, so they get a rateLI of .667. League trailers have an LI of around 0.5, so they get a rateLI of .333.

    If the team ERA for relievers is 4.50, with a range from 3.0 to 6.0, you can probably due:
    rateLI = (tmERA - ERA)/10 + .5

    So, an ERA of 3.00 ends up with an expected rateLI of .65. And rateLI of .65 implies .65/.35=LI=1.86.

    Seems like a reasonable approach...

    ReplyDelete
  8. Hi Tango, thanks for that. Neat set of ideas, and it does make sense. I may give that a go at some point here, though Sky, you might get to it before me (which is fine!).
    -j

    ReplyDelete
  9. Perhaps a better way would be:

    LI=(tmERA/ERA)^x

    where tmERA is the ERA of the team's relievers, and x is some number like 1.4 or 1.5.

    Play around with it, and you'll probably get something decent.

    ReplyDelete
  10. I'm in over my head here, but can anyone point me to a mathematical model that predicts win/lose outcomes based on the score differential in each inning?

    To what extent does 1/0 in the first predict the outcome? 4/2 in the eighth? etc. I'm interested in this without regard to team or any other factors... just based on the score.

    Someone must have discussed it somewhere! thanks

    ReplyDelete
  11. Miles, I don't have a formula, but here's a few links that may be helpful regarding win probability:

    Dave Studeman's article

    An empirically-derived Win Probability tool

    Also, if I remember correctly, there are complete win probability tables in Tango, Litchman, and Dolphin's The Book (which is also highly recommended).
    -j

    ReplyDelete