Table of Contents

Wednesday, December 05, 2007

Player Value, Part 5b: Leverage and Relievers

Among the biggest concerns that folks had about the first piece on pitcher value is that I was missing something about the value of relievers because I wasn't accounting for the leverage of the situations in which they are used. While this is getting into an area in which I'm not particularly well-read, I think those criticisms are correct and I'm going to take an initial stab at making this adjustment.

What is leverage?

If someone hooked me up to a heart-rate monitor during a ballgame, they'd note that my heart rate varies quite a bit over the course of the game. One run lead in the top of the third inning? I'm into the game, but fairly relaxed. Down eight runs in the fourth inning? I'm falling asleep. Tying run on third with no outs in the 9th? I'm on the verge of a heart attack.

The reason for the variation in fans' heart rates, degree of white-knuckleness, etc, over the course of a game, of course, is that different situations have different impacts on the outcome of the game. With a man on third with no outs in the 9th, every single pitch has a high likelihood of determining the outcome of the game. Down eight runs in the 4th? Well, in that case, the other team's chances of winning are so high that whatever happens next on the field is almost irrelevant to the outcome of the game.

Leverage is the term that baseball statisticians use to describe the importance of game situations. High leverage situations are those situations that are highly influential on the outcome of a game, whereas low-leverage situations don't mean a lot to the outcome of the game.

We can quantify the actual impact of events in a game by looking at changes in win probability. In this approach, folks have created a model of how likely teams are to win games based on the score, the inning, the number of men on base, and the number of outs. We can then monitor how this win probability changes over the course of the game. Let's say that the Reds are down by three runs in the bottom of the ninth, but load the bases with two outs. Certainly that's a situation that would still have the Reds losing more often than not (90% of the time, according to the model), but it's also a high leverage situation in that the Reds have a chance to pull out a win with one swing of the bat. Now, if Adam Dunn comes up and hits a grand slam to win the game, that's a huge improvement in the Reds' win probability, which changes from ~10% to 100%. In other words, with one swing of the bat, Adam Dunn's performance contributed 90% of a win to the team (we would credit him with +0.90 Win Probability Added [WPA] for that plate appearance).

Leverage Index (LI) is an effort to quantify the importance of game situations, as is essentially calculated as the relative spread of how much win probability could change, given different situations. This spread, divided by the average spread of all possible game states, is leverage index. Under this convention, a leverage index of 1.00 is, by definition, a situation with average leverage. The situation described above had an enormous spread in how win probability could change: -10% if Dunn makes an out, +90% if Dunn homers, with a bunch of other possibilities in between. In this case, the actual leverage index was 3.91, or ~4 times as important as an average game state. In contrast, if a team trails by eight runs in the 4th, then leverage index will be much less than one.

Why is this important to estimating the value of relievers? Well, if a pitcher is, on average, used in situations with an average leverage index of 2.0, the runs he gives up are about twice as important, in terms of value to team wins, as those given up by a pitcher used in situations with an average leverage index of 1.0. And the 1.0 LI pitcher's runs, in turn, are ~twice as important as those given up by a pitcher used in situations with an average leverage index of 0.5.

The calculation of leverage index, by necessity, requires play-by-play data, which still isn't something I've started to work with. Fortunately, FanGraphs.com reports the average leverage index per plate appearance for all relievers in its pLI statistic. This stat essentially tells us the average leverage index under which each reliever pitched last season, and thus gives an indication of his opportunity to influence a ballgame based on his performance.

So how can we go about using pLI to adjust our estimates of reliever value? Well, let's start with a way that LI is often used with win probability statistics: WPA/LI. WPA/LI, as described by Tom Tango, is a "situation deflated" version of Win Probability Added (WPA; see above), and describes the change in win probability that would occur based on that player's performances if every single plate appearance had happened in a situation with average leverage (i.e., LI=1.00). Therefore, it takes WPA, which is heavily situation-dependent, and converts it to something that is much more situation independent, and thus similar to more traditional estimates of player performance in which all plate appearances are given equal weight.

Now, WPA/LI is not the same thing as WPA/pLI. WPA/LI is calculated on a per-PA basis. Because pLI is just the average LI of all PA, WPA/pLI will differ from WPA/LI depending on the PA-to-PA variation in WPA and LI. Nevertheless, at least in concept, it's trying to do the same thing, and gives us a basis for applying leverage to reliever runs data.

If we use a 10 runs = 1 win approximator (commonly used, and consistent with the coefficients relating runs to wins that I showed in the first article of this series), then we can make this approximation:
RAA ~= WPA/pLI * 10

RAA is Runs Above Average, which mirrors WPA in that it's centered around league-average. If our goal is to get an estimate of value that is more dependent on the situations in which a reliever pitched, we're essentially asking for a value estimate that is more like Win Probability Added (WPA). So, with 9th-grade algebra, the above equation converts to:

WPA ~= RAA * pLI / 10
or
"RPA" = RAA * pLI = RARLI

In other words, we can simply multiply Runs Above Average by pLI to get something that approximates the situation-specific runs above average value of a reliever. Cool!

Nevertheless, I like to report value relative to a replacement-level baseline, not average. And converting our situation-specific RAA number to RAR requires a slightly round-about approach. If we revisit our reliever RAR equation from the previous article on pitchers, it was:

RAR = (RPG - 1.07*lgRPG) / 9 * IP * -1

which is the same as:

RAR = [(RPG - lgRPG) / 9 * IP * -1] + [(0.07*lgRPG)/9*IP]

This essentially just adds the additional runs a replacement pitcher would be expected to give up relative to an average pitcher to a reliever's RAA estimate. This converts it from a RAA estimate to a RAR estimate. So, the above equation is the same as:

RAR = RAA + [(0.07*lgRpG) / 9 * IP]

So, to make this a situation-specific RAR estimate, we can use this equation:

RARLI = [RAA*pLI] + [(0.07*lgRpG) / 9 * IP]
or

RARLI = [(RpG - lgRpG) / 9 * IP * -1 * pLI] + [(0.07*lgRpG)/9*IP]

It's a little bit ugly. But it does the job.

Please note that I'm only going to use this equation for relievers. While starting pitchers do diverge from 1.00 leverage from time to time, those deviations tend to be more or less random. Relievers, on the other hand, deviate in consistent ways from average leverage based on how their managers choose to employ them.

2007 Cincinnati Reds Pitchers, Take Two

How much of a difference does factoring reliever leverage into our estimates actually make? Well, here is the table from the previous article, expanded to also include leverage-based numbers for relievers (starters were forced to be 1.00 LI pitchers):

Base Runs
FIP Runs
Pitcher IP RAR
pLI
RARLI
Pitcher IP RAR
pLI
RARLI
AHarang 231.7 61.4 1.00 61.4
AHarang 231.7 56.0 1.00 56.0
BArroyo 210.7 25.9 1.00 25.9
BArroyo 210.7 30.9 1.00 30.9
DWeathers 77.7 11.7 1.95 20.0
MBelisle 177.7 25.9 1.00 25.9
KLohse 131.7 17.5 1.00 17.5
KLohse 131.7 20.4 1.00 20.4
MBelisle 177.7 13.7 1.00 13.7
DWeathers 77.7 8.3 1.95 13.5
JBurton 43.0 13.5 0.95 12.9
BLivingston 56.3 8.0 1.00 8.0
HBailey 45.3 4.4 1.00 4.4
HBailey 45.3 4.9 1.00 4.9
TShearn 32.7 2.2 1.00 2.2
JBurton 43.0 4.9 0.95 4.8
JCoutlangus 41.0 1.9 1.06 2.0
EMilton 31.3 4.4 1.00 4.4
BSalmon 24.0 3.2 0.46 1.9
MStanton 57.7 3.1 0.95 3.0
BLivingston 56.3 1.6 1.00 1.6
BBray 14.3 2.6 0.99 2.6
BBray 14.3 0.4 0.99 0.4
MMcBeth 19.7 1.8 1.21 2.0
EMilton 31.3 0.4 1.00 0.4
JCoutlangus 41.0 1.3 1.06 1.2
RStone 5.3 -2.8 0.33 -0.8
BSalmon 24.0 1.3 0.46 1.1
EGuardado 13.7 -1.0 0.95 -0.9
GMajewski 23.0 0.4 1.32 0.3
MMcBeth 19.7 -1.4 1.21 -1.9
EGuardado 13.7 -0.3 0.95 -0.2
ERamirez 16.3 -1.9 1.00 -1.9
RStone 5.3 -4.4 0.33 -1.3
RCormier 3.0 -1.4 1.40 -2.1
MGosling 33.0 -3.7 0.62 -1.8
VSantos 49.0 -3.4 0.78 -2.3
RCormier 3.0 -1.3 1.40 -1.9
MGosling 33.0 -8.1 0.62 -4.6
TShearn 32.7 -2.6 1.00 -2.6
MStanton 57.7 -5.2 0.95 -4.8
VSantos 49.0 -4.0 0.78 -2.7
KSaarloos 42.7 -9.1 1.05 -9.6
ERamirez 16.3 -3.2 1.00 -3.2
TCoffey 51.0 -10.4 0.94 -9.7
KSaarloos 42.7 -5.0 1.05 -5.4
GMajewski 23.0 -7.6 1.32 -10.2
PDumatrait 18.0 -5.6 1.00 -5.6
PDumatrait 18.0 -17.1 1.00 -17.1
TCoffey 51.0 -6.4 0.94 -5.9

The biggest difference we see between the first set of RAR numbers and the RARLI numbers, within both the Base Runs and FIP-based estimates, is that David Weathers' value gets a considerable boost. This reflects the excellent job that Reds' managers did in using him in high-leverage situations this season, often coming in to get an out or two during the 8th inning. Similarly, we see Gary Majewski's negative BaseRuns value exaggerated (appropriately) due to the fact that he performed terribly in high-leverage situations this year.

On the other side of the coin, the low leverage of the innings in which they pitched mitigated the negative value of several Reds pitchers, including Victor Santos, Ricky Stone, and Michael Gosling. While from a performance evaluation standpoint, this seems to let those pitchers off the hook, it seems appropriate to do this from the standpoint of assessing the value of these players to the 2007 Cincinnati Reds.

What if you don't have or don't want to deal with pulling pLI from fangraphs?

Updated 11 January 2008
As discussed earlier, when thinking about reliever value, it's insufficient to strictly consider the rate at which they give up runs because some runs are more valuable than others. Closers, in particular, tend to pitch in high leverage situations, and therefore should get more "credit" for their ability to pitch above reliever replacement level than a pitcher who only pitches in games that have a lopsided score.

For players since 2002, we can get actual pLI data from FanGraphs, and I discussed how to employ those data to adjust reliever run value estimates previously. However, what if you want to look at reliever value among players who played prior to 2002, like in my proposed series on past winning Reds teams? In that situation, you'd need some way of inferring reliever usage from other statistics.

One way to try to do this is by looking at performance--better pitchers should be used in higher-leverage situations. However, when attempting this approach, I've found that there's just very little predictive power (i.e. huge amount of scatter), even though there is a significant relationship between ERA (or FIP) and pLI. Whether that's due to within-team competition, inconsistent reliever performance, or poor decisions by managers, performance is just not a very good way to predict pLI.

On the other hand, as Darren implied, even in historical databases like Lahman's, we have at least one statistic that tells us something about usage: saves. Saves are well documented to be a rather poor indicator of reliever quality. Nevertheless, they do tell you who was pitching in the 9th inning of a team's games, which tends to be the inning with the highest leverages. So we should be able to use saves to infer something about reliever usage. Here's what I did:

Methods

I pulled stats, including both traditional pitching statistics and pLI, from fangraphs on all pitchers, 2002-2007, who threw at least 25 innings in relief in a season. There is some selection bias in such a sample, because it will tend to exclude a lot of bad pitchers who weren't given the opportunity to throw 25 IP. But it still does include pitchers that span much of the range in terms of performance, and gets around the issue of dealing with stats on pitchers with extremely small samples (not that 25 IP is a big sample...).

Next, I calculated saves per inning (Srate) as an indication of the proportion of a pitcher's innings that were associated with saves:

Srate = Saves/IP

It's important to use a rate because you want to know something about a player's opportunities. If someone gets 20 saves in 20 innings, they're probably pitching in much higher leverage situations, on average, than someone who gets 20 saves in 70 innings. Ideally, I'd also use blown saves--and maybe holds--but those stats are not available in the Lahman database or on baseball-reference's team pages, so I'm going to ignore them for now.

I also converted to pLI to a "rate" statistic using the approach suggested by Tom Tango:

rateLI = pLI/(pLI+1)

Such that:
pLI = 2 ---- rateLI = 0.667
pLI = 1 ---- rateLI = 0.500
pLI = 0.5 ---- rateLI - 0.333

This was important because as a pure ratio, pLI changes at a faster rate above 1.0 than it does below 1.0, which makes it hard to model using a regression-based approach.

Anyway, here's a plot of Srate vs. rateLI:
Obviously, that's a pretty ugly-looking relationship down in the zero/low-saves groups. But as you can see, there's a pretty nice relationship among pitchers who actually have a modest number of saves and their pLI. In other words, once someone starts to get saves, you can reasonably predict that he'll have an above-average pLI, and the player's pLI should steadily increase from there.

I decided to run with this and, in what I completely admit is a really terrible abuse of regression math (I've violated just about every assumption one can violate), I fitted a line to this relationship. I found that a second-order polynomial seemed to fit the data well. Furthermore, I forced the y-intercept to come in at a rateLI=0.5 (pLI=1.0), such that the average pitcher without saves is expected to pitch in average leverage (otherwise, the equation tended to predict that the vast majority of pitchers would have a pLI=0.8, and that's not reasonable). Here's the equation:
rateLI = -0.3764*(Srate^2) + 0.5034*Srate + 0.5

which we can convert back to pLI by:

pLI = rateLI/(1-rateLI)


Now, this rather shaky regression equation isn't something that I'd try to publish in the SABR newsletter, much less an academic journal. It's not built upon rigorous math. But it actually works pretty darn well. For demonstration, here's a table showing a hypothetical pitcher who has thrown 70 innings, and how his predicted pLI changes as the number of saves (and thus his Srate) increases:
Saves (70 IP)
Srate rateLI pLI
0 0.00 0.50 1.0
5 0.07 0.53 1.1
10 0.14 0.56 1.3
15 0.21 0.59 1.4
20 0.29 0.61 1.6
25 0.36 0.63 1.7
30 0.43 0.65 1.8
35 0.50 0.66 1.9
40 0.57 0.66 2.0
45 0.64 0.67 2.0
50 0.71 0.67 2.0
As you can see, the numbers seem to plateau at around a pLI of 2.0, which is about where MLB closers tend to plateau. David Weathers, for example, who had a Srate = 0.42 last season, had an actual pLI=1.95, which isn't far from his predicted pLI using this method. Pitchers with a smaller number of saves per IP--setup men, mostly--are assumed to have above-average but still relatively moderate leverage. Finally, guys without saves are assumed to have average leverage.

Anyway, I think that this is a pretty reasonable way to adjust for historical reliever leverage, at least among closers. Obviously, we're going to undervalue some relievers that aren't yet in the setup role but pitch in lots of big-time leverage situations in the 7th or 8th innings. But I think this approach will capture a lot of what we're trying to do with a reliever leverage adjustment.

David Weathers photo by Getty Images/David Maxwell