Table of Contents

Showing posts with label player value. Show all posts
Showing posts with label player value. Show all posts

Monday, June 23, 2014

Branch Rickey & Allen Roth in 1954

In the history track in Sabermetrics 101 this week (the 4th week), we read an article by Branch Rickey that appeared in Life Magazine in 1954 describing his and Allen Roth's efforts to develop a model that would predict team success.  Here's the model that is at the heart of the article:
To break it down:
  • The top row is the Offense term.  It is essentially OBP + 0.75*ISO + Clutch.  Clutch was a catch-all term that tracked how often a team scored once runners were on base, and includes clutchiness, baserunning, luck, etc.
  • The bottom term is the Defense term.  It includes opponent batting average, the Walk+HBP term of opponent OBP, Opponent "Clutch", and a strikeout term (weight 1/8th...this was presumably necessary because it's just the extra value of a strikeout over and above what is already tracked in the batting average term).  F is fielding (independent of the other values), which Rickey & Roth basically punted.  In fact, they have a great line in the article: "There is nothing on earth anybody can do with fielding."  They just assigned it a zero and moved on, hoping it wouldn't matter that much.
Therefore, the equation amounts to:

Offense (O) - Defense (D) = G

Where G is a stat that will track run differential quite well.

Neat, right?  

There are some problems that I see.  First, it seems like the ISO term is confounded with the R term, because a lot of the value of extra-base hits lies in driving runners home (and vice versa).  The second is unquestionably the over-emphasis on BABIP when tracking pitching performances (especially when they start relating this to individual pitcher performances; this was pre-Voros McCracken, after all!).  And there's also the lack of separation of the unique effect of the home run.  And finally, the units are sort of a mish-mash of arbitrary ratio units, rather something that has immediate meaning like runs or wins.

In short, it's not Base Runs.  But it seems to work pretty well, based on the work they did on it in the 50's.  The article itself is a great read, with a ton of great quotes.  I highly recommend it.  It's neat to think that this kind of thing was happening 60 years ago...and how hard it must have been to do the analysis, before the days of excel, mysql, and statistical packages!***

***At one point in the article, they mentioned sending off their data for six weeks(!) to a stat department at an institution for "correlation analysis."  What would have taken a couple of hours today (mostly just getting the data together) took WEEKS of work using mechanical calculators, slide-rules, and lots of paper computation.

Monday, March 17, 2014

Was Homer Bailey's Contract An Overpay?

The Reds locked up Homer Bailey this offseason,
but was the cost too great?
Photo credit: David Slaughter
The Reds' biggest offseason player move was unquestionably handing Homer Bailey a $105 million/6 year contract extension this offseason.  While this was exciting for most fans (myself included), reaction around the internets tended to lean toward this being a pretty substantial overpay.

I'm sure that people have done nice analyses of Homer's contract.  I didn't really look around at the time, though.  So, I decided to finally take a look myself.  Well, a few looks.  

Approach #1

The first approach I used is close to what I've been doing for years to understand contracts, and is inspired by people like tangotiger.  It works as follows.
  1. Make a projection for a player during the contract.  I used 2014 Steamer and ZIPS projections for that (courtesy of FanGraphs), and then did a "standard" 0.5 WAR/year decline.  That might be generous aging for a pitcher given pitchers' inherent tendencies to break, but we'll run with it.
  2. Come up with a cost per win translator for each year of the contract.  This is trickier than it used to be, because some of Dave Cameron's recent work has made it clear that the cost per win is not constant anymore; above-average players are getting more dollars per win than below-average players.  Fortunately, in that article he presented a regression line for this relationship, and it showed a really simple relationship: 2 WAR players (i.e. league-average) are getting $6M/win, 3 WAR players are getting $7M/win, 4 WAR players get $8M/win, etc.  I'm assuming that this "bonus" is set in the first year, such that players do not see their $/win decline as their performance declines.
  3. Estimate salary inflation.  I'm guessing wildly here, but based on past increases, as well as the amazing amount of money coming into the game right now with all of the TV contracts, I'm estimating a fairly aggressive 10% inflation on the $/win of an average player.  I'm just going to assume that the extra million bonus a 3-WAR player gets per WAR is fixed and not subject to inflation.
  4. Multiply the estimated WAR each year by the player-specific $/win numbers.  This gives salary value each year.  Then, you just sum up all of the years to get total contract value.
Here's what I got when I did this for Homer:

On the far left are years, my estimated average $/win with inflation, and Homer's actual salary.  I'm assuming the Reds will not exercise their part of his mutual 2020 option, so they pay the $5 million buyout.  Then you have the Steamer projected WAR, his $/win, and estimated salary.  Similarly, I report ZIPS estimated salaries.  Finally, on the right, I'm presenting a projection that would be required for the contract to make sense using my salary model.  In other words, this is apparently how the Reds are valuing Bailey.  Also, please note that I'm ignoring the fact that 2014 is Bailey's last arbitration year; we should really subtract $3 million from his total 2014 salary in each case to account for the fact that players make about 80% of their free agent value in their 3rd arbitration year.

It doesn't look like a particularly good contract, does it?  Steamer and ZIPS have his contract valued at between $50 and $67 million over six years.  Furthermore, Steamer's projection for 2014 is low enough that it doesn't even make sense to give him a 6th year.  The difference between the Steamer and ZIPS projections is entirely playing time: Steamer projects a 3.62 FIP in 173 innings, while ZIPS projects at 3.62 FIP in 192 innings.  Homer has thrown 200+ innings for two consecutive years, and has been very healthy in those seasons.  But just one trip to the DL would drop him into the 170-territory, and the floor in any given season is 0 innings.

To get the contract to make sense, you have to set his 2014 projection to 3.3 WAR.  That's not outrageous; Bailey was worth 3.7 fWAR last season (and 3.2 bWAR), after all.  But that was easily his best season thus far.  It's pretty hard to project that he'll do that again this season, at least based on standard player behavior.

On the other hand, what we're really dealing with here is a projected difference of 0.6 to 0.9 wins.  Given how large the error bars are on projections, this really isn't that bad.  If the Reds have special scouting information that indicates that Bailey really did take a significant step forward last year, and one that he's very likely to continue in future seasons, you could at least make an argument that this is a reasonable projection...

Approach #2

Dave Cameron recently posted a pair of new models that look at free agent salaries.

The first is a model that just takes total projected WAR in a contract and uses that to estimate a player's salary.  It's a simple regression equation, but it explains 95% of the variation in free agent salaries from the offseason.  Not too shabby!  Let's run it for Homer and our various projections:

This is a bit more encouraging.  Based on the regression equation, and our projection systems, Homer Bailey's contract estimate comes in at between $70 million and $85 million over 5 years (or six years, for that matter; he's projected to be replacement level in 2019).

I adjusted the Apparent Reds projection a bit here, because this regression model tends to result in higher estimated salaries than my first approach. Here, a projection of 3.1 WAR gets him where he should be for the contract to be an even value.  That's an 0.4 to 0.7 WAR difference from projections.

Approach #3

In that same article, Cameron also put together a salary estimation "toy."  It's very simple, not horrifically rigorous, but it works pretty well.  You can go to the article to read about it.  I applied it to Homer:

Cameron's toy suggests that, based on the Steamer and Zips projections, the market length of a salary like Homer's would be about 4 years.  But if we extend it to 6 total years, we get total estimates between $72 million and $81 million.  That's pretty close to the regression equation above.

The closest I could get his contract to the actual value was 3.2 WAR.  Push it to 3.3 WAR, and Cameron's toy extends him another year to 7 WAR, and the total contract value shoots to $115 WAR.  But again, the estimates are indicating that the +Cincinnati Reds are valuing Homer by about a half-win higher than the projection systems.

Conclusions

By the numbers, I think it's pretty easy to see why so many see this as an overpay.  Steamer, which has been the champion of pitcher projections the last few years, estimates his monetary value between 50% and 70% of the actual contract value, depending on which approach one takes.  That's a tough pill to swallow, especially when you consider that the salary models I'm using aren't making any allowances for the fact that pitchers are inherently more risky than hitters, aside from the projections.

That said, the other thing that this exercise impressed upon me was that the systems that I, at least, am using are highly volatile when examining long-term salaries.  For the most part, we're dealing with differences of just a half a win.  That's easily within our margin of error.  Any small difference in projection gets compounded with each year of an extension.  This is further enhanced by the fact that the cost per win changes with player quality.  As a result, a difference of less than a win in a projection can result in a $50 million difference in a contract valuation.  In Homer's case, that's half of his salary!

What do you think?  Is it reasonable to project Homer to have a 3+ WAR season in 2014?  Are these salary approaches so sensitive to small changes in player projection that they are almost useless?  Or was this a big overpay by the Reds?

Thursday, April 30, 2009

Player value series, part 7: How should we handle park factors?

Note: I've changed enough about the way I calculate player value for hitters that I probably need to re-write this entire series (click player value below to see the whole set of posts). But I wanted to get this online so I can show my work, so we'll run with it for now and then fix the other articles later.

One of the more unique aspects of baseball is the substantial influence that the ballpark can have on the outcome of (at the least) batted balls. While this an aspect of baseball that many of us enjoy, it also presents a challenge to our ability to assess player value. For example, Brian Giles hit 0.306/0.398/0.456 at PETCO last season, which was good for a 0.376 wOBA. And Brad Hawpe hit 0.283/0.381/0.498 at Coors Field, which gave him a 0.379 wOBA.

They had pretty similar seasons according to those numbers, with Giles being slightly better at getting on base, and Hawpe showing more power. The problem, of course, is that Giles played half of his games in PETCO park, which is a notorious pitcher's park. And Hawpe played half of his games in Colorado, which even post-humidor is the best hitter's park in baseball. Presumably, if you had swapped where the two players were hitting, Giles would have vastly outperformed Hawpe.

How do we reconcile this in our player valuations? The traditional approach has been to use park factors. The best article on the web about park factors is Patriot's, and I won't replicate his work here. But briefly, a park factor can be conceptualized as simply this ratio:

Avg Runs scored per game at a ballpark
------------------------------------------
Avg Runs scored per game at all ballparks

So, if you see a park factor of 1.18, for example, that means that more runs are scored at that ballpark than in a typical ballpark. And a ratio of 0.84 means that fewer runs are scored at the ballpark in question than in a typical ballpark.

A quick and dirty way of calculating park factors is to simply divide a team's total runs scored and allowed at home by its runs scored and allowed in away games. So, in that case, a 1.18 park factor would mean that a team scores ~18% more runs at home than they do on the road.

There are a variety of additional complications if you're going to do it properly, and Patriot discusses them. But to be honest, most of the time, that simple approach will get you most of the way there. Nevertheless, there are three factors to worry about that are important and make a big difference:

1) If you go to apply a home runs/away runs park factor to an individual player's runs stats in order to discount (in Hawpe's case) or boost (in Giles' case) his value rating, it's important to first cut the park factor in half (meaning 1.18 would be 1.09, and 0.84 would be 0.92). This is because the 1.18 park factor described above would be an appropriate adjustment only for games played in the park. Typically, though, players play half of their games in other parks, which, on average, will have a park factor of ~1.0. So, if we split the difference, we'll get a multiplier that makes sense in light of the 81 home/81 away game schedule. Once you have this number, you just divide your player's absolute runs estimate by your park factor (but see #2 and #3 below). Many people (like Patriot and Szymborski) already have done this adjustment to the numbers they post, but be sure to check on this wherever you get your park factors.

2) Park factors are variable as all heck from year to year. Patriot posts 5-year averaged, regressed park factors, which in my view are the most reliable park factors on the 'net. The reason is that by both averaging and regressing, he's accounting for the fact that any park factor estimate has large error bars around it, and the true park factor is probably a bit closer to league average than even your 5-season average indicates.

3) As pointed out by Tango and others, there may be problems with simply dividing Hawpe's absolute batted runs number by 1.09. The issue is that this adjustment will have a more significant impact on good hitters than poor hitters. A park factor of 1.10 would strip 10 runs from a 100-run hitter's totals, but only 3 runs from a 30-run hitter's totals...and yet there's no evidence indicating that the good hitters' value should be discounted at a greater rate than a poor hitter's value.

A solution to this problem--and, at the same time, a convenient way to apply park factors directly to RAA or RAR data (ratios can only be applied to absolute runs data)--is to convert our traditional ratio-based park factors to an additive park factor. Once we do this, we just add or subtract a certain small fraction of runs per PA to each hitter. It's an extremely easy and straightforward way to handle park factors, and yet is something that I rarely see done.

So, here are conversions of Patriot's 2008 5-year regressed park factor ratios into additive park factors. I show them per PA and per 700 PA's (a season's worth) to help us understand how large of an effect we're working with here. (Methods: I took the total MLB runs in 2008, divided a given the park factor, and took the difference in runs between the adjusted and unadjusted runs. I then divided this difference by total 2008 MLB PA's to get the per-PA adjustments)
Year TEAM PF
Runs/PA Adj R/700 PA Adj
2008 ARI 1.05 -0.0060 -4.2
2008 ATL 1 0.0000 0.0
2008 BAL 1.01 -0.0012 -0.8
2008 BOS 1.04 -0.0048 -3.4
2008 CHA 1.04 -0.0048 -3.4
2008 CHN 1.04 -0.0048 -3.4
2008 CIN 1.02 -0.0024 -1.7
2008 CLE 1 0.0000 0.0
2008 COL 1.09 -0.0108 -7.6
2008 DET 1 0.0000 0.0
2008 FLA 0.98 0.0024 1.7
2008 HOU 0.99 0.0012 0.8
2008 KCR 1 0.0000 0.0
2008 LA 0.98 0.0024 1.7
2008 LAA 0.99 0.0012 0.8
2008 MIL 1 0.0000 0.0
2008 MIN 1 0.0000 0.0
2008 NYY 1 0.0000 0.0
2008 NYM 0.97 0.0036 2.5
2008 OAK 0.98 0.0024 1.7
2008 PHI 1.02 -0.0024 -1.7
2008 PIT 0.98 0.0024 1.7
2008 SD 0.92 0.0096 6.7
2008 SEA 0.97 0.0036 2.5
2008 SF 1.01 -0.0012 -0.8
2008 STL 0.98 0.0024 1.7
2008 TB 0.99 0.0012 0.8
2008 TEX 1.03 -0.0036 -2.5
2008 TOR 1.02 -0.0024 -1.7
2008 WAS 1.01 -0.0012 -0.8
Essentially, for each "unit" of a park factor, you add or subtract 0.0012 runs per PA, which works out to be ~0.8 runs per season. This results in a park-induced range of ~15 runs per season between the best hitter's park (Coors) and the best pitcher's park (PETCO).

Put another way, if you took a true 30 RAR hitter and played him on the Rockies Colorado, you'd expect him to produce ~38 RAR (raw, without park adjustments). That same hitter would be expected to produce ~23 RAR in San Diego. So by subtracting 8 runs/season from your Colorado hitter, or adding 7 runs to your San Diego hitter, you can properly estimate the player's true hitting performance (~30 runs above replacement).

Isn't that easy?

The same approach could potentially be used on pitchers, but we unfortunately don't tend to use per-PA data to evaluate pitchers. For now, I'm still just using ratio park factors on pitchers, but I'll likely switch to a new approach in the near future once I incorporate some other changes to how I do their player valuations (probably involving pythagorean-based win estimates for pitchers, like Tango does). More on that at a later date.

Thanks to folks in this thread at Baseball Fever for helping me to finally figure some of this stuff out. Assuming, of course, I really have figured this stuff out...

Friday, April 17, 2009

Friday Night Fungoes: Larkin, run estimators, CHONE, and the Dunning-Kruger Effect

Looks like I'm going to the Curve game tomorrow night. Whether I'll post a report may depend on how they do...they've yet to win this year! Weather should be nice, though: 72 degrees & sunny.

Does Larkin Belong in the Hall of Fame? Revisited

I can't remember if I linked to this or not, but even if I have it's worth linking again: Rally has posted season-by-season WAR estimates for all players in the Retrosheet era. He also has a top-300 ranking, so we can look at the best of the past 50+ years using these numbers.

Rally's data include offense, defense (including turning double plays, etc), baserunning, and era-specific position adjustments. This is similar to what I tried to do in my piece on Larkin, but better because of the baserunning & especially the era-specific position adjustments. Here is how the shortstops I included in my Larkin study pan out in Rally's WAR data, plus a number of others who came up in discussions following my Larkin piece:


MyWAR RallyWAR
Alex Rodriguez 82.4 96.5
Cal Ripken+ 73.8 91.2
Robin Yount+ 62.1 75.9
Barry Larkin 57.6 70.1
Ozzie Smith+ 45.3 67.6
Alan Trammell 53.7 66.8
Derek Jeter 49.5 62.4
Ernie Banks+* 54.8 59.2
Luis Aparicio+ 30.9 50.5
Toby Harrah
47.1
Omar Vizquel 28.0 45.2
Bert Campaneris
44.5
Normar Garciaparra
43.7
Tony Fernandez
43
Miguel Tejada 39.2 40.9
Jay Bell
35.5
Davey Concepcion 26.0 34
Mark Belanger 26.2 32
Edgar Renteria
31.9
Chris Speier
24.8
Bill Russell
24.4
Rick Burleson
21.3
Freddie Patek
20.4
Steve Sax
19.8
Larry Bowa
18.9
Bucky Dent
12.4
Don Kessinger
7.2
Tim Foli
3.5

Some players got a big boost in their rating, like Ozzie and Aparicio, once you include baserunning (I only included SB's & CS's) and double play turning. But as you can see, the results are more or less the same as far as Larkin is concerned: probably the 4th best overall, and 2nd-best pure shortstop in the Retrosheet ERA, at least based on total contributions to their ballclubs relative to their league.

Career-level WAR accumulation isn't the be-all, end-all of hall of fame voting, as peak performance is also important. But in Larkin's case, career WAR is crucial. No one disputes that he was brilliant player when healthy. The knock on him is that he didn't play enough due to all of his injuries. These data clearly indicate that his total contribution, including playing time, was among the best in baseball history at his position.

Rally has Larkin as the 30th-best position player of the Retrosheet era. I know he almost certainly will not be a first-ballot Hall of Famer, but he probably should be.


Why I'm trying to stop using OPS

Colin followed up his study posted last week on run estimators with an improved method. This time, instead of looking at half-inning or even game-level combinations of team offense, he instead focused on identifying the average value of particular offensive events to games. His methodology was to take matched games--games that had the same numbers of major counting events, but that differed in how many of one specific event they contained.

For example, he might have a game with 5 singles, 3 doubles, 1 homer, and 3 walks, and he'd compare that to a game with 5 singles, 3 doubles, 1 homer, and 4 walks. Finding the average difference in runs scored between pairs of games like this would tell you the average value of a walk in runs. He then compared those actual differences in runs scored to the expect difference in runs scored according to a variety of run estimation mechanisms.

The results? Linear weights-based methods did the best. This includes his "house" linear weights (which he kindly shares), as well as manipulations of linear weights like wOBA. A bit behind them were GPA (aka 1.7 OPS), Base Runs and BPro's EqR, followed a bit more distantly by Bill James' Runs Created. The worst of the bunch were the OPS-based methods, as well as the even-more-horrible Total Average (bases/outs).

This is strong evidence that we should more or less stop using OPS to evaluate hitters. It's unnecessary, given how easy wOBA is to calculate. Is it better than batting average? Sure, of course. But it misses badly enough and often enough that we should really move past it. It's a tough habit to break, but it's time to wOBA, folks.


CHONE is a really good projection system

Matt has a fairly exhaustive projection roundup here. He notes that each system seems to have its own strengths, but often also some weaknesses:

--CHONE was the best at projecting most things.

--PECOTA was very close behind but had some systematic biases, specifically for speedy players' BABIPs, which ZIPS struggled with as well.

--ZIPS is behind the other systems, except it does quite well with projecting the three true outcomes for players over 35.

--CHONE does better with older players in general, since its specialty is aging curves, but PECOTA does better at finding comparable players for younger players for whom less data is available (unless they fall into the speedster category).

--OLIVER clearly contends and even takes the lead at some things--especially at projecting hitters with lower homerun totals and other players significantly affected by park effects. However, OLIVER under-projects walks and strikeouts systematically and over-projects homeruns systematically, and could probably be improved by adjusting how those outcomes are computed.

The nice thing about this is that we can use this information to give more or less weight to a given projection system when it differs from others in predicting a given player's performance based on the sort of player we're looking at. Or, we can do what I've essentially decided to do around here, which is to just use CHONE. :)

It's worth noting that Matt's is just the latest projection roundup in which CHONE did particularly well. Whether it will continue to do so in the future is an open question, of course, but the data suggest that it's as good as they come.


The Dunning-Kruger effect

JC posted about this terrific psychological concept: that people incompetent in a particular discipline will massively overestimate their competency in that discipline. That's pretty much the definition of a baseball fan, isn't it? :)

I'm jesting, mostly. You certainly see arguments between baseball fans who really know their stuff and baseball fans who just think they know their stuff. And I tend to think that most of what you hear on talk radio (sports, or otherwise) involves people who fall into the latter category rather than the former category. And, of course, I tend to think that on at least some issues (some areas of biology, some areas of baseball research, etc), I fall into the reasonably competent category.

But the great part of this is that the Dunning-Kruger effect predicts that we'll have a very hard time being able to tell whether we're competent or not...because the more incompetent we are, the less we'll realize it! :)

Friday, June 20, 2008

Part 6: Accounting for league differences

This is more of a methods post. It's fairly trivial, and not likely to be interesting to 99% of you, but I wanted to post it for the purposes of showing my work. I had to do this in order to produce mlb-wide year to date total value player rankings that will go up in a few minutes.

One of the problems with how I've been calculating runs above replacement is that I haven't accounted for differences in the quality of the leagues. At Tom Tango's blog, they use a difference of ~5 runs per season to represent differences across leagues. Up 'til now, I've just opted to ignore this, mostly because I'm lazy. However, I've noticed that my estimated value for the top NL players this year has been universally better than that for AL players. So, I was concerned that I was overestimating NL player value, at least to some degree, by not accounting for league differences.

I decided that I needed to make an adjustment to my methods to account for this. However, this is easier said than done, because I'm using slightly different methods than what Tango and others have been using. In particular, he starts with linear weights (or WPA/LI) that are standardized vs. average, while I prefer to start with absolute linear weights (some people call these "lwts_rc" because they resemble the output of James' Runs Created). This means that the math is slightly different. What follows is how I'm reconciling the two methods.

I've been a calculating a hitter's runs vs. replacement level as:

([Player R/G] - [Constant]*[MLBAvgR/G]) / [26.25 outs/G] * [outs] = RAR

For details, see this article.

At question here is the constant that I use. I've been using 73% for all players, which means that replacement level is 73% of the production of the average big league hitter (MLBAvgR/G above). To match up to Tango's work, a 5.0 r/g hitter in the AL should be worth ~25 RAR over a full season, while 5.0 r/g in the NL should be worth just 20 RAR. This is because the AL is a better league, and therefore it takes a better hitter to hit 5.0 r/g in the AL than the NL.

So, I took what approximates a MLB-average hitter over a full season (700 PA, 0.335 OBP, 5.0 R/G) and solved for the constant needed to result in 25 RAR (the AL value). This constant was 72%, which is very close to the 73% I have been using. However, when I instead solved for the constant needed to get 20 RAR (the NL value), I got 77%. This is quite a bit higher than what I've been using, and indicates that I've been overestimating the value of NL players (including the Reds, unfortunately...and they're not exactly tearing the cover off the ball).

The difference is not huge...we're talking 5 runs per 700 PA, and most hitters never make it to 700 PA. But from this point on (until the NL catches up with the AL, at least), I'm using 72% as the baseline for AL hitters, and 77% as the baseline for NL hitters. The result will be that NL hitters are going to be devalued by up to 5 runs per season. It's not a huge difference in value, but money-wise, it means about $2 million less in estimated salary per season...

Also, I followed a similar procedure to adjust pitcher numbers (though I won't bore you with the math). From this point on, AL pitcher replacement level will be defined as 129% of MLB average for starters, and 108% of MLB average for relievers. Conversely, in the NL, which is an easier place to play, I will define replacement level as 124% of MLB average for starters and 103% of MLB average for relievers. The AL figures match up well with those I've been using, while the NL figures set a higher bar (because it's an easier league).

I'm not arguing that these adjustments are perfect by any means, but my feeling is that they do constitute an improvement over the numbers I have been using.

FWIW, the best NL players are still doing quite a bit better than the best AL players after this adjustment. You'll see that in the next post...

Saturday, January 12, 2008

A dirty way of predicting reliever leverage when pLI is not available

Note: while I'm posting this separately so that it is visible, it's really just meant to be an update to my piece on reliever leverage in the player value series. I'm appending it to that article as well.

As discussed earlier, when thinking about reliever value, it's insufficient to strictly consider the rate at which they give up runs because some runs are more valuable than others. Closers, in particular, tend to pitch in high leverage situations, and therefore should get more "credit" for their ability to pitch above reliever replacement level than a pitcher who only pitches in games that have a lopsided score.

For players since 2002, we can get actual pLI data from FanGraphs, and I discussed how to employ those data to adjust reliever run value estimates previously. However, what if you want to look at reliever value among players who played prior to 2002, like in my proposed series on past winning Reds teams? In that situation, you'd need some way of inferring reliever usage from other statistics.

One way to try to do this is by looking at performance--better pitchers should be used in higher-leverage situations. However, when attempting this approach, I've found that there's just very little predictive power (i.e. huge amount of scatter), even though there is a significant relationship between ERA (or FIP) and pLI. Whether that's due to within-team competition, inconsistent reliever performance, or poor decisions by managers, performance is just not a very good way to predict pLI.

On the other hand, as Darren implied, even in historical databases like Lahman's, we have at least one statistic that tells us something about usage: saves. Saves are well documented to be a rather poor indicator of reliever quality. Nevertheless, they do tell you who was pitching in the 9th inning of a team's games, which tends to be the inning with the highest leverages. So we should be able to use saves to infer something about reliever usage. Here's what I did:

Methods

I pulled stats, including both traditional pitching statistics and pLI, from fangraphs on all pitchers, 2002-2007, who threw at least 25 innings in relief in a season. There is some selection bias in such a sample, because it will tend to exclude a lot of bad pitchers who weren't given the opportunity to throw 25 IP. But it still does include pitchers that span much of the range in terms of performance, and gets around the issue of dealing with stats on pitchers with extremely small samples (not that 25 IP is a big sample...).

Next, I calculated saves per inning (Srate) as an indication of the proportion of a pitcher's innings that were associated with saves:

Srate = Saves/IP

It's important to use a rate because you want to know something about a player's opportunities. If someone gets 20 saves in 20 innings, they're probably pitching in much higher leverage situations, on average, than someone who gets 20 saves in 70 innings. Ideally, I'd also use blown saves--and maybe holds--but those stats are not available in the Lahman database or on baseball-reference's team pages, so I'm going to ignore them for now.

I also converted to pLI to a "rate" statistic using the approach suggested by Tom Tango:

rateLI = pLI/(pLI+1)

Such that:
pLI = 2 ---- rateLI = 0.667
pLI = 1 ---- rateLI = 0.500
pLI = 0.5 ---- rateLI - 0.333

This was important because as a pure ratio, pLI changes at a faster rate above 1.0 than it does below 1.0, which makes it hard to model using a regression-based approach.

Anyway, here's a plot of Srate vs. rateLI:
Obviously, that's a pretty ugly-looking relationship down in the zero/low-saves groups. But as you can see, there's a pretty nice relationship among pitchers who actually have a modest number of saves and their pLI. In other words, once someone starts to get saves, you can reasonably predict that he'll have an above-average pLI, and the player's pLI should steadily increase from there.

I decided to run with this and, in what I completely admit is a really terrible abuse of regression math (I've violated just about every assumption one can violate), I fitted a line to this relationship. I found that a second-order polynomial seemed to fit the data well. Furthermore, I forced the y-intercept to come in at a rateLI=0.5 (pLI=1.0), such that the average pitcher without saves is expected to pitch in average leverage (otherwise, the equation tended to predict that the vast majority of pitchers would have a pLI=0.8, and that's not reasonable). Here's the equation:
rateLI = -0.3764*(Srate^2) + 0.5034*Srate + 0.5

which we can convert back to pLI by:

pLI = rateLI/(1-rateLI)


Now, this rather shaky regression equation isn't something that I'd try to publish in the SABR newsletter, much less an academic journal. It's not built upon rigorous math. But it actually works pretty darn well. For demonstration, here's a table showing a hypothetical pitcher who has thrown 70 innings, and how his predicted pLI changes as the number of saves (and thus his Srate) increases:
Saves (70 IP)
Srate rateLI pLI
0 0.00 0.50 1.0
5 0.07 0.53 1.1
10 0.14 0.56 1.3
15 0.21 0.59 1.4
20 0.29 0.61 1.6
25 0.36 0.63 1.7
30 0.43 0.65 1.8
35 0.50 0.66 1.9
40 0.57 0.66 2.0
45 0.64 0.67 2.0
50 0.71 0.67 2.0
As you can see, the numbers seem to plateau at around a pLI of 2.0, which is about where MLB closers tend to plateau. David Weathers, for example, who had a Srate = 0.42 last season, had an actual pLI=1.95, which isn't far from his predicted pLI using this method. Pitchers with a smaller number of saves per IP--setup men, mostly--are assumed to have above-average but still relatively moderate leverage. Finally, guys without saves are assumed to have average leverage.

Anyway, I think that this is a pretty reasonable way to adjust for historical reliever leverage, at least among closers. Obviously, we're going to undervalue some relievers that aren't yet in the setup role but pitch in lots of big-time leverage situations in the 7th or 8th innings. But I think this approach will capture a lot of what we're trying to do with a reliever leverage adjustment.

On a moderately related note...last night, I spent some time setting up spreadsheets and my database to start on the Winning Reds historical series. Should be pretty efficient at this point, which should make it easier to get through the teams at a good clip as long as I keep the writing under control. I'm excited to get started on the series, but I think I'll do a dry run first in wrapping up the 2007 Reds' season. Look for that shortly.

Wednesday, December 05, 2007

Player Value, Part 5b: Leverage and Relievers

Among the biggest concerns that folks had about the first piece on pitcher value is that I was missing something about the value of relievers because I wasn't accounting for the leverage of the situations in which they are used. While this is getting into an area in which I'm not particularly well-read, I think those criticisms are correct and I'm going to take an initial stab at making this adjustment.

What is leverage?

If someone hooked me up to a heart-rate monitor during a ballgame, they'd note that my heart rate varies quite a bit over the course of the game. One run lead in the top of the third inning? I'm into the game, but fairly relaxed. Down eight runs in the fourth inning? I'm falling asleep. Tying run on third with no outs in the 9th? I'm on the verge of a heart attack.

The reason for the variation in fans' heart rates, degree of white-knuckleness, etc, over the course of a game, of course, is that different situations have different impacts on the outcome of the game. With a man on third with no outs in the 9th, every single pitch has a high likelihood of determining the outcome of the game. Down eight runs in the 4th? Well, in that case, the other team's chances of winning are so high that whatever happens next on the field is almost irrelevant to the outcome of the game.

Leverage is the term that baseball statisticians use to describe the importance of game situations. High leverage situations are those situations that are highly influential on the outcome of a game, whereas low-leverage situations don't mean a lot to the outcome of the game.

We can quantify the actual impact of events in a game by looking at changes in win probability. In this approach, folks have created a model of how likely teams are to win games based on the score, the inning, the number of men on base, and the number of outs. We can then monitor how this win probability changes over the course of the game. Let's say that the Reds are down by three runs in the bottom of the ninth, but load the bases with two outs. Certainly that's a situation that would still have the Reds losing more often than not (90% of the time, according to the model), but it's also a high leverage situation in that the Reds have a chance to pull out a win with one swing of the bat. Now, if Adam Dunn comes up and hits a grand slam to win the game, that's a huge improvement in the Reds' win probability, which changes from ~10% to 100%. In other words, with one swing of the bat, Adam Dunn's performance contributed 90% of a win to the team (we would credit him with +0.90 Win Probability Added [WPA] for that plate appearance).

Leverage Index (LI) is an effort to quantify the importance of game situations, as is essentially calculated as the relative spread of how much win probability could change, given different situations. This spread, divided by the average spread of all possible game states, is leverage index. Under this convention, a leverage index of 1.00 is, by definition, a situation with average leverage. The situation described above had an enormous spread in how win probability could change: -10% if Dunn makes an out, +90% if Dunn homers, with a bunch of other possibilities in between. In this case, the actual leverage index was 3.91, or ~4 times as important as an average game state. In contrast, if a team trails by eight runs in the 4th, then leverage index will be much less than one.

Why is this important to estimating the value of relievers? Well, if a pitcher is, on average, used in situations with an average leverage index of 2.0, the runs he gives up are about twice as important, in terms of value to team wins, as those given up by a pitcher used in situations with an average leverage index of 1.0. And the 1.0 LI pitcher's runs, in turn, are ~twice as important as those given up by a pitcher used in situations with an average leverage index of 0.5.

The calculation of leverage index, by necessity, requires play-by-play data, which still isn't something I've started to work with. Fortunately, FanGraphs.com reports the average leverage index per plate appearance for all relievers in its pLI statistic. This stat essentially tells us the average leverage index under which each reliever pitched last season, and thus gives an indication of his opportunity to influence a ballgame based on his performance.

So how can we go about using pLI to adjust our estimates of reliever value? Well, let's start with a way that LI is often used with win probability statistics: WPA/LI. WPA/LI, as described by Tom Tango, is a "situation deflated" version of Win Probability Added (WPA; see above), and describes the change in win probability that would occur based on that player's performances if every single plate appearance had happened in a situation with average leverage (i.e., LI=1.00). Therefore, it takes WPA, which is heavily situation-dependent, and converts it to something that is much more situation independent, and thus similar to more traditional estimates of player performance in which all plate appearances are given equal weight.

Now, WPA/LI is not the same thing as WPA/pLI. WPA/LI is calculated on a per-PA basis. Because pLI is just the average LI of all PA, WPA/pLI will differ from WPA/LI depending on the PA-to-PA variation in WPA and LI. Nevertheless, at least in concept, it's trying to do the same thing, and gives us a basis for applying leverage to reliever runs data.

If we use a 10 runs = 1 win approximator (commonly used, and consistent with the coefficients relating runs to wins that I showed in the first article of this series), then we can make this approximation:
RAA ~= WPA/pLI * 10

RAA is Runs Above Average, which mirrors WPA in that it's centered around league-average. If our goal is to get an estimate of value that is more dependent on the situations in which a reliever pitched, we're essentially asking for a value estimate that is more like Win Probability Added (WPA). So, with 9th-grade algebra, the above equation converts to:

WPA ~= RAA * pLI / 10
or
"RPA" = RAA * pLI = RARLI

In other words, we can simply multiply Runs Above Average by pLI to get something that approximates the situation-specific runs above average value of a reliever. Cool!

Nevertheless, I like to report value relative to a replacement-level baseline, not average. And converting our situation-specific RAA number to RAR requires a slightly round-about approach. If we revisit our reliever RAR equation from the previous article on pitchers, it was:

RAR = (RPG - 1.07*lgRPG) / 9 * IP * -1

which is the same as:

RAR = [(RPG - lgRPG) / 9 * IP * -1] + [(0.07*lgRPG)/9*IP]

This essentially just adds the additional runs a replacement pitcher would be expected to give up relative to an average pitcher to a reliever's RAA estimate. This converts it from a RAA estimate to a RAR estimate. So, the above equation is the same as:

RAR = RAA + [(0.07*lgRpG) / 9 * IP]

So, to make this a situation-specific RAR estimate, we can use this equation:

RARLI = [RAA*pLI] + [(0.07*lgRpG) / 9 * IP]
or

RARLI = [(RpG - lgRpG) / 9 * IP * -1 * pLI] + [(0.07*lgRpG)/9*IP]

It's a little bit ugly. But it does the job.

Please note that I'm only going to use this equation for relievers. While starting pitchers do diverge from 1.00 leverage from time to time, those deviations tend to be more or less random. Relievers, on the other hand, deviate in consistent ways from average leverage based on how their managers choose to employ them.

2007 Cincinnati Reds Pitchers, Take Two

How much of a difference does factoring reliever leverage into our estimates actually make? Well, here is the table from the previous article, expanded to also include leverage-based numbers for relievers (starters were forced to be 1.00 LI pitchers):

Base Runs
FIP Runs
Pitcher IP RAR
pLI
RARLI
Pitcher IP RAR
pLI
RARLI
AHarang 231.7 61.4 1.00 61.4
AHarang 231.7 56.0 1.00 56.0
BArroyo 210.7 25.9 1.00 25.9
BArroyo 210.7 30.9 1.00 30.9
DWeathers 77.7 11.7 1.95 20.0
MBelisle 177.7 25.9 1.00 25.9
KLohse 131.7 17.5 1.00 17.5
KLohse 131.7 20.4 1.00 20.4
MBelisle 177.7 13.7 1.00 13.7
DWeathers 77.7 8.3 1.95 13.5
JBurton 43.0 13.5 0.95 12.9
BLivingston 56.3 8.0 1.00 8.0
HBailey 45.3 4.4 1.00 4.4
HBailey 45.3 4.9 1.00 4.9
TShearn 32.7 2.2 1.00 2.2
JBurton 43.0 4.9 0.95 4.8
JCoutlangus 41.0 1.9 1.06 2.0
EMilton 31.3 4.4 1.00 4.4
BSalmon 24.0 3.2 0.46 1.9
MStanton 57.7 3.1 0.95 3.0
BLivingston 56.3 1.6 1.00 1.6
BBray 14.3 2.6 0.99 2.6
BBray 14.3 0.4 0.99 0.4
MMcBeth 19.7 1.8 1.21 2.0
EMilton 31.3 0.4 1.00 0.4
JCoutlangus 41.0 1.3 1.06 1.2
RStone 5.3 -2.8 0.33 -0.8
BSalmon 24.0 1.3 0.46 1.1
EGuardado 13.7 -1.0 0.95 -0.9
GMajewski 23.0 0.4 1.32 0.3
MMcBeth 19.7 -1.4 1.21 -1.9
EGuardado 13.7 -0.3 0.95 -0.2
ERamirez 16.3 -1.9 1.00 -1.9
RStone 5.3 -4.4 0.33 -1.3
RCormier 3.0 -1.4 1.40 -2.1
MGosling 33.0 -3.7 0.62 -1.8
VSantos 49.0 -3.4 0.78 -2.3
RCormier 3.0 -1.3 1.40 -1.9
MGosling 33.0 -8.1 0.62 -4.6
TShearn 32.7 -2.6 1.00 -2.6
MStanton 57.7 -5.2 0.95 -4.8
VSantos 49.0 -4.0 0.78 -2.7
KSaarloos 42.7 -9.1 1.05 -9.6
ERamirez 16.3 -3.2 1.00 -3.2
TCoffey 51.0 -10.4 0.94 -9.7
KSaarloos 42.7 -5.0 1.05 -5.4
GMajewski 23.0 -7.6 1.32 -10.2
PDumatrait 18.0 -5.6 1.00 -5.6
PDumatrait 18.0 -17.1 1.00 -17.1
TCoffey 51.0 -6.4 0.94 -5.9

The biggest difference we see between the first set of RAR numbers and the RARLI numbers, within both the Base Runs and FIP-based estimates, is that David Weathers' value gets a considerable boost. This reflects the excellent job that Reds' managers did in using him in high-leverage situations this season, often coming in to get an out or two during the 8th inning. Similarly, we see Gary Majewski's negative BaseRuns value exaggerated (appropriately) due to the fact that he performed terribly in high-leverage situations this year.

On the other side of the coin, the low leverage of the innings in which they pitched mitigated the negative value of several Reds pitchers, including Victor Santos, Ricky Stone, and Michael Gosling. While from a performance evaluation standpoint, this seems to let those pitchers off the hook, it seems appropriate to do this from the standpoint of assessing the value of these players to the 2007 Cincinnati Reds.

What if you don't have or don't want to deal with pulling pLI from fangraphs?

Updated 11 January 2008
As discussed earlier, when thinking about reliever value, it's insufficient to strictly consider the rate at which they give up runs because some runs are more valuable than others. Closers, in particular, tend to pitch in high leverage situations, and therefore should get more "credit" for their ability to pitch above reliever replacement level than a pitcher who only pitches in games that have a lopsided score.

For players since 2002, we can get actual pLI data from FanGraphs, and I discussed how to employ those data to adjust reliever run value estimates previously. However, what if you want to look at reliever value among players who played prior to 2002, like in my proposed series on past winning Reds teams? In that situation, you'd need some way of inferring reliever usage from other statistics.

One way to try to do this is by looking at performance--better pitchers should be used in higher-leverage situations. However, when attempting this approach, I've found that there's just very little predictive power (i.e. huge amount of scatter), even though there is a significant relationship between ERA (or FIP) and pLI. Whether that's due to within-team competition, inconsistent reliever performance, or poor decisions by managers, performance is just not a very good way to predict pLI.

On the other hand, as Darren implied, even in historical databases like Lahman's, we have at least one statistic that tells us something about usage: saves. Saves are well documented to be a rather poor indicator of reliever quality. Nevertheless, they do tell you who was pitching in the 9th inning of a team's games, which tends to be the inning with the highest leverages. So we should be able to use saves to infer something about reliever usage. Here's what I did:

Methods

I pulled stats, including both traditional pitching statistics and pLI, from fangraphs on all pitchers, 2002-2007, who threw at least 25 innings in relief in a season. There is some selection bias in such a sample, because it will tend to exclude a lot of bad pitchers who weren't given the opportunity to throw 25 IP. But it still does include pitchers that span much of the range in terms of performance, and gets around the issue of dealing with stats on pitchers with extremely small samples (not that 25 IP is a big sample...).

Next, I calculated saves per inning (Srate) as an indication of the proportion of a pitcher's innings that were associated with saves:

Srate = Saves/IP

It's important to use a rate because you want to know something about a player's opportunities. If someone gets 20 saves in 20 innings, they're probably pitching in much higher leverage situations, on average, than someone who gets 20 saves in 70 innings. Ideally, I'd also use blown saves--and maybe holds--but those stats are not available in the Lahman database or on baseball-reference's team pages, so I'm going to ignore them for now.

I also converted to pLI to a "rate" statistic using the approach suggested by Tom Tango:

rateLI = pLI/(pLI+1)

Such that:
pLI = 2 ---- rateLI = 0.667
pLI = 1 ---- rateLI = 0.500
pLI = 0.5 ---- rateLI - 0.333

This was important because as a pure ratio, pLI changes at a faster rate above 1.0 than it does below 1.0, which makes it hard to model using a regression-based approach.

Anyway, here's a plot of Srate vs. rateLI:
Obviously, that's a pretty ugly-looking relationship down in the zero/low-saves groups. But as you can see, there's a pretty nice relationship among pitchers who actually have a modest number of saves and their pLI. In other words, once someone starts to get saves, you can reasonably predict that he'll have an above-average pLI, and the player's pLI should steadily increase from there.

I decided to run with this and, in what I completely admit is a really terrible abuse of regression math (I've violated just about every assumption one can violate), I fitted a line to this relationship. I found that a second-order polynomial seemed to fit the data well. Furthermore, I forced the y-intercept to come in at a rateLI=0.5 (pLI=1.0), such that the average pitcher without saves is expected to pitch in average leverage (otherwise, the equation tended to predict that the vast majority of pitchers would have a pLI=0.8, and that's not reasonable). Here's the equation:
rateLI = -0.3764*(Srate^2) + 0.5034*Srate + 0.5

which we can convert back to pLI by:

pLI = rateLI/(1-rateLI)


Now, this rather shaky regression equation isn't something that I'd try to publish in the SABR newsletter, much less an academic journal. It's not built upon rigorous math. But it actually works pretty darn well. For demonstration, here's a table showing a hypothetical pitcher who has thrown 70 innings, and how his predicted pLI changes as the number of saves (and thus his Srate) increases:
Saves (70 IP)
Srate rateLI pLI
0 0.00 0.50 1.0
5 0.07 0.53 1.1
10 0.14 0.56 1.3
15 0.21 0.59 1.4
20 0.29 0.61 1.6
25 0.36 0.63 1.7
30 0.43 0.65 1.8
35 0.50 0.66 1.9
40 0.57 0.66 2.0
45 0.64 0.67 2.0
50 0.71 0.67 2.0
As you can see, the numbers seem to plateau at around a pLI of 2.0, which is about where MLB closers tend to plateau. David Weathers, for example, who had a Srate = 0.42 last season, had an actual pLI=1.95, which isn't far from his predicted pLI using this method. Pitchers with a smaller number of saves per IP--setup men, mostly--are assumed to have above-average but still relatively moderate leverage. Finally, guys without saves are assumed to have average leverage.

Anyway, I think that this is a pretty reasonable way to adjust for historical reliever leverage, at least among closers. Obviously, we're going to undervalue some relievers that aren't yet in the setup role but pitch in lots of big-time leverage situations in the 7th or 8th innings. But I think this approach will capture a lot of what we're trying to do with a reliever leverage adjustment.

David Weathers photo by Getty Images/David Maxwell