Table of Contents

Friday, April 27, 2007

Can we adjust OPS to account for variation in BABIP?

"You know what the difference is between hitting 0.250 and hitting 0.300? It's 25 hits. Twenty-five hits in 500 at bats is 50 points. Okay? There's 6 months in a season, that's about 25 weeks -- you get one extra flare a week -- just one -- a gork, a ground ball with eyes, a dying quail -- just one more dying quail a week...you're in Yankee Stadium." -- Crash Davis (Bull Durham)

The degree to which a player's Batting Average on Balls in Play (BABIP) varies from season to season due to nothing more than random "luck" has been one of the more important discoveries in baseball player performance analysis. JC Bradbury's PrOPS statistic was an attempt to estimate batting performance based on batted ball types, rather than outcomes, thus removing a great deal of the luck (variation due to random factors) that we see in traditional "scorebook" statistics. The estimates of OPS that his approach generated seemed to get around variation in performance due to luck. For example, players with a positive difference between PrOPS and OPS (PrOPS-OPS > 0) one year tended to have either a negative difference or no difference the following year. Similarly, players with absurdly high BABIP's tend to have PrOPS estimates much lower than their actual OPS. More importantly, PrOPS has a better year-to-year correlation within players than OPS, and actually predicts subsequent year OPS better than OPS itself.

But there are two problems with PrOPS. First, JC Bradbury hasn't released (to my knowledge) the actual equations used to calculate PrOPS, which means that we can't completely evaluate his methodology. Furthermore, it restricts us to the PrOPS information we can gather from THT's (excellent) website, as there's no way for us to replicate his efforts. Second, batted ball data aren't always available. For example, I often like to look at splits throughout the season (e.g. home/away splits, or month-by-month splits), and I usually can't find batted ball data in those instances.

Therefore, I've been experimenting to see if I could develop a simple alternative to PrOPS that, while not as good, might still help separate lucky/unlucky performances from true player skill. In the end, I'm not entirely happy with the result, but I thought I'd report my findings nonetheless.

My approach is very simple, and is something I touched on in my spring training review: adjust a player's hit totals so that his BABIP is set to a more "typical" number, and then use those adjusted hit totals to calculate OPS. I make the explicit (and probably not completely valid, though also probably not fatal) assumption that the entire difference between "typical" and "expected" BABIP are singles--the "ground balls with eyes" that Crash Davis referred to above. If a batters' BABIP is higher than "typical," OPS should decrease, whereas if BABIP is already fairly typical, OPS should not change much.

Equations

Simple stuff here. The equation for BABIP is as follows:
  • BABIP = (H-HR)/(AB-HR-K)
So solving for H gives us:
  • H = (BABIP)*(AB-HR-K) + HR
This is the expected number of hits a player would receive for a given BABIP. We can then use these adjusted hit totals along with the other original stats (AB's, BB's, 2B's, 3B's, HR's) to calculate OPS as we usually do (we could also do the same with just about any other offensive statistic: Runs Created, Gross Production Average, etc). But what BABIP value should we use? I'll consider two alternatives.

The most straightforward, though not necessarily the most accurate way to estimate what BABIP should have been is to simply use MLB average. Over the past three seasons ('04-'06), the average BABIP, based on all balls hit into play, is 0.304.

This first approach, however, ignores the finding that batters do consistently vary in BABIP; excellent hitters often do have consistently higher BABIP's than poorer hitters. Dave Studenmund has done some research on this in the past and discovered that BABIP can be predicted with reasonable accuracy as %LD + 0.120 (I loosely confirmed this, though over the past three years the value one should add appears to be 0.117--though I used Studes' number in this report). Therefore, the second approach was to estimate BABIP using LD%. Granted, I am trying to get away from batted-ball data, but this will serve as a useful check of how badly the approach of using league average BABIP affects the results.

A third approach might be to use a hitter's career BABIP instead of current-year BABIP. This would be fine when looking at hitters with an established MLB track record, but we often want to look at hitters with only a year or two at the MLB level. It also complicates (slightly) the calculations we'd have to make, as we would no longer be able to calculate everything from just one row's worth of data (one season) in a spreadsheet. Therefore, I'm not going to pursue this approach for the time being.

Results

Below is a scatter plot matrix comparing OPS, PrOPS, and the two adjustments to OPS that I proposed above: avg-aOPS (MLB Average BABIP adjusted OPS, approach #1 above), and ld-aOPS (LD%-determined BABIP Adjusted OPS, approach #2 above). Data are for all batters from 2004-2006 with a minimum of 100 plate appearances. I did not control for individual in this study, so individuals who played multiple years are represented more than once. Sorry--if that freaks people out, I can go back and re-do it, but I doubt it would change the findings much. Also, I removed Barry Bonds' 2004 line, because his ridiculous 1.422 OPS had incredibly high leverage and was throwing off my regressions & correlations.

"R2," or r-squared, values indicate the proportion of variance explained by regressing one variable onto another. The square root of R2 is r, Pearson's correlation coefficient. As you can see, all four variables were well-correlated with one another. There were a few incidences where there appears to be some non-linearity, and I haven't tried to compensate for that as of yet.

PrOPS was the least well-correlated with actual OPS of the three OPS estimates, but that makes sense, because PrOPS is based on a completely different set of data (batted ball, k, and bb rates). The most interesting finding here was that both avgBIPaOPS and ldBIPaOPS were both better correlated to PrOPS than they were to actual OPS. This seemed to indicate that some of the luck-based variation that PrOPS removes from its performance estimates are also removed in the two BABIP-adjusted OPS estimates. That was a very good sign that this approach may have some merit.

Furthermore, there is almost no difference between avgBIPaOPS and ldBIPaOPS in their correlations to actual OPS or PrOPS. This indicates, at least initially, that we don't lose all that much information by adjusting everyone's OPS to MLB average, rather than trying to estimate it from LD%, even though we know it's technically not right to do so. The primary consequence appears to be that the LD%-based estimates show a better linear relationship than the league-average based estimates. This makes sense because it might reflect the fact that BABIP's do vary in consistent ways between hitters.

Again, for this approach to be useful to me, it needs to work reasonably well in the absence of batted-ball data like LD%. So to assess that let's look at a case example (the 2006 Reds) and see how the three approaches compare in their player assessments. The hope is that they will all tell a fairly similar story, as thus be compatible techniques:
Above you see three graphs comparing an estimate of OPS to actual OPS for the 2006 Reds (minimum 100 plate appearances). The first is PrOPS, which I'm holding up as the gold standard. Individuals in the upper-left corner probably "underachieved" last season, meaning that their batted ball stats predict a better OPS than they actually had. Likewise, individuals in the lower-right probably "overachieved" (i.e. they were lucky).

PrOPS indicated that five players may have underachieved for the Reds last season: Jason LaRue and Adam Dunn were particularly high off the curve, while Griffey, Valentin, and Clayton also may have deserved better than they got. Only one individual, Chris Denorfia, was identified as someone who probably overachieved last season. Let's see how those findings compare to those derived from BABIP adjustements.

ld-aOPS, the OPS estimate based on a line-drive estimated BABIP, told a somewhat similar, though admittedly not identical, story. It identified LaRue, Dunn, and Valentin as underachievers, though it put Griffey and Clayton back onto the curve due to their low LD%'s last season. It also identified Hatteberg, Castro, and (perhaps) Aurlia) as slight underachievers. Denorfia was once again identified as an over-achiever.

avg-aOPS, the OPS estimate based on league-average BABIP, was the most different of the bunch. That's unfortunate, as it's the only one that I can use when I only have access to traditional statistics. It identified only LaRue and Griffey, both of whom had sub-0.250 BABIP's last season, as underachieving. Denorfia is once again the only real over-achiever thanks to his 0.345 BABIP. All other players were basically right on the line.

In addition to looking at the Reds, I also took a look at those MLB players ranked as the most extreme under- and over-achievers over the past three years using these three approaches. This analysis identified another problem: both of the two BABIP estimates tended to identify poor hitters as "under-achievers" and good hitters as "over-achievers" (even when plate appearance requirements were stepped up in excess of 250). They did this to a far greater degree than PrOPS did--in other words, a bit part of what it's doing is just regressing players to toward the mean. This, along with the Reds results above, indicate to me that doing BABIP-adjustments provides, at best, only a very rough estimate for how well a player is hitting.

Should we consider using it?

Nevertheless, despite my disappointment in the overall performance of this little stat, I still think it can be worthwhile if used cautiously. When BABIP's deviate in dramatic fashion from typical values (<0.250,>0.350), using the avg-aOPS approach will still identify lucky and/or unlucky performances and get an idea of what they would look like under more "typical" luck. The thing to remember is that a positive difference probably does mean that good or bad luck has played into a player's actual OPS value. In contrast, a lack of deviation from expected values doesn't necessarily mean that a players' statistics were in line with his "true" performance, it just means that the player's BABIP looks about right for his performance.

Therefore, I will go ahead and use the avg-aOPS (I'll just call it "aOPS" from now on) approach from time to time this season when PrOPS or raw batted-ball data are unavailable. I expect that it will be particularly valuable when looking at monthly splits, as small sample sizes do tend to result in highly variable BABIP's, and some of the huge swings we see in player performance may be able to be explained by simple, random variation in BABIP as opposed to actual variation in true player performance.