This week in the SABR101x course, we're covering hitting statistics. The lesson was very similar to an article I wrote 6 years ago on this site comparing the ability of different offensive stats to predict runs scored (many others have written such articles; it's a classic approach to addressing the question of hitting stat quality).
In it, I argued that there wasn't much of a problem to using OPS if it improved communication because the gains from it to other, better stats (like wOBA) were so meager. Fortunately, in the time since, FanGraphs has popularized wOBA so much that I feel pretty comfortable just reporting it and ignoring OPS altogether. And in many cases, I've even moved on to using wRC+ to get the advantages of park controls and run environment-neutrality. OPS just isn't necessary anymore.
In any case, I thought it would be fun to reproduce that article here. Here's the most relevant graph. The rest appears below the jump.
In a comment to my latest "weekly" stat review for the Reds, Bluzer politely critiqued my use of OPS (and, by extension, PrOPS). OPS is certainly not a great statistic. For details on some of the arguments made against it, check out Patriot's fine assault on it here.
But, in general, OPS seems to work well, and it's widely used... and it's something I've been looking at for a long time to judge players, so I'm comfortable with it. And that's pretty much the whole reason that I've continued to use it around here.
But how much am I missing by relying on it so much? Would I be better off scrapping it altogether and sticking to linear weight-based statistics like wOBA or R/G? I'm sure the answer is "yes," but how much of a difference does it make? I generally do report R/G with my player reports, but even so, I'll admit that my eye goes first to OPS when I judge a hitter. Old habits die hard.
I decided to run a quick study to see what difference it makes. I pulled MLB team offense totals, 2005-2007. No particular reason to use three years, I just figured that I wanted to include more than one year in case there was something peculiar about a given year. And you can quibble with whether team totals are a good way to evaluate stats for individual players--after all, team stats are less variable than hitter stats (update: in this sample, team OPS ranged from 0.708 for the '05 Nationals to 0.829 for the '07 Yankees). But team stats provide a clean measure of actual runs scored, which allows us to ask how well a given rate stat predicts actual runs scored. And scoring runs is pretty much the definition of good offense, right?
Anyway, below are Pearson's correlations between runs scored and a variety of rate stats that I've seen proposed and used here and there around the internets. Numbers closer to 1.0 indicate a closer relationship between runs scored and the statistic in question:
TA-orig (~TB/PA): 0.8739
BOP (~TB/outs): 0.8955
2OPS (2*obp+slg): 0.9323
GPA ([1.8*obp+slg]/4): 0.9326
Note: RAA/PA and R/G use my custom linear weights for 2003-2007 MLB. They have about twice as many parameters as wOBA (including stolen bases, gdp's, etc), and they're more specifically tuned to this particular era. Also note that because EqA is so damn hard to calculate, I used Patriot's conversions to get from EqRAW to EqR. So, EqA's spot in this list might be improved by a more careful calculation. Sorry 'bout that...but I just didn't have the patience to calculate it properly, as I'm not a particularly big fan of it because of its unnecessary complexity.
Here are those data graphically, which I think helps with interpretation:
That's pretty much the order I expected to see (though I had no idea where Bluzer's stats would fall out). Big improvement by going from AVG to either OBP or SLG. Pretty big jumps from SLG on up to OPS. But not much of an improvement after you hit OPS. The most accurate stats proved to be those based on linear weights: RAA/PA, wOBA, and R/G. That's gratifying, as I've been treating them as the gold standard statistics on this site for rating hitters... but they're incremental improvements, at best, over plain old OPS in terms of predicting runs scored.
I did find it interesting that Base-Out Percentage (BOP) and Total Average (TA), two representatives of the "bases divided by outs or PA" group of statistics that recently were discussed here, did so poorly. I didn't expect them to do as well as the linear weights statistics, but I didn't really expect OPS to do so much better than they did either.
In his comment, Bluzer recommended his latest statistic, ABSO. It is true that it did a tad better than OPS in this analysis at predicting runs scored. But at the same time, equally easy-to-calculate statistics like OBP*SLG or 2OPS did even better. GPA did the best of all statistics based on OBP and SLG, which is consistent with its reputation. But again, everything from OPS to R/G gives you very similar rankings. So, the lesson is that it really doesn't matter all that much which one you choose!
That's not a novel insight by any means...in fact, after I wrote this, I discovered this two-year old study by Dan Fox that reports much the same thing, and takes it a step further by showing how OPS can break down into something very much like linear weights. But I like to do things myself, and it's nice to see the same trend born out in yet another study.
So, in the end, I'm going to keep using OPS as one of the ways that I judge hitters on this site. I will keep on reporting R/G as well, as it has several other advantages in addition to being more accurate that come into play now and then (especially in terms of easy park factor corrections). But for most purposes, OPS is going to do just fine. Yay.
Update: In case anyone wants to play, I've uploaded a copy of the spreadsheet I used to calculate these results to this location. It also includes data based on 5-year team totals, which I calculated after the 3-year totals as a check. They conform very closely to the results I posted here, though wOBA beats out my R/G stat (barely), and EqA looks worse. Thanks to Baseball Reference for the data.
Update2: Victor Wang pointed out another relevant study that I'd missed (and I'm sure there are others as well), which he published in a recent SABR newsletter. He looks at the coefficient used to weight OBP vs. SLG in OPS calculations, and finds that the best coefficient varies considerably from era to era. In my dataset, 2005-2007, the 1.8 weighting (like what is used in GPA) works best. But I also have a 2003-2007 dataset (see the spreadsheet I linked above), and correlations from it it match Wang's finding that 1.6 works better in those years than 1.8. But, as is the theme here, it really doesn't make all that big of a difference...