Comparison of the Fielding Statistics
Ok, as we showed in the last piece, there are a lot of different options for fielding statistics. Some do seem better on the surface than others based on their methodologies. My preferences are for statistics that are based on the most specific data possible, and those that try to account for additional factors that could impact player performance beyond just fielding skill. But when we look at the fielding statistics, how do they vary with respect to how they actually rate players?
In the work that follows, I pulled 2006 data from all of the different systems for which I could find data. Here are my sources:
- Fans Scouting Report (FSR) - From Tango on his site, converted to +/- runs.
- Davenport Translations (DT) - from Baseball Prospectus.
- Ultimate Zone Rating (UZR) - From MGL on his site.
- Zone Rating (ZR) - From ESPN.com, converted to +/- runs.
- Revised Zone Rating (RZR) - From The Hardball Times, converted to +/- runs.
- Probabilistic Model of Range (PMR) - From David Pinto on his site, converted to +/- runs.
Feature | DT | UZR | ZR | RZR | PMR | Fans |
Based on Outs/Opportunities | x | x | x | x | x | |
Convertible to +/- Runs | x | x | x | x | x | x |
Built upon hit-location data | x | x | x | x | ||
Uses different rates for different zones | x | x | ||||
Includes adjustments for ball type | x | x | x | x | ||
Includes adjustments for batter handedness | ? | x | x | |||
Includes adjustments for ballpark | x | x | x | |||
Includes breakdown of component player skills | x |
So, based on that table, I would have to say that UZR and PMR have the best methodologies, with a nod to the Fans data because they can provide such unique insights into player skill. ZR and RZR are more or less the same thing--coarse interpretations of hit location data--though RZR has an advantage in that we know exactly how many opportunities (i.e. balls in zone) a player had, whereas with ZR we have to estimate it based on typical BIZ/inning rates for each position. Davenport Translations are based on scorebook data, rather than hit location data, and therefore--despite many additional adjustments--seems unlikely to be as accurate.
But how do these different systems rate players? Let's take a look. I found data on 251 players that showed up in all six datasets in 2006 (primary limiting factor was PMR, which has a cutoff of 1000 BIP for a fielder to be rated). I ended up removing Adam Everett and Manny Ramirez from the sample because they were so severely high and low, respectively, that I didn't want their data "artificially" strengthening my correlations due to their high leverage (correlations only dropped by about 0.02 across the board when I did this). This left me with 249 players.
First, here's a scatterplot matrix describing how the different metrics varied (includes all positions except catcher, which I ignored for this entire analysis...more on them later):
As you can see, while there's certainly a lot of scatter, there's reasonably good agreement between most of these metrics. Some are particularly well correlated (UZR vs. ZR are particularly agreeable), but most of these plots show discernible positive correlations. Given that they're all reasonable attempts to measure fielding, it's nice to see at least some level of agreement! On the other hand, the substantial amount of scatter indicates that there's a fair bit of disagreement among the various statistics in how to rate players.
Here are the correlation values in matrix form. All are massively significant due to the sample size involved, even after I do an Bonferroni adjustment for multiple comparisons.
FSR | DT | UZR | ZR | RZR | PMR | |
FSR | 1.00 | 0.36 | 0.42 | 0.39 | 0.42 | 0.54 |
DT | 0.36 | 1.00 | 0.38 | 0.30 | 0.30 | 0.47 |
UZR | 0.42 | 0.38 | 1.00 | 0.81 | 0.58 | 0.53 |
ZR | 0.39 | 0.30 | 0.81 | 1.00 | 0.52 | 0.45 |
RZR | 0.42 | 0.30 | 0.58 | 0.52 | 1.00 | 0.60 |
PMR | 0.54 | 0.47 | 0.53 | 0.45 | 0.60 | 1.00 |
Update: I've made two adjustments to the FSR data since I first published this article. First, I realized that I had forgotten to account for playing time in these estimates--I have no pro-rated each players' FSR rating by their innings played to better correspond to the scale the other stats report fielding data. Second, Tango directed me to a recent post of his that report his weights. I am now using those values. The result was a slightly better fit with the DT's (+0.04 correlation), a slightly weaker fit with the STATS-based estimates (-0.04 correlations), and almost zero change in the fit with the BIS-based estimates.
Discussion of findings:
STATS vs. BIS
UZR and ZR are particularly well correlated to one another, easily showing the closest similarities among any two variables in this dataset. The most reasonable explanation for this is that this is because they are built upon the same raw dataset--hit location data furnished by STATS Inc. At the same time, while the effect is not nearly as dramatic, the two datasets built upon hit location data purchased from BIS (RZR and PMR) are more similar to one another than either is to the other statistics.
So we seem to have an effect of the raw statistics provider here. This is consistent with Michael Humphreys study last August. He found substantial differences between the STATS- and BIS-based fielding statistics, even after the methodologies were made as similar as possible, with correlations topping out at ~0.60.
Is there any indication that one provider is better than the other? Well, one way to evaluate this is to examine how the two stats compare to other assessments of fielding performance. Humphreys found that his own fielding statistic, DRA, is better correlated with BIS than with STATS. Similarly, the FSR and DT estimates correlate better to PMR in this dataset (2006 fielders) than they do with either of the estimates built upon STATS inc data....although they don't correlate especially well with RZR, which is built upon BIS data. At this point, while PMR does better in these comparisons than the other statistics, I'm not comfortable saying that we have a good indication of whether one provider is better than the other. And this means that we should incorporate fielding estimates based on data from both providers whenever possible.
Coarse vs. Careful Systems
Another comparison that one can make is between the fairly "coarse" fielding estimates (like ZR and RZR) that involve artificially-established zones of responsibility for fielders, to those that are more "careful" (like UZR and PMR) in that they consider fielder performance in all zones, and include adjustments for varying batted ball velocities, batter handedness, park factors, etc. If we agree that careful fielding estimates are preferable to coarse ones, we'd expect that these more careful estimates would be a closer match to one another than they would to the "coarse" estimates, once you get past the data provider issue.
That's not really what we see though. PMR does correlate better with UZR and with ZR. But UZR correlates better with RZR than with PMR. So I'm not seeing indications that the more careful, presumably "better" methodologies of UZR and PMR are resulting in substantially different fielding estimates than the more coarse approaches. That's not to say that I don't still prefer careful methods...it just means that the coarse methods are still useful when we don't have access to the careful methods.
Davenport Translations
Baseball Prospectus more or less has refused to even consider using an alternative fielding system to its DT's (aka FRAA), despite the apparent methodological advantages to using a system built upon hit location data. And because BPro is so well-regarded, their fielding numbers tend to be widely used. At the same time, some stathead-types are skeptical of these numbers because the data they are built upon are things like putouts and assists, rather than the hit location data powering the zone-based statistics.
To me, the above correlations indicate that those who are skeptical of the BPro fielding numbers have justification: DT's have weaker correlations across the board than any of the other fielding estimates do with each other, including the Fans Scouting Report. Now, it's possible that DT's are measuring something real that the other systems are not. But, given that they are based on apparently "weaker" data than the other systems, my inclination is to think that they're not as good as those other options. That, of course, is consistent with the general thoughts of many baseball researchers, but it's nice to see the numbers bearing this out.
That said, DT's do have some advantages. Because they are generated from scorebook statistics, they can be calculated on virtually any player who has ever played the game. This makes them a nice tool to assess historical fielding performances--and these data bear out the idea that DT's do have meaning. But I don't think there's much reason to use them for modern ballplayers when other superior options are available.
The Fans' Scouting Report
This was the big surprise for me. While the hit location statistics were better correlated to one another than they were to the Fans, FSR data does pretty darn well for itself. Remember, these are data that are based on the subjective rankings of fans, many of whom are no better trained than I am to evaluate defense visually. They are then weighted and converted into +-runs by a process that is as much intuition as it is scientific. So for it to even come close to the hit-location statistics, not to mention being a better match than the DT's, was pretty exciting.
Now, one might still argue that the fact that they still weren't a good enough match to the other stats to be worth using. But I'd disagree. The FSR data are extremely different in both their basis (survey data), and how they go about evaluating fielding (weighted analysis of different skill categories, specific to each position). The fact that they match as well as they do, while undoubtedly providing information that is not included in the other statistics, indicates to me that they are worth considering. I might not weight them as highly as the other stats given their subjective nature, but I'd like to include them in my ultimate assessments.
Recommendations
First, I think we can probably ignore DT's whenever we have something else to work from. That's probably not a very controversial point.
Second, the apparent differences in raw data provided by STATS and BIS indicate to me that it's important that we incorporate estimates from both providers in our fielding estimates. If both datasets tell us the same thing, we can be reasonably confident in our conclusions. Therefore, I'd recommend treating fielding estimates from the two providers as equals until/unless we find reason to favor one over the other. I would also discourage folks from using data from one exclusive source whenever possible. I'm obviously guilty of violating that recommendation throughout the history of this site, but I'm going to try to more consistently use both datasets moving forward. :)
Ok, so we're going to use both STATS and BIS data. But how specifically should we do this? I see two possibilities. One, we average across all four estimates (if they are available). Or, we pick one estimate from each provider and average them.
I'm partial to the latter approach. Including all four doesn't seem like it would provide any particular advantages...if anything, it seems more likely to pull down the accuracy of our estimate because ZR and RZR are less carefully designed than UZR and PMR. So, I'd take the best available stat from BIS (PMR over RZR) and the best available stat from STATS Inc (UZR over ZR) and take the average. If one or both of the careful estimates are unavailable, as is currently the case for 2007, I would not hesitate to use the average rating of ZR and RZR. They match up well to the more careful systems, and still are based upon high quality hit location data.
Third, I also think that the availability of the Fans' Scouting Report data is pretty exciting, as it provides an entirely different set of data that still seem to do a good job of describing variation in fielding. So my inclination is to include FSR data in our estimates if it's available. The question, however, is how specifically to do this: do we treat it like the equivalent of the BIS and STATS data? Or do we down-weight these data in recognition of their inherently subjective nature? I'm open to suggestions on this, but at this point I'm partial to the latter approach...so here's one possible equation that goes this route:
+-Fielding = 0.375*STATS + 0.375*BIS + .25*FSR
Where STATS is the best available fielding estimate based on STATS data (UZR or ZR), and BIS is the best available fielding estimate based on BIS data (PMR or RZR). That puts the fielding estimate at 75% of hit-location data (split 50/50 between STATS and BIS), and 25% of Fans' data.
Next time, we'll wrap up the position players. We'll start by taking a rough pass on evaluating catchers (who are still rather poorly studied, with little precedent to work from), work out how to incorporate position adjustments, and finally put it all together by reporting total ratings (offense + defense) for the 2007 Cincinnati Reds position players!
Justin I'm loving this series your doing. You really have a knack for taking complex technical jargon and distilling it into a more easily read format. Keep it I'm learning a great deal.
ReplyDeleteThe basis for the BPro fielding system was laid out in detail in the print annual "Baseball Prospectus 1998 Edition" pp. 6-10; a high level followup was published on their website by Clay Davenport which alluded to a couple of further refinements. one of its virtues is that the method "works" for minor league data as well as older major league seasons. The data points for it are estimated from standard statistics, thus BIP is estimated from innings pitched rather than more directly calculated from from BFP, L/R hitter adjustments from totalling team innings pitched by left and right handed pitchers, and ground ball/fly ball adjustments from the ratio of infield assists to outfield putouts. A park factor adjustment was added according to the online article, but not explained. As you say, there is no reason to use this system as an independent check on systems which are taking these factors into consideration based on more direct counting from the modern statistical services.
ReplyDeleteThanks for the scoop, Joe. I hadn't tracked down that Davenport article, and it was very helpful to see some of the thought processes that go into that statistic.
ReplyDeleteI did try to acknowledge the DT advantages for more historical studies, but your point about their utility in evaluating minor leaguers is also correct. If only they had stats on all minor leaguers at their website! :) ... or, of course, they could release the methodology so that we could calculate it ourselves...
-j
Was there ever a corresponding study that found the R for what somebody defined as the future accurate results?
ReplyDeletein other words, which system most accurately predicts future fielding ability? is that possible? can you only tell if the system is predictive of itself in terms of its own future numbers?
is the reason you did not select one here to be the most accurate that there is no way to tell?
Hi,
ReplyDeleteI think it is the case that you cannot choose one stat as being the most accurate, simply because there's no gold standard for comparison. Perhaps the best you can do is to compare the objective metrics vs. the Fan Scouting Report, because then at least you're comparing vs. a groupsense perception of defensive skill. But even then, you don't KNOW that you have a most-accurate system.
Fielding stats are different from looking at run estimators, for example, because you can use actual within-inning or within-game run totals to test your statistic.
That's why my general preference is to take an average of stats from multiple sources--BIS, STATS Inc, FSR, and maybe something like TotalZone. Unfortunately, that's hard to get now that I'm not releasing my own composite fielding stats. It's something I could get back to doing...
In terms of prediction, though...accuracy does not necessarily mean the same thing as prediction... But that said, I can't think of a survey study of that sort that focused on how good each of these stats is at prediction. As a general rule, though, based on studies of one fielding stat or another, fielding statistics require about two seasons of data to be predictive. The exception might be the Fan Scouting Report, which I regard as a bit more stable than something like UZR, PMR, etc. The problem with FSR is that I think you could have more of a problem with systematic bias...
-j