Table of Contents

Saturday, March 03, 2007

How should we calculate Zone Rating?

I spent a bit of time this evening looking at the behavior of the new fielding stats we have available at THT. This won't be a comprehensive analysis, as I'm only going to look at short stops (I'm still trying to understand Alex Gonzalez's defense). This is just a means of getting my feet wet with these stats.

I mentioned in my previous post that there are a number of ways to calculate zone rating with reference to plays made outside of a player's zone. The three possibilities were (note: acronyms are from THT's stat pages: BIZ = balls hit into the zone a player is responsible for, Plays = plays made on balls hit into a player's zone, and OOZ = plays made on balls hit outside a players' zone):
  • ZR_THT=Plays/BIZ. This approach ignores plays made out of the zone entirely. This is what is done with the ZR reported at THT right now.
  • ZR_STATS=[Plays+OOZ]/[BIZ+OOZ]. This is the approach used by Stats Inc.'s version of ZR. It effectively moves any balls a player fields into his zone, and does not differentiate between out of zone balls and balls hit into a player's zone.
  • ZR_DIAL=[Plays + OOZ]/BIZ. Modification suggested by Chris Dial and MGL. My favorite approach going into this, because it indicates every ball a player fields outside his zone should be worth a ball he missed within his zone. It's "extra credit" for working outside your zone, and gives no expectation that a player should have made the play.
I calculated all three versions of ZR on qualified shortstops from 2006 and compared their estimated ratings of shortstop defense to each other and to David Pinto's Probabilistic Model of Range. Here is the result:
This is a scatterplot matrix. You read it just like a table. All graphs along the top (first) row have ZR_THT plotted on the y-axis, while all graphs along the left-most (first) column have ZR_THT plotted on the x-axis. This is why plots along the diagonal create a perfect line.

A few findings. First, there isn't much difference between the modified ZR calculation purchased by THT and the ZR we've been seeing from Stats, Inc for years now (see row 2, column 1). That doesn't mean it's money wasted, though, because a) the ZR_STATS I'm showing doesn't include their addition of outs for double plays, and b) the in zone/out of zone data that THT has purchased allows us to do much more with the ZR stat than we could before (like write this sort of article).

The second observation is that the modification suggested by Chris Dial that adds out of zone plays to the numerator but not the denominator results in some big jumps up in ZR for some players (row 3, column 1). Here's a closer look at that graph:
Scatterplot of the basic ZR calculation (ZR_THT) and the addition of out of zone plays (ZR_DIAL) to the numerator. Line is the linear regression between the two variables.

Seven players saw their relative defensive ratings improve dramatically when out of zone plays were considered. Bill Hall saw a huge improvement in his rating thanks to his making a gigantic 66 plays out of his zone (one has to wonder if he was positioned correctly). The new Red SS Alex Gonzalez is also among those who saw major improvements in his rating. The biggest loser was Derek Jeter, who made only 28 plays out of his zone...and only 80.5% of plays in his zone.

Ok, so there's a big difference between ZR_THT and ZR_DIAL. Which is the better way to go? To make this determination, I compared these two ratings to David Pinto's PMR. Since both PMR and ZR are both supposed to be overall evaluations of defensive range, they should show a fairly tight relationship. All graphs (scatter plot matrix, row 4, columns 1-3) show a fair bit of scatter reflecting the differences in their calculation, but the fit between Pinto's PMR and ZR_DIAL (R2 = 0.37, P = 0.002) was substantially better than the fit between PMR and ZR_THT (R2 = 0.13, P = 0.08). You don't need the stats to tell you this either--the ZR_DIAL graph shows a visibly tighter relationship. In fact, the slope relating PMR and ZR_THT is not significantly different from zero!

Therefore, based on the relationship with Pinto's PMR, as well as my (and others') intuitive sense of what makes more sense, it seems to me that ZR_DIAL is the most informative version of Zone Rating available. At least for shortstops. ...though I expect this applies to every position around the diamond.

To close, let's look more closely at the relationship between ZR_DIAL and PMR:
The two statistics do agree on a lot of players. Adam Everett (of course) and Bill Hall (I didn't realize he was so good), for example, both seem to have done a brilliant job last season. Derek Jeter, on the other hand, looks positively miserable as a defender. But there are also notable disagreements, one of which involves Alex Gonzalez. Zone Rating absolutely loves Gonzalez, ranking him as the fourth-based shortstop in baseball last season. In contrast, PMR rates him as a middle of the pack kind of guy.

What gives? No idea. David Pinto's site offers some nice graphs comparing predicted vs. actual fielding performance along different vectors, and I've been staring at these for a half-hour now. Gonzalez's performance, as you'd expect from his overall rating, is an almost perfect match of his predicted performance on all vectors. He did seem to do pretty well on line drives, but those represent a fairly small number of outs. He also seemed to do a tiny bit better moving toward the bag than moving into the hole. Nevertheless, I don't see anything that really stands out as explaining why zone rating is so kind to him whereas Pinto's PMR is not. Maybe someone out there will have a better idea...? :)