Saturday, March 03, 2007

How should we calculate Zone Rating?

I spent a bit of time this evening looking at the behavior of the new fielding stats we have available at THT. This won't be a comprehensive analysis, as I'm only going to look at short stops (I'm still trying to understand Alex Gonzalez's defense). This is just a means of getting my feet wet with these stats.

I mentioned in my previous post that there are a number of ways to calculate zone rating with reference to plays made outside of a player's zone. The three possibilities were (note: acronyms are from THT's stat pages: BIZ = balls hit into the zone a player is responsible for, Plays = plays made on balls hit into a player's zone, and OOZ = plays made on balls hit outside a players' zone):
  • ZR_THT=Plays/BIZ. This approach ignores plays made out of the zone entirely. This is what is done with the ZR reported at THT right now.
  • ZR_STATS=[Plays+OOZ]/[BIZ+OOZ]. This is the approach used by Stats Inc.'s version of ZR. It effectively moves any balls a player fields into his zone, and does not differentiate between out of zone balls and balls hit into a player's zone.
  • ZR_DIAL=[Plays + OOZ]/BIZ. Modification suggested by Chris Dial and MGL. My favorite approach going into this, because it indicates every ball a player fields outside his zone should be worth a ball he missed within his zone. It's "extra credit" for working outside your zone, and gives no expectation that a player should have made the play.
I calculated all three versions of ZR on qualified shortstops from 2006 and compared their estimated ratings of shortstop defense to each other and to David Pinto's Probabilistic Model of Range. Here is the result:
This is a scatterplot matrix. You read it just like a table. All graphs along the top (first) row have ZR_THT plotted on the y-axis, while all graphs along the left-most (first) column have ZR_THT plotted on the x-axis. This is why plots along the diagonal create a perfect line.

A few findings. First, there isn't much difference between the modified ZR calculation purchased by THT and the ZR we've been seeing from Stats, Inc for years now (see row 2, column 1). That doesn't mean it's money wasted, though, because a) the ZR_STATS I'm showing doesn't include their addition of outs for double plays, and b) the in zone/out of zone data that THT has purchased allows us to do much more with the ZR stat than we could before (like write this sort of article).

The second observation is that the modification suggested by Chris Dial that adds out of zone plays to the numerator but not the denominator results in some big jumps up in ZR for some players (row 3, column 1). Here's a closer look at that graph:
Scatterplot of the basic ZR calculation (ZR_THT) and the addition of out of zone plays (ZR_DIAL) to the numerator. Line is the linear regression between the two variables.

Seven players saw their relative defensive ratings improve dramatically when out of zone plays were considered. Bill Hall saw a huge improvement in his rating thanks to his making a gigantic 66 plays out of his zone (one has to wonder if he was positioned correctly). The new Red SS Alex Gonzalez is also among those who saw major improvements in his rating. The biggest loser was Derek Jeter, who made only 28 plays out of his zone...and only 80.5% of plays in his zone.

Ok, so there's a big difference between ZR_THT and ZR_DIAL. Which is the better way to go? To make this determination, I compared these two ratings to David Pinto's PMR. Since both PMR and ZR are both supposed to be overall evaluations of defensive range, they should show a fairly tight relationship. All graphs (scatter plot matrix, row 4, columns 1-3) show a fair bit of scatter reflecting the differences in their calculation, but the fit between Pinto's PMR and ZR_DIAL (R2 = 0.37, P = 0.002) was substantially better than the fit between PMR and ZR_THT (R2 = 0.13, P = 0.08). You don't need the stats to tell you this either--the ZR_DIAL graph shows a visibly tighter relationship. In fact, the slope relating PMR and ZR_THT is not significantly different from zero!

Therefore, based on the relationship with Pinto's PMR, as well as my (and others') intuitive sense of what makes more sense, it seems to me that ZR_DIAL is the most informative version of Zone Rating available. At least for shortstops. ...though I expect this applies to every position around the diamond.

To close, let's look more closely at the relationship between ZR_DIAL and PMR:
The two statistics do agree on a lot of players. Adam Everett (of course) and Bill Hall (I didn't realize he was so good), for example, both seem to have done a brilliant job last season. Derek Jeter, on the other hand, looks positively miserable as a defender. But there are also notable disagreements, one of which involves Alex Gonzalez. Zone Rating absolutely loves Gonzalez, ranking him as the fourth-based shortstop in baseball last season. In contrast, PMR rates him as a middle of the pack kind of guy.

What gives? No idea. David Pinto's site offers some nice graphs comparing predicted vs. actual fielding performance along different vectors, and I've been staring at these for a half-hour now. Gonzalez's performance, as you'd expect from his overall rating, is an almost perfect match of his predicted performance on all vectors. He did seem to do pretty well on line drives, but those represent a fairly small number of outs. He also seemed to do a tiny bit better moving toward the bag than moving into the hole. Nevertheless, I don't see anything that really stands out as explaining why zone rating is so kind to him whereas Pinto's PMR is not. Maybe someone out there will have a better idea...? :)

6 comments:

  1. Interesting stuff, but what I'm not sure of is whether any of these metrics can clear up a fundamental flaw of positioning relative to other players. A common argument made in Jeter's defense is that he doesn't get outside of his zones because he's got a very good defensive shortstop playing next to him at third. Looking at aggregate data, the way it is currently tracked, makes it hard to know whether something like this is true. Like you said, maybe Bill Hall looks so good because he was out of position. I'd be interested to see if this had any impact on how the third basemen and second basemen look compared to him.

    I think a more ideal system for measuring range would track a player's range relative to their positioning. Perhaps if they could develop some sort of circles of influence which a player is expected to cover, we would be able to tell has the larger area of coverage better. I'm sure it would be very difficult to maintain and would require lots of computer modeling and lots of cameras to cover the field, but it would give us a breakthrough about how much a player's positioning affects their defensive ability.

    I'm sorry, I went off on an unrelated tangent there.

    One question I'm wondering about that you might know the answer to is, are these numbers adjusted at for infield shifts that are commonly played against players like Griffey and Dunn? Maybe Hall and Everett get a lot of credit for fielding balls on the right side of second because they happen to a lot of games against the Reds and happened to field more balls on shifts? Just one postulation.

    ReplyDelete
  2. Hi Joel,

    No, none of these statistics take into account player positioning at all, shift or no shift.

    There's a perspective in which that might be ok. I tend to think of positioning as part of a fielder's skill. If they (or their coaches) position themselves intelligently such that they can make more plays, this will result in a higher ZR. But if they're out of position and not making plays, that represents a deficiency relative to the rest of the league. So ZR goes down.

    As far as Jeter goes, there might be something to that. Looking at his PMR charts, he does seem to be well below average on fly balls hit to LCF and on over to 3B. So maybe Arod is interfering over there? On the other hand, he also shows deficiency on ground balls toward second (note that the scale on those graphs is pretty variable, so a small-looking difference on grounders can mean a lot of plays). So I'm not convinced that he's an unusual victim.

    Bill James had a good writeup on Jeter in the fielding bible, comparing him to Adam Everett. What I take from that is that Jeter is flashy, does a lot of things off-balance, etc, but actually doesn't cover much ground (slow first step or something). -j

    ReplyDelete
  3. I agree on Jeter. As Red Menace likes to say, Jeter has done nothing but prove over the last 6 years that he is a loser in the playoffs. :) I was just using him as a way to introduce the positioning debate.

    However, I think that positioning and range should be measured separately, partly because coaches control positioning just as much as fielders. I think having the two separated would benefit teams much more since they would know better if free agents they are looking at benefited from better positioning or if they're skill is what it truly appears. Yes, players have some control over their positioning (taking the extra step to the left or playing a little deeper), but being able differentiate between positioning and athletic skill would be a big step.

    ReplyDelete
  4. Well, I understand that argument (and this is starting to sound like the debate between Dial, MGL, and Gassko in one of the links above). But to me, what's important is to understand (in the case of a shortstop) the impact a player had on balls hit through the infield. I honestly don't care how he got to the ball, as long as he did get to the ball. I think it's just an issue of what you're after in your fielding statistic.

    ...all that said, I'm not disagreeing with you that it'd be nice to be able to parse "athleticism" and "positioning" out. I guess my point is that I think these stats do the most important thing right now, which is to measure the extent to which a player influences balls hit to his position.

    The thing I'd most like to see is THT convert their ZR (ideally the ZR_DIAL variant) into a runs +/- stat, much like Chris Dial did with the original ZR the last two seasons. -j

    ReplyDelete
  5. Great analysis. Thanks.

    Naturally, Dial's ZR will beat THT's, because it includes more information. That's why we present the OOZ numbers separately at THT, so you can analyze them in a different way.

    For instance, why not regress THT_ZR and THT_OOZ vs. PMR? I'd be interested in seeing how that does vs. Dial's ZR. The interesting thing is that you might come up with a coefficient for the OOZ number that tells you how much to weight it -- which is something Dial_ZR doesn't do.

    We list these stats separately and don't roll them up into a single number because fielding assessment isn't that easy. You've got to look at several "angles" to really assess a fielder's capability. Also, because John Dewan asked us to.

    ReplyDelete
  6. Dave,

    Thanks for the great comment. I love your idea to do the multiple regression to weight OOZ vs. Plays. I will do that.

    I'm currently putting together a fielding review for the Reds, and as part of that I'm doing some further (very simple) adjustments to the ZR data to convert it to a +/- plays statistic, like PMR. This avoids the need to use ratios (which can be misleading in some cases) and is a bit easier for me to understand in terms of on-field performance. I'll see if I can include a weighting measure of some sort as well.

    I think the fact that you folks list all this data on your site is absolutely wonderful, as it lets folks like me fiddle with numbers and really probe player performance. One of the biggest frustrations I have about baseball stat research is that so much of the better indicators are proprietary, and thus not reproducable. That's one of the things I like most about THT--that so much of what you folks do is transparent.

    Thanks again, -j

    ReplyDelete