Friday, March 30, 2007

How should we calculate Zone Rating? (Part II)

In a previous post, I looked at a variety of ways to calculate zone rating--a fielding statistic--using the new dataset that is freely available at The Hardball Times. Tonight, I'd like to report some of the continued work I've done on this subject.

Introduction & Goals (you can skim this)

The first fielding statistic that I really felt at home with was devised by Baseball Info Solutions and published in The Fielding Bible last year. It had just about everything I'd want in a stat -- the methodologies behind it are straightforward (even if I can't replicate them myself), and it's easy to interpret, thanks to the fact that it's reported in terms of +/- plays vs. average (i.e. a +2 rating means that a player made two more plays than an average fielder would, given his chances).

Unfortunately, BIS decided not to release these statistics to the public this year. I inquired about buying them electronically, but they wanted at least $100 for my personal use--and probably would not have been pleased if I'd posted them here. Pretty steep price given that these stats were just a part of a $20 book last year. :)

Thankfully, the folks at The Hardball Times purchased detailed Zone Rating statistics from BIS. While they're not the same as the stats that were the highlight of the fielding bible, they're still very good: essentially they assign each fielder a zone (or rather, a set of zones) on the field and assess how many balls hit into that zone the fielder converts into outs. This has advantages over more traditional fielding stats like fielding percentage because it incorporates fielder range into the estimate of fielder quality in addition to his sure-handedness and ability to throw accurately. And it's better than range factor because it accounts for the number of balls a player had the opportunity to field, rather than just assuming that all players get the same number of chances at a given position.

The THT stats, along with David Pinto's PMR stats, give us, the public, two high-quality fielding statistics that are available for free! The only problem is that it's not entirely straightforward how to use them. As I wrote about in my last piece, the ZR ratio calculated on THT.com doesn't include balls hit out of the player's zone, which seems like something we'd like to account for to understand good fielding (see Bill Hall at shortstop last year, for example). But how should we account for these out of zone balls? Furthermore, it has a similar problem to range factor in that each position has a different standard for what a "good" ZR is.

I'm excited to have a chance to use ZR stats this year as I evaluate players. But in order to use it effectively, I wanted to:
  1. incorporate, in an appropriate way, out of zone plays as well as plays made within a players' zone in the Zone Rating Estimate.
  2. report the statistics in an easy to use +/- format.
Methods

Converting ZR to a +/- system.

This is a pretty easy thing to do because THT reports not only the classic ZR ratio (plays made / ball in zone) statistic, but also all the stats that are used to calculate it: balls hit into a player's zone (BIZ), plays made on balls in the player's zone (PLAYS), and plays made on balls out of zone (OOZ). This gives us the tools we need to convert ZR into a +/- statistic.

First, we'll calculate the average proportion of PLAYS per BIZ at each position by simply summing up the total number of PLAYS and total number of BIZ across all players in major league baseball at each position (easily attainable from THT's stats output) and taking the ratio. Here are the expected (average) ratios for each position based on 2006 data (I haven't confirmed that these numbers are stable across years...I'd guess that they don't vary too much though, except when you have a high-leverage player like Albert Pujols or Adam Everett throwing off the mean...which we do):

PositionExpected PLAYS/BIZ
1B
0.799
2B
0.820
3B
0.706
SS
0.818
LF
0.608
CF
0.811
RF
0.638

Once we've done this, we can easily convert a players' PLAYS data into a +/- Plays statistic using this equation:

+/- Plays = PLAYS(actual) - [BIZ(actual) * ExpPRATIO]

where PLAYS(actual) is the actual number of plays a player made, BIZ(actual) is the actual number of balls hit into a player's zone, and ExpPRATIO is the expected ratio of PLAYS per BIZ calculated reported in the table above. Really easy stuff, and something anyone can do really quickly once you have the expected ratios.

Update: Tangotiger extended this work by reporting PLAYS/BIZ values from 2004-2006 on his blog. If you're going to use this approach on years other that 2006, you should probably use those data rather than what is in the table above.

We can do the same sort of thing with OOZ to estimate the average number of OOZ plays made per BIZ (this makes an assumption that the number of balls in hit into the player's zone will be tightly correlated to the number of balls hit just outside his zone). Here are those ratios:

PositionExpected OOZ/BIZ
1B
0.415
2B
0.096
3B
0.150
SS
0.126
LF
0.030
CF
0.162
RF
0.031
(you almost have to wonder if they should add another zone of responsibility to 1B's)

Update: Tangotiger's post on his blog also provides the '04-'06 data, which would be better to use if you're doing these stats on years other than 2006. Note that he reports the expected OOZ (see below) rate as OOZ/(PLAYS + OOZ), but he reports the total BIZ and OOZ data over that time period that you'd need to calculate expected OOZ in a fashion like I did here.

We can then get a +/- figure for a players' OOZ plays as:

OOZ(actual) - [BIZ(actual) * ExpORATIO]

where OOZ(actual) is the actual number of out of zone plays a player made, BIZ(actual) is as above, and ExpORATIO is the expected ratio of OOZ per BIZ reported above.

How should we combine +/-PLAYS and +/-OOZ?

I took three approaches to go about finalizing this +/- statistic:
  • Ignore OOZ plays entirely and use only +/-PLAYS.
  • Simply add +/-PLAYS and +/-OOZ together.
  • Estimate a coefficient for OOZ via a regression on Pinto's PMR statistic (this was done at the suggestion of Dave in my prior post--thanks Dave!).
The first two are really straightforward. The last one is slightly more complicated. Basically, here's what I did:
  1. Converted PMR to a +/- outs statistic by subtracting actual outs from expected outs for each player (all of the needed data to do this are provided by Pinto in his reporting).
  2. Consolidated players at each position to match those reported by Pinto in his 2006 reports over the winter (his cutoff was that the player was present in the field for 1000 balls in play). I also had to calculate season totals for a number of players who switched teams during the year, since THT reports them on separate rows while PMR reports them as one row.
  3. Ran a multiple regression, setting +/-PMR as the dependent variable and setting +/-PLAYS and +/-OOZ as predictors.
  4. So long as both the effects of PLAYS and OOZ were significant, I recorded the regression coefficients from this regression. Then, to make calculations simple and straightforward (i.e. usable by others), I then divided both coefficients by the coefficient for PLAYS, such that PLAYS would have a new coefficient of 1.0, while OOZ would vary from 1.0 depending on its weight in the regression equation.
  5. Used this OOZ coefficient calculate a combined +/- statistic (I'll call it +/-ZR_ADJ for now) that incorporates both PLAYS and OOZ:
    • ADJ_ZR = PLAYS(+/-) + WEIGHT * OOZ(+/-)
Finally, once I had all three +/- estimates based on the THT data, I compared these estimates to Pinto's +/-PMR data to see which estimates followed his work the closest. The assumption here is that, since PMR and ZR are supposed to measure the same thing, the +/-ZR estimate that is the closest fit to PMR is the best one to use.

Results

The results vary by position. I'll divide them into three groups: middle infielders, corner infielders, and outfielders.

In the graphs that follow for each group (sorry about the small size), I report the closeness of fit between Pinto's +/- PMR data and the three +/- ZR-based fielding stats as "R2" (should be R-squared), which is defined as the proportion of variation in PMR data explained by the ZR data (ranges from 0 to 1, with 1.0 being a perfect match). I list the three ZR stats as ZR_THT (+/-PLAYS only, no OOZ), ZR_DIAL (simple summing of +/-PLAYS and +/-OOZ, named after Chris Dial as in my previous post), and ZR_ADJ (weight coefficient modifying +/- OOZ, added to +/-PLAYS).

Middle Infielders

As you can see, while there was a sizable improvement (double the R2) in the fit between ZR and PMR when you include the OOZ data. There was very little difference between simply summing PLAYS and OOZ (ZR_DIAL) and using the regression-based weighting coefficient to modify OOZ before adding this to PLAYS (ZR_Adjusted). Among middle infielders, these coefficients are fairly close to one (1.10 for 2B's and 1.28 for shortstops), so it's not surprising to see such a close correspondence. There's very little here to argue against just summing PLAYS and OOZ.

Corner Infielders

Oh, weirdness. First basemen are actually fairly similar to middle infielders, in that adding OOZ data makes a huge improvement in the R2. This is probably tied into the fact that first basemen typically make a huge number of plays outside of their zone, as currently defined (see OOZ/BIZ table above). Not much improvement when I used the weighting coefficient to modify OOZ in ZR_Adjusted.

Third basemen, however, are very different. First, there's a remarkably good correspondence between the ZR data and the PMR data. No idea why it'd be so good for third basemen over other positions. Second, you get a substantial improvement in fit when you add weighted OOZ to the PLAYS data (ZR_Adjusted), but not when you just add unweighted OOZ to PLAYS. The reason is that the coefficient modifying OOZ at 3B is the farthest from one among any of the infield positions (= 0.42). Therefore, it seems that among third basemen, OOZ plays should effectively be divided by two when calculating our ZR estimates.

By the way, that ridiculous point in the upper-right for all the 1B graphs? That's Albert Pujols. He made 39 more outs than expected of an average first baseman last year according to both PMR and ZR. Given his offensive production, that's just absurd. It's unfair to everyone else. He's definitely my pick for MVP last season.

Outfield
Here's where things get hard. As you can see, I didn't report ZR_ADJ for outfielders. The reason was that the regression equations never showed a significant effect of OOZ in any of the outfield positions. This makes anything you do with the OOZ coefficient pretty unreliable, so I didn't use them.

Explanatory power of the ZR stats is good in left field, and improves slightly when OOZ data are included. Even though OOZ didn't contribute a significant addition to the regression model, I think it's worth including them in the ZR estimate--they do improve the fit (slightly), and conceptually, I can't help but think that we should take those sorts of plays into account.

In center, the only significant relationship was between ZR_DIAL and PMR, and even then it only explains 10% of the variation. I'm not sure why center field is so hard to understand, though I imagine part of it must be due to the huge range of locations on the field where a center fielder might position himself throughout a game compared to other positions. It also may have to do with the variety of types of balls that are hit to center field. ZR does a poor job of dealing with different types of fly balls, whereas PMR seems to do a better job of at least accounting for this in its calculations. Looking at the ratings of center fielders, it's hard for me to say which stat is working better...though ZR's negative rating on Beltran does raise my eyebrows:
RankZRPMR
1st
Willy Taveras (+34, rated +10 by PMR)
Beltran (+18 , rated -5 by ZR)
2nd
Curtis Granderson (+34, rated +4 by PMR)
Corey Patterson (+16)
3rd
Corey Patterson (+22)
Joey Gathright (+15, rated +1 by ZR)
4th
Juan Pierre/Reggie Abercrombie (+15, both rated +4 by PMR
Coco Crisp (+13, rated -15 by ZR)
5th
Brian Anderson (+14, rated +4 by PMR)
Aaron Rowand/Johnny Damon (+12, rated -3 and +1 by ZR, respectively)

Right field looks similarly bad at first, but much of the issues there are driven by a single outlier who is rated highly by ZR and poorly by PMR: Brian Giles (rated as +10 by ZR_DIAL, but -21 by PMR). If you remove him from the dataset, the R2 increased from 0.29 for ZR_THT data to 0.34 for ZR_DIAL. OOZ is still not significant in the multiple regression. But overall, as long as you ignore whatever the heck is happening with Giles, right field is very similar to left field--just with a bit weaker fit.

Discussion

Recommendations for ZR calculation

First, I personally find the +/- conversion to be a huge improvement from the traditional way the ZR data are reported, which is the ratio between plays made and balls hit into the zone. A +/- conversion allows anyone to immediately and intuitively assess a fielder's abilities. I highly encourage THT to consider adding two columns to their stats output that automatically report +/- PLAYS and +/- OOZ.

Second, in terms of how to incorporate the OOZ data into the ZR fielding estimate, the data seem to indicate that simply summing the two is perfectly adequate--and advantageous compared to not using OOZ data at all--for all positions except 3B. There, it's best to multiply +/- OOZ by 0.4 to down-weight its effects. I don't have any good ideas why OOZ plays should receive so little weight in this case--maybe it has to do with charging plays on bunts in the infield?

Update: Based on the discussion below, I'm torn about whether to use the coefficient on 3B. While it did result in a better fit with PMR, it could be that PMR behaves strangely with 3B and therefore that we're decreasing the accuracy of the ZR statistic by using that coefficient. Since we don't have an independent reason for adding the 0.4 coefficient (i.e. a "baseball" reason why OOZ plays shouldn't have as much of an impact on ZR at 3B), I'm now inclined to calculate it line all the other positions: [+/- ZR] = [+/- PLAYS] + [+/- OOZ].

Finally, while I've reported all values here in absolute terms, it is possible to convert +/- ZR numbers to a rate-like statistic. My suggestion is to divide the +/- ZR statistic for a player by his BIZ, and then multiply by 400 -- a number that seems to be at the upper end of how many balls a fielder (well, at least non-first basemen) will see in a full season. The actual number you multiply by doesn't matter, of course, as long as you're consistent across all individuals. You could even vary it by position, though I'm not sure if that's really worth doing.

ZR Top-5 2006 Fielders

As a diagnostic, I wanted to close by having a quick look at the top player rankings at each position according to ZR (calculated as recommended above), PMR, and the Fielding Bible (FB; extracted from the Bill James 2007 Handbook).

PositionRankZR+/-PMRFielding Bible
1B
1st
Albert Pujols (+39)
Albert Pujols (+39)
Albert Pujols (+19)

2nd
Doug Mientkiewicz (+24)
Lyle Overbay (+16)
Doug Mientkiewicz (+16)

3rd
Chris B Shelton (+21)
4-TIED at +13
Kevin Youkilis (+10)

4th
Richie Sexson (+19)
Niekro, Morales
3-TIED at +7:

5th
Travis Lee (+15)
Dan & Nick Johnson
Garciaparra/Hatteberg/Lee
2B
1st
Aaron Hill (+21)
Orlando Hudson (+32)
Jose Valentin (+22)

2nd
Jamey Carroll (+16)
Jamey Carroll (+26)
Aaron Hill (+22)

3rd
Jose Valentin (+16)
Chase Utley (+25)
Chase Utley (+19)

4th
Tony Graffanino (+16)
Aaron hill (+24)
Mark Ellis (+13)

5th
T-Ellis/Polanco (+15)
Mark Grudzielanek (+22)
Tony Graffanino (+13)
3B
1st
Scott Rolen (+27)
Joe Crede (+38)
Brandon Inge (+27)

2nd
Brandon Inge (+25)
Pedro Felix (+28)
Pedro Felix (+25)

3rd
Joe Crede (+20)
Brandon Inge (+26)
Adrian Beltre (+23)

4th
Mike Lowell (+17)
Adrian Beltre (+22)
Joe Crede (+22)

5th
Morgan Ensberg (+14)
Freddy Sanchez (+19)
Nick Punto (+15)
SS
1st
Adam Everett (+39)
Adam Everett (+35)
Adam Everett (+43)

2nd
Craig Counsell (+25)
Bill Hall (+28)
Clint Barmes (+27)

3rd
Bill Hall (+23)
Yuniesky Betancourt (+27)
Bill Hall (+18)

4th
Clint Barmes (+23)
Craig Counsell (+19)
Craig Counsell (+17)

5th
Alex Gonzalez (+18)
Clint Barmes (+17)
T-Reyes/Bartlett (+13)
LF
1st
Dave Roberts (+28)
Melky Cabrera (+18)
Dave Roberts (+16)

2nd
Garret Anderson (+18)
Matt Diaz (+13)
Carl Crawford (+15)

3rd
Juan Rivera (+18)
Reed Johnson (+13)
Alfonso Soriano (+15)

4th
Matt Diaz (+16)
Dave Roberts (+12)
Ryan Lanerhans (+15)

5th
Alfonso Soriano (+14)
T-Murton/Fahey (+10)
Jason Bay (+14)
CF
1st
Curtis Granderson (+34)
Carlos Beltran (+18)
Corey Patterson (+34)

2nd
Willy Taveras (+34)
Corey Patterson (+16)
Andruw Jones (+30)

3rd
Corey Patterson (+22)
Joey Gathright (+15)
Juan Pierre (+25)

4th
Juan Pierre (+15)
Coco Crisp (+14)
Curtis Granderson (+18)

5th
Reggie Abercrombie (+15)
T-Damon/Rowand (+12)
Willy Taveras (+17)
RF
1st
Jose Guillen (+23)
Juan Encarnacion (+12)
Randy Winn (+22)

2nd
Randy Winn (+20)
Damon Hollins (+9)
Alexis Rios (+20)

3rd
Reggie Sanders (+17)
Ichiro Suzuki (+9)
J.D. Drew (+19)

4th
Trot Nixon (+17)
Tied-Four @ +8:
Brian Giles (+18)

5th
T-JDDrew/JJones (+16)
Drew/Jones/Freel/Quintin
Ichiro Suzuki (+17)
Interesting to see Brian Giles on the fielding bible list for top right fielders given how much he messed up the ZR vs. PMR calculations. :)

At first blush, neither PMR or ZR seem to follow the Fielding Bible's +/- ratings more closely than the other. To check this, I calculated correlations between PMR, ZR, and the Fielding Bible based on individuals in the Bill James Handbook's 10-top lists (actually, they are partial correlations to factor out the influence of position). Here's the correlation matrix:

FB ZR PMR
FB 1.00 0.42 0.49
ZR 0.42 1.00 0.40
PMR 0.49 0.40 1.00
It turns out that all three variables have almost equal correlations to one another, all ranging between 0.40 and 0.49. There might be a slightly higher correlation between PMR and FB, but not enough for me to worry about.

What this means to me is that we should incorporate both PMR and ZR into our evaluations of player performance. In fact, if you run a general linear model regressing PMR and ZR onto the FB values, both PMR and ZR are both highly significant, with almost identical sums of squares values. This indicates that both contribute useful, independent information than better helps us predict FB (often regarded as the best available fielding stat), and suggests that they should be weighted equally when interpreting player fielding performance.

This has been a monster of a post. :) But with this information in hand, we can make the most of the fielding stats that are available to us! I'm planning to put them towards a review of the 2006 Reds fielding--I hope I can get it done before the start of the regular season! :D

Update: One can convert the +-Plays values into an estimated +-Runs statistic using the runs per play values in this article by Chris Dial. They're probably not perfect conversions, as they're based on a different set to data (Stats Inc.'s zone rating rather than BIS's zone rating), but I bet their close enough.

23 comments:

  1. J,
    GREAT post. Great research. Loved it, seriously awesome job. It will be interesting to see just how our defenders worked out for us last year....

    ReplyDelete
  2. Great job, JinAZ. Thanks much. FYI, we've been asked to not use +/- metrics from the Zone Rating stats, because BIS doesn't want to cause confusion with their own +/- metric.

    ReplyDelete
  3. Good stuff. The issue with regressing in-zone and out-of-zone +/- on PMR is that PMR is highly unreliable in the infield, since David uses infield fly balls and line drives in his infielder ratings. For the ratings I developed for the 2007 THT Season Preview, I used simply Plays Made/(BIZ + OOZ), which I thinks works best.

    ReplyDelete
  4. Doug, thanks!! :)

    Dave, thanks also. I'm sorry to hear about the restrictions with the +/- reporting, but at least you folks are able to provide the data that let us calculate it! Sounds to me like BIS needs to come up with a distinctive name for their +/- metric so this sort of thing won't be such an issue. :D
    -j

    ReplyDelete
  5. David (Gassko--too many d-names around here),

    Thanks. Interesting point about PMR--I see why infield flies might be uninformative, but what is the issue with line drives? It's also interesting that the infield PMR was far more consistent with ZR than the outfield in my analysis despite any problems with the PMR data. :)

    Also, did you mean (PLAYS+OOZ)/BIZ? If so, that's exactly what my analysis indicates should be done everywhere but at 3B (even though I converted it to a +/- stat, it shouldn't make a difference). I wonder if the way that 3B's handle infield flies has something to do with why their weighing on OOZ plays was so low?

    If you meant (PLAYS + OOZ)/(BIZ + OOZ), my previous look at ZR (which only looked at shortstops) indicated that this was hardly different from just PLAYS/BIZ...and was far less consistent with PMR, for what that's worth.
    -j

    ReplyDelete
  6. I certainly agree that you don't want to put OOZ in the denominator. That's just going back to the way STATS does it (or, at least, used to do it).

    I suggested to J that he run a regression to see if he could improve the fit from his previous analysis, and I still think it's a good idea. In systems like PMR and UZR, the OOZ plays figure more strongly in the output.

    ReplyDelete
  7. Just to re-iterate my findings, while the regression does result in coefficients for OOZ that vary from one, the only position where this results in an improved match (vs. the PLAYS+OOZ data) with the PMR data is at 3B.

    Otherwise, there seems to be no improvement in fit--which indicates to me that the simpler, constrained model (i.e. weight plays in and out of the zone equally) is the way to go at those positions.
    -j

    ReplyDelete
  8. Yes, of course I meant (Plays made in-zone + Plays made out-of-zone)/(Balls in zone).

    ReplyDelete
  9. The process you followed is virtually identical to what I've been doing. Great stuff!

    ReplyDelete
  10. TangoTiger,

    Thanks! I seem to have missed your work, but I'll look around for it on your various blogs. Great to hear that you think I'm generally on track with this.
    -j

    ReplyDelete
  11. Great stuff. I've been doing a lot of work on this ZR stuff too - look for an article on Hardballs Times sometime soon.

    There's a couple reasons to be careful in fitting the data to PMR:

    1) popups and line drives - in PMR for infielders, not part of zone rating.

    2) Dave Pinto has park adjustments, at least for OF. This ZR has none.

    3) We aren't sure PMR is the best out there. By using a .4 weight for 3B OOZ, we fit better to PMR, but how do we know that Gassko's method for ZR doesn't more accurately rate the 3B without a weight?

    ReplyDelete
  12. Jin, I haven't posted it yet. I was writing for Hardball Times, and then Chone here beat me to it, and now you've pretty much covered anything else I wanted to write.

    I'll have to reread your article, but you'd have to do it by year, since I don't trust the year-to-year reliability of the data recorders.

    ReplyDelete
  13. Ok, I see you only did 2006. That's good enough for your purposes. If you do go backwards, I would definitely say to use yearly weights.

    ReplyDelete
  14. @Tangotiger -- thanks, I'll look into weights if I do ever extend this backwards.

    @Chone -- I'm really looking forward to your post on THT. Your points are well-taken. The argument against using the 0.4 modifier at 3B is fair enough: why would just this position be treated differently? Unless there's a reason to treat it differently aside from the comparsion to PMR, maybe we should hold off on treating it differently without some sort of baseball-related rationale for doing so. -j

    ReplyDelete
  15. One of the most interesting things to me is that we can calculate these stats in the middle of a season, rather than waiting until the end of the year.

    Also, it's been interesting to watch the objective measures for fielding improve incrementally year after year. It wasn't that long ago that we were still stuck with fielding pct.

    ReplyDelete
  16. Nice work. I think you've everything that can be done with the HBT data without bringing in outside information.

    It doesn't affect your analysis of what you can do with THT ZR itself, but in your prior post (March 2) which favorably compared THT ZR with STATS ZR, you criticized STATS ZR for double-counting double plays. They DID do this 1989-1998, but stopped doing so in 1999, and the historical ZR data which you see on websites such as ESPN and CNNSI has been restated according to the 1999 model, so no double-counting. The problem remaining with STATS ZR is their addition of plays made OOZ to their denominator.

    On the other hand, the main limitation of THT ZR is that it doesn't capture difficulty at all, neither trajectory, direction, handedness of batter/pitcher nor how hard/far the ball was hit. STATS ZR actually does have a mild advantage in this respect, because they defined their zones to exclude opportunities with less than a 50% chance of being turned into an out. This isn't at all precise as a measure of difficulty but it does put a limit on the extent to which the metric can be biased by an abnormal distribution of difficult plays.

    It is the attempt to capture difficulty more precisely which should put PMR and Dewan/Fielding bible +/- ahead.

    Generally my view is any play made or not made has a real benefit or cost and should not be excluded from the accounting, so I don't agree with David Gassko that PMR is unreliable in the infield because [merely because] line drives and popups are included. However there are a few subtle points here worth further discussion.
    1) At last report PMR was not handling the "ball hog" problem correctly on popups (and fly balls) which can be handled by more than one player, so that pluses and minuses are not assigned fairly; this makes popups a problem - but it is a correctable problem.
    2) I think David G. would not deny that skill is involved in catching line drives but instead would say that there's too much luck involved in whether a line drive is hit close to a fielder or far away so that there's too much noise involved in using it. Looking for example at Pinto's PMR charts for 2006, Adam Everett had 26 plays made on line drives vs about 22 expected and 18.7 of the expected outs were along the vector right at the shortstop, or 1 vector over (up the middle); Alex Gonzalez had 14 plays made and just over 13 expected (and 10.1 expected outs along the same 2 vectors). The majority of their plus credit for this hit type was along these two vectors, not 'lucky' catches along unexpected vectors. Spot-checking several shortstops, this seems to be generally true. Almost all the plays made are being made on balls that are nearly "right at" the fielder anyway, so luck about direction doesn't appear to be a big factor. I think the problem right now is that PMR's sample size for determining the likelihood of the play being made is still rather small, given that PMR uses other parameters besides direction. But that's a problem which should diminish over time, rather than an intrinisic flaw in including line drives.
    3) I think the real "problem" with mixing in these hit types is that they distort what we commonly think of as an "opportunity." This is an issue for HBT ZR in the OF too, in which line drives and fly balls are mixed together. They're not equal opportunities. If there are 3 popups hit to me that I should catch at a rate of 98%, and 3 line drives hit toward you, of which you should catch 1/3, obviously you have a great opportunity to distinguish yourself in a positive direction by a plus/minus accounting, and I essentially have none. On the other hand, I have much greater room to distinguish myself negatively. So to quote an "Out Ratio" [rate of actual plays made per expected play made], as Pinto now is doing with PMR, can be misleading if the types of opportunities haven't evened out. Again careful comparisons of his PMR charts can show that over a full season they don't even out. Adam Everett for example had disproportionately more opportunities on ground balls in the hole than Alex Gonzales, and fewer expected fly ball (popup) plays to make. So Everett had greater opportunity to accumulate pluses than Gonzales. Anyway, a sufficiently careful look at this might convince me to agree with David G. that on balance there's more to lose than to gain by using all types of balls in play, but I think the real answer is to come up with a better way of expressing opportunity ...

    ReplyDelete
  17. Joe,

    Thanks for the very interesting post! Your reasoning is very convincing, and brings up a lot of points I haven't thought about before. -j

    ReplyDelete
  18. "On the other hand, the main limitation of THT ZR is that it doesn't capture difficulty at all"

    I think you are wrong here. The Dewan zones are the same, whether at STATS or BIS. They still take the "at least 50%" or whatever demarcation he decides. That's why you have plays made in zone, and plays made outside of zone.

    So, the THT/BIS/Dewan zones are split into two: typical, not typical. Three would have been better. Of course, no reason to stop at 3, but then, it's alot hard to present the data.

    However, you are definitely right that you would, at the least, split by handedness, especially for the CF.

    (Even better would be by spray pattern for hitter/pitcher.)

    ReplyDelete
  19. I think part of Joe's point, at least from that quote, is that the only measure of difficulty is the discrimination of whether a zone is a player's responsibility or not. It doesn't include information about how hard the ball is hit, angle, defensive positioning, etc. I'm not sure that his statement was in contradiction to your point.. Unless I'm missing something (always possible!). -j

    ReplyDelete
  20. Tango,

    What I said about the difference between STATS ZR opportunities and Dewan/HBT ZR opportunities is an inference, but I think that it has to be true. At the least, it is impossible that Dewan used the same 50% rule for defining a zone for the new zone rating as was used for the STATS zone rating.
    If the rules for defining the zones were the same, the opportunities would be similar. But they vary greatly.

    consider Carl Crawford in LF.
    STATS Dewan/HBT
    year Opps outs ZR Opps outs ZR
    2003 342 310 .906 | 395 254+40 .643
    2004 294 271 .922 | 355 240+30 .676
    2005 365 330 .904 | 445 309+32 .694
    2006 343 301 .878 | 465 286+16 .615

    The Fielding Bible and HBT overlap for 2004-2005, and the opportunities and outs are virtually identical for those 2 years, which must mean that they are defined the same way. [HBT has 1 less opp in 2004 (with 1 more play out of zone, and 3 more opps in 2005 - this certainly is due to BIS making corrections to the underlying data.] The STATS opportunities include balls out of zone; adjusting for that, STATS has to be counting 90 - 130 less opportunities within zone each year for Crawford, with essentially no change in plays actually made.

    If you consult the STATS Baseball Scoreboard 2000 (pp.168-169), in which the revised STATS ZR is explained for the outfield, you can see that virtually every fly ball over 200 feet has a >50% chance of being caught, except short and deep balls along vectors J and Q (LCF and RCF gaps). The STATS zones for line drives (defined by the 50% rule) are far more restricted: basically balls hit 290-340 feet along vectors F-G-H).

    The basis for my inference that Dewan/HBT zones are not limited to balls with an overall 50% chance of being caught is that STATS is counting virtually all fly balls in the regular left field area as in zone opportunities anyway; to identify ~100 extra "in zone" opportunities each year, Dewan must have a considerably larger zone for line drives than the area which STATS identified as the 50%+ area.

    ReplyDelete
  21. Joe, I was disputing your assertion that BIS doesn't capture difficulty at all. Since they've got the inzone plays and outzone plays, they do do so.

    As for how they do that, perhaps they don't use the 50% rule. The obviously use something.

    It is interesting that in LF and RF, virtually all outs are inzone, while for CF a substantial portion are outzone.

    The ZR for the corner OF is far lower than any of the other positions, which makes it rather clear that they don't have the same 50% rule for each positions.

    In short, BIS is trying to discriminate, but it's hard to tell how they are doing that discrimination.

    ReplyDelete
  22. Great work by all involved. I was a little surprised to see Juan Pierre rated so highly as a centerfielder and it made me want to ask if arm strength was taken into account? I would assume it is for infielders as it would be incorporated into whether or not a play was made. For outfielders, it looks as if it is only covering balls that are caught and not balls hit into the gap that hold a runner to a single instead of a double, or a double instead of a triple, and also not arm strength. Please correct me if I am wrong. Thanks!
    vr, Xeifrank

    ReplyDelete
  23. Hi Xeifrank,

    That's right, the outfielder measures discussed above consider only the rate at which outs are generated via caught fly balls. The best arm ratings that are readily available to us are those by John Walsh. He doesn't update during the season, but THT created a unique stat page for his arm stats over the past 3-4 years or so. -j

    ReplyDelete