Table of Contents

Tuesday, October 02, 2007

2007 Fielding Data

I have updated my fielding translations to include final 2007 fielding data from THT.

There's been some confusion around the 'net about how these stats are calculated. I've gone into more detail elsewhere, but here's a quick summary of the methodology:
  1. First, for plays in zone:
    1. Determine, for each position, the average number of plays per ball in zone.
    2. For each player, determine their expected number of plays given how many balls in zone they experienced.
    3. Subtract the expected number of plays from the actual number of plays to get +/-plays in zone (+-Plays).
  2. Next, for plays out of zone:
    1. Determine, for each position, the average number of out of zone plays per ball in zone (it's not the best denominator, but it's better than innings--alternative suggestions welcome).
    2. For each player, determine their expected number of out of zone plays given how many balls in zone they experienced.
    3. Subtract the expected number of out of zone plays from the actual number of out of zone plays to get +-out of zone plays (+-OOZ).
  3. Sum together the +-in zone plays and +-out of zone plays to get +-total plays. I considered different weightings, but a 1:1 weighting seemed to be the best solution.
  4. Convert +-total plays to +-runs using the conversions published here by Chris Dial. If you're interested in why a play saved is typically worth roughly 0.8 runs, please refer to this recent post by Tom Tango.


  1. One of the things I find very intersting in this data isn't the overall +- Runs so much as the picture we can get by looking at the PiZ vs PooZ. Take Keppinger for example. He's above 0 on PiZ and below on Pooz. Omar Vizquel is an extreme example of this case. Meanwhile, Miguel Cabrera is the opposite. Good stuff!

  2. Yeah, I agree, that's one of the real interesting things about this stuff. I like the +-runs because it helps weight player contributions on defense vs. offense, but the resolution offered here when you're just wondering about what makes a player good or bad is a big step up from what you can learn based on traditional ZR, or even THT's data as presented at their site (they are constrained, of course, in how they can present their data based on their suppliers).

  3. I've said this before, but what this data really needs is a positioning aspect to it. I'm not sure how it would be done, but look at someone like Ichiro. Not so good on the PiZ, but he has the highest PooZ out of any player. Is he spending too much time trying to make up for weaknesses on the LF and RF positions? Also, I wonder if this has some bearing as to why he is rated so poorly in UZR.

    A couple of other interesting players are David Wright and Chipper Jones, both of whom were around average on PiZ but high on PooZ. If you look at their regular shortstops (Reyes and Renteria) both of those guys were much closer to average on PooZ, though Reyes is high on PiZ and Renteria is average. It'll be interesting to see when PMR comes out whether it jibes well with these numbers.

  4. You and MGL need to get together to work out what you disagree so strongly on Ichiro. Someone is capturing something he shouldn't, and there's really no way to know which of you it is.

  5. Well, on one hand, you can probably make the assumption that if a player is playing "out of position," they're likely to make more out of zone plays while at the same time making fewer in-zone plays. Since this stat is essentially based on total number of plays made, the differences would hopefully even out. And if they don't even out (e.g. a player plays in a part of the field to which fewer balls are hit at all, or perhaps a part of the field that overlaps with the range of anothe rplayer), a player should probably be penalized because they're not positioning themselves in a way that optimizes the number of plays they save for their team.

    But you're right that some information about positioning would be incredibly helpful. Even the SAFE stat, which is quantitatively the most sophisticated stat yet developed, doesn't have information about positioning. I think the issue goes back to the raw data that is available--neither BIS or STATS keep track of player position, probably because they can't do so via TV broadcasts because they will always miss the first 2-3 steps of a defender before the camera switches over to them.

    As far as Ichiro is concerned, Tango reported on his blog this morning that a Mariner fan indicates that he's taking all discretionary fly balls. That could be a huge part of his out of zone rating. UZR rates him as one of the worst CF in baseball, so maybe MGL's able to remove pop flies of that sort from consideration? I wonder if the opposite thing is happening to Sizemore, because he seems really unlikely to be among the worst CF in baseball based on fan scouting reports, reputation, etc. UZR has hit as one of the best.

  6. I went through the data and picked out all the performances by Reds players. The results are interesting. The Reds' defense is actually strong up the middle for the most part. Both SS and 2B were positives and CF was just a tick below average. The corner positions were just horrible. I was surprised that first base defense, as a whole, was so poor. Up the middle was a combined +18.0. Corner positions were a staggering -62.8. I believe I caught all the Cincinnati players. The team was a combined -44.8. This does not take into account pitchers and catchers. Some of the better Reds' defenders were: Phillips +14.0, Hopper +10.5, Freel +7.8, Gonzalez +5.2 and Ellison +4.8. These are combined numbers for every position they played. Jason Ellison had a positive rating at all three outfield positions. Nice. The worst defenders included the usual suspects:EdE -21.0, Dunn -18.1, Griffey -17.5, Hatteberg -11.1, Hamilton -8.6 and Castro -4.8.
    If we assume that Cincinnati was one of the poorer defensive squads. And we assume that 10 runs is rougly equivalent to 1 win. Then the difference between the poorer defensive clubs and the more proficient defensive clubs is around 9 or 10 wins a season. Does that seem correct? As an aside, may I post some of this data on the Redszone website? Or, at the very least, link it?

  7. Ira, without access to raw data from both BIS and STATS Inc, which no person can possibly pay for, I don't know if we're going to be able to resolve those sorts of issues. I certainly can't do anything, as I only am using the publicly available data from THT.

    In CF, I tend to favor UZR over these translations to a significant degree whenever there's disagreement. Other positions seem to work better, and I'm more inclined to split the difference or maybe go 60/40 in favor of UZR.. -j

  8. I was wondering if you have come up with any sort of general ranking system with regards to combining VORP and your +/- defensive ratings? When you add the two what sort of +/- run totals would you get for the following five categories:
    elite, above average, average, below average, and time-to-get-another player? If you have some general idea could you let me know please? Thanks in advance.

  9. Texasdave, you beat me to the punch. :P I'll do more with the Reds data in the coming months, but you gave a good summary.

    In terms of fact-checking, two quick things. First, the '07 Reds had a DER of 0.679, which ranked third from the bottom in the league. So yeah, they're definitely not a good defensive team (but we already knew that!).

    THT's batted ball defensive estimates for teams have the Reds at -32 plays below average overall, which ranks 5th from the bottom. At ~.8 runs/play, that puts them -25.6 runs. That's not as bad as your figures come to, but at the same time the RZR data are based on more detailed information, so they're probably closer to the mark.

    So again, I think yours is a nice summary of the Reds' performance.

  10. Also, no, I don't have a perfect way of ranking players via categories. There's a way in which I'd discourage doing so, as you lose too much information.

    But in terms of understanding what the values are telling you... In general, the elitist of elite (not counting a peaking Barry Bonds) players seem to top out around 100 runs over replacement, though most of the best players (Phillips included) are above 50 runs. 30 runs is certainly a fine performance, and 20 runs over replacement also can be a valuable guy.

    But when you start getting down to 10 runs over replacement, especially among players with at least three quarters playing time, that's getting to be a fairly weak contribution. And, obviously, when you're below 0, a team is doing worse than an average replacement player. -j

  11. First of all, great work! One quick question...

    If you add up all of the +/- runs, you get +47.2.

    Is this not equal 0 because the average is based on multiple years of data? Does that mean that 2007 was an "above-average" fielding year league-wide?


  12. Looking by position, it appears that everything is right near 0 league-wide except RF which is +46.

    Again, thanks and great work.

  13. Hi SG,

    Hmmm... I just downloaded the spreadsheet and then summed each position, and they all equal 0.0. Are you sure of your numbers?

    I just found that if I left off the last two right fielders (Abreu and Dye), right fielders sum to 45.5. Maybe you just missed two rows when copying and pasting? :)

    They should all sum to zero. +/- is based strictly on 2007 data, as there have been some changes to how zones are defined (especially at 1B) that make in darn near impossible to get a multi-season expected plays/biz rate.


  14. yep, left off the last two RFs. Thanks.

  15. Yay! I had a momentary panic! :D

  16. I'm starting a fund to purchase raw data for a super-awesome fielding metric. I've got $10. Who's else is in?

    What always amazes me about the best players is how far ahead of the pack they are. For example, Arod/Wright are at about 100 runs over replacement this year. Helton/Reyes/Bonds are 60 runs. Swisher/Kinsler are 40 runs. League-average on a full-time basis is 20 runs. The worst are about -20 runs.