On Baseball & The Reds: October 2007

Tuesday, October 30, 2007

Player Value, Part 3b: Comparison of Fielding Statistics

To view the complete player value series, click on the player value label on any of these posts.

Comparison of the Fielding Statistics

Ok, as we showed in the last piece, there are a lot of different options for fielding statistics. Some do seem better on the surface than others based on their methodologies. My preferences are for statistics that are based on the most specific data possible, and those that try to account for additional factors that could impact player performance beyond just fielding skill. But when we look at the fielding statistics, how do they vary with respect to how they actually rate players?

In the work that follows, I pulled 2006 data from all of the different systems for which I could find data. Here are my sources:

Fans Scouting Report (FSR) - From Tango on his site, converted to +/- runs.
Davenport Translations (DT) - from Baseball Prospectus.
Ultimate Zone Rating (UZR) - From MGL on his site.
Zone Rating (ZR) - From ESPN.com, converted to +/- runs.
Revised Zone Rating (RZR) - From The Hardball Times, converted to +/- runs.
Probabilistic Model of Range (PMR) - From David Pinto on his site, converted to +/- runs.

Before launching into comparisons, let's first think about how they vary in terms of their features (x=feature included):

Feature	DT	UZR	ZR	RZR	PMR	Fans
Based on Outs/Opportunities	x	x	x	x	x
Convertible to +/- Runs	x	x	x	x	x	x
Built upon hit-location data		x	x	x	x
Uses different rates for different zones		x			x
Includes adjustments for ball type		x	x	x	x
Includes adjustments for batter handedness	?	x			x
Includes adjustments for ballpark	x	x			x
Includes breakdown of component player skills						x

So, based on that table, I would have to say that UZR and PMR have the best methodologies, with a nod to the Fans data because they can provide such unique insights into player skill. ZR and RZR are more or less the same thing--coarse interpretations of hit location data--though RZR has an advantage in that we know exactly how many opportunities (i.e. balls in zone) a player had, whereas with ZR we have to estimate it based on typical BIZ/inning rates for each position. Davenport Translations are based on scorebook data, rather than hit location data, and therefore--despite many additional adjustments--seems unlikely to be as accurate.

But how do these different systems rate players? Let's take a look. I found data on 251 players that showed up in all six datasets in 2006 (primary limiting factor was PMR, which has a cutoff of 1000 BIP for a fielder to be rated). I ended up removing Adam Everett and Manny Ramirez from the sample because they were so severely high and low, respectively, that I didn't want their data "artificially" strengthening my correlations due to their high leverage (correlations only dropped by about 0.02 across the board when I did this). This left me with 249 players.

First, here's a scatterplot matrix describing how the different metrics varied (includes all positions except catcher, which I ignored for this entire analysis...more on them later):

As you can see, while there's certainly a lot of scatter, there's reasonably good agreement between most of these metrics. Some are particularly well correlated (UZR vs. ZR are particularly agreeable), but most of these plots show discernible positive correlations. Given that they're all reasonable attempts to measure fielding, it's nice to see at least some level of agreement! On the other hand, the substantial amount of scatter indicates that there's a fair bit of disagreement among the various statistics in how to rate players.

Here are the correlation values in matrix form. All are massively significant due to the sample size involved, even after I do an Bonferroni adjustment for multiple comparisons.

	FSR	DT	UZR	ZR	RZR	PMR
FSR	1.00	0.36	0.42	0.39	0.42	0.54
DT	0.36	1.00	0.38	0.30	0.30	0.47
UZR	0.42	0.38	1.00	0.81	0.58	0.53
ZR	0.39	0.30	0.81	1.00	0.52	0.45
RZR	0.42	0.30	0.58	0.52	1.00	0.60
PMR	0.54	0.47	0.53	0.45	0.60	1.00

Update: I've made two adjustments to the FSR data since I first published this article. First, I realized that I had forgotten to account for playing time in these estimates--I have no pro-rated each players' FSR rating by their innings played to better correspond to the scale the other stats report fielding data. Second, Tango directed me to a recent post of his that report his weights. I am now using those values. The result was a slightly better fit with the DT's (+0.04 correlation), a slightly weaker fit with the STATS-based estimates (-0.04 correlations), and almost zero change in the fit with the BIS-based estimates.

Discussion of findings:

STATS vs. BIS

UZR and ZR are particularly well correlated to one another, easily showing the closest similarities among any two variables in this dataset. The most reasonable explanation for this is that this is because they are built upon the same raw dataset--hit location data furnished by STATS Inc. At the same time, while the effect is not nearly as dramatic, the two datasets built upon hit location data purchased from BIS (RZR and PMR) are more similar to one another than either is to the other statistics.

So we seem to have an effect of the raw statistics provider here. This is consistent with Michael Humphreys study last August. He found substantial differences between the STATS- and BIS-based fielding statistics, even after the methodologies were made as similar as possible, with correlations topping out at ~0.60.

Is there any indication that one provider is better than the other? Well, one way to evaluate this is to examine how the two stats compare to other assessments of fielding performance. Humphreys found that his own fielding statistic, DRA, is better correlated with BIS than with STATS. Similarly, the FSR and DT estimates correlate better to PMR in this dataset (2006 fielders) than they do with either of the estimates built upon STATS inc data....although they don't correlate especially well with RZR, which is built upon BIS data. At this point, while PMR does better in these comparisons than the other statistics, I'm not comfortable saying that we have a good indication of whether one provider is better than the other. And this means that we should incorporate fielding estimates based on data from both providers whenever possible.

Coarse vs. Careful Systems

Another comparison that one can make is between the fairly "coarse" fielding estimates (like ZR and RZR) that involve artificially-established zones of responsibility for fielders, to those that are more "careful" (like UZR and PMR) in that they consider fielder performance in all zones, and include adjustments for varying batted ball velocities, batter handedness, park factors, etc. If we agree that careful fielding estimates are preferable to coarse ones, we'd expect that these more careful estimates would be a closer match to one another than they would to the "coarse" estimates, once you get past the data provider issue.

That's not really what we see though. PMR does correlate better with UZR and with ZR. But UZR correlates better with RZR than with PMR. So I'm not seeing indications that the more careful, presumably "better" methodologies of UZR and PMR are resulting in substantially different fielding estimates than the more coarse approaches. That's not to say that I don't still prefer careful methods...it just means that the coarse methods are still useful when we don't have access to the careful methods.

Davenport Translations

Baseball Prospectus more or less has refused to even consider using an alternative fielding system to its DT's (aka FRAA), despite the apparent methodological advantages to using a system built upon hit location data. And because BPro is so well-regarded, their fielding numbers tend to be widely used. At the same time, some stathead-types are skeptical of these numbers because the data they are built upon are things like putouts and assists, rather than the hit location data powering the zone-based statistics.

To me, the above correlations indicate that those who are skeptical of the BPro fielding numbers have justification: DT's have weaker correlations across the board than any of the other fielding estimates do with each other, including the Fans Scouting Report. Now, it's possible that DT's are measuring something real that the other systems are not. But, given that they are based on apparently "weaker" data than the other systems, my inclination is to think that they're not as good as those other options. That, of course, is consistent with the general thoughts of many baseball researchers, but it's nice to see the numbers bearing this out.

That said, DT's do have some advantages. Because they are generated from scorebook statistics, they can be calculated on virtually any player who has ever played the game. This makes them a nice tool to assess historical fielding performances--and these data bear out the idea that DT's do have meaning. But I don't think there's much reason to use them for modern ballplayers when other superior options are available.

The Fans' Scouting Report

This was the big surprise for me. While the hit location statistics were better correlated to one another than they were to the Fans, FSR data does pretty darn well for itself. Remember, these are data that are based on the subjective rankings of fans, many of whom are no better trained than I am to evaluate defense visually. They are then weighted and converted into +-runs by a process that is as much intuition as it is scientific. So for it to even come close to the hit-location statistics, not to mention being a better match than the DT's, was pretty exciting.

Now, one might still argue that the fact that they still weren't a good enough match to the other stats to be worth using. But I'd disagree. The FSR data are extremely different in both their basis (survey data), and how they go about evaluating fielding (weighted analysis of different skill categories, specific to each position). The fact that they match as well as they do, while undoubtedly providing information that is not included in the other statistics, indicates to me that they are worth considering. I might not weight them as highly as the other stats given their subjective nature, but I'd like to include them in my ultimate assessments.

Recommendations

First, I think we can probably ignore DT's whenever we have something else to work from. That's probably not a very controversial point.

Second, the apparent differences in raw data provided by STATS and BIS indicate to me that it's important that we incorporate estimates from both providers in our fielding estimates. If both datasets tell us the same thing, we can be reasonably confident in our conclusions. Therefore, I'd recommend treating fielding estimates from the two providers as equals until/unless we find reason to favor one over the other. I would also discourage folks from using data from one exclusive source whenever possible. I'm obviously guilty of violating that recommendation throughout the history of this site, but I'm going to try to more consistently use both datasets moving forward. :)

Ok, so we're going to use both STATS and BIS data. But how specifically should we do this? I see two possibilities. One, we average across all four estimates (if they are available). Or, we pick one estimate from each provider and average them.

I'm partial to the latter approach. Including all four doesn't seem like it would provide any particular advantages...if anything, it seems more likely to pull down the accuracy of our estimate because ZR and RZR are less carefully designed than UZR and PMR. So, I'd take the best available stat from BIS (PMR over RZR) and the best available stat from STATS Inc (UZR over ZR) and take the average. If one or both of the careful estimates are unavailable, as is currently the case for 2007, I would not hesitate to use the average rating of ZR and RZR. They match up well to the more careful systems, and still are based upon high quality hit location data.

Third, I also think that the availability of the Fans' Scouting Report data is pretty exciting, as it provides an entirely different set of data that still seem to do a good job of describing variation in fielding. So my inclination is to include FSR data in our estimates if it's available. The question, however, is how specifically to do this: do we treat it like the equivalent of the BIS and STATS data? Or do we down-weight these data in recognition of their inherently subjective nature? I'm open to suggestions on this, but at this point I'm partial to the latter approach...so here's one possible equation that goes this route:

+-Fielding = 0.375*STATS + 0.375*BIS + .25*FSR

Where STATS is the best available fielding estimate based on STATS data (UZR or ZR), and BIS is the best available fielding estimate based on BIS data (PMR or RZR). That puts the fielding estimate at 75% of hit-location data (split 50/50 between STATS and BIS), and 25% of Fans' data.

Next time, we'll wrap up the position players. We'll start by taking a rough pass on evaluating catchers (who are still rather poorly studied, with little precedent to work from), work out how to incorporate position adjustments, and finally put it all together by reporting total ratings (offense + defense) for the 2007 Cincinnati Reds position players!

Sunday, October 28, 2007

World series thoughts...

Well, this hasn't been a very good world series. Rockies can't get anything going offensively, which has made the games one-sided and lacking much real tension.

And now we're at the absolute lowest point. As I type this, we're in the top of 8th inning of what will probably (91% win probability) be the final game of the world series, with the Rockies desperately trying to keep the deficit within three runs. And the only thing that the announcers are talking about is a certain player's (who I will not name) completely unsurprising decision to opt out of his contract. Incredibly poor form by Fox, not to mention this particular player and his agent. This is the freaking world series. Announcements about players not involved--or quite frankly anything else about baseball--should never be permitted to upstage this game. How irritating.

Edit: Congrats to the Red Sox, and to the Rockies, on their terrific seasons. Shame it had to end with a fairly mundane world series, but it was still a nice season by both franchises. Great to see the Rockies finally breaking through to get to this stage...hopefully the Reds can follow suit soon.

Saturday, October 27, 2007

Player Value, Part 3a: Fielding Performance Estimators

To view the complete player value series, click on the player value label on any of these posts.

In the first article of this series, we found that a run prevented on defense is worth just as much as a run scored on offense. Nevertheless, for many fans, analysts on TV, and apparently even some within baseball, position player evaluation starts and ends with offense--if anything, consideration of fielding value is used as a tiebreaker. Fortunately, we now have access to a wide variety of quality statistics that can estimate fielding performance with reasonably accuracy. In this and the next few articles, I will do a run-down on some of these options, compare them to see how well they agree, make recommendations on how to use them, discuss how to do position adjustments to put fielders on even footing, and finally, take a look at the 2007 Cincinnati Reds!

So let's get started.

Fielding Percentage and Range Factor-Type Statistics

Virtually all fielding statistics operate on this simple equation:

Fielding = Outs/Opportunities

The primary differences between them have to do with how they measure outs and (especially) chances. Some also factor in a comparison of a player's out-rate to league average at that position to convert the rate into a +/- number, but it's still built upon an assessment of that rate.

The first and most basic (and flawed) effort to measure defense was fielding percentage, which was calculated as:

FldPct=Outs/Chances = [Assists+Putouts]/[Assists+Putouts+Errors]

This statistic traces back into the early 1900's, when error rates were high enough that they actually provided a pretty good estimate of fielding. And it's still the most commonly cited fielding statistic outside of stat circles--you'll often hear TV commentators, for example, say that the Rockies led all of baseball this season (and, in fact, all of history) in "fielding," meaning they had the highest team fielding percentage in history.

Errors certainly are bad things. However, the problem with using fielding percentage as one's primary fielding statistic is that error rate only evaluates part of fielding performance: making an out once you get to a ball. It completely ignores the ability of fielders to actually get to the ball in the first place, which, our modern fielding statistics tell us, is where the biggest differences among players actually exist.

In an early attempt to rectify this problem, Bill James developed range factor, which is calculated as simply:

RF = Outs/Game = (Assists + Putouts)/G

Makes sense, right? Now we're trying to assess the rate at which players actually make outs, not just the rate at which they make mistakes.

Unfortunately, there is a significant problem with range factor. What James was trying to approximate with range factor was Outs/Opportunities. Because he was limited by data at the time, he had to make the assumption that Opportunities/G was more or less constant, at least within a position. If true, then range factor (outs/G) would tell you the same thing as Outs/Opportunities.

To evaluate the extent to which this is true, here are some data extracted from this article by MGL, showing 2002 starting shortstops (Games and Outs come from traditional scorebook stats, whereas Opportunities were extracted from mgl's play-by-play data and represent the number of balls hit toward a player--whether a player converted them into an out, made an error, or went through for a hit):

Name	Team	G	Outs	RF	Opps	Opps/G	Outs/Opps
Fox	Fla	112	500	4.5	265	2.4	1.9
Izturis	LA	128	535	4.2	290	2.3	1.8
Clayton	CWS	109	509	4.7	278	2.6	1.8
Gomez	TB	130	616	4.7	345	2.7	1.8
Aurilia	SF	131	571	4.4	327	2.5	1.7
Rodriguez	Tex	162	766	4.7	443	2.7	1.7
Guzman	Min	147	637	4.3	378	2.6	1.7
Bordick	Bal	117	593	5.1	354	3.0	1.7
Perez	KC	139	674	4.9	405	2.9	1.7
Cruz	SD	147	617	4.2	380	2.6	1.6
Furcal	Atl	150	731	4.9	451	3.0	1.6
Guillen	Sea	130	530	4.1	334	2.6	1.6
Ordonez	NYM	142	655	4.6	416	2.9	1.6
Gonzalez	ChC	142	599	4.2	382	2.7	1.6
Hernandez	Mil	149	735	4.9	469	3.1	1.6
Uribe	Col	155	812	5.2	521	3.4	1.6
Eckstein	Ana	147	626	4.3	406	2.8	1.5
Vizquel	Cle	150	701	4.7	461	3.1	1.5
Garciaparra	Bos	154	708	4.6	481	3.1	1.5
Larkin	Cin	135	624	4.6	425	3.1	1.5
Cabrera	Mon	153	748	4.9	519	3.4	1.4
Jeter	NYY	156	594	3.8	415	2.7	1.4
Rollins	Phi	152	696	4.6	488	3.2	1.4
Wilson	Pit	143	709	5.0	499	3.5	1.4
Renteria	StL	149	638	4.3	449	3.0	1.4
Womack	Ari	149	568	3.8	422	2.8	1.3
Tejada	Oak	156	724	4.6	539	3.5	1.3

From the above table, you can see that James' assumption does not hold. There is substantial variation in these players' average number of Opportunities/G, which all but masks variation in the rate of Outs/Opportunities in the range factor calculations. In fact, the correlation between range factor (outs/G) is dramatically higher with the rate of Opportunities/G (r = 0.60 in this dataset) than it is with Outs/Opportunities (r = 0.10)! Therefore, range factor tells you almost nothing about how many outs a player makes given his opportunities, but rather how many balls are hit his direction.

There have been subsequent attempts by James and others to try to better control for opportunities. Perhaps the best, and certainly the most widely circulated numbers of this kind are the Davenport fielding translations (Fielding Runs Above Average, FRAA) available at Baseball Prospectus, which report fielding prowess as runs saved above average. Unfortunately, to my knowledge, the actual methodology by which these "improved" numbers are generated has not been published anywhere, which means it's hard to know exactly what's going on under the hood (this is a chronic problem with BPro stats). Furthermore, because they are not based on hit-location data (like other stats below), they are not well-regarded among most baseball researchers.

My experience with BPro's FRAA numbers is that they're "pretty good." So I'm going to include them, for now, among the numbers that I'll compare in the next article.

Zone-Based Fielding Statistics

There are four variants of zone-based statistics that you will run across with any frequency: UZR, ZR, RZR, and BIS's plus/minus system. Let's run through them one at a time:

Ultimate Zone Rating (UZR)

UZR is often heralded as the gold standard of defensive stats. I agree that it's very good, but as we'll see in the comparisons article, I'm not convinced it's the only thing worth paying attention to. It is the creation of Mitchel Litchman (aka MGL), and his methodologies are described in detail in these two posts at BBTF. In its most basic form, it's a pretty simple procedure, and it serves as a nice model to understand the other zone-based stats, so I'd like to walk through it.

Essentially, the ball field is broken up into different zones, which are the hit location zones defined by Project Scoresheet/Retrosheet (see figure to right). If one pays $10,000 to get hit location data from STATS Inc (which no individual fan except MGL has been willing to do), one can calculate how many balls were hit into each of these zones. And, for each position within each zone, one can determine the percentage of those balls that were typically converted into outs.

From the raw data, one can also measure how many balls were hit into each zone when a particular player was on the field at a given position, and how many of those balls were turned into outs by that player. And using that information, you can get the percentage of batted balls that the player converted into outs within that zone.

Let's say that the average shortstop converts 21% of balls hit into zone "56" into outs ("56" is the zone corresponding to the "hole" between third base and shortstop). And, let's say that Larry Barkin played the entire season as the Reds' starting shortstop, and he converted 25% of balls hit into zone 56 into outs. Based on that information, I think most folks would be comfortable saying that he was better than average, at least on balls hit into that zone.

But how much better? Well, 25%-21%=4%. But how much better is 4%? Well, let's say that there were 100 balls hit into zone 56 while Barkin was playing. If the average shortstop turns 21% of zone 56 balls into outs, that means the average shortstop would be expected to make 21 plays in Barkin's situation. And yet Barkin made 25 plays. Therefore, I'd say that Barkin performed 4 plays above average in zone 56. Now, let's say that we did the same procedure in all other zones on the ball field, and Barkin's rate matched the actual rate exactly in those other zones. The summed difference between Barkin's rates of making outs and the average shortstop's rates of making outs would then be +4 plays, which would be entirely due to his excellence in zone 56.

Ok so far? What if we want to know the approximate run value of this better-than-average performance? Well, using linear weights, we can determine the average runs value of hits that go through zone 56 around the league. For the purposes of this illustration, let's say that every ball hit through zone 56 in baseball turned into a single (in reality, some turn into doubles or triples, but this isn't too far off). The marginal linear weights value of a single is ~0.460 runs, so Barkin's performance prevented ~0.46 x 4 = 1.84 runs from occurring via these singles. Furthermore, he also generated four additional outs, each of which is worth ~-0.265 marginal runs, so he prevented 0.265 x 4 = 1.06 runs by generating these four outs. This puts Barkin's total fielding value at shortstop, given that he was +4 plays above average, as 1.84+1.06 = +2.90 runs saved above average (~0.725 runs saved per play above average). This is the estimated improvement in defensive performance over what you would probably have gotten from a completely average shortstop had he played instead of Barkin.

This, in a nutshell, is what UZR does. You'll note that the rates from which we're calculating the +-plays above follow the same generalized formula that we presented at the top of the article: outs/opportunities. It's just that instead of having to assume that opportunities are only those balls that a player got to, or that the average number of opportunities per game was constant, the zone system allows us to get a much better estimate of the number of opportunities a player actually had.

In reality, UZR is a bit more complicated than I indicated here, using different rates to account for batter handedness (attempts to adjust for positioning), how hard a ball was hit (to account for difficulty), as well as adjustments for defensive park factors and a few other things (you can refer to MGL's two articles for more info). Nevertheless, this should give you a basic understanding of how it all works. And that should help you understand the remaining stats as well, as they're all pretty similar.

Zone Rating (ZR)

While MGL's treatment of the STATS Inc. hit location data is much better, STATS Inc. has, for a long time, released a rate statistic called Zone Rating (available from a variety of websites, including espn.com, cnnsi.com, etc) that does things in a fairly similar fashion to UZR. STATS divides the field up into a larger number of zones than UZR (see image to right), and then assigns any zone in which defenders at a given position, on average, convert more than 50% of balls into outs to be within that position's "zones of responsibility." Zones are not shared among positions, and some zones are not the responsibility of any player.

Next, for each player, STATS tallies up how many balls were hit into his zones of responsibility (balls in zone, or BIZ), and how many of them he converted into outs (PLAYS). They also tally up any plays a player made outside his zones of responsibility (out of zone, or OOZ). They then calculate zone rating as:

ZR = (PLAYS+OOZ)/(BIZ+OOZ)

Again, this is essentially the same outs/opportunities formula that can be traced back to Fielding Percentage. You'll note that it's sloppier than UZR, because all zones of responsibility are lumped into one grand zone, rather than using separate rates for each individual zone (this is ameliorated somewhat by using smaller initial zones, but not entirely). The treatment of plays made outside the zones of responsibility is also a bit sloppy--the OOZ term probably should not be in the denominator. But this is the number that we get from STATS. And, as you'll see in the comparisons section, it's pretty useful for ranking player fielding.

What if we wanted to convert the ZR rate to a +/- runs statistic like UZR? Well, without the actual BIZ data from Stats, all we can do is make estimates of opportunities. Fortunately, Chris Dial, who had access to some of the source data underlying ZR, developed a procedure that allows us to do this with reasonable accuracy. It essentially just uses average opportunities per inning at each position to estimate BIZ for each player (technically it's BIZ+OOZ, but we'll ignore OOZ for now), from which point you can back-calculate PLAYS for each player. After that, getting a +/- plays estimate is as simple as:

+/-Plays = PLAYS - BIZ*(lgPLAYS/BIZ)

Where the "lg" prefix just means league totals of the given statistic.

Dial also provided average runs/play estimates for each position (which I assume are based on actual data on typical hit types through each position), allowing us to convert the +/- plays stat into a +/- runs stat:

1B 0.798 runs/play
2B 0.754 r/p
3B 0.800 r/p
SS 0.753 r/p
LF 0.831 r/p
CF 0.842 r/p
RF 0.843 r/p

You'll note that these values are all slightly higher than the 0.725 r/p we defined up in the UZR section. That's because some (and the number varies predictably by position) of the hits through a position turn into something more than singles. This is especially a factor for outfielders and corner infielders.

All in all, while Dial's conversions of ZR aren't as good as UZR, they correlate well with it, and they are ultimately based on the same raw data. So they're definitely worth using, especially if you don't have UZR available.

Update: I recently discovered that the Replacement Level Yankee Blog has posted 1987-2007 ZR translations using Dial's methodology. 2002-2007 data use actual chances, which is pretty exciting. Nice resource.

Revised Zone Rating (RZR)

John Dewan, who was a founder of STATS Inc. and was the one who created Zone Rating in the first place, has since left that company and founded Baseball Info Solutions (BIS), which is now one of STATS Inc.'s competitors. Naturally, they created their own version of ZR! While there are a few minor differences in how the numbers are tallied, the most important thing about RZR is that it is available from The Hardball Times...and they report not only the RZR rate statistic, but also the "raw" BIZ, PLAYS, and OOZ data! This allows us to construct our own, more accurate +/- runs statistic using a simple process that I outlined here.

Basically, you get league averages of PLAYS per BIZ at each position, and then apply that average rate to each player's BIZ to get an expected number of PLAYS made. Then, you simply subtract this expected value from a player's actual number of PLAYS. Out of Zone plays are handled the same way. So, the equation is:

[PLAYS - (BIZ*(lgPLAYS/lgBIZ)] + [OOZ - (BIZ*(lgOOZ/lgBIZ)] = +/- Plays

Once you have your +/- Plays number, you can apply Dials runs/play figures (see above) to convert it to a +/- runs statistic.

There is some disagreement about whether one should use a rate based on BIZ or innings for the out of zone plays. I prefer to use BIZ over innings because it seems as though that would more accurately reflect the ground ball/fly ball tendencies and handedness of a pitching staff. It is possible, at least on a team level, to estimate the actual number of out of zone opportunities. However, at least in this thread, the suggestion is (based on yet-to-be-published studies) that doing this doesn't make that much of a difference compared to just using BIZ. And just using BIZ is a heck of a lot easier.

BIS's Plus/Minus System

The RZR numbers are based on the same raw dataset that Dewan's company has used to create their own advanced fielding metric, which they refer to as the "plus/minus" system. In the outfield, it essentially operates in the same way that UZR does, where a player's performance is compared within each zone on the field to the average performance of others at his position, rather than just in zones that are assigned to a particular position. On the infield, rather than using zones, it instead uses vector slices to divide up the field...though in practice, it's pretty much the same thing.

I like the plus/minus system. Unfortunately, it's not freely available, so we'll apparently have to wait a while before another fielding bible is released to get that data (I tried to purchase it at one point, but they wanted $100 for one year's worth of +/- data--and that was for my private use only!!).

Vector/Nonlinear Function-Based Statistics

Probabilistic Model of Range (PMR)

David Pinto's PMR operates on a slightly different paradigm than zone-based stats. Rather than using zones, he uses vectors, somewhat like what is used by BIS's plus/minus system. Using the batted ball vectors, he can (more or less) plot a function describing a player's fielding prowess relative to the rest of the league, which can be depicted in graphical form (see right). His system uses separate rates for each position depending on ball type (fly, ground, line drive), how hard each ball was hit, and the handedness of the batter and pitcher. The difference between actual performance and league-average performance for each of these ball types at each vector provides the resulting defensive rating. He also includes park factor adjustments.

Therefore, in many ways, this system is very much analogous to UZR...but for whatever reason, it doesn't quite get the hype. The primary criticism I've seen of PMR is that it's a bit too inclusive--it includes all batted ball types for all positions. This means that performance (i.e. ball-hogging) on infield pop-ups are among the ways that infielders can improve their PMR rating. Other stats, especially zone-rating statistics, only evaluate infielders on ground balls hit at or through their positions.

Nevertheless, given the park factor and handedness adjustments that are factored into PMR, I think it's among the better systems available. Pinto has released the data annually for the last three years to the public. He generally just reports the data in terms of the difference between expected and actual outs (i.e. +/- plays), but we can convert those data using Dial's runs/play data into a +/- runs format for our use.

Spatial Aggregate Fielding Algorithm (SAFE)

SAFE is the result of Shane Jensen et. al's work at the University of Pennsylvania, and has the potential to become the best fielding system available. For ground balls to infielders, it uses a vector-based approach like Pinto's PMR, but then fits a non-linear function to those data for each player and compares that function to league average to determine infielder performance on ground balls. The area between the curves is the difference between the player's fielding performance and league average.

The way they handle fly balls are even more exciting: they plot the landing location of each batted ball and then fit a three dimensional function describing the frequency with which a player fields fly balls in all locations on the field (see awesomeness to right). This function can be subtracted from a league average function at that player's position to determine a fielder's performance vs. average.

It's a very promising system, combining both improved precision in ball location information with improved simple sizes, since they don't have to break up the field into arbitrary zones. Unfortunately, perhaps because it is essentially a side-project for some academics (and thus not well-funded!), they have only released data through the 2005 season. Therefore, it's not something that we can use for current seasons. I'm hoping that some company out there licenses it from them and posts it on their site. Universities always enjoy it when their faculty get the entrepreneurial spirit, and we'd have some fantastic defensive data to work with. :)

The Fans' Scouting Report

Tom Tango has, for several years now, overseen a project that tries to quantify subjective impressions of player performance. The idea is that a composite depiction of fielding ability by informed fans is a useful way of getting information about fielding performance. I really like this project: it provides a completely unique dataset from the more objective systems, and, if the data are good (and they typically seem to be), they can really can flesh out our understanding of player skill in a way we can't get from our other fielding statistics.

Each fan scout is asked to rate position players on a scale of 1-5 in seven separate skills, ranging from reaction speed and "hands" to arm strength and accuracy. Evaluators are asked to ignore all fielding statistics, and to compare a player to all other players--not just those at his position. The resulting data are reasonably well-correlated with the "pure numbers" metrics, and yet provide some insights that we otherwise can't get about each individual player's skills.

Furthermore, Tango has converted the survey ratings into an approximate +/- runs statistic. He first produced a weighted average of the individual skills for each player at each position based on assumed importance of each of the skills to that position. Then, by comparing the average standard deviation in these averages (~14 points) to the typical standard deviation in runs saved according to UZR (~10 runs), he estimated that one ranking point in the weighted average is worth ~0.7 runs.

Unfortunately, Tango hasn't released his custom positional weights AFAIK. I could sit down and try to come up with my own, but I didn't feel comfortable doing that--I only played ball for one year, so I don't consider myself knowledgeable enough to make those sorts of judgment calls. So I instead cheated and pulled out my multiple regression tools to generate equations that predict Tango's FanRuns values based on his '03 data. Here are the resulting coefficients and intercepts for each position:

Pos	Instincts	Acceleration	Speed	Hands	Release	Strength	Accuracy	Intercept
3	0.095	0.122	0.018	0.110	0.045	0.038	0.057	-20.8
4	0.213	0.195	0.067	0.118	0.116	0.060	0.044	-41.0
5	0.089	0.080	0.047	0.084	0.083	0.168	0.089	-31.5
6	0.188	0.176	0.091	0.075	0.179	0.048	0.060	-47.3
7	0.077	0.127	0.145	0.064	0.043	0.020	0.013	-27.6
8	0.077	0.195	0.159	0.099	0.037	0.044	0.021	-35.7
9	0.069	0.131	0.125	0.060	0.029	0.030	0.037	-23.9

To get a +/- runs estimate on a player from these equations, simply multiply each of the coefficients by the relevant player rating in the fan scouting report, sum the values, and then add in the intercept. You will have to adjust the intercept to match the data such that the totals within each position sum to zero, as averages in player skills and the identities of the fans doing the scouting vary from year to year.

Note: This approach "predicts" Tango's FanRuns in the '03 scouting report almost perfectly, and seems to produce reasonable estimates in subsequent years. Nevertheless, there are substantial correlations among most of the player skills, which can destabilize the regression coefficients. Therefore, it's entirely possible that using these equations on different datasets (e.g. different seasons) will produce estimates that differ substantially from those that Tango might produce using his own procedure. Nevertheless, looking at which coefficients are large or small at each position, these equations make sense to me: hands are important at first base, instincts and acceleration are important at second base and shortstop, arm strength is important at third base, speed is important in the outfield, etc. So I'm comfortable using them.

Update: In the comments of this article, Tom Tango directed me to a post in which he did report his custom weights by position. They are, on the whole, pretty similar in relative skill emphasis to the coefficients above. Nevertheless, I'd recommend using them to create weighted averages for each player, then subtracting the mean for that player's position to create a +-points statistic, and then multiplying by 0.7 runs/point to get your runs estimate. This actually does not predict 2003 data perfectly at all positions (I'm guessing that Tango's using slightly different weights now), but it does create fielding ratings that make sense. And, as we'll see in the next article, estimates that correlate well to other fielding statistics.

Next time, we'll compare each of these fielding systems quantitatively and see how similar their estimates of fielding performance are, and we'll work out how to apply these numbers to get fielding estimates for the 2007 Reds.

Wednesday, October 24, 2007

Josh Flipping Hamilton

My wife and I like watching those house flipping shows. You know...Flip This House, Flip That House, Property Ladder, etc. We've always enjoyed home improvement shows, and I guess the tension that's associated with trying to renovate what are often terrible properties on a short timeline with a tight budget takes it to the next level.

Anyway, tonight we selected The Real Estate Pros off of Tivo, which aired last Sunday. To my shock, this episode was almost entirely about Josh Hamilton!! Oh, and some guy named Shoeless Joe Jackson's house.

The show was recorded in 2006, starting prior to spring training and prior to Josh's reinstatement into baseball. The piece profiles Josh's efforts to get clean and stay busy while trying to get back into baseball. There was some uncertainty at the time about whether he'd ever be permitted to play ball again within the MLB-system, and the owner of the show's real estate company was trying to get Hamilton to sign with an independent team that he'd recently purchased (can you imagine?).

At the end of the show, Josh finally got approval to play in the official minor league system again, joining the Hudson Valley Renegades in late 2006. In the at-bat they showed near the end of the program, Hamilton reached on what was probably ruled a double, but was basically a routine shallow outfield pop-up that somehow fell in between two players. He scored on a subsequent single. It was one of three doubles and seven runs that Josh scored in his time with the Renegades.

Seeing this special, I realized that I'd somehow almost forgotten just how amazing Hamilton's 2007 season was. I mean, we all knew it was an incredible story when it was happening. And yet now, while I obviously still remember the excitement surrounding his first several months in MLB, I find myself almost taking the guy for granted. In fact, if you'd asked me about my thoughts on Hamilton yesterday, I'd probably cite concerns about whether he could stay healthy enough to get in a full season next year, and my hopes that the Reds could somehow move him out of center field.

And that would have been completely off the mark. Think about what happened: Josh Hamilton was acquired in January, somehow had a brilliant spring, slugged his way into the starting lineup by mid-April, and had the best rookie season by a Reds hitter since Austin Kearns. All that after almost literally coming back from the dead, and getting just 50 AB's in professional baseball since the 2002 season...in the New York-Penn League.

Here are his final 2007 numbers:

Year	Team	Age	PA	%K	%BB	LD%	BABIP	AVG	OBP	SLG	ISO	OPS	PrOPS	SBRns	RAR	R/G
2007	CIN	26	337	19%	10%	22%	0.318	0.292	0.368	0.554	0.262	0.922	0.940	-0.9	24.5	6.7

I mean...wow. There's absolutely nothing to not like there, except maybe the fact that he's already 26. Heck, PrOPS even indicates that he may have been a bit unlucky. Congrats, Josh, on a fantastic season. Can't wait to see what you do next year.

Oh, and by the way, in the house flipping show, Josh Hamilton helped renovate Shoeless Joe Jackson's home into a new museum. Kinda neat.

Tuesday, October 23, 2007

Player Value, Part 2c: Offense - Positional Adjustments

To view the complete player value series, click on the player value label on any of these posts.

Positional Adjustments

If you go back and compare the RAR numbers in my baselines post to BPro's VORP, one of the biggest differences that you might note is that players playing catcher, shortstop, and second base tend to get higher ratings in VORP than in my numbers. The opposite is true for first basemen, left fielders and right fielders. The reason? VORP includes an adjustment based on typical offensive performance at each position, and my data did not. ... but should they have? Let's take a look.

In the table below, I've grouped NL hitters from '03 to '07 by their primary position and sorted them by their rate of offensive production (runs per game). Here's how those totals break down:

POS	PA	R/G	%LgAvg	OBP	SLG	OPS
C	43228	4.1	82%	0.319	0.393	0.712
SS	46332	4.4	87%	0.326	0.401	0.727
2B	49834	4.7	94%	0.338	0.414	0.751
CF	45720	4.9	96%	0.335	0.424	0.759
3B	47791	5.2	104%	0.344	0.449	0.793
RF	46754	5.4	107%	0.348	0.454	0.801
LF	48358	5.6	111%	0.353	0.465	0.818
1B	49581	6.0	119%	0.364	0.483	0.847

The rankings here probably don't come as a big surprise to many of you--after all, it's often the case that teams will employ catchers and shortstops that aren't very good hitters (e.g. David Ross and Alex Gonzalez with the Reds). First basemen and left fielders, on the other hand, are often some of the best hitters on any given team. But why is it that the positions vary in their hitting so much? I can think of two possible reasons:

A poor hitting position may be a more "difficult" position to play and thus the pool of players who can competently play that position may be smaller than at a position that is easier to play. Smaller pool of players = less offensive depth.
The players playing a weaker position may not be as talented (defined as total offensive + defensive skill) as those at other positions.

The first is almost certainly true. We expect shortstops, for example, to have tremendous range, great hands, and a canon for an arm, because those attributes are what is needed to get lots of outs at that position. The same expectations are not present for first basemen, and therefore there's a larger pool of players that can play first base...and thus, there's a higher standard for offense at that position than at shortstop.

Therefore, analysts often make positional adjustments to offensive performances based on these observed differences in hitting across positions. The logic is that if teams play players in appropriate positions relative to their defensive skills--and, in general, they probably do--we can give a boost to poor hitting positions and a penalty to plus hitting positions such that the average hitter at each position is given equal value. That way, each player is compared to his peers, putting everyone on an equal playing field. Presumably, in terms of total player value (offense + defense), if a player moved from first base to shortstop, any gains he would make in terms of being a better-than-his-peers hitter would be negated by the cost he would incur to his team via substandard defense.

That's essentially the basis for positional adjustments in VORP. However, there's a major flaw in that approach. And it relates to the second explanation for variation in offense across positions, that of variation in talent level across positions.

Why positional adjustments based on offensive disparities is not the best approach

Let's come at this from a fresh angle: players aren't restricted to one position, but can theoretically "play" anywhere on the baseball diamond. The problem is that some are better defenders than others, and therefore teams tend to put their best defenders at the positions that are the hardest to play (both in terms of the physical demands of the position, and the level of average defensive performance at that position), such as shortstop. But how much harder is it to play shortstop, for example, than first base?

The best study that I'm aware of that has tried to quantitatively answer this question is one by Tom Tango. He used multi-season UZR data to compare how players performed when they played multiple positions. Presumably, a player's absolute defensive skill is a constant, but he'll look better or worse at a position given how he stacks up to his competition at each position (especially once you adjust for experience at a position). By comparing how player defense varied across different positions, in virtually every combination and direction you can imagine, Tango constructed the following defensive "spectrum" (it should still be considered a work in progress):

+5 CF
+4 SS
+0 2B
-1 3B
-4 LF
-4 RF
-8 1B

The units are the typical differences he found in defensive runs saved per season that you should expect when a player moves from one position to another. So, if you move an average fielding first baseman to shortstop (assuming you can do this, i.e. he's not left-handed), you should expect that player to play roughly 12 runs below average per season once he learns the position (players will vary, of course, in how they do based on their specific attributes--speed, arm, hands, etc--these are just the mean differences).

Here's another look at this question, based on data can be gathered from the Fans Scouting Report, which asks fans to rate players in a variety of categories based on their subjective impressions of a player's skills (participants are asked to ignore player position, as well as any defensive stats...hence the "scouting" report). Here are the average total scores for players at each position, all of which are scored in the same categories:

CF 60
SS 59
2B 56
3B 54
RF 50
LF 43
1B 41

The average player across all of these positions got a score of ~52. Therefore, we see that players who play center field or shortstop are generally given substantially above-average ratings on their skills (speed, first step, hands, arm accuracy, arm strength, etc), whereas players in left field or first base are rated as below-average defenders. While the actual numerical differences aren't the same (it is possible to convert differences in these ratings to an approximate runs saved statistic), the overall positional rankings are almost identical to those that Tango generated using UZR data. The only difference is RF'ers being ranked as better than LF'ers. Pretty compelling when two such vastly different datasets come to virtually the same answer!

Now, looking at these two defensive spectra, they look pretty similar to the offensive rankings, right? Shortstops are ranked near the top, while corner outfielders and first basemen are down at the bottom. That's certainly supporting our first explanation--some positions are harder to play defensively, and therefore the pool of players that have sufficient defensive skills to play those positions competently is smaller, meaning there's less offensive depth at those positions.

On the other hand, there are some notable differences between the offensive and defensive positional rankings. Center fielders, for example, were rated as approximately average hitters (96% of league average), and yet were rated as the single hardest position in baseball to play in both datasets. So here we have a position that features players that are average hitters as well as above-average fielders. This means that center fielders are, overall, an above-average position in terms of total player talent! On the other side of the coin are second basemen. They are rated as an average position, defensively, and yet also feature below-average hitting. This means that second basemen, overall, are a position with below-average talent (again, talent = combined offensive + defensive skill).

Now, think about what this means if we use adjustments based on offensive disparities among positions. If a player is at second base, and then is moved to center field, he is moving to a position that is more difficult to play defensively. At yet, because center fielders, on average, tend to hit better than second basemen, he is going to be judged against a higher offensive standard than he was at second base. Therefore, if we use positional adjustments based on offensive disparities, this player is going to suffer a hit to both his offensive and defensive ratings just because he moved to a more talented position!

To put it another way, positional adjustments based on offensive disparities assumes a negative correlation between fielding skill and offensive performance. The fact that we have positions where the average player at that position is both a superior defender and an average hitter means that this assumption cannot be correct.

Therefore, if the point of the positional adjustments is to put all players on an even playing field with respect to our value estimates, it seems to me that adjustments based strictly on offensive variation across positions are inadequate. They will underrate players playing particularly talented positions, and they will overrate players playing talent-poor positions.

To be clear, there is absolutely a need to apply positional adjustments to player value ratings, because the average fielding skill (and thus value) varies across positions: an average-fielding first basemen as far less of a defensive asset than an average-fielding shortstop, and we need to recognize that if we're going to value our shortstops and first basemen appropriately. It's just that using differences in offensive performance is a flawed way to go about this, because overall player talent levels are not constant across positions. Instead, we should be using adjustments that account for differences in the actual fielding value of average defense at different positions, like Tango's UZR spectrum above.

So all that said, here's "my" solution on how to assess player value (it's most certainly not my idea, just what I've come agree is the best way to go about things):

Estimate player offensive value. This can be done relative to overall league averages or replacement level. It should be done without regard to position: in this step, we're strictly interested in offense, so it doesn't matter where the player plays!
Estimate fielding value via two steps:

First, calculate player fielding value relative to overall league average at a player's position (as we saw in my replacement level study, this works for both league-average and replacement-level baselines).
Assign a prorated (by inning) positional adjustment to each player that accounts for the differences in fielding value across different positions. Using Tango's data, a first baseman would be get -8 runs per season relative to the rest of the league, whereas a center fielder would be rated at +5 runs per season, relative to the rest of the league.

Sum all these values together to get a composite estimate of player value.

We know how to do #1 already. The next article(s) in this series will discuss how to do both parts of #2 in detail.

Update: After some additional data analysis and discussion, Tango and others seem to have moved to this spectrum, which is slightly (emphasis on slightly) different from that which I've been using. Numbers are runs per season:
+10 C
+6 SS
+4 CF
+1 2B/3B
-6 LF/RF
-9 1B
-15 DH

This is what I'm going to use moving forward. But it's close enough that I'm not going to revise anything I've done to date. Sometimes you just gotta move forward! :)

And here's more on the apparent fact that the assumption of positional equality in offense+defense is a flawed assumption.

Update #2: As with any field, this research is constantly evolving. Based on several rounds of further discussion and analysis, the current positional adjustments endorsed over at TheBookBlog are:

+1.25 C
+0.75 SS
+0.25 2B/3B/CF
-0.75 LF/RF
-1.25 1B
-1.75 DH

Multiply these by 10 to convert wins to runs and you get the adjustments I'm currently using. I probably should really be multiplying by 9.5 or so (4.75 r/g * 2 teams = 9.5 r/g), but it makes almost zero difference and the round numbers are easier to remember. :)

Table of Contents