Table of Contents

Saturday, October 06, 2007

Data for the masses, by the masses

With the regular season over, a lot of new information is being made available by enterprising amateurs. Here's a few that are especially notable:

2007 BIS + STATS ZR Ratings

Rally has posted a spreadsheet containing both 2007 THT fielding translations (very similar to what I recently posted) as well as translations that follow the methodology described previously by Chris Dial. He then takes the average of the two fielding systems, which should be treated as a more reliable estimate of fielding performance than either of the estimates alone.

I will note that STATS Inc's version of ZR severely down-rates the value of out of zone plays, which, at least based on my comparisons to PMR, is probably not the best way to do things. The minimization of OOZ plays is why the STATS Inc data tend to have lower extreme estimate than THT does--fewer plays are considered overall. Nevertheless, because the STATS Inc and BIS raw data have significant inconsistencies, I think these overall estimates that Rally calculated are probably the most reliable fielding estimates we have for the '07 season thus far.

PITCHf/x for all MLB Players

I very recently ran across Josh Kalk's blog, where he has posted PITCHf/x pitcher and batter cards for just about every player in major league baseball. It's an incredibly resource--I'm predicting we're going to start seeing this kind of thing on larger sites like fangraphs and baseball-reference, and probably even sites like ESPN or MLB, within the next 2-3 years. For now, though, Kalk's blog does it all.

Features include graphs similar to those that John Walsh produced, as well as a bunch of additional figures and information. He also uses an automated procedure to identify pitch types (it's mostly successful!), and then breaks things down in a variety of ways, both graphically and numerically. The batter cards, which look at outcomes of a variety of pitches based on location, are new to me but are really fascinating...just not sure what to make of them yet.

If nothing else, the pitching cards allow us to get immediate ideas of pitch types and velocity--I've used them for this purpose in my Cubs profile. But I have a feeling they'll provide a lot more information as we get more used to the range and variance of the data. It's really exciting work, and it's an enormous amount of data, so it's going to take me a long time to parse through it. But I thought I'd link to it so that you folks out there can start analyzing it in the meantime. :)

US Patriot's Stats

U.S. Patriot releases annual reports of his own home-brew of sophisticated statistics, though somehow I've overlooked them until now. They include tons of interesting stats, including his own version of a runs above replacement stat for both position players and pitchers. I may start using some of his approaches around this blog moving forward, as I've become increasingly disenchanted with VORP for a variety of reasons. At least with his calculations, I know what's happening under the hood...and there's reason to believe that his estimates may actually be "better."

Also, it's worth noting that his website is an absolute treasure trove for wet-behind-the-ears analysts like myself. Not only does it feature the above statistics for seasons going back to 2003, as well as his multi-year regressed park factors (which I've been using for years), but it also features some approachable and very informative essays on a variety of advanced sabermetric topics. I've been reading his essays on run estimators and baselines this week on my bus rides, and they've been incredibly helpful in helping me think about these issues. I don't always agree with him on a few of the smaller points (I don't get concerned, for example, about whether stats work well under extreme conditions that don't fall within the bounds of what's actually happening in MLB--shows my background as an empirical, rather than theoretical, scientist), but he does a great job of laying out the relevant issues and leaving the reader room to make one's own decisions while still providing his own informed take.

Massive thanks to US Patriot for taking the time to create this resource for all of us--I just can't believe I waited this long to delve into it.