Table of Contents

Friday, March 30, 2007

How should we calculate Zone Rating? (Part II)

In a previous post, I looked at a variety of ways to calculate zone rating--a fielding statistic--using the new dataset that is freely available at The Hardball Times. Tonight, I'd like to report some of the continued work I've done on this subject.

Introduction & Goals (you can skim this)

The first fielding statistic that I really felt at home with was devised by Baseball Info Solutions and published in The Fielding Bible last year. It had just about everything I'd want in a stat -- the methodologies behind it are straightforward (even if I can't replicate them myself), and it's easy to interpret, thanks to the fact that it's reported in terms of +/- plays vs. average (i.e. a +2 rating means that a player made two more plays than an average fielder would, given his chances).

Unfortunately, BIS decided not to release these statistics to the public this year. I inquired about buying them electronically, but they wanted at least $100 for my personal use--and probably would not have been pleased if I'd posted them here. Pretty steep price given that these stats were just a part of a $20 book last year. :)

Thankfully, the folks at The Hardball Times purchased detailed Zone Rating statistics from BIS. While they're not the same as the stats that were the highlight of the fielding bible, they're still very good: essentially they assign each fielder a zone (or rather, a set of zones) on the field and assess how many balls hit into that zone the fielder converts into outs. This has advantages over more traditional fielding stats like fielding percentage because it incorporates fielder range into the estimate of fielder quality in addition to his sure-handedness and ability to throw accurately. And it's better than range factor because it accounts for the number of balls a player had the opportunity to field, rather than just assuming that all players get the same number of chances at a given position.

The THT stats, along with David Pinto's PMR stats, give us, the public, two high-quality fielding statistics that are available for free! The only problem is that it's not entirely straightforward how to use them. As I wrote about in my last piece, the ZR ratio calculated on THT.com doesn't include balls hit out of the player's zone, which seems like something we'd like to account for to understand good fielding (see Bill Hall at shortstop last year, for example). But how should we account for these out of zone balls? Furthermore, it has a similar problem to range factor in that each position has a different standard for what a "good" ZR is.

I'm excited to have a chance to use ZR stats this year as I evaluate players. But in order to use it effectively, I wanted to:
  1. incorporate, in an appropriate way, out of zone plays as well as plays made within a players' zone in the Zone Rating Estimate.
  2. report the statistics in an easy to use +/- format.
Methods

Converting ZR to a +/- system.

This is a pretty easy thing to do because THT reports not only the classic ZR ratio (plays made / ball in zone) statistic, but also all the stats that are used to calculate it: balls hit into a player's zone (BIZ), plays made on balls in the player's zone (PLAYS), and plays made on balls out of zone (OOZ). This gives us the tools we need to convert ZR into a +/- statistic.

First, we'll calculate the average proportion of PLAYS per BIZ at each position by simply summing up the total number of PLAYS and total number of BIZ across all players in major league baseball at each position (easily attainable from THT's stats output) and taking the ratio. Here are the expected (average) ratios for each position based on 2006 data (I haven't confirmed that these numbers are stable across years...I'd guess that they don't vary too much though, except when you have a high-leverage player like Albert Pujols or Adam Everett throwing off the mean...which we do):

PositionExpected PLAYS/BIZ
1B
0.799
2B
0.820
3B
0.706
SS
0.818
LF
0.608
CF
0.811
RF
0.638

Once we've done this, we can easily convert a players' PLAYS data into a +/- Plays statistic using this equation:

+/- Plays = PLAYS(actual) - [BIZ(actual) * ExpPRATIO]

where PLAYS(actual) is the actual number of plays a player made, BIZ(actual) is the actual number of balls hit into a player's zone, and ExpPRATIO is the expected ratio of PLAYS per BIZ calculated reported in the table above. Really easy stuff, and something anyone can do really quickly once you have the expected ratios.

Update: Tangotiger extended this work by reporting PLAYS/BIZ values from 2004-2006 on his blog. If you're going to use this approach on years other that 2006, you should probably use those data rather than what is in the table above.

We can do the same sort of thing with OOZ to estimate the average number of OOZ plays made per BIZ (this makes an assumption that the number of balls in hit into the player's zone will be tightly correlated to the number of balls hit just outside his zone). Here are those ratios:

PositionExpected OOZ/BIZ
1B
0.415
2B
0.096
3B
0.150
SS
0.126
LF
0.030
CF
0.162
RF
0.031
(you almost have to wonder if they should add another zone of responsibility to 1B's)

Update: Tangotiger's post on his blog also provides the '04-'06 data, which would be better to use if you're doing these stats on years other than 2006. Note that he reports the expected OOZ (see below) rate as OOZ/(PLAYS + OOZ), but he reports the total BIZ and OOZ data over that time period that you'd need to calculate expected OOZ in a fashion like I did here.

We can then get a +/- figure for a players' OOZ plays as:

OOZ(actual) - [BIZ(actual) * ExpORATIO]

where OOZ(actual) is the actual number of out of zone plays a player made, BIZ(actual) is as above, and ExpORATIO is the expected ratio of OOZ per BIZ reported above.

How should we combine +/-PLAYS and +/-OOZ?

I took three approaches to go about finalizing this +/- statistic:
  • Ignore OOZ plays entirely and use only +/-PLAYS.
  • Simply add +/-PLAYS and +/-OOZ together.
  • Estimate a coefficient for OOZ via a regression on Pinto's PMR statistic (this was done at the suggestion of Dave in my prior post--thanks Dave!).
The first two are really straightforward. The last one is slightly more complicated. Basically, here's what I did:
  1. Converted PMR to a +/- outs statistic by subtracting actual outs from expected outs for each player (all of the needed data to do this are provided by Pinto in his reporting).
  2. Consolidated players at each position to match those reported by Pinto in his 2006 reports over the winter (his cutoff was that the player was present in the field for 1000 balls in play). I also had to calculate season totals for a number of players who switched teams during the year, since THT reports them on separate rows while PMR reports them as one row.
  3. Ran a multiple regression, setting +/-PMR as the dependent variable and setting +/-PLAYS and +/-OOZ as predictors.
  4. So long as both the effects of PLAYS and OOZ were significant, I recorded the regression coefficients from this regression. Then, to make calculations simple and straightforward (i.e. usable by others), I then divided both coefficients by the coefficient for PLAYS, such that PLAYS would have a new coefficient of 1.0, while OOZ would vary from 1.0 depending on its weight in the regression equation.
  5. Used this OOZ coefficient calculate a combined +/- statistic (I'll call it +/-ZR_ADJ for now) that incorporates both PLAYS and OOZ:
    • ADJ_ZR = PLAYS(+/-) + WEIGHT * OOZ(+/-)
Finally, once I had all three +/- estimates based on the THT data, I compared these estimates to Pinto's +/-PMR data to see which estimates followed his work the closest. The assumption here is that, since PMR and ZR are supposed to measure the same thing, the +/-ZR estimate that is the closest fit to PMR is the best one to use.

Results

The results vary by position. I'll divide them into three groups: middle infielders, corner infielders, and outfielders.

In the graphs that follow for each group (sorry about the small size), I report the closeness of fit between Pinto's +/- PMR data and the three +/- ZR-based fielding stats as "R2" (should be R-squared), which is defined as the proportion of variation in PMR data explained by the ZR data (ranges from 0 to 1, with 1.0 being a perfect match). I list the three ZR stats as ZR_THT (+/-PLAYS only, no OOZ), ZR_DIAL (simple summing of +/-PLAYS and +/-OOZ, named after Chris Dial as in my previous post), and ZR_ADJ (weight coefficient modifying +/- OOZ, added to +/-PLAYS).

Middle Infielders

As you can see, while there was a sizable improvement (double the R2) in the fit between ZR and PMR when you include the OOZ data. There was very little difference between simply summing PLAYS and OOZ (ZR_DIAL) and using the regression-based weighting coefficient to modify OOZ before adding this to PLAYS (ZR_Adjusted). Among middle infielders, these coefficients are fairly close to one (1.10 for 2B's and 1.28 for shortstops), so it's not surprising to see such a close correspondence. There's very little here to argue against just summing PLAYS and OOZ.

Corner Infielders

Oh, weirdness. First basemen are actually fairly similar to middle infielders, in that adding OOZ data makes a huge improvement in the R2. This is probably tied into the fact that first basemen typically make a huge number of plays outside of their zone, as currently defined (see OOZ/BIZ table above). Not much improvement when I used the weighting coefficient to modify OOZ in ZR_Adjusted.

Third basemen, however, are very different. First, there's a remarkably good correspondence between the ZR data and the PMR data. No idea why it'd be so good for third basemen over other positions. Second, you get a substantial improvement in fit when you add weighted OOZ to the PLAYS data (ZR_Adjusted), but not when you just add unweighted OOZ to PLAYS. The reason is that the coefficient modifying OOZ at 3B is the farthest from one among any of the infield positions (= 0.42). Therefore, it seems that among third basemen, OOZ plays should effectively be divided by two when calculating our ZR estimates.

By the way, that ridiculous point in the upper-right for all the 1B graphs? That's Albert Pujols. He made 39 more outs than expected of an average first baseman last year according to both PMR and ZR. Given his offensive production, that's just absurd. It's unfair to everyone else. He's definitely my pick for MVP last season.

Outfield
Here's where things get hard. As you can see, I didn't report ZR_ADJ for outfielders. The reason was that the regression equations never showed a significant effect of OOZ in any of the outfield positions. This makes anything you do with the OOZ coefficient pretty unreliable, so I didn't use them.

Explanatory power of the ZR stats is good in left field, and improves slightly when OOZ data are included. Even though OOZ didn't contribute a significant addition to the regression model, I think it's worth including them in the ZR estimate--they do improve the fit (slightly), and conceptually, I can't help but think that we should take those sorts of plays into account.

In center, the only significant relationship was between ZR_DIAL and PMR, and even then it only explains 10% of the variation. I'm not sure why center field is so hard to understand, though I imagine part of it must be due to the huge range of locations on the field where a center fielder might position himself throughout a game compared to other positions. It also may have to do with the variety of types of balls that are hit to center field. ZR does a poor job of dealing with different types of fly balls, whereas PMR seems to do a better job of at least accounting for this in its calculations. Looking at the ratings of center fielders, it's hard for me to say which stat is working better...though ZR's negative rating on Beltran does raise my eyebrows:
RankZRPMR
1st
Willy Taveras (+34, rated +10 by PMR)
Beltran (+18 , rated -5 by ZR)
2nd
Curtis Granderson (+34, rated +4 by PMR)
Corey Patterson (+16)
3rd
Corey Patterson (+22)
Joey Gathright (+15, rated +1 by ZR)
4th
Juan Pierre/Reggie Abercrombie (+15, both rated +4 by PMR
Coco Crisp (+13, rated -15 by ZR)
5th
Brian Anderson (+14, rated +4 by PMR)
Aaron Rowand/Johnny Damon (+12, rated -3 and +1 by ZR, respectively)

Right field looks similarly bad at first, but much of the issues there are driven by a single outlier who is rated highly by ZR and poorly by PMR: Brian Giles (rated as +10 by ZR_DIAL, but -21 by PMR). If you remove him from the dataset, the R2 increased from 0.29 for ZR_THT data to 0.34 for ZR_DIAL. OOZ is still not significant in the multiple regression. But overall, as long as you ignore whatever the heck is happening with Giles, right field is very similar to left field--just with a bit weaker fit.

Discussion

Recommendations for ZR calculation

First, I personally find the +/- conversion to be a huge improvement from the traditional way the ZR data are reported, which is the ratio between plays made and balls hit into the zone. A +/- conversion allows anyone to immediately and intuitively assess a fielder's abilities. I highly encourage THT to consider adding two columns to their stats output that automatically report +/- PLAYS and +/- OOZ.

Second, in terms of how to incorporate the OOZ data into the ZR fielding estimate, the data seem to indicate that simply summing the two is perfectly adequate--and advantageous compared to not using OOZ data at all--for all positions except 3B. There, it's best to multiply +/- OOZ by 0.4 to down-weight its effects. I don't have any good ideas why OOZ plays should receive so little weight in this case--maybe it has to do with charging plays on bunts in the infield?

Update: Based on the discussion below, I'm torn about whether to use the coefficient on 3B. While it did result in a better fit with PMR, it could be that PMR behaves strangely with 3B and therefore that we're decreasing the accuracy of the ZR statistic by using that coefficient. Since we don't have an independent reason for adding the 0.4 coefficient (i.e. a "baseball" reason why OOZ plays shouldn't have as much of an impact on ZR at 3B), I'm now inclined to calculate it line all the other positions: [+/- ZR] = [+/- PLAYS] + [+/- OOZ].

Finally, while I've reported all values here in absolute terms, it is possible to convert +/- ZR numbers to a rate-like statistic. My suggestion is to divide the +/- ZR statistic for a player by his BIZ, and then multiply by 400 -- a number that seems to be at the upper end of how many balls a fielder (well, at least non-first basemen) will see in a full season. The actual number you multiply by doesn't matter, of course, as long as you're consistent across all individuals. You could even vary it by position, though I'm not sure if that's really worth doing.

ZR Top-5 2006 Fielders

As a diagnostic, I wanted to close by having a quick look at the top player rankings at each position according to ZR (calculated as recommended above), PMR, and the Fielding Bible (FB; extracted from the Bill James 2007 Handbook).

PositionRankZR+/-PMRFielding Bible
1B
1st
Albert Pujols (+39)
Albert Pujols (+39)
Albert Pujols (+19)

2nd
Doug Mientkiewicz (+24)
Lyle Overbay (+16)
Doug Mientkiewicz (+16)

3rd
Chris B Shelton (+21)
4-TIED at +13
Kevin Youkilis (+10)

4th
Richie Sexson (+19)
Niekro, Morales
3-TIED at +7:

5th
Travis Lee (+15)
Dan & Nick Johnson
Garciaparra/Hatteberg/Lee
2B
1st
Aaron Hill (+21)
Orlando Hudson (+32)
Jose Valentin (+22)

2nd
Jamey Carroll (+16)
Jamey Carroll (+26)
Aaron Hill (+22)

3rd
Jose Valentin (+16)
Chase Utley (+25)
Chase Utley (+19)

4th
Tony Graffanino (+16)
Aaron hill (+24)
Mark Ellis (+13)

5th
T-Ellis/Polanco (+15)
Mark Grudzielanek (+22)
Tony Graffanino (+13)
3B
1st
Scott Rolen (+27)
Joe Crede (+38)
Brandon Inge (+27)

2nd
Brandon Inge (+25)
Pedro Felix (+28)
Pedro Felix (+25)

3rd
Joe Crede (+20)
Brandon Inge (+26)
Adrian Beltre (+23)

4th
Mike Lowell (+17)
Adrian Beltre (+22)
Joe Crede (+22)

5th
Morgan Ensberg (+14)
Freddy Sanchez (+19)
Nick Punto (+15)
SS
1st
Adam Everett (+39)
Adam Everett (+35)
Adam Everett (+43)

2nd
Craig Counsell (+25)
Bill Hall (+28)
Clint Barmes (+27)

3rd
Bill Hall (+23)
Yuniesky Betancourt (+27)
Bill Hall (+18)

4th
Clint Barmes (+23)
Craig Counsell (+19)
Craig Counsell (+17)

5th
Alex Gonzalez (+18)
Clint Barmes (+17)
T-Reyes/Bartlett (+13)
LF
1st
Dave Roberts (+28)
Melky Cabrera (+18)
Dave Roberts (+16)

2nd
Garret Anderson (+18)
Matt Diaz (+13)
Carl Crawford (+15)

3rd
Juan Rivera (+18)
Reed Johnson (+13)
Alfonso Soriano (+15)

4th
Matt Diaz (+16)
Dave Roberts (+12)
Ryan Lanerhans (+15)

5th
Alfonso Soriano (+14)
T-Murton/Fahey (+10)
Jason Bay (+14)
CF
1st
Curtis Granderson (+34)
Carlos Beltran (+18)
Corey Patterson (+34)

2nd
Willy Taveras (+34)
Corey Patterson (+16)
Andruw Jones (+30)

3rd
Corey Patterson (+22)
Joey Gathright (+15)
Juan Pierre (+25)

4th
Juan Pierre (+15)
Coco Crisp (+14)
Curtis Granderson (+18)

5th
Reggie Abercrombie (+15)
T-Damon/Rowand (+12)
Willy Taveras (+17)
RF
1st
Jose Guillen (+23)
Juan Encarnacion (+12)
Randy Winn (+22)

2nd
Randy Winn (+20)
Damon Hollins (+9)
Alexis Rios (+20)

3rd
Reggie Sanders (+17)
Ichiro Suzuki (+9)
J.D. Drew (+19)

4th
Trot Nixon (+17)
Tied-Four @ +8:
Brian Giles (+18)

5th
T-JDDrew/JJones (+16)
Drew/Jones/Freel/Quintin
Ichiro Suzuki (+17)
Interesting to see Brian Giles on the fielding bible list for top right fielders given how much he messed up the ZR vs. PMR calculations. :)

At first blush, neither PMR or ZR seem to follow the Fielding Bible's +/- ratings more closely than the other. To check this, I calculated correlations between PMR, ZR, and the Fielding Bible based on individuals in the Bill James Handbook's 10-top lists (actually, they are partial correlations to factor out the influence of position). Here's the correlation matrix:

FB ZR PMR
FB 1.00 0.42 0.49
ZR 0.42 1.00 0.40
PMR 0.49 0.40 1.00
It turns out that all three variables have almost equal correlations to one another, all ranging between 0.40 and 0.49. There might be a slightly higher correlation between PMR and FB, but not enough for me to worry about.

What this means to me is that we should incorporate both PMR and ZR into our evaluations of player performance. In fact, if you run a general linear model regressing PMR and ZR onto the FB values, both PMR and ZR are both highly significant, with almost identical sums of squares values. This indicates that both contribute useful, independent information than better helps us predict FB (often regarded as the best available fielding stat), and suggests that they should be weighted equally when interpreting player fielding performance.

This has been a monster of a post. :) But with this information in hand, we can make the most of the fielding stats that are available to us! I'm planning to put them towards a review of the 2006 Reds fielding--I hope I can get it done before the start of the regular season! :D

Update: One can convert the +-Plays values into an estimated +-Runs statistic using the runs per play values in this article by Chris Dial. They're probably not perfect conversions, as they're based on a different set to data (Stats Inc.'s zone rating rather than BIS's zone rating), but I bet their close enough.

Sunday, March 25, 2007

Updated Reds Links

With several new Reds' blogs appearing over the past few weeks (yay!), I've updated my list of Reds-related links with these new entries. I've also gone through and removed some that seem to have gone inactive--any that haven't been updated since the end of last season was removed. If your blog got removed and you're still planning to maintain it this year, just let me know (via comment or e-mail and I'll gladly add it back. :)

Also, last night I finished most of the data analysis on a fairly substantial fielding study, which extends some of what I did in this post: How should we calculate Zone Rating? Look for it tonight or tomorrow night, depending on how long it takes to write up. I think the findings are potentially useful enough that it might have gone as a freelance piece to THT, but I ultimately decided to just post it here (so I get full editorial control). Hopefully it'll still be helpful to some people.

Once that's done, I'll finally be able to get my review of 2006 Reds fielding online as well, which should be worth reading. Like the other reviews, the emphasis will be on graphical depictions of Reds player performance. It will be based primarily (at least for non-catchers) on THT's Zone Rating and David Pinto's Probabilistic Model of Range.

Thursday, March 22, 2007

I've got XM Radio!

For my birthday, my wife got me a portable (sort of) XM Radio, the Samsung NeXus 50, to help me stay in touch with the Reds this season. I plan to write up a full review of the MLB XM service and this little audio player after I've had more time to mess around with it, but I wanted to post a few quick impressions of the radio and the XM Radio service:

Pros:
  • The player:
    • Player has a nice look & feel, and very small and compact.
    • Very affordable. There are other options that are a bit nicer in terms of features (see below in cons), but for someone who wasn't willing to shell out several hundred dollars, this was a nice option. There is a sizable rebate that is good through the end of the month.
    • Sound quality from the headphone jack seems very good.
    • Can hold MP3's (if I wanted...though I have an ipod for that), and can record up to 50 hours of XM radio material for listening any time. This can be a single song from a music station, or a complete broadcast.
    • Can output audio with an earphone jack (I have it hooked up to computer speakers through this jack), or through RCA output.
    • Screen is monochrome, but is easy to read and has a nice backlight.
  • XM Radio Service
    • Reception is great, even a few rooms in from the closest windows while at work. I don't get satellite service that far into the building, but XM has a number of terrestrial "relays" here in Phoenix which result in crystal-clear signal. ymmv.
    • Channel 175 (MLB Home Plate channel) has a nice lineup of shows. I listened this afternoon to Chuck Wilson's On Deck this afternoon and found it to be a nice talk show with a nice pace of news, as well as some analysis. The highlight today was an interview with Maury Brown about Extra Innings-Gate (it is enough of a scandal to be called a gate now, right?). Other good looking shows include Charley Steiner's Baseball Beat, Ripken Baseball (with Cal and Bill)...and heck, even Rob Dibble has a show. :)
    • Every baseball game is broadcast. And you can schedule a pre-recording of the show, which allows you to play it back at a later time. Handy with my crazy schedule.
    • There's also an online xm web broadcast available, though the MLB broadcasts are not--I'm guessing because it would compete with MLB's webcasts...
Cons:
  • The Audio Player
    • The big one: this unit has to be hooked up to its cradle in order to play live XM radio. You can play back previously recorded stuff on the go, but if you want to be carrying a unit around and listening to live stuff like MLB broadcasts, you'll need to buy one of the more expensive options.
    • Interface could be better. The menus are logical, clean, and uncluttered, but I really miss the touch wheel on my ipod when trying to quickly scroll through menus--especially long lists of channels. The included remove helps negate this when it's appropriate to use it.
    • The remote works fairly well, though you do have to make sure you're pointing it directly at the unit.
  • XM Radio Service
    • As far as I can tell, only one broadcast from each game is aired at any one time. I don't know how they decide who to broadcast--maybe it's the home team? The result, however, is that I'll probably only hear Marty and Thom/Jeff/Joe about half the time. I'm not super upset about this, but it's irritating.
    • The interface for scheduling pre-recordings is really clunky. It's an add-on to Napster, and not very well implemented. Unlike similar services such as TiVo, the program guide and the scheduling software are completely separate. So, for each game you want to record, you have to note the channel (there are 14 channels, and teams aren't consistently on any one channel), date, and time, and then manually enter it in. I'm spoiled by Tivo, 'cause I find this really onorous.
Overall, I'm reasonably pleased with this setup right now. It gives me the flexibility to listen to Reds games when I want out here in Phoenix, completely free of the constraints of being near a computer (like when I'm in the lab doing research)--and that was my primary goal. It also allows me to do things like pause in the middle of listening to games so that I can be available as a teacher, husband, dad, etc, and yet not miss anything. I'm really looking forward to being able to follow the actual "live" events a bit more this year.

Wednesday, March 21, 2007

WPA PBP @ FG

David Appelman, operator of FanGraphs, recently announced the addition of play-by-play summaries of all ballgames from 2002-2006 on his site. What makes this noteworthy is that he has added WPA (Win Probability Added--think of it as measuring the instantaneous shifts in the chances of winning a game with each play) statistics to each play, allowing you to see, at a level of detail that we've never had before, how each individual play in each game affected the chances that a given team would win or lose. Here's a screenshot I just took from his site featuring the ninth inning of one of the best Reds games last September:
Note the fifth play of the ninth inning involving the newest Reds right fielder. :) It was his last at-bat of the season, as a pinch hitter, in the final home game of the season.

I've posted the WPA graph from this game previously, and I think that the graphical output is still the most intuitive way to understand what win probability statistics do for us. But I'm very excited to have the chance to dig into games now and then and identify just how "huge" a particular strikeout is in a given situation. :) And best of all, fangraphs is planning to release this play by play data live this season, so that we can follow along as things happen. Should be fun.

Sunday, March 18, 2007

Reds sign a Taiwanese Player

Via Global Baseball, the Reds have signed 19-year old Taiwanese LHP Tzu-Kai Chiu to a deal that includes a $200,000 signing bonus. jhelfgott, who runs Global Baseball, suggests that this may be a "statement signing" to indicate that they intend to be involved in future Chinese Taipei (as they are known in international tournaments) player signings.

I'm not crazy about the idea of spending $200,000 on a player who is unlikely ever amount to anything. But I'm very excited to see the Reds getting involved in signing players from the East. The Reds do have a good presence in the Dominican, but aside from that you don't see much indication that they're scouting players that are outside the United States. This is a great first step.

Friday, March 16, 2007

Pings vs. Cracks: should metal bats be banned?

Here's a topic for discussion: should little leagues and high school teams allow metal bats?

Over on Baseball Prospectus Unfiltered (no subscription required), there have been a pair of posts (3/14 & 3/15) by Will Carroll and Kevin Goldstein about a bill approved by the NYC legislature banning metal bats from New York high school teams. The argument is that metal bats cause balls to be hit harder, and therefore are more likely to cause life-threatening injuries. While largely backed up by anecdotal evidence, there have apparently been cases in which kids have been killed by being hit in the chest by a batted ball off of an aluminum bat.

Carroll argues that this bill is just common sense, and that wooden bat manufacture has been improved to such a degree that they are no longer significantly less durable and cost-effective than metal bats. Goldstein argues that cost may still be a factor, but that metal bat manufacturers could address the safety issue by producing bats that are less potent.

Admittedly it's going to be a while yet before my kid is playing little league, but since I'm sure many of you have children in little league, I wanted to ask you folks your take on this issue:
  1. Are you worried about the use of metal bats in little league?
  2. Is there any discussion in your local little leagues about the possibility of eliminating metal bats?
  3. Do you think that wooden bats can still be cost effective (e.g. compare wooden bat cost vs. cost for metal bats used in little league--do kids actually use the $500 whip-handled, carbon fiber bats?)?
  4. How would you compare this controversy to other safety-related controversies, such as the use of face guards on batting helmets? Is the concern more or less real? How much would the use of wooden bats change the game?

Thursday, March 15, 2007

Fehr wants '09 WBC to be in March

The head of the Players Union weighs in on the World Baseball Classic scheduling:
"The problem is, that when you attempt to coordinate the schedules, you back into March," Fehr said Thursday. "On balance you look it and go, 'Well, whatever its benefits would be in terms of some calendar association, it really doesn't work.' ... It's impossible, I think, to do it any other time next time than in March."
Looks like Selig is going to get his way on this one. As I mentioned last year, though, I strongly disagree with this assessment. March is an awful time to hold the Classic. Major league players are rusty, managers treat it as spring training, and injuries, should they happen, are almost certain to impact the regular season.

The alternative, as I see it, is to hold it following the World Series, starting the first or second week of November. Benefits:
  • Most players would receive a month off prior to the classic to recover from the regular season, yet wouldn't be off so long that they'd get rusty.
  • Injuries, should they happen, would be more likely to have healed prior to the start of the next season.
  • Managers wouldn't be under pressure to get stars into shape for the regular season, so they could just focus on winning.
  • It doesn't require interrupting any standard major league activities. Period.
While I'm sure we could come up with downsides to this option as well (it's cold outside of San Diego, Phoenix, and Florida...'course, it's cold in March too), it seems to me that this is the best available solution. I hope it gets more consideration.

Protrade: a Sports Stock Market

Over the past few days, I've been playing around with Protrade. If you haven't yet checked it out, it's a pretty fun site. Essentially, you're able to buy or sell (with fake money--it's a free site) "stock" in different MLB players. That stock price will move up or down depending on the purchasing decisions of other users--presumably, players who perform well will see their shares increase in value (Pujols is currently listed at a ridiculous $432.59/share), while players who do poorly (or are not well-liked) will see their shares decline.

You can also buy players in other sports, as well as teams in the NCAA tournament. Furthermore, while I haven't gotten into it yet, you can spend a lot of your money "shorting" players, where you buy shares expecting them to decline in value, not increase. My only critique about ProTrade is that the available player selection is a bit small. Only 13 Reds players are available, with guys like Alex Gonzalez, Kirk Saarloos, Josh Hamilton, Joey Votto, and Homer Bailey not available for purchase yet.

It's a neat concept, and since it's free (you get $5k in fake money to start, plus another $10k after you make 5 trades in your first week), I went ahead and started an account. My plan has been primarily to target quality young players who should have a good season, moving from prospect to established player status. I'll also take players who are coming off a bad season and that I expect to rebound. As such, here's my current portfolio and current price:

Reds
  • Adam Dunn @ $234.41 - I might drop him, not because I don't think he'll rebound, but because I think the market may already have assumed he'll rebound this season.
  • Edwin Encarnacion @ $170.84 - I think he's going to have a great season, and should be worth ~$200 by season's end.
  • Kyle Lohse @ $76.87 - I think he'll stick in the rotation, and while not a star, should be worth $100-125 or so.
  • Dave Ross @ $99.17 - I'm surprised to see him rated so low. I don't think he'll repeat his '06 performance, but he should be worth average starting catcher money...Say $125?
Other Teams
  • Zach Duke @ $104.41 - Young starter for PIT, I think he'll do well this year and should be worth $150 or so.
  • Conor Jackson @ $167.71 - Up and coming 1B with amazing plate patience. Has had a few good years already, and looks ready to break out.
  • Carlos Quentin @ $166.60 - Might be one to drop. He'll probably have a good year and might increase, but I'm less confident about him than the other ARI kids.
  • Ian Snell SP-PIT @ $131.03 - Another young starter for PIT, with many projections indicating a better season for him than Duke.
  • Adam Wainwright SP-STL @ $159.83 - I think he's poised to have a great season as a starter for STL (finally).
  • Chris Young OF-ARI @ $160.09 - He's an early pick for NL rookie of the year.
Thoughts? Comments? I do not pay very close attention to other ballclubs, so I'm admittedly a bit naive about a lot of those players (though I have been tracking the Arizona kids the past year or so). I'm also reading that the current baseball market is a bit inflated vs. last year. It seems like an above-average starting pitcher last year was worth something around ~$150-200 (tops @ ~$280), whereas an above-average hitter was worth ~$200-250 (tops @ ~$320). ... Come to think of it, shorting might turn out to be a good strategy in the current market. :)

Tuesday, March 13, 2007

Ty Cobb on Probability & Trick Plays

Via TangoTiger, this is a fascinating read. Ty Cobb is one of the more intriguing characters in major league history, and this article, which was written in 1916 and involves an interview with Cobb, really showcases the sort of player he was. Here's an excerpt, from a discussion on scoring from second on an infield ground ball:
Now this demands six separate and distinct operations by three different players. The infielder has to catch the ball and has to throw it to first. The first baseman has to catch it and throw it to the catcher. The latter has to catch it and tag the runner. Now the runner ought to have rounded third base before the infielder got his hands on the ball. If so, that would mean that these six operations had to be performed by three different players in less than the time necessary for a fast man to get from third to home. It needs quick and accurate work on the part of the infield to catch a man on this play where the breaks are right. And quick work that is hurried is seldom perfectly accurate.
There are many things to dislike about Cobb, from racism, to outright arrogance, to his willingness to harm other players on the field. But there's also no question that he was among the great players of his era, and this article gives insight into what made him so great--not just his tremendous physical talents, but his analytical approach to the game and desire to push the limits of what is possible.

The other thing I like about this article is that it helps show, once again, how a purely statistical approach to player analysis can miss some things about player performance--you miss the human factors that affect how games play out. I always try to keep that in mind in my work here, even if my work is primarily statistical--mainly because I don't have the knowledge to do skills-based scouting. :)

Saturday, March 10, 2007

2006 Reds Pitching Review

Continuing where we left off with the '06 Hitting Review, tonight we'll investigate the Reds pitching in 2006.

'06 Pitching Recap


The 2005 Reds had awful pitching. It's a point that just can't be emphasized enough. They led the league in runs allowed, ERA, hits allowed, and home runs allowed, while they were second to last in strikeouts, last in complete games, last in shutouts, and, for what it's worth, last in saves. So to say that the Reds' pitching needed to improve in 2006 is a tremendous understatement.

Fortunately, it did improve. After allowing an unbelievable 889 runs in 2005, the Reds' staff allowed "only" 801 last season, which was 10th best in the 16-team league. Not quite above-average, but 88 runs is huge improvement, representing somewhere around 9 wins difference (hitting lost 71 runs off the 2005 totals, of course, which cut the overall team improvement to somewhere around just 2 wins).
Reds ERA over the past five seasons, broken down by starters and relievers (ref: ESPN). Dotted lines indicate league averages for starters and relievers.

The largest improvement on the team was in the starting rotation, which saw its ERA drop 0.80 runs. In fact, thanks to strong performances by Bronson Arroyo and Aaron Harang, as well as a decent late-season showing by Kyle Lohse, the Reds starters were (slightly) better than league average for the first time in five years. Nevertheless, the bullpen also showed notable improvement, dropping 0.37 earned runs per nine for the second consecutive year behind the surprising Todd Coffey, as well as David Weathers, Kent Mercker, Bill Bray, and (though I barely remember him being around) Matt Belisle.

Graphically Speaking

Again, I'll return to using graphs rather than tables to learn about the '06 Reds. I chose a cutoff of 24.3 innings for the following work, as there's a big drop-off from that total (Franklin) to the next guy on the list (Standridge).

How they got hit...
Vertical and Horizontal lines indicate league averages for OBP-allowed and hr/9 allowed.

This graph is a little unorthodox for evaluating pitchers, but I wanted to start with an analog to the OBP/ISO graphs for hitters. I think it nicely shows the two main ways a pitcher can be hurt: letting opponents get on base, and letting them hit for power. Clearly, the best place to be is in the lower-left, but pitchers can be successful in in the lower-right or upper-left (see Arroyo). If you find yourself in the upper-right, however, you're in bad shape.

Two principle observations:
  • The Reds had four pitchers in the lower-left, with Arroyo just outside the bounds. I'd rate those five guys as our five most reliable returning pitchers.
  • The Reds had a lot of players who allowed home runs at an unusually high rate last year. Part of this is undoubtedly the effect of Great American Ballpark, which routinely has a park factor for HR's among the highest in baseball, despite being only a slight hitters park overall. But we also just have a lot of guys who allow a lot of fly balls.

Walks vs. Strikeouts...

Strikeout rate vs. walk rate. The horizontal and vertical lines represent league averages for strikeout and walk rates. Observations:
  • The best place to be in this graph is in the upper-left: low walks, high strikeouts. Gratifyingly, the Reds had a number of pitchers in that part of the graph, including Harang, Arroyo, Coffey, Bray, and Lohse.
  • I was surprised to see Hammond up there, but his peripherals were actually pretty decent last season (high HR-allowed rate though). It's easy to forget, but Hammond had an outstanding May (0.79 ERA, 7.2 k/9, 0.8 bb/9, 0.8 hr/9) after a miserable April and a rough June, the latter failure leading to his release.
  • The Reds also, unfortunately, had a number of guys in the low-k, high-bb range, most notably David Weathers and Matt Belisle. Weathers, in particular, is concerning. Despite actually improving in ERA in 2006, he saw surges in his walk and home run allowed rates, as well as a drop in his strikeout rates last season. I'm very worried about him as we look toward 2007.

Clutch and choke pitchers for 2006

Unlike among hitters, VORP is not adjusted (afaik) for pitcher role, and therefore serves as an excellent comparison to WPA for pitchers:
VORP (based on classic stats) and WPA (based on win probability changes) for Reds' pitchers in 2006. Regression line is for Reds' pitchers only, as I couldn't easily get WPA and VORP for all NL pitchers.

Observations:
  • WPA, at least for Reds' pitchers, seems to track much more closely to classic scorebook stats (which make up VORP) than it does for hitters.
  • Nevertheless, Weathers, Mercker, and Bray all come out as having a greater impact on game outcomes than you'd predict from their stats. It's good to see a young guy like Bray performing well in that situation. It's also impressive to see Weathers doing so well after his mid-season swoon--he was easily our most successful reliever the last two months of the season, and won my "impact pitcher" awards both of those months.
  • Harang (sort of) and Ramirez seem to be the most notable underperformers. I noticed last season that Ramirez, even before his August implosion, was rarely bringing home much in the way of wins or WPA. This is reflected in his record: the kid went 4-8 despite turning in 10 quality starts. Perhaps we're seeing an indication of a lack of run support?

Surprises and Disappointments:
This graph shows ERA vs. PECOTA-predicted ERA from the 2006 BP Annual. While I didn't agree with all the predictions, they are generally close enough to what I expected to make for a useful comparison.

Observations:
  • Weird pattern, and I think it shows something about the Reds' pitching in 2006. While some pitchers performed expected levels, we had a number of major surprises and, unfortunately, a number of huge disappointments.
  • First, the surprises:
    • Arroyo, of course, leads the list, with an outstanding 3.29 ERA compared to a mid-4's PECOTA projection.
    • Another surprise came from Todd Coffey, who was absolutely brilliant in April (0.60 ERA) and May (1.80 ERA) before struggling from June through August (5.84, 5.59, & 5.79 ERA's). He did put together an excellent final month of the season (2.45 ERA), which helps me think he might yet be able to put up outstanding numbers again.
    • Weathers, Belisle, and Franklin also did better than expected...in Franklin's case, though, that's not saying much.
    • Harang was listed as a surprise, but I think he was underrated by PECOTA last year...and this year, for that matter.
  • As for the disappointments...
    • I think Claussen's implosion/injury was the biggest story that was forgotten last season. He began the season as our #3 starter after a very solid 2005 campaign (4.21 ERA in 167 IP), but was completely ineffective and, ultimately, was placed on the disabled list. I think it may have been a mistake to cut bait with him this offseason--he's still young (28 in May) and has had success in the past. The question is whether he'll be able to come back from his surgery. Apparently the Reds didn't think he could--we'll see if he does anything for the Nationals this year.
    • The Reds also received catastrophically bad performances from Rick White, Chris Hammond, Dave Williams, and Joe Mays. Williams was a fairly big disappointment given his status as the guy we got for Casey, although I mentioned in my early-season profile on him, his peripherals in 2005 were far worse than his respectable 4.41 ERA would indicate. Peripherals also hinted at some loss of skill for Chris Hammond in the 2005 season, so there's a way in which his implosion was also predictable. It's clear to me that Dan O'Brien's team didn't pay much heed to DIPS, and that was to their detriment.

The winds of fortune...FIP vs. ERA

Line indicates a perfect match between FIP and ERA.

Observations:
  • I have to admit that I'm a little disappointed in how well FIP predicted ERA. Compare this, for example, to the tighter fit between PrOPS and OPS (another reason why I think PrOPS is such a good stat). I've been meaning, for quite some time, to revisit and critically evaluate FIP. I rely on it a lot in my player evaluations, but sometimes I wonder about how good it really is.
  • Nevertheless, we can see some things pretty clearly:
    • Some players, most notably Arroyo, Belisle, Weathers, Mercker, and Michalak, grossly overperformed their peripherals. Mercker and Weathers had BABIP's below 0.250, which generally only happens with some outstanding luck. ... I continue to be really worried about Weathers' prospects this season.
    • Arroyo, it should be said, still looks to have had a good season based on his peripherals (low-4's ERA, tons of innings). Just probably not a should-have-been Cy Young contender.
    • A number of players may have pitched better than their ERA's would indicate, including: Kyle Lohse (I continue to be bullish on him this season, as long as he gets his hamstring issues worked out), Elizardo Ramirez, Chris Hammond, Rick White, and Brandon Claussen. Of course, White and Claussen were still predicted to have terrible ERAs, just not as terrible as they turned out to be.
Finally, we'll close with a stats table describing the Reds' 2006 season, sorted by VORP.
Player IP H/9 K/9 BB/9 HR/9 BABIP ERA FIP VORP WPA OBPa
Arroyo 240.7 8.3 6.9 2.4 1.16 0.270 3.29 4.14 64.9 3.16 0.293
Harang 234.3 9.3 8.3 2.2 1.08 0.312 3.76 3.64 50.2 1.92 0.308
Coffey 78.0 9.8 6.9 3.1 0.81 0.320 3.58 3.77 19.8 0.89 0.335
Weathers 73.7 7.5 6.1 4.2 1.47 0.227 3.54 5.28 18.9 1.57 0.309
Milton 152.7 9.6 5.3 2.5 1.71 0.270 5.19 5.35 10.2 -0.18 0.317
Lohse 63.0 10.0 7.3 2.7 1.00 0.323 4.57 3.85 9.5 -0.07 0.327
Belisle 40.0 9.7 5.9 4.3 1.13 0.299 3.60 5.12 9.0 0.15 0.361
Schoeneweis 14.3 5.7 6.9 5.0 0.63 0.205 0.63 4.26 8.6 0.77 0.300
Guardado 14.0 9.6 10.9 1.3 1.29 0.361 1.29 3.07 4.5 0.31 0.310
Mercker 28.3 8.9 5.4 3.5 1.91 0.247 4.13 5.83 4.3 0.63 0.317
Yan 15.0 7.8 4.8 4.2 2.40 0.205 3.60 6.62 3.2 -0.33 0.317
Michalak 35.0 10.8 2.6 4.1 1.54 0.283 4.89 6.33 2.9 -0.35 0.377
Bray 27.7 10.7 7.5 2.9 0.98 0.341 4.23 3.83 2.9 0.41 0.341
Cormier 14.0 13.5 3.9 2.6 1.93 0.340 4.50 6.00 2.5 -0.69 0.379
Franklin 24.3 10.0 6.7 5.9 1.11 0.329 4.44 4.57 1.5 -0.47 0.391
Ramirez 104.0 10.6 6.0 2.5 1.21 0.316 5.37 4.65 1.0 -1.75 0.344
Johnson 8.7 11.4 4.2 0.0 1.04 0.313 3.12 4.14 0.9 -0.13 0.316
Kim 6.7 9.5 5.4 0.0 4.05 0.190 5.40 4.12 0.6 0.00 0.250
Germano 6.7 10.8 10.8 4.1 1.35 0.389 5.40 7.87 0.5 -0.16 0.387
Gosling 1.3 6.8 6.8 6.8 6.75 0.000 13.50 15.97 -1.0 -0.07 0.429
Standridge 18.7 8.2 8.7 6.8 0.96 0.294 4.82 5.09 -1.2 -0.68 0.372
Shackelford 16.3 9.9 8.3 5.5 2.20 0.292 7.16 6.77 -1.8 0.19 0.380
Claussen 77.0 10.9 6.7 3.3 1.64 0.321 6.19 5.39 -2.7 -0.76 0.362
Hammond 28.7 11.3 7.2 1.6 1.57 0.337 6.91 4.40 -3.3 -0.41 0.328
Majewski 15.0 18.0 5.4 2.4 0.60 0.468 8.40 3.88 -3.7 -1.98 0.468
Burns 13.3 20.3 6.1 2.0 1.35 0.519 8.78 4.72 -3.8 0.01 0.500
White 27.3 11.2 5.6 1.6 1.65 0.322 6.26 4.90 -4.1 -0.86 0.339
Mays 27.0 13.3 5.3 4.0 1.33 0.367 7.33 5.07 -4.6 -0.76 0.400
Williams 40.0 12.2 3.6 3.6 2.03 0.302 7.20 6.84 -6.6 -1.10 0.381