- Part 0 - Introduction by Slyde - Covers some of the more useful reference sites on the web.
- Part 1 - Park Factors by boobs - An overview of park factors, how they work, and what they mean.
- Part 2 - AVG vs. OPS by BLee2525 - A comparison of batting average and OPS (on base plus slugging), and why OPS is more informative about offensive production than batting average (shockingly, he cites an old set of data I posted at RR a long time ago! I'd forgotten about that study, and always meant to revisit it in more detail).
- Part 3 - The Matrix and Smallball by Red Menace - An overview of how different in-game strategies affect how teams score runs. It includes a much better treatment of sacrifice bunting than I did previously, so kudos.
- Part 4 - Creating Runs by sidnancy - An overview of how to estimate total offensive contribution of players, using runs created, linear weights, or value over replacement player (VORP).
Friday, July 20, 2007
Red Reporter Sabermetric Overview Series
I wanted to highlight some excellent work I just (belatedly) saw over at Red Reporter in the diaries, the Sabermetric Overview Series. Individuals that frequent that site have thus far posted five readable, informative, and most of all useful articles recapping some of the basic and relevant results of research on baseball. Here are the articles thus far (I've also linked to them in my sidebar, as I think this is a pretty neat initiative and I'll want to refer to these articles and then):
Pythagorean Standings
In honor of the Reds tying the Everett-less Astros for last in the NL Central last night, I thought it would be fun to take a look at the standings based on teams' Pythagorean records, which are based on their runs scored and runs allowed per game. Here they are:
In two cases, the NL Central and the NL West, the top two teams are reversed. Otherwise, the leaders are maintained, though some teams look closer or further from contention than they actually are. Pythagorean Wild Card leaders are currently the Dodgers over the Brewers in the National League, and the Yankees over the Indians in the American League.
There's a sense in which these standings are irrelevant, of course, as these games have already been played and the wins/losses determined. Nevertheless, all things being equal, we can probably expect teams to regress toward their Pythagorean records over the course of the rest of this season. So as a follow-up, I was interested in seeing which teams were helped and hurt the most by the reality of playing actual games (as opposed to just scoring and allowing runs). Because teams play against one another, I opted to compare actual Games Back with the Games Back according to the above Pythagorean standings.
Here are the differences (positive values indicate teams that are doing better than predicted, whereas negative values indicate teams that are doing worse than expected):
A combination of a an unfortunate Pythagorean division leader (San Diego) and a very fortunate Arizona team puts the D-backs seven games better in their division than you'd expect based on run differentials. Seven games! The Dodgers, Cardinals, Rockies, and Indians round out the five most lucky (or, perhaps, overperforming?) teams.
At the opposite end of the spectrum, Oakland is about eight games behind where they "should be" in their division, thanks to an "overperforming" Angels ballclub and an A's team that is four games below expectations. Interestingly for the New York/Boston-centered sports media, the Yankees also show up as being much worse than run differentials predict, one game behind the Red Sox according to Pythagoras--and that's all on the Yanks, as Boston is spot-on with their predicted record. Other top "unlucky"/"underperforming" teams include the Rangers, Reds, Orioles, and Cubs.
It will be interesting to compare these predicted standings to actual standings at the end of the season. My guess is that teams will be ranked closer to their present Pythagorean standings than their present actual standings.
Pythagorean Standings - As of 7/18/07 | ||||
National League East | ||||
Team | Wins | W% | Games Back | Real position |
NYN | 50 | 0.532 | 0 | 1st |
ATL | 48 | 0.505 | 2.5 | 2nd, 2.5 out |
PHI | 47 | 0.500 | 3 | 3rd, 5 out |
FLA | 44 | 0.463 | 6.5 | 4th, 7.5 out |
WAS | 36 | 0.383 | 14 | 5th, 13 out |
National League Central | ||||
Team | Wins | W% | Games Back | Real position |
CHN | 52 | 0.559 | 0 | 2nd, 3.5 out |
MIL | 51 | 0.543 | 1.5 | 1st |
CIN | 45 | 0.474 | 8 | 5th, 13.5 out |
HOU | 42 | 0.442 | 11 | 5th, 13.5 out |
STL | 39 | 0.429 | 12 | 3rd, 8.5 out |
PIT | 39 | 0.415 | 13.5 | 4th, 13 out |
National League West | ||||
Team | Wins | W% | Games Back | Real position |
SD | 55 | 0.591 | 0 | 2nd, 1 out |
LAN | 52 | 0.547 | 4 | 1st |
COL | 47 | 0.500 | 8.5 | 4th, 5.5 out |
SF | 44 | 0.478 | 10.5 | 5th, 13.5 out |
ARI | 45 | 0.469 | 11.5 | 3rd, 4.5 out |
American League East | ||||
Team | Wins | W% | Games Back | Real position |
BOS | 56 | 0.596 | 0 | 1st |
NYA | 54 | 0.587 | 1 | 2nd, 7 out |
TOR | 47 | 0.500 | 9 | 3rd, 11 out |
BAL | 46 | 0.489 | 10 | 4th, 14 out |
TB | 35 | 0.376 | 20.5 | 5th, 18.5 out |
American League Central | ||||
Team | Wins | W% | Games Back | Real position |
DET | 56 | 0.609 | 0 | 1st |
CLE | 52 | 0.553 | 5 | 2nd, 2 out |
MIN | 51 | 0.543 | 6 | 3rd, 8 out |
KC | 44 | 0.468 | 13 | 5th, 16 out |
CHA | 40 | 0.430 | 16.5 | 4th, 14.5 out |
American League West | ||||
Team | Wins | W% | Games Back | Real position |
LAA | 51 | 0.548 | 0 | 1st |
OAK | 49 | 0.516 | 3 | 3rd, 11 out |
SEA | 47 | 0.511 | 3.5 | 2nd, 1.5 out |
TEX | 43 | 0.457 | 8.5 | 4th, 14.5 out |
In two cases, the NL Central and the NL West, the top two teams are reversed. Otherwise, the leaders are maintained, though some teams look closer or further from contention than they actually are. Pythagorean Wild Card leaders are currently the Dodgers over the Brewers in the National League, and the Yankees over the Indians in the American League.
There's a sense in which these standings are irrelevant, of course, as these games have already been played and the wins/losses determined. Nevertheless, all things being equal, we can probably expect teams to regress toward their Pythagorean records over the course of the rest of this season. So as a follow-up, I was interested in seeing which teams were helped and hurt the most by the reality of playing actual games (as opposed to just scoring and allowing runs). Because teams play against one another, I opted to compare actual Games Back with the Games Back according to the above Pythagorean standings.
Here are the differences (positive values indicate teams that are doing better than predicted, whereas negative values indicate teams that are doing worse than expected):
Team | ExpGB-ActualGB |
ARI | 7 |
LAN | 4 |
STL | 3.5 |
COL | 3 |
CLE | 3 |
TB | 2 |
CHA | 2 |
SEA | 2 |
MIL | 1.5 |
WAS | 1 |
PIT | 0.5 |
NYN | 0 |
ATL | 0 |
BOS | 0 |
DET | 0 |
LAA | 0 |
FLA | -1 |
SD | -1 |
PHI | -2 |
TOR | -2 |
MIN | -2 |
HOU | -2.5 |
SF | -3 |
KC | -3 |
CHN | -3.5 |
BAL | -4 |
CIN | -5.5 |
NYA | -6 |
TEX | -6 |
OAK | -8 |
At the opposite end of the spectrum, Oakland is about eight games behind where they "should be" in their division, thanks to an "overperforming" Angels ballclub and an A's team that is four games below expectations. Interestingly for the New York/Boston-centered sports media, the Yankees also show up as being much worse than run differentials predict, one game behind the Red Sox according to Pythagoras--and that's all on the Yanks, as Boston is spot-on with their predicted record. Other top "unlucky"/"underperforming" teams include the Rangers, Reds, Orioles, and Cubs.
It will be interesting to compare these predicted standings to actual standings at the end of the season. My guess is that teams will be ranked closer to their present Pythagorean standings than their present actual standings.
Reds top the Braves in 15!

- Harang looked awesome. Goes without saying given his pitching line in this game, but he looked much more dominant than the last time I saw him (vs. Diamondbacks). ... That little butcher boy double of his is the first time I can remember actually seeing him get a hit. Maybe he should swing like that more often. :) It's especially impressive given that he'd just lost his grandfather.
- How about Pete Mackanin's five-infielder maneuver? First time I can remember seeing that particular move in years, but it reminded me of some of the things that Tommy Lasorda used to do. My personal favorite, which was cited in The Book, was sending a right-handed pitcher to left field while he brought in a LOOGY to face a lefty, and then bringing the righty back in to face the next hitter. Wouldn't mind seeing that move more often, as it can save an arm and yet still give a the best platoon match-up.
- Phillips has a flair for the dramatic, doesn't he? I do love his energy.
- The Reds now have two sweeps this month, after not having one since last July at the beginning of the month. I knew they'd win sometime this year, and it's been fun. The Reds haven't exactly been dominant, but a win is a win. Enjoy it while it lasts.

Wednesday, July 18, 2007
My Brewers Analysis at THT
Since the most interesting thing about the remainder of the Reds' season is going to be who gets traded, and for whom, I've decided to take a bit of time to learn about the teams that we'll likely be seeing in the playoffs this year. The Hardball Times has been kind enough to host the first of these analyses, focusing on the NL Central-leading Milwaukee Brewers.
Next up is San Diego, but I want to learn a bit more about park factors before I tackle their team.
Update: By some sort of amazing providence (or industrial espionage), there's a piece on the Brewers at Baseball Prospectus today as well. It serves as a nice complement to my profile.
:)
Next up is San Diego, but I want to learn a bit more about park factors before I tackle their team.
Update: By some sort of amazing providence (or industrial espionage), there's a piece on the Brewers at Baseball Prospectus today as well. It serves as a nice complement to my profile.
:)
Tuesday, July 17, 2007
Book Review: Baseball Between the Numbers
About two years ago, I started to get really interested in learning about the current state of baseball research. I'd read Moneyball and Numbers Game, and was starting to actively apply the things I'd learned from those books to what I was seeing on the field with my Cincinnati Reds. But as I've continued to try to learn more about what has and has not been done, I've noticed that it can be pretty difficult to find a comprehensive "review" of current research about baseball. There are certainly great sources out there, like Tom Tango's blog and website, The Hardball Times' glossary, and more recently, the BP Toolbox series by Derek Jaques. But if someone came to me asking where they can go to get an overview of current research, I'd have a hard time deciding where to send them.
That is, until I read Baseball Between the Numbers. Edited by Jonah Keri and featuring contributions from a variety of authors from Baseball Prospectus, the goal of this book seems to have been to provide a near-comprehensive view of current baseball research on all facets of the game. They sought to cover everything from performance analysis and on-field strategy to long-term team development and the business side of baseball. They wanted to communicate these findings accurately, all the while keeping the writing approachable to the average intelligent fan. They are remarkably successful in accomplishing these goals. While certainly not perfect, this book is a wonderful resource to someone, like myself, who is trying to learn about all the great work that has been done over the past several decades to better understand baseball.
Here are a few snippits of things I learned while reading this book.
All that said, there are some notable critiques that one could levy toward this book. First, if one didn't know any better, one could easily walk away from this thinking that all the great baseball research has been done by either Bill James or Baseball Prospectus. It is true that there is some excellent research that goes on at that site, but this text ignores much of the work done outside of their group--and sometimes to their detriment. For example, all discussion of defensive statistics relies on Clay Davenport's defensive translations which, while certainly better than fielding percentage and range factor, fall short when compared to more detailed play-by-play metrics such as Zone Rating, Mitchel Lichtman's UZR, or David Pinto's PMR, all of which have been around and available for many years. There is only a brief mention of these alternatives in the chapter on fielding, and they are said to report results that are simply "very close" to that reported by Davenport's translations. In fact, differences do exist, and they can be substantial.
Those concerns speak to my other major issue with the book, and that is its tone. While not the case in all essays, a substantial number of the articles are written in a tone that is overwhelmingly authoritative--and unfortunately, far more so than the data justify. For example, after failing to detect significant year-to-year correlations between rates at which hitters ground into double plays, James Click writes that "Anyone who's seen many catchers lumber down to first knows that beating out a double play can't be entirely random. Yet the lack of any year-to-year consistency assures us that it is." What should be said is that the effect, if it exists, has not been large enough for us to detect, and is likely confounded by a variety of factors. In fact, there may still be differences, and you may indeed encounter players with unusually high or low double play rates--it's just that the effect is not as strong or common as one might expect it to be.
In this case, as well as other cases in this book, the conclusion seems to me to simply be too strong for what the data can show. There is always a certain degree of uncertainty in data analysis of any kind, and it is a mistake when researchers, be they amateurs or professionals, overstate the strength of their conclusions. In fact, it actually hurts one's arguments when one does this, because when a rare exception comes along that bucks the population trend, it makes one's arguments seem to lose all credibility. It is far better to take the cautious approach and stay within the bounds of one's data, which includes identifying and discussing potential shortfalls of one's analysis. This problem, of course, is not limited to baseball research, and is something that I also see from time to time among papers that I read as part of my professional work as a biologist.
Despite these criticisms, the scope, readability, and, with a few exceptions, accuracy, of this book cause me to highly recommend Between the Numbers. It serves as a great primer, and a great review, of much of the modern research in baseball, and yet it remains highly readable. Any fan interested in using statistics to understand the game of baseball would be well served by picking up this book and giving it a thorough read. Just be sure to read it with the same critical eye that these authors turn towards the conventional wisdom of baseball.

Here are a few snippits of things I learned while reading this book.
- Closers should be brought in earlier in the ballgame when the situation allows it, though using a closer in a standard closing situation isn't a terrible use of one's ace reliever.
- Earned Run Average is probably not as good a measure of pitching performance as runs allowed per nine innings.
- Teams leading late in the ballgame can be well-served by playing for one run.
- Bunting a runner to third with zero outs is often a better use of an out than bunting a runner to second with zero outs.
- Players do, in fact, perform better in their "walk" year prior to free agency.
- New stadiums are almost always a great deal for the team, but a bad deal for the city that pays for it.
- Some kinds of players lose effectiveness sooner than other players--it's predictable, to some degree.
- There actually is a measurable "clutch" skill that differs among players...though it's really small.
- The playoffs aren't a complete crapshoot--particular types of teams do tend to do better than others.
All that said, there are some notable critiques that one could levy toward this book. First, if one didn't know any better, one could easily walk away from this thinking that all the great baseball research has been done by either Bill James or Baseball Prospectus. It is true that there is some excellent research that goes on at that site, but this text ignores much of the work done outside of their group--and sometimes to their detriment. For example, all discussion of defensive statistics relies on Clay Davenport's defensive translations which, while certainly better than fielding percentage and range factor, fall short when compared to more detailed play-by-play metrics such as Zone Rating, Mitchel Lichtman's UZR, or David Pinto's PMR, all of which have been around and available for many years. There is only a brief mention of these alternatives in the chapter on fielding, and they are said to report results that are simply "very close" to that reported by Davenport's translations. In fact, differences do exist, and they can be substantial.
Those concerns speak to my other major issue with the book, and that is its tone. While not the case in all essays, a substantial number of the articles are written in a tone that is overwhelmingly authoritative--and unfortunately, far more so than the data justify. For example, after failing to detect significant year-to-year correlations between rates at which hitters ground into double plays, James Click writes that "Anyone who's seen many catchers lumber down to first knows that beating out a double play can't be entirely random. Yet the lack of any year-to-year consistency assures us that it is." What should be said is that the effect, if it exists, has not been large enough for us to detect, and is likely confounded by a variety of factors. In fact, there may still be differences, and you may indeed encounter players with unusually high or low double play rates--it's just that the effect is not as strong or common as one might expect it to be.
In this case, as well as other cases in this book, the conclusion seems to me to simply be too strong for what the data can show. There is always a certain degree of uncertainty in data analysis of any kind, and it is a mistake when researchers, be they amateurs or professionals, overstate the strength of their conclusions. In fact, it actually hurts one's arguments when one does this, because when a rare exception comes along that bucks the population trend, it makes one's arguments seem to lose all credibility. It is far better to take the cautious approach and stay within the bounds of one's data, which includes identifying and discussing potential shortfalls of one's analysis. This problem, of course, is not limited to baseball research, and is something that I also see from time to time among papers that I read as part of my professional work as a biologist.
Despite these criticisms, the scope, readability, and, with a few exceptions, accuracy, of this book cause me to highly recommend Between the Numbers. It serves as a great primer, and a great review, of much of the modern research in baseball, and yet it remains highly readable. Any fan interested in using statistics to understand the game of baseball would be well served by picking up this book and giving it a thorough read. Just be sure to read it with the same critical eye that these authors turn towards the conventional wisdom of baseball.
Sunday, July 15, 2007
Sicko

We saw Sicko. This is the third Michael Moore film I've seen, and it was the best of the three.
I'm going to be brief about this: I feel disappointed in and embarrassed for my country right now. Yes, like all Moore films that I've seen, it is one-sided, and prone to oversimplification and hyperbole. But also like all of his films that I've seen, there is enough substance to his argument to be both compelling and impossible to ignore.
I can't pretend to know the best solution for health care in this country. I'm going to wager that it will probably not be exactly the same as what works for other countries. But I am certain of two things. One, there is a very real and very serious problem here, and it is completely unacceptable. And two, we as a country have a moral obligation to fix it.
...I won't let my politics become a regular intrusion on my postings here, as I know that's not why you folks come here. But at the same time, part of the reason I keep this blog independent, despite a variety of offers to leave and join some larger network or group, is so I can do what I want with this space. So I'm going to go ahead and be off topic this time. Future posts will be back to focusing on what's wrong with the Reds. Mostly.
Saturday, July 14, 2007
Top division in baseball?
This week in John Dewan's Stat of the Week (which is often a quick and fun diversion, and is highly recommended reading), he looked at which divisions were the best in baseball by tallying up their records in inter-divisional games (including interleague). Here are his results:
Is anyone surprised to see the NL Central at the bottom? ... makes the Reds record all the more embarrassing, I suppose..
I probably would have guessed that the AL Central would be the top dog, but the AL West is also a pretty strong division...
Division W L Pct. AL West 137 118 .537 AL Central 154 137 .529 NL West 139 125 .527 NL East 129 135 .489 AL East 145 159 .477 NL Central 133 163 .449
Is anyone surprised to see the NL Central at the bottom? ... makes the Reds record all the more embarrassing, I suppose..
Subscribe to:
Posts (Atom)