Table of Contents

Friday, April 17, 2009

Friday Night Fungoes: Larkin, run estimators, CHONE, and the Dunning-Kruger Effect

Looks like I'm going to the Curve game tomorrow night. Whether I'll post a report may depend on how they do...they've yet to win this year! Weather should be nice, though: 72 degrees & sunny.

Does Larkin Belong in the Hall of Fame? Revisited

I can't remember if I linked to this or not, but even if I have it's worth linking again: Rally has posted season-by-season WAR estimates for all players in the Retrosheet era. He also has a top-300 ranking, so we can look at the best of the past 50+ years using these numbers.

Rally's data include offense, defense (including turning double plays, etc), baserunning, and era-specific position adjustments. This is similar to what I tried to do in my piece on Larkin, but better because of the baserunning & especially the era-specific position adjustments. Here is how the shortstops I included in my Larkin study pan out in Rally's WAR data, plus a number of others who came up in discussions following my Larkin piece:

Alex Rodriguez 82.4 96.5
Cal Ripken+ 73.8 91.2
Robin Yount+ 62.1 75.9
Barry Larkin 57.6 70.1
Ozzie Smith+ 45.3 67.6
Alan Trammell 53.7 66.8
Derek Jeter 49.5 62.4
Ernie Banks+* 54.8 59.2
Luis Aparicio+ 30.9 50.5
Toby Harrah
Omar Vizquel 28.0 45.2
Bert Campaneris
Normar Garciaparra
Tony Fernandez
Miguel Tejada 39.2 40.9
Jay Bell
Davey Concepcion 26.0 34
Mark Belanger 26.2 32
Edgar Renteria
Chris Speier
Bill Russell
Rick Burleson
Freddie Patek
Steve Sax
Larry Bowa
Bucky Dent
Don Kessinger
Tim Foli

Some players got a big boost in their rating, like Ozzie and Aparicio, once you include baserunning (I only included SB's & CS's) and double play turning. But as you can see, the results are more or less the same as far as Larkin is concerned: probably the 4th best overall, and 2nd-best pure shortstop in the Retrosheet ERA, at least based on total contributions to their ballclubs relative to their league.

Career-level WAR accumulation isn't the be-all, end-all of hall of fame voting, as peak performance is also important. But in Larkin's case, career WAR is crucial. No one disputes that he was brilliant player when healthy. The knock on him is that he didn't play enough due to all of his injuries. These data clearly indicate that his total contribution, including playing time, was among the best in baseball history at his position.

Rally has Larkin as the 30th-best position player of the Retrosheet era. I know he almost certainly will not be a first-ballot Hall of Famer, but he probably should be.

Why I'm trying to stop using OPS

Colin followed up his study posted last week on run estimators with an improved method. This time, instead of looking at half-inning or even game-level combinations of team offense, he instead focused on identifying the average value of particular offensive events to games. His methodology was to take matched games--games that had the same numbers of major counting events, but that differed in how many of one specific event they contained.

For example, he might have a game with 5 singles, 3 doubles, 1 homer, and 3 walks, and he'd compare that to a game with 5 singles, 3 doubles, 1 homer, and 4 walks. Finding the average difference in runs scored between pairs of games like this would tell you the average value of a walk in runs. He then compared those actual differences in runs scored to the expect difference in runs scored according to a variety of run estimation mechanisms.

The results? Linear weights-based methods did the best. This includes his "house" linear weights (which he kindly shares), as well as manipulations of linear weights like wOBA. A bit behind them were GPA (aka 1.7 OPS), Base Runs and BPro's EqR, followed a bit more distantly by Bill James' Runs Created. The worst of the bunch were the OPS-based methods, as well as the even-more-horrible Total Average (bases/outs).

This is strong evidence that we should more or less stop using OPS to evaluate hitters. It's unnecessary, given how easy wOBA is to calculate. Is it better than batting average? Sure, of course. But it misses badly enough and often enough that we should really move past it. It's a tough habit to break, but it's time to wOBA, folks.

CHONE is a really good projection system

Matt has a fairly exhaustive projection roundup here. He notes that each system seems to have its own strengths, but often also some weaknesses:

--CHONE was the best at projecting most things.

--PECOTA was very close behind but had some systematic biases, specifically for speedy players' BABIPs, which ZIPS struggled with as well.

--ZIPS is behind the other systems, except it does quite well with projecting the three true outcomes for players over 35.

--CHONE does better with older players in general, since its specialty is aging curves, but PECOTA does better at finding comparable players for younger players for whom less data is available (unless they fall into the speedster category).

--OLIVER clearly contends and even takes the lead at some things--especially at projecting hitters with lower homerun totals and other players significantly affected by park effects. However, OLIVER under-projects walks and strikeouts systematically and over-projects homeruns systematically, and could probably be improved by adjusting how those outcomes are computed.

The nice thing about this is that we can use this information to give more or less weight to a given projection system when it differs from others in predicting a given player's performance based on the sort of player we're looking at. Or, we can do what I've essentially decided to do around here, which is to just use CHONE. :)

It's worth noting that Matt's is just the latest projection roundup in which CHONE did particularly well. Whether it will continue to do so in the future is an open question, of course, but the data suggest that it's as good as they come.

The Dunning-Kruger effect

JC posted about this terrific psychological concept: that people incompetent in a particular discipline will massively overestimate their competency in that discipline. That's pretty much the definition of a baseball fan, isn't it? :)

I'm jesting, mostly. You certainly see arguments between baseball fans who really know their stuff and baseball fans who just think they know their stuff. And I tend to think that most of what you hear on talk radio (sports, or otherwise) involves people who fall into the latter category rather than the former category. And, of course, I tend to think that on at least some issues (some areas of biology, some areas of baseball research, etc), I fall into the reasonably competent category.

But the great part of this is that the Dunning-Kruger effect predicts that we'll have a very hard time being able to tell whether we're competent or not...because the more incompetent we are, the less we'll realize it! :)


  1. On OPS: I'd say that it really depends on your audience. For sites like yours, BtB, or THT, I think you guys should be ready to move on. For a site like Red Reporter or Redleg Nation, we've still got people who are catching up, so a mix is necessary so that people can transition. And obviously the mainstream media still thinks that OPS is a brand new concept, so they have a lot of catching up to do.

    My point is that while wOBA is more accurate, it is also much less intuitive. I'd love to start using it with more frequency, but often times I just want to make a general point about a player and I don't want to have to explain what wOBA is every time. That's why typically I just post a slash line and hope that's enough for people to grasp what I'm trying to say.

  2. If it's a general point about a player, I think OPS is fine if it aids communication. I'm thrilled to see ESPN showing OPS on hitters, for example.

    When it comes to any kind of research, we need to use wOBA. You just miss too much with OPS. The more I compare the two in real world examples, the more convinced I am of this.

  3. On the Dunning-Kruger effect: Wouldn't it follow that, unless you are a trained psychologist, the more convinced you become that another person is overestimating their competence, the less likely they are to actually be doing so?

  4. Maybe it's the psychologists that overestimate their competence in assessing the competence of others?

  5. Dunno, I'm not a psychologist.