Monday, June 23, 2014

Branch Rickey & Allen Roth in 1954

In the history track in Sabermetrics 101 this week (the 4th week), we read an article by Branch Rickey that appeared in Life Magazine in 1954 describing his and Allen Roth's efforts to develop a model that would predict team success.  Here's the model that is at the heart of the article:
To break it down:
  • The top row is the Offense term.  It is essentially OBP + 0.75*ISO + Clutch.  Clutch was a catch-all term that tracked how often a team scored once runners were on base, and includes clutchiness, baserunning, luck, etc.
  • The bottom term is the Defense term.  It includes opponent batting average, the Walk+HBP term of opponent OBP, Opponent "Clutch", and a strikeout term (weight 1/8th...this was presumably necessary because it's just the extra value of a strikeout over and above what is already tracked in the batting average term).  F is fielding (independent of the other values), which Rickey & Roth basically punted.  In fact, they have a great line in the article: "There is nothing on earth anybody can do with fielding."  They just assigned it a zero and moved on, hoping it wouldn't matter that much.
Therefore, the equation amounts to:

Offense (O) - Defense (D) = G

Where G is a stat that will track run differential quite well.

Neat, right?  

There are some problems that I see.  First, it seems like the ISO term is confounded with the R term, because a lot of the value of extra-base hits lies in driving runners home (and vice versa).  The second is unquestionably the over-emphasis on BABIP when tracking pitching performances (especially when they start relating this to individual pitcher performances; this was pre-Voros McCracken, after all!).  And there's also the lack of separation of the unique effect of the home run.  And finally, the units are sort of a mish-mash of arbitrary ratio units, rather something that has immediate meaning like runs or wins.

In short, it's not Base Runs.  But it seems to work pretty well, based on the work they did on it in the 50's.  The article itself is a great read, with a ton of great quotes.  I highly recommend it.  It's neat to think that this kind of thing was happening 60 years ago...and how hard it must have been to do the analysis, before the days of excel, mysql, and statistical packages!***

***At one point in the article, they mentioned sending off their data for six weeks(!) to a stat department at an institution for "correlation analysis."  What would have taken a couple of hours today (mostly just getting the data together) took WEEKS of work using mechanical calculators, slide-rules, and lots of paper computation.