Table of Contents

Sunday, June 21, 2009


Rob Dibble, pitching for the Cincinnati Reds i...Rob Dibble's career looks better with WAR than with win shares. Image via Wikipedia

I've been exchanging emails with a friend about Rally's WAR data (and competing systems) and it turned into a nice Q&A that I though I'd post here. Quotes are my friend, the rest is me.
1) Is it safe to say that WAR has a much higher bar than WS? (any bad regular can accumulate WS in a season, but with WAR a bad regular will be around +/- 1.0 WAR)
Yes, WS uses too low of a baseline. If a replacement player (e.g. Willie Bloomquist) plays enough, they will have positive win shares but a zero WAR. This is why Bill James has finally started developing "loss shares," which is a (clunky, in my view) way of dealing with this problem. Of course, James isn't publishing loss shares yet, so it's hard to know what to make of them (if anything). Rally's stuff also uses a better fielding metric than James's stuff does, so at this point I think it's safe to say that WS is vastly inferior to what Rally's selling (which I have purchased, fwiw--it's exactly what I was hoping for, although you do have to mesh it with a database...I stuck it in Lahman's get the retroID's to match up with names. No biggie, as I'm trying to learn to use a database as it is).
2) WAR includes everything in some form - defense, baserunning, position adjustments, ballpark/league adjustments (which is not that different from WS except I don't think it had baserunning). Am I missing anything?
I'm not sure about league adjustments, though I wouldn't be surprised. It does include range, arms, dp turning (all w/ TotalZone), baserunning (beyond just sb's, also advancing 1st to third and stuff like that), park adjustments, etc. Rally's pretty awesome. :)
3) For the purposes of explaining WAR, what would you say the scale is? I'm guessing at something like:

2.0 - Useful Player
5.0 - Good Regular/All Star Candidate
7.0 - MVP Candidate
10.0 - MVP in a normal season
2.0 = MLB average player playing roughly a full year. I think good regular is ~3 WAR, but yeah 5 WAR = allstar. 10 WAR = Pujols, 12 WAR = Bonds. :)
Does that sound right? Is it different for pitchers?
Yeah, I think pitchers tend to score a tad lower, at least at the high end. Clemons had a few 10 WAR seasons, but Sabathia last year was ~7 WAR (across leagues) for example, as was Halladay. I still think of an average starter as ~2 war, though.
4) Are there differences between WAR on the various sites (or for that matter, BP's revised WARP figures). I know you favor WAR, just want to understand in a nutshell the differences.
Rally's WAR is calculated almost exactly as Tango does it, which is also how FanGraphs does it (though Rally has more baserunning info than fangraphs, plus reached-on-errors). Hitting=lwts, pitching=BsR + pythagopat, fielding=best available system. Everyone doing WAR is using the same baseline, which is ~2 wins below average per season (currently they're using 2.5 in the AL because it's so much better, but that only goes a few years back).

WARP, the new version at least (as of ~february), is much closer to WAR in its baseline (the old version assumed replacement players are atrocious fielders, which has zero empirical support). But it suffers from a few remaining flaws, like the use of offense-based position adjustments instead of fielding-based ones (there are some years when CF's hit better than LF's, and as a result WARP gives LF's a bonus over CF's...which is beyond absurd given the differences in fielding difficulty between those two positions). WARP also, I don't think, does not use a different baseline for relivers and starters (starting is harder than relieving, the same pitcher will put up better numbers as a reliever than a starter), and doesn't recognize leverage for closers like Rally's stuff does. WARP isn't terrible anymore, and I like it better than WS now. But WAR is a bit more current in its research underpinings, mostly because it's based on a collaborative effort of lots of extremely smart people (Tango, MGL, Rally, Patriot, and all the other people over there) instead of just one extremely smart guy (Clay Davenport).

If James's name wasn't attached to win shares -- or to runs created, for that matter -- it would have disappeared by now. But he's deservedly a giant, even if one who is sort of being left behind these days, and so his stuff remains in use even after it's obsolete.
5) Does WAR "favor" peak value vs. career value? To give an extreme example of two players Reds players - Ron Oester totaled 9.3 WAR as a Red vs. Ron Dibble's 9.4. In WS, Oester has 112, vs 63 for Dibble. Thoughts?
WAR is pure career value. You can look at peak value by pulling those years out and doing something to them, but career WAR is just career value.

The reason that Oester tops Dibble in WS but not WAR is because of the problem of win shares' baseline. You can be a crappy player for a long time and accumulate a lot of win shares, while you might not get any WAR. WAR requires a higher level of play to get "credit." It's not an exceptionally high bar, but it's higher than win shares' baseline. I'm sort of guessing here, but if a AAAA player is the baseline for WAR (which is about right), then a AA/AAA player might be the baseline for WS. That means that in WS, a AAAA player playing 20 years in the MLB would get a lot of WS but no WAR to speak of.

If and when James ever publishes his loss shares (probably in some new book or something), what we'll find is that Oester accumulated far more loss shares than Dibble. And as a result, his career contributions (win shares - loss shares) are roughly the same as Dibble's. In other words, when we eventually get the data we need from James, we'll finally get to the point that we're already at with Rally's WAR.

FWIW, I do tend to think that peak has to be taken into account, and not just accumulated career value, when you're talking the Hall of Fame. For that reason, I'd definitely rank Dibble over Oester. Dibble was pure badass dominance for a short number of years, which included that 1990 team. Oester was a decent player for a while, and had one genuinely good year in '85 (if totalzone is to believed, his fielding was better that year...probably in part random error though), but Oester was never ever ever dominant.


And then in another email a bit later...
For the hitters:

I assume Bat is batting runs, BSrun is base running runs (turns out Pete was a pretty good baserunner - I was always curious if he was "too aggressive" but apparently not). DP is a debit/credit for hitting into DP's.

On the defense...

Total Zone - is that runs above average? I hope so, or I'm going to have rethink Concepcion as a fielder.

IF DP - runs over/below average given opportunities?

OF Arm - same as above for OF

Catcher - is this some sort of fielding total for catchers based on CS, PB, WP?

Could all of these be totaled to create a total defensive runs saved?
YES to all above. That is what is done when calculating WAR.
The Total looks like the sum of all the runs, but then there is a Position adjustment. What does that represent?
It's an era-specific adjustment for the position a player plays. So, SS has the best fielders, and thus the highest level of competition. So, if you're an average fielding shortstop, you're an above-average fielder, so you get a bonus to account for that. Rally also made these adjustments specific over history, as the differences in quality among positions hasn't been constant.

Also, this is the only place that position comes into play. Offensive batted runs numbers are straight-up offense, without consideration for position. This is because the offensive-based adjustments you see in WARP or VORP, for example, assume all positions have equal talent levels. And that's not true. Second basemen and third basement are roughly equal fielders in modern baseball, but third basemen are better hitters. That makes 3B's a more talented position than 2B's. If you just do position adjustments by offense, you miss that and overrate 2B's relative to 3B's.
The "rep" column - is that replacement run level given the playing time for the season?
It's the difference between an average player and a replacement player, pro-rated for playing time. Since everything else is given vs. average,
And what is the RAR column before the WAR? I'm assuming it is a calculation of some form, but I can't figure it out.
"Total" is offense (bat + BSrun)
RAR is everything (offense + all fielding + position adjustment + replacement)
WAR is RAR converted to wins (runs divided by league average runs per mlb game, ~9.4 r/g or so)
On the pitching side...

The runs must be how many the pitcher gave up, and the Rep runs is what a replacement pitcher would have given up given the innings.=Def is the defense behind the pitcher.

So pitchers get credit for pitching behind a bad defense correct? (The Big Red Machine pitchers get hurt if that is the case, since they all had a good defense behind them). How does that adjustment work?
Right. From Rally's site:

Def - Estimated runs saved by this pitcher's defense, using TotalZone range, DPs, OF arms, and catchers, prorated by the number of balls in play allowed by the pitcher

Awesome. This gets away from having to use something like FIP or xFIP or tRA to extract pitcher performances from fielding performances.
Likewise for the Leverage Index - do pitchers get "extra credit" for a high Leverage Index?
Relievers get partial credit for leverage index. I think they get bonus for any leverage above 1.5 or so. I don't know exactly how Rally does it, but Tango talks about how he does it here:

I'm doing the same thing now with my Reds stuff, though I do it for all relievers and Tango only does it for closers. I like my way, because special relievers like Marmol get extra credit, which I think is appropriate.

Thanks for the great conversation!


  1. Cool writeup. Sometime I'll have to create a page and link to posts like this for people who have questions about the system.

    About leverage - relievers get credit for 1/2 of the extra leverage beyond one - If Mo Rivera has a 1.8 then I give him (1.8+1)/2 = 1.4 credit.

    The explanation is not an easy one, I had a hard time getting it straight, but it really depends on chaining - if Mo gets hurt you don't give his 1.8 leverage innings to a replacement level reliever, you give them to his above average setup man. And the setup man's 1.3 leverage goes to an average pitcher, and so on, with the "replacement level" guy added to the roster while Mo is out goes to 0.3 leverage mopup work.

  2. I once looked at how many players were above certain WAR cutoffs and came up with my own subjective scale:

    0 WAR: useless
    2 WAR: average
    4 WAR: borderline All-Star*
    6 WAR: will get MVP votes
    8 WAR: top MVP candidate**
    10 WAR: No-brainer MVP***

    * There are about as many All-Star slots as players to hit 4 WAR in a season.

    ** That is, players who SHOULD be MVP candidates, not those that actually are.

    *** Unless, of course, there happen to be multiple guys up here, which is rare.

  3. @Rally, thanks. The leverage stuff took me a while to get straight as well. I'm doing something similar with my Reds stuff, but for leverage over 1 I'm giving bonus credit if a reliever is better than a 0.570 pitcher. If so, the pitcher gets normal credit vs replacement, plus credit for anything they do over 0.570 times their leverage-1. That's pretty much in keeping with what Tango does, I think.

    @Sky, thanks for the scale--that looks right and is a great shorthand.