Thursday, October 18, 2007

Player Value, Part 2a: Offense - Run Estimation

To view the complete player value series, click on the player value label on any of these posts.

Player Value Series Contents

Runs Estimation

The first thing we have to decide upon when trying to quantify a player's offensive value is how to estimate the number of runs a player contributes to a team in a given season. And there are a lot of options available to us. Fortunately, there are some clear recommendations that we can make.

Brandon Heipp (aka U.S. Patriot) has a great set of essays on his site that give the low-down on a variety of run estimation techniques. I find his work very convincing and readable, so I'm not going to re-hash it. But I do want to go over the methods that I have used in light of his (and others', especially Tom Tango's) work. Readers will undoubtedly note that what follows is heavily influenced by those two individuals, though I've tried to reconstruct what follows myself as much as possible. If anything is directly lifted, I've linked to the source.

There are two primary approaches to estimating run production of teams or players: models and linear weights. Let's profile them one at a time.

Models

Models try to actually describe how runs are created mathematically, and therefore have the advantage of being constructed in a manner that reflects something about the nonlinear fashion in which teams actually score runs. They have a disadvantage in that a) they rely on the ability of their creators to actually model how teams score runs mathematically, which is no easy feat, and b) they are (usually) only applicable to estimating team runs, rather than runs provided by individual hitters, because they assume interactions between on base and advancement skill. Unless a player hits a home run, a player has to interact with his teammates in order to generate runs by either getting on base and then getting knocked in, or by advancing runners already on bases.

There have been a number of models that have been proposed over the years. Perhaps the best known is Runs Created, which was developed by Bill James. While revolutionary for its time, Runs Created, even in its most recent iterations, suffers from a number of flaws that have caused it to fall out of favor of many analysts (perhaps the most significant of these is how it values home runs). If you are interested in the details, Tom Tango has an excellent article series that breaks down some of the problems with runs created. Furthermore, Brandon Heipp has a more historical look at runs created, along with detailed discussion of some of its problems. On the basis of their compelling work, I have decided to discontinue the use of Runs Created on this site. I don't think it's a terrible stat, but I think it is critically flawed under certain situations...and there's a much better alternative out there that is just as easy--if not easier--to use (if nothing else, it's far more intuitive):

Base Runs

Among the statisticians that I read frequently, the most well regarded runs estimation model these days is David Smyth's BaseRuns. BaseRuns has strong theoretical underpinnings, is more accurate and flexible than Runs Created across a variety of run environments, is arguably much more intuitive and approachable than RC. In short, there's little reason to use anything else--at least for the time being.

BaseRuns is based on this simple equation, which could arguably be called a truism (this part is essentially lifted from Tango's article--sorry!):

runs scored = [batters reaching base] * [scoring rate] + home runs

This makes sense, right? Pretty straightforward. No other way of producing runs that I can think of...you either get on base and have a chance of scoring based on the performance of your teammates, you advance your teammates in such a way to increase their chances of scoring, or you smack a homer. Anyway, that equation can be rewritten as this

runs scored = [batters reaching base] * [(successes)/(opportunities)] + home runs
or
runs scored = [batters reaching base] * [(successes)/(successes + failures)] + home runs
or
runs scored = A * (B/(B+C)) + D

Now, we can directly quantify the number of batters that get on base (A): hits - hr + walks + hbp, right? And we can certainly count the number of home runs (D). And it turns out that "failures" (C) above is essentially just the number of team outs. So the only thing that we need to come up with is the "successes" (B) in the scoring rate term.

Smyth proposed a B term that works remarkably well, but additional work on this issue over the years has generated greater accuracy. Probably the best work on this front (at least that I've seen) is by Tom Tango, who empirically generated coefficients for the B term using 1974-1990 Retrosheet data, the results of which can be found in this table.

Nevertheless, either because these data are based on a lower runs environment, or (more likely) because the stats I have access to don't have some of the terms that Tango had in his retrosheet study, forcing me to leave some things out (most notably reached base on errors), my experience is that these data underestimate total runs slightly in modern baseball. Therefore, I have used a spreadsheet (found here) by Brandon Heipp to apply slight multipliers (~10% increase) to Tango's B coefficients to have the output match the runs scored based on 2003-2007 National League totals. The assumption here is that there will be sufficient correlations among the events I have and the events I don't have that this will result in a pretty close estimate of runs. Anyway, here's "my" final BaseRuns equation:

(Updated 12/1/07--corrects an issue with SH's, adds GDP's)

BsR = A * (B/(B+C)) + D
where:
A = H - HR + BB + HBP + 0.08 SH
B = 0.883 1B + 2.370 2B + 3.812 3B + 1.995 HR + 0.063 (BB - IBB) - 0.588 IBB + 0.198 HBP + 0.884 SH + 0.989 SB - 1.445 CS -1.445 GDP - 0.005 (AB - H - K + SF) - 0.069 K
C = AB - H + SF + 0.92 SH
D = HR
(note: the 0.08 SH in the "A" term, and the 0.92 SH in the "C" term, is due to those times when runners reach base on a sacrifice due to a fielder's choice)

I know it looks complicated, but it's not! Remember, A is just "men on base," C is just "outs", and D is just "home runs." The only voodooish part of the equation is B, which tries to estimate the degree to which runners will advance based on each of the different offensive events.

How well does base runs work? Well, here's a table showing 2007 NL teams and how well base runs, as defined above, predicts their runs scored:

Team ActualRS BsR
+-Diff %Diff
Arizona 712 705.7 6.3 0.9%
Atlanta 810 796.6 13.4 1.7%
Chicago_Cubs 752 753.8 -1.8 0.2%
Cincinnati 783 793.1 -10.1 1.3%
Colorado 860 851.8 8.2 1.0%
Florida 790 814.6 -24.6 3.1%
Houston 723 737.4 -14.4 2.0%
LA_Dodgers 735 741.7 -6.7 0.9%
Milwaukee 801 800.2 0.8 0.1%
NY_Mets 804 814.2 -10.2 1.3%
Philadelphia 892 899.4 -7.4 0.8%
Pittsburgh 724 713.3 10.7 1.5%
San_Diego 741 718.7 22.3 3.0%
San_Francisco 683 675.7 7.3 1.1%
St._Louis 725 722.1 2.9 0.4%
Washington 673 679.5 -6.5 1.0%

Overall, base runs explained the 2007 NL totals to within 4 runs! This was an exceptionally good result, but looking back from through the '03 season, it was never off by more than 1.2% across an entire season. Granted, this work slightly circular, given that I forced the B term to match the '03-'07 totals, but the accuracy I'm citing here has been repeated elsewhere (including a recent application with the Israeli Baseball League!). Furthermore, other studies (here's one by Tango) have demonstrated that Base Runs is accurate across a wide variety of conditions. Runs Created and Linear Weights fail the further you move away from league averages.

In sum, when estimating team or league offense, the best available approach is Base Runs. No runs estimation approach is simple, per se, but base runs is reasonably approachable, accurate, and flexible to both run environments and datasets (with a bit of caution you can add or subtract terms from the model depending on your needs and your dataset). For more on base runs, I highly recommend reading the articles by Patriot and Tango.
Update 12/1/07: Here's a version of the above equation that I've fudged to match '03-'07 MLB totals (not just NL totals):
A = H - HR + BB + HBP + 0.08 SH
B = 0.905 1B + 2.429 2B + 3.908 3B + 2.045 HR + 0.065 (BB - IBB) - 0.602 IBB + 0.203 HBP + 0.907 SH + 1.014 SB - 1.481 CS - 1.481 GDP - 0.005 (AB - H - K + SF) - 0.071 K
C = AB - H + SF + 0.92 SH
D = HR
Update: 12/27/08: Updated for '04-'08 MLB totals:
A = H - HR + BB + HBP + 0.08 SH
B = 0.903 1B + 2.422 2B + 3.897 3B + 2.039 HR + 0.065 (BB - IBB) - 0.600 IBB + 0.202 HBP + 0.905 SH + 1.011 SB - 1.477 CS - 1.477 GDP - 0.0050 (AB - H - K + SF) - 0.0708 K
C = AB - H + SF + 0.92 SH
D = HR

Linear Weights

While base runs works great for estimating team or league offense, it doesn't work well when applied individual hitters. The reason is that the on-base term, A, is designed to interact with the advancement term, B (this is the case with most models, including many versions of runs created). When you're dealing with an individual player, his on-base ability can't actually interact with his advancement ability to produce a run unless he hits a home run. Therefore, in order to assess individual hitter contributions, it's best to use a run estimator that doesn't try to describe interactions--otherwise you will tend to overestimate production among high OBP and high SLG individuals. That's where linear weights comes in.

Linear weights refers to a set of coefficients that you can apply to offensive events in order to estimate the runs scored by a player or a team. They can be generated any number of ways, including empirically (by measuring the average effect of each offensive event on runs scored in innings), or via models.
Given that we have such a nice run estimation model in BaseRuns, and that linear weights can change drastically in different hitting environments (i.e. good vs. bad offensive team or league), I find that model-derived linear weights are the most useful approach, as it allows us to customize our linear weights to fit a particular context.

One process of extracting linear weights from a model goes like this: we can use base runs to model a particular offensive environment, like I did with the '03-'07 National League. To see how many runs each single (or double, or triple, or walk, etc) typically adds in this context, we can take our '03-'07 National League data, and then add one single to those totals. The difference between the initial number of estimated runs and the number after we've added this single is the typical value of a single in that run environment. Repeating this process for each event (single, doubt, walk, hbp, etc), we can generate custom linear weights for our chosen runs environment.

Using Heipp's Base Runs spreadsheet to automate this process, I extracted this linear weights equation for the '03-'07 National League:

LWTS = 0.498 1B + 0.821 2B + 1.134 3B + 1.433 HR + 0.320 NIBB + 0.215 SB - 0.314 CS + 0.349 HBP + 0.179 IBB + 0.128 SH - 0.314 GDP - 0.097 (AB - H - K + SF) - 0.111 K

Notes: The coefficient in front of each event is the (estimated) average number of runs that each event produced in the NL from '03 to '07. [AB - H - K + SF] is a measure of non-strikeout outs. These values may differ slightly from those you may see around the 'net, but they are pretty close to those reported here by Tom Tango. The reason they differ is because they were generated using a different dataset--the '03 to '07 National League.

Because there are no interactions like in the actual BaseRuns equation, one can apply these numbers to individual hitters. Simply multiply each of the coefficients by the hitter's counting stats, and presto, you have an estimate of that hitter's total offensive contributions to the team! We could also apply these numbers to teams, and get an estimate that is probably pretty close to what we had with the BaseRuns equation. However, because linear weights essentially only tell us the average value of each event, we will often over- or underestimate runs scored on teams depending on each team's unique runs environment (offense quality, park effects, etc). I'll talk more about using custom linear weights for teams in a future article.

It's worth nothing that some versions of linear weights report values of outs that are roughly 0.16 higher than those I've used here. In those equations, linear weights give a value that is +/- average. I find the above versions, which give you an absolute measure of offense, to be more useful as a starting point. However, these have a slight disadvantage in that they only give you an estimate of the number of runs created by getting runners on base and moving them over, but not in avoiding outs. They do account for the fact that an out advances runners to the tune of about 0.10 runs less than other plays. Yet they don't account for the fact that they also make innings end sooner, reducing the offensive potential of subsequent batters. To "get around" this, it's most appropriate to use outs, rather than plate appearances, when converting these numbers to a rate. More on that in the next piece.
Update (12/1/07): Here's a set of linear weights that are extracted based on MLB '03-'07 totals, as opposed to just those of the National league. They're not much different, really...:

LWTS = 0.508 1B + 0.835 2B + 1.152 3B + 1.439 HR + 0.328 NIBB + 0.217 SB - 0.318 CS + 0.357 HBP + 0.185 IBB + 0.129 SH - 0.318 GDP - 0.099 (AB - H - K + SF) - 0.113 K

Update (12/27/08): New equation updated for '04-'08 totals:

LWTS = 0.507 1B + .834 2B + 1.152 3B + 1.439 HR + 0.327 NIBB + .218 SB - 0.318 CS + 0.357 HBP + 0.184 IBB + 0.130 SH - 0.318 GDP - 0.099 (AB - H - K + SF) - 0.113 K
Coming up next... what baseline, if any, should we use when evaluating hitters? And how do positional adjustments factor into player evaluation? Case studies include '07 Reds hitters.

11 comments:

  1. Why aren't things like errors and wild pitches considered when trying to estimate how many runs a team has scored? I know they have value and they contribute to the scoring of runs, but they are simply dismissed in the base runs formula (unless I am missing something). I read on the Baseball Fever that errors have about the same run-value as singles and should be considered as such. This was posted by TangoTiger. (I saw in another article that errors were worth .49 runs) The average NL team made right at 100 errors in 2007. The average NL team made 50 wild pitches in 2007. If you add 100 singles to the team stats (for the errors) and then change 50 singles to doubles (for the wild pitches - and often-times wild pitches occur with more than one runner on base) and run the Base Runs formula you will find that it is off by approximately 75 runs. IOW it estimates that teams should score about 75 more runs than if you run run the Base Runs formula without regard to errors and wild pitches. This would indicate to me that either some of the positive coefficients are too high or some of the negative coefficients are too low. I guess my question is this: how does the Base Runs formula account for errors and wild pitches? Thanks for any light you could shed on this for me.
    Here is a quick anecdote. I once went through five seasons worth of box scores for the Reds (2000-2004) and summed up the statistics for games in which the opponents made an error, and games in which they didn't. I have long since lost that data, but in games in which the other team made an error the Reds scored roughly 10% more runs than Runs Created would suggest. And they scored about the same 10% less than expected in error-free games. If you summed the two it was a very close estimate. It would seem that errors and wild pitches would not be that hard to track.

    PS Yes EdE had a fine second half. =)

    ReplyDelete
  2. Hi,

    It's a fair critique. Errors, in particular, are available for team stats, and if you look at Tango's version of base runs, they are included. That's one of the advantages of baseruns--it's flexible enough that you can add and remove terms as needed based on what you have to work with. In this way, there really isn't one The Base Runs Formula, but they're all inter-related...and some are better than others.

    I didn't include them because my primary goal in developing the above base runs equation was to use it to deliver accurate linear weights that could be used on hitters. Therefore, I only included statistics that are typically available for hitters...while I'd love to have reached base on error and advanced on error terms in my base runs model, I can't get that without parsing retrosheet data, and I haven't gotten that to work yet..

    If my goal was instead to just predict team runs, I would definitely include the error term. This, in fact, is what was done in that post I linked to about the Israeli Baseball League, and it made a huge difference because of the rather absurd error rates in that league.

    Tango had a recent post on the runs value of an error (look at the '07 fielding data post that I made...I think it links to it) in which he noted that a missed play--be it an error, or a ball a player didn't get to but another defender would, is worth ~.8 runs. You can trace this to the marginal value of a single (~0.48) plus the marginal value of an out (-0.27 runs).

    Or you can do something like what you did: go back and compare all games where hits were the same but teams differed by one error, average 'em up, and you'll find that the team with the error gave up ~.8 runs more than the other teams (so says Tango anyway...haven't done it myself).

    Note, because the linear weights I report in this post uses an absolute scale, it doesn't have the same out value as Tango's marginal linear weights. But since we usually measure fielding on a marginal scale (i.e. vs. average), those are the numbers to use.
    -j

    ReplyDelete
  3. Just getting around to reading this. Interesting stuff. I've just been getting interested in BaseRuns recently, and I've used Linear Weights off and on for the last couple of years. Usually, if I quote a run estimator for a player though, it's the RC from Baseball Reference, but that's only because of convenience. So, once you calculate everything and post it, I'll just refer to you, okay? cool. :)

    Anyway, if you are interested in including ROE in your run estimators, BRef has them in the splits for both individuals and teams. I don't remember what is behind the subscription wall and what isn't, but if you need help getting to them, let me know and I'll get you a sample of them.

    ReplyDelete
  4. Joel, thanks for the tip on RBOE's at B-ref! I checked it (and everywhere else I could think of) to see if those numbers were present, but only on the main page, and not the splits page.

    I may take you up on getting those data at some point, but for now, don't worry about it--I doubt I'm all that far off without them. -j

    ReplyDelete
  5. Retrosheet has the ROE everywhere.

    ReplyDelete
  6. Justin,

    It's not clear to me precisely what you're looking to measure here.

    Clearly, nonlinear models such as Runs Created and Base Runs fail to model individual run scoring, since they have the individual's on-base ability interact with his own runner advancement ability.

    But Linear Weights makes the opposite mistake. It assigns a fixed run value to each offensive event based on the league average or (in your case) marginal value. Thus, it fails to take into account the actual team the player is on. The more your teammates get on base, the more valuable your double or home run potentially is, and vice versa. Run values of offensive events do depend on what your teammates will do with them.

    So it would seem we should assign run weights to offensive events on a per-team basis.

    But if we're really trying to assess a player's offensive contribution, shouldn't we take into account the fact that all singles, or all home runs, are not created equal? A single with two outs, no men on base and the pitcher on deck is worth less than a single with bases loaded.

    Shouldn't we distinguish between hits with men on base and hits without, or between hits in high-leverage situations and those without, or by the number of outs, or by the on-base percentage of the batters on deck? But then, of course, the mathematics gets increasingly complex.

    Ultimately, as I said to begin with, it depends on what you're really trying to measure. If you want to "estimate the number of runs a player contributes to a team in a given season," I'm not sure Linear Weights is more accurate than Base Runs. It's just differently biased.

    ReplyDelete
  7. Hi there,

    I disagree that linear weights is just the other side of the coin from base runs. Base runs are clearly inappropriate to use on players because players cannot interact with themselves. This is well-established and can lead to massive overvaluations for players with high OBP and SLG.

    However, your criticism about linear weights is correct. It's not just the opposite bias, but it is a problem. Custom linear weights for teams are ultimately the way to go. Practically, it doesn't make a huge difference compared to using custom weights for a particular league (which is what I've posted here)...maybe 5 or 6 runs difference when you "move" an extremely productive hitter from a good offensive team to a poor offensive team in MLB. Nevertheless, that's something I'm planning to talk about in the last part in this series, along with park factors. One step at a time...

    As for the situation specific values, my tendency is to not go that route. Past performance in those situations has very little predictive value compared to overall performance, and yet those adjustments can massively influence the values we attribute to those events. It's a give and take, but I'd rather avoid that kind of thing for the time being. Besides, it makes this stuff impossible to do from a single row of player data, and part of my goal here (written or not) was to figure out how to accurately assess player value from easily-accessible data.
    -j

    ReplyDelete
  8. I think we can summarize it this way:

    For an individual player...

    - Base Runs estimates how many runs would be created per player by a team of players with the same statistics.

    - Linear Weights estimates how many runs the player would have created for a league-average team.

    - Linear Weights with team-specific weights estimates how many runs the player likely created for his own actual team.

    So if you're looking to compare how good a player he is in a team-neutral way, the league-average linear weights is actually probably the way to go. But if you want to estimate how many runs he created for his team, I think you'd need team-specific weights.


    part of my goal here (written or not) was to figure out how to accurately assess player value from easily-accessible data.

    Naturally. You could also add the question as to whether it's even fair to a player to evaluate his performance in the context of his team, which he has little control or influence over.

    ReplyDelete
  9. So if you're looking to compare how good a player he is in a team-neutral way, the league-average linear weights is actually probably the way to go. But if you want to estimate how many runs he created for his team, I think you'd need team-specific weights.

    I completely agree with your summary. Well said.

    Naturally. You could also add the question as to whether it's even fair to a player to evaluate his performance in the context of his team, which he has little control or influence over.

    That makes a lot of sense to me too, and it's something I talked about with Skyking a few weeks back. I've never been completely sure which direction I wanted to go with it either. If I'm just interested in comparing players within the Reds, for example, it's definitely best to use the custom team linear weights. But for comparing hitters across teams, I do lean toward using league-wide linear weights. Probably. Still not sure on that one though.
    -j

    ReplyDelete
  10. Justin-

    This is just awesome. I've never really taken the time to read a comprehensive breakdown of this stuff, and I find it fascinating. Your writing is clear and concise, and easy to follow. Can't wait to read the other ones you have posted!

    One quick typo I noticed, which, considering the effort you put into this, I thought you might want to know about:

    "is more accurate and flexible that Runs Created" should read "flexible than Runs Created" methinks.

    Thanks!

    -Dave

    ReplyDelete
  11. Thanks Dave, glad you enjoyed it and found it useful!

    Thanks also for the typo catch. I'm terrible about typos and missing words, despite my proofreading passes. And that one in particular gets me all the time. :) -j

    ReplyDelete