Wednesday, March 01, 2006

Baseball Statistics QuickSheet

Last Updated: 4/18/06

I like statistics. I make no apologies for it. While they're not the end-all in player evaluation, statistics do provide long-term, objective measures of player performance. I've always enjoyed playing with baseball numbers, and find the new baseball stat research that is being done these days to be really fascinating. Nevertheless, one of the biggest challenges I've encountered when learning about many of the new baseball statistics is finding out a) what new acronyms mean, b) how statistics are calculated and should be interpreted, and c) where I can find these statistics calculated for actual players!

The purpose of this quicksheet is to provide a central location for this sort of information in a manner approachable to someone who is familiar with traditional baseball statistics (batting average, ERA, etc) but not necessarily the newer stuff. It also will serve as something of a glossary for visitors of my site, as I will use these statistics in my posts. It is not intended to be a comprehensive resource, as there are professional sites that do a better job of that (see Hardball Times and Baseball Prospectus below).

Locations for Player Statistics/Information
  • The Baseball Archive. Here you can download statistics for all players and teams since 1871 in Excel or Access formats. It's all the basic stats (nothing fancy), but the beauty of this site is that you can quickly access a huge amount of data in spreadsheet format and start doing your own calculations. Want to calculate FIP on all pitchers, past and present? Easy, type in the formula in excel, drag it down, and you've got it. Does include ZR going back to '00 (though '05 is currently missing).
  • The Baseball Cube. If you can survive the tremendously annoying pop-up windows and ads, it's an excellent resource that has data on a ton of players. You'll get not only MLB totals, but also stats from college, the minors, and even Japanese Leagues (see Tuffy Rhodes' entry for an example). Very useful when your team picks up a prospect. Basic info, though it does have OBP, SLG, & OPS for hitters and WHIP, K/9, BB/9, & HR/9 for pitchers.
  • Baseball Prospectus. Unfortunately this is a pay site, so many people don't take advantage of their excellent content. Furthermore, they have a nasty habit of not fully spelling out how some of the statistics they put forth as "the best" are actually calculated, and they are also somewhat selective (at times) with what data they do release. I imagine money has something to do with it. Nevertheless, their work, in general, is superb. If you subscribe, you can gain access to their tremendously useful VORP, WARP, and EqA calculations, as well as their fascinating PECOTA player projection work. Not to mention a series of consistently fascinating articles about all things baseball.
  • Baseball Reference.org. Excellent resource for newer statistics. Here you'll find career stats for MLB players, including many newer statistics. Batters: OBP, SLG, OPS, OPS+, RC, RC/27, and attempts at park-adjustments for these numbers. Pitchers: park-adjusted ERA, ERA+, and WHIP. I usually start with this site, as it has direct links to player pages on Baseball Cube, ESPN, etc, should you need more info.
  • Fan Graphs. Not a site for statistics, per se, but they do a great job of graphing player performance across and within seasons for a variety of players. Graphic visualization often does wonders for how well I'm perceiving the data.
  • The Hardball Times. One could call this the poor man's Baseball Prospectus, though that's not fair to the HT Staff. They put out a great deal of high quality content. Great place to go to get all sorts of stats (though they emphasize only 2005 at this point), including defense independent pitching stats, win shares, etc. Also includes frequent statistical articles, which are often both very good and very approachable. Finally, they have a superb statistics glossary that actually includes formulae for most "simple" statistics!
  • Baseball Think Factory. While not a statistics depot, this is a terrific site to find links to new statistics articles and commentary, as well as general baseball news. Also the first non-Reds site to link me. Twice! :)
  • CBS Sportsline. Of all the professional news sites, this one has perhaps the widest variety of situational stats and splits. Want to know Eric Milton's ERA in the 4th inning in 2004? They have that split ready for you (it was 4.09). Fun with small sample sizes!
  • ESPN.com. ESPN is a good midseason source for Stats Inc.'s ZR. They also have a good selection of splits for each player.
Batting Statistics

Batting Statistics are among the most familiar to folks, but I may use a few that are a bit less well-known. Here they are:
  • PA - Plate Appearances. For some reason, when folks started keeping track of hitting statistics, they decided to ignore the "mistakes" by pitchers, such as walks, hit-by-pitches, etc, and just keep track of At Bats (AB's). However, I will take a cue from Baseball Prospectus and report everything in plate appearances, as it seems more relevant to me. PA = AB + BB + HBP
  • K/BB - Strikeouts per walk. While not exactly opposites of one another, I think this can be an interesting diagnostic. Guys who are either terrific contact hitters or very patient hitters tend to have low ratios. Guys who don't walk much or strikeout at very high rates tend to have high ratios. A strikeout is the least productive out you can make, aside from getting caught stealing, so they are to be avoided. A walk, while not as valuable as a single (unless no runners are on) can make a huge difference in a ballgame. Walks get runners on base and players that take a lot of walks make pitchers throw a lot of pitches.
  • SB/% - Stolen Base / Percentage. I will tend to report stolen bases along with the stolen base percentage, as neither means much without the other. I enjoy the running game, but only with guys who have high steal percentages (see Barry Larkin for the definition of this; lifetime SB%=83%; in 1995 he went 51-5…91%). Say you have zero outs and a runner on first. The average runs scored in an inning with this situation in 2005 was 0.90. If the runner steals second, resulting in a runner on second with zero out, the average runs scored increases to 1.14, a 0.24-run advantage. If, however, the runner gets caught stealing, resulting in none on with 1 out, the expected runs drops to 0.28, a .62-run drop! Therefore, in order for stealing to be a productive strategy, you need to be able to steal successfully 0.62/0.24=2.6 times for every time you get caught stealing. This makes for a minimum steal percentage requirement of 72% for it to be worthwhile (if you start with one on and one out, the minimum percentage changes to 76%. 1 on, 2 outs? 68%). There are guys who steal at a rate higher than this, but unfortunately lots of guys steal at a rate lower than this, effectively running their teams out of ballgames.
  • OBP - On Base Percentage. I make the conscious decision not to report Batting Average, as it is a tremendously overvalued statistic that shows remarkably little correlation with scoring runs. Instead, I report On Base Percentage (also called On Base Average), which is calculated as: OBP = (H + BB + HBP) / (AB + BB + HBP). As you can see, it's simply a measure of how often players get on base. Good OBP is in the neighborhood of 0.350, with 0.400 being outstanding.
  • SLG - Slugging Percentage. Slugging percentage is simply calculated as: SLG = (H + 2B + 2*3B + 3*HR) / AB = (Total Bases) / AB. It gives an indication of the power of a hitter, as well as something about their ability to get on base via hits. Good SLG is around 0.450, with 550+ being outstanding.
  • OPS - On Base Plus Slugging. Simply summing OBP and SLG results in this statistic, which provides better prediction of runs scored than these two measures individually. Good OPS's start around 0.800, with 0.950+ or so being outstanding.
  • GPA - Gross Production Average. This statistic is used by the Hardball Times, based on studies that have found that the best prediction of runs scored comes when you weight OBP by 1.8 over SLG. They then divide it by 4, which puts it on roughly the same scale as traditional batting average. Average players have GPA's around 0.265, with 0.300+ being outstanding. Hence the formula: GPA = (1.8*OBP + SLG)/4
  • EqA - Equivalent Average. This is the first complicated statistic I report, and I cannot replicate it. It's basic formula from the Baseball Prospectus folks is here:
             H + TB + 1.5*(BB + HBP) + SB
    EqA = ----------------------------
    AB + BB + HBP + CS + SB/3
    As you can see, it incorporates information about both power (total bases) as well as getting on base (hits, BB, HBP), and even includes information about stolen bases. Therefore, it can be considered a measure of a player's total offensive contribution. On top of this, however, the BP folks normalize it to the ballpark and league in which a player is hitting. Like GPA, it can be evaluated along the same scale as traditional batting average. 0.265 is average, 0.300 is outstanding.
  • VORP - Value Over Replacement Player - This is an extremely complicated statistic that I freely admit I do not fully understand. Essentially, it incorporates a player's total offensive production like EqA. It then compares this production to a generic "replacement player," who is basically the performance a team could expect from the best available player after a starting player goes down due to injury. Any major league team should be able to get a replacement player for that position (it is position-specific), so the goal here is to have starters better than VORP. VORP is reported in terms of runs +/- a replacement player, corrected for park and league. A VORP of 20 is a decent score, and indicates that a player contributed 20 more runs than a replacement player in that season.
  • GB/OF/LD - % Ground Balls / % Outfield Flies / % Line Drives. The Hardball Times released an interesting table of ball outcomes for each player in their '05 Annual, and I can't help but report these values from time to time. I'll just report the percentage values like: 25/30/16. Since GABP favors fly balls and kills ground balls, these statistics can be particularly relevant for evaluating how our players will do in our park.
Fielding Statistics

Fielding statistics are currently a subject of a great deal of research among the sabrmetric people. For several years, Mitchel Lichtman, having paid several thousand dollars for amazing play-by-play data, was publishing UZR, which is widely heralded if not diefied as the best (though most complicated) fielding statistics system available. Unfortunately, the Cardinals hired the guy, and he is no longer publishing this information. This has spurred several new systems to be developed, and it's not yet clear which ones are the best. Therefore, I will report a number of them in hopes of developing a composite view of a player's effectiveness. The following stats will be reported for 1B, 2B, 3B, and SS, as well as all OF positions.
  • DI's - Defensive Innings played. This is a more relevant indication of time at a position than games played, as a player who comes in as a defensive replacement in the 8th inning should not get the same credit as a someone who plays the full game.
  • Dewan+- - John Dewan's +/- range system from the Fielding Bible. The values equal the number of plays the player made above average at that player's position (a negative value indicates the player made fewer plays than expected). For first and third basemen, as well as outfielders, I use his "enhanced" +/- numbers, which adjust for the number of bases saved (a ball that gets by one of these players often can result in a double, rather than just a single). Furthermore, since Dewan reports his values in terms of the total plays made relative to average, I adjust his values for playing time by estimating the plays saved as if each player had played every inning of 150 games. This allows me to compare defense of regulars and reserves in an apples to apples fashion.
  • Dial ZR - Chris Dial's ZR Translations. Chris Dial translated Stats Inc's Zone Rating (ZR) into a +/- runs statistic for all ballplayers, and was kind enough to release his spreadsheet containing all 2005 players, along with an article on how to use it. I will report RScal+ (+/- Runs Saved if played the whole season) from his spreadsheet, though you can also find a playing-time adjusted value (RSpt). He also includes an adjustment for arm rating for outfielders (Totalcal), but I will examine arm stats separately in my pieces.
  • Gassko - David Gassko's Range statistic. This stat uses the number of ground balls, line drives, and fly balls that a team allowed to estimate how many balls each player on the team should have gotten to, and then compares that to the actual number of plays made by that player. It is not available online, but is available in the Hardball Times Baseball Annual 2006. It is reported as runs +/- average at each position, assuming the player plays 150 games.
  • D*G - Dial x Gassko. David Gassko did a regression that indicated that a combination of his and Dial's systems may be the best fielding metric available. The equation was: D*G = 0.67*(Dial's ZR) + 0.33 (Gassko's Range). So I calculate this metric and report it.
  • Pinto - David Pinto's Probabilistic Model of Range. For a few years now, David Pinto has been using PBP data to gather range information on fielders. He reports these data, ultimately, as Runs Saved/27 outs above average (I *think* this means 27 outs recorded by that player, not per actual game...?).
  • Davenport. Clay Davenport uses a fielding system (I don't know how it works) to generate the fielding values reported in Baseball Prospectus. I've seen some studies that indicate that these values don't perform very well, but I'm not willing to discount them yet...particularly because they are one of the few stats for which I can find multiple years of stats. Like most of the others, these statistics are reported as runs +/- average. Since they not adjusted for playing time (they are given along with the number of games played at that position), I adjust these numbers as if the player had played 150 games at that position.
  • After these statistics, I will report a few additional items depending on the player's position.
    • For first and third basemen, John Dewan (Fielding Bible) reports player performance on bunts according to a scoring system (scores move towards 1 when a player prevents runner advancement on a bunt, score moves towards 0 when a player boots the ball or allows a hit). From this score, I will subtract the league average and multiply times 100 (roughly analogous to the percentage values below) to get a +/- value for easy comparison.
    • For second basemen and shortstops, I will report John Dewan's (Fielding Bible) GDP value, which is the proportion of double plays converted per opportunity in which the player was involved. Again, I subtract the league average GDP score from this value and multiply by 100 to get an easy to compare +/- score.
    • For outfielders, I will report two statistics on throwing arms:
      • DewanHold - In the Fielding Bible, John Dewan reports the percent of runners who took an extra base on an outfielder when such an opportunity presented itself. To convert these values to a more easily understandable +/- score, I've taken this number and subtracted it from the league average runner advanced percentage at that outfield position. Note that this changes the meaning of the value from "runners advanced" to "runners held." For example, last year, runners advanced 36.3% of opportunities on Adam Dunn. The league average was 38.1%. Therefore, subtracting 36.3 from 38.1 results in Adam Dunn "holding" 1.8% more runners than average (including kills).
    • Catchers have a few special stats (I also omit several stats from other sources that either provide no info on catchers [fielding bible], or it is unclear to me how to interpret their numbers [zone rating, gassko, davenport]):
      • PintoGB - On David Pinto's PMR page, he released evaluations of catcher range on ground balls only. In general, I think a catcher's range is fairly irrelevant compared to their other "fielding" responsibilities - working with the pitcher, calling the game, blocking balls in the dirt, throwing out would-be base-stealers, etc. Nevertheless, it's nice info to know. Therefore, this value gives an indication of the number of runs saved (vs. average) on bunts and very softly-hit ground balls.
      • Passed Balls/150g - Number of passed balls, adjusted as if the catcher played 150-nine inning games. I'd rather present some statistic about the proportion of balls in the dirt successfully blocked, but I haven't been able to find that info (yet). Nonetheless, this should give an idea, however poor, as to the catcher's ability to catch the balls the pitcher throws.
      • Caught Stealing - the number of baserunners he gunned down.
      • CS% - the percentage of would-be base-stealers the catcher threw out. The break-even point, at least with nobody out, is 28% (100%-72%; see section on stolen bases above). Less than that, and the catcher is permitting the other team to profit by stealing bases. More than that and the catcher has caused opposing teams to run themselves out of baseball games. That's a good thing.
      • ERAeffect - A stat that I "invented." ESPN.com provides a statistic called CERA, which is the ERA of all pitchers when that catcher is in the ballgame. To compare this to the team era, I simply do this: ERAeffect = CERA - TeamERA. A negative ERAeffect means that the team's pitchers had a smaller ERA when that catcher was playing than they did over the entire season. The stat is somewhat limited, but you'd expect that a catcher that calls a brilliant ballgame or really handles his pitchers well might consistently achieve a negative ERAeffect (best compared over multiple seasons). The converse might be true for a catcher who is generally a disaster with his pitching staff.
Pitching Statistics

Within the past decade, how we look at pitchers has turned completely upside-down with the discoveries associated with Defense Independent Pitching Statistics (DIPS), initiated by Voros McCracken. These statistics were designed to separate a pitcher's contribution to defense from that of the fielders. Initial attempts were to look only at those statistics that pitchers had sole responsibility for -- strikeouts, walks, and home runs -- and see how well you could predict pitching performance from them. And it turned out that this could be done very well! However, in the process of doing this, an amazing discovery was made -- with a few exceptions, pitchers exerted almost NO control over whether balls hit into play became outs or not.

If you haven't encountered this before, I'm sure you're incredulous right now -- I certainly was when I first encountered it. Here are some good primers on DIPS:
Here are the statistics I will use most often on this site:
  • IP - Innings Pitched, how many.
  • K/9 - Strikeouts per nine innings. Purists often dislike strikeouts, but they are very important stats for teams with poor defense and/or run-inducing ballparks. League average is around 6.
  • BB/9 - Walks per nine innings. League average is around 3.3.
  • HR/9 - Home Runs per nine innings. League average is around 1.1. A key statistic for the Reds, given the HR-friendly conditions of their otherwise pitcher's park.
  • BABIP - Batting Average on Balls hit Into Play -- BABIP = (H-HR)/(AB-K-HR). Or, if you're using Lahman's database and do not have access to AB's, you can use BFP (Batter's faced by pitcher; used in BABIP = (H-HR)/(BFP-K-BB-HBP-HR). This is the average for any ball hit into fair play that is not a home run. It is effected primarily by defensive ability of the team, as well as luck. Please note that some sites report a different BABIP, calculated without subtracting the home runs. Since we use BABIP to understand why ERA deviates from FIP, it makes sense to use the formulae I describe here. Average BABIP is usually around 0.290.
  • ERA - Earned Run Average, or Earned Runs per nine innings. The classic pitching statistic, calculated as: ERA = (earned runs) / IP * 9. League average is around 4.40 or so. While this does not penalize a pitcher for defensive miscues (errors), it can penalize a pitcher for poor defensive range among his fielders. Which is why I also report FIP (see below).
  • FIP - Fielding Independent Pitching -- FIP = (13HR + 3(BB+HBP)-2K) / IP + 3.2 (the latter is a constant and should probably be adjusted to the league). You can basically think of this as ERA, but controlling for luck and fielding that determines BABIP (see below). This is the same as the DICE formula given at Wikipedia, except that the constant is different. This stat works best among "average" pitchers in the 3.30 to 4.75 or so ERA range from what I've seen. I'll be posting an analysis about this some day.
  • PERA - Peripheral ERA. This is similar to FIP above, except that it also makes use of H/9, meaning that defense can come into play. But it does get around some of the wildly lucky or unlucky pitching performances that are more the result of sequences of events rather than hit rates. It is also park and league corrected. I'd prefer to have something more analogous to FIP that was park corrected, but this is the stat I can more easily find. I have not located an equation for calculating this stat, but I get the numbers from Baseball Prospectus.
  • VORP - Value over Replacement Player. This is analogous to the VORP for hitters, except that here pitchers are compared to replacement players for their role (starter or reliever). It is park-specific. Again, this is Baseball Prospectus material.
  • GB% - Ground Ball Percentage. This is Baseball Prospectus's Ground Ball calculation, which is based on all hit balls -- not just outs, which is how you usually see this stat calculated.
Projections
There are two projection stats that I will use here with any frequency, PECOTA and ZiPS:
  • PECOTA75 - the 75th percentile PECOTA score, which is basically what we could reasonably expect from this player if they have a good year
  • PECOTA - the weighted mean PECOTA, which is what is reported in the BP annual
  • PECOTA25 - the 25th percentile PECOTA score, which is what we could expect form this player if they have a bad year.
  • ZiPS - the most recent ZIPS projections for that player.
PECOTA is the Baseball Prospectus system, and is extremely complicated. It starts by generating a large list of comparable players to the player of interest, and then using those comparable players to predict how a player will perform in the future. The method is sound because it can draw on the rich history of past baseball performances, and most players do have very similar players who have played in the past. ZiPS are values reported over at Baseball Think Factory by Dan Szymborski. I have yet to find out how they are calculated.

That should do it for now. There may be other stats I pull in from time to time, but these will be the ones I will focus most of my effort on in the immediate future. -j

2 comments:

  1. hey that's not how you figure slugging percentage. it's h+2*2b+3*3b+4*hr/ab

    ReplyDelete
  2. Sorry, but that's not correct. If you replaced "h" with "1b" your equation would be right. But by using hits, you're giving too many bases for extra-base hits.
    -Justin

    ReplyDelete