Table of Contents

Thursday, April 30, 2009

Player value series, part 7: How should we handle park factors?

Note: I've changed enough about the way I calculate player value for hitters that I probably need to re-write this entire series (click player value below to see the whole set of posts). But I wanted to get this online so I can show my work, so we'll run with it for now and then fix the other articles later.

One of the more unique aspects of baseball is the substantial influence that the ballpark can have on the outcome of (at the least) batted balls. While this an aspect of baseball that many of us enjoy, it also presents a challenge to our ability to assess player value. For example, Brian Giles hit 0.306/0.398/0.456 at PETCO last season, which was good for a 0.376 wOBA. And Brad Hawpe hit 0.283/0.381/0.498 at Coors Field, which gave him a 0.379 wOBA.

They had pretty similar seasons according to those numbers, with Giles being slightly better at getting on base, and Hawpe showing more power. The problem, of course, is that Giles played half of his games in PETCO park, which is a notorious pitcher's park. And Hawpe played half of his games in Colorado, which even post-humidor is the best hitter's park in baseball. Presumably, if you had swapped where the two players were hitting, Giles would have vastly outperformed Hawpe.

How do we reconcile this in our player valuations? The traditional approach has been to use park factors. The best article on the web about park factors is Patriot's, and I won't replicate his work here. But briefly, a park factor can be conceptualized as simply this ratio:

Avg Runs scored per game at a ballpark
Avg Runs scored per game at all ballparks

So, if you see a park factor of 1.18, for example, that means that more runs are scored at that ballpark than in a typical ballpark. And a ratio of 0.84 means that fewer runs are scored at the ballpark in question than in a typical ballpark.

A quick and dirty way of calculating park factors is to simply divide a team's total runs scored and allowed at home by its runs scored and allowed in away games. So, in that case, a 1.18 park factor would mean that a team scores ~18% more runs at home than they do on the road.

There are a variety of additional complications if you're going to do it properly, and Patriot discusses them. But to be honest, most of the time, that simple approach will get you most of the way there. Nevertheless, there are three factors to worry about that are important and make a big difference:

1) If you go to apply a home runs/away runs park factor to an individual player's runs stats in order to discount (in Hawpe's case) or boost (in Giles' case) his value rating, it's important to first cut the park factor in half (meaning 1.18 would be 1.09, and 0.84 would be 0.92). This is because the 1.18 park factor described above would be an appropriate adjustment only for games played in the park. Typically, though, players play half of their games in other parks, which, on average, will have a park factor of ~1.0. So, if we split the difference, we'll get a multiplier that makes sense in light of the 81 home/81 away game schedule. Once you have this number, you just divide your player's absolute runs estimate by your park factor (but see #2 and #3 below). Many people (like Patriot and Szymborski) already have done this adjustment to the numbers they post, but be sure to check on this wherever you get your park factors.

2) Park factors are variable as all heck from year to year. Patriot posts 5-year averaged, regressed park factors, which in my view are the most reliable park factors on the 'net. The reason is that by both averaging and regressing, he's accounting for the fact that any park factor estimate has large error bars around it, and the true park factor is probably a bit closer to league average than even your 5-season average indicates.

3) As pointed out by Tango and others, there may be problems with simply dividing Hawpe's absolute batted runs number by 1.09. The issue is that this adjustment will have a more significant impact on good hitters than poor hitters. A park factor of 1.10 would strip 10 runs from a 100-run hitter's totals, but only 3 runs from a 30-run hitter's totals...and yet there's no evidence indicating that the good hitters' value should be discounted at a greater rate than a poor hitter's value.

A solution to this problem--and, at the same time, a convenient way to apply park factors directly to RAA or RAR data (ratios can only be applied to absolute runs data)--is to convert our traditional ratio-based park factors to an additive park factor. Once we do this, we just add or subtract a certain small fraction of runs per PA to each hitter. It's an extremely easy and straightforward way to handle park factors, and yet is something that I rarely see done.

So, here are conversions of Patriot's 2008 5-year regressed park factor ratios into additive park factors. I show them per PA and per 700 PA's (a season's worth) to help us understand how large of an effect we're working with here. (Methods: I took the total MLB runs in 2008, divided a given the park factor, and took the difference in runs between the adjusted and unadjusted runs. I then divided this difference by total 2008 MLB PA's to get the per-PA adjustments)
Runs/PA Adj R/700 PA Adj
2008 ARI 1.05 -0.0060 -4.2
2008 ATL 1 0.0000 0.0
2008 BAL 1.01 -0.0012 -0.8
2008 BOS 1.04 -0.0048 -3.4
2008 CHA 1.04 -0.0048 -3.4
2008 CHN 1.04 -0.0048 -3.4
2008 CIN 1.02 -0.0024 -1.7
2008 CLE 1 0.0000 0.0
2008 COL 1.09 -0.0108 -7.6
2008 DET 1 0.0000 0.0
2008 FLA 0.98 0.0024 1.7
2008 HOU 0.99 0.0012 0.8
2008 KCR 1 0.0000 0.0
2008 LA 0.98 0.0024 1.7
2008 LAA 0.99 0.0012 0.8
2008 MIL 1 0.0000 0.0
2008 MIN 1 0.0000 0.0
2008 NYY 1 0.0000 0.0
2008 NYM 0.97 0.0036 2.5
2008 OAK 0.98 0.0024 1.7
2008 PHI 1.02 -0.0024 -1.7
2008 PIT 0.98 0.0024 1.7
2008 SD 0.92 0.0096 6.7
2008 SEA 0.97 0.0036 2.5
2008 SF 1.01 -0.0012 -0.8
2008 STL 0.98 0.0024 1.7
2008 TB 0.99 0.0012 0.8
2008 TEX 1.03 -0.0036 -2.5
2008 TOR 1.02 -0.0024 -1.7
2008 WAS 1.01 -0.0012 -0.8
Essentially, for each "unit" of a park factor, you add or subtract 0.0012 runs per PA, which works out to be ~0.8 runs per season. This results in a park-induced range of ~15 runs per season between the best hitter's park (Coors) and the best pitcher's park (PETCO).

Put another way, if you took a true 30 RAR hitter and played him on the Rockies Colorado, you'd expect him to produce ~38 RAR (raw, without park adjustments). That same hitter would be expected to produce ~23 RAR in San Diego. So by subtracting 8 runs/season from your Colorado hitter, or adding 7 runs to your San Diego hitter, you can properly estimate the player's true hitting performance (~30 runs above replacement).

Isn't that easy?

The same approach could potentially be used on pitchers, but we unfortunately don't tend to use per-PA data to evaluate pitchers. For now, I'm still just using ratio park factors on pitchers, but I'll likely switch to a new approach in the near future once I incorporate some other changes to how I do their player valuations (probably involving pythagorean-based win estimates for pitchers, like Tango does). More on that at a later date.

Thanks to folks in this thread at Baseball Fever for helping me to finally figure some of this stuff out. Assuming, of course, I really have figured this stuff out...