Monday, March 06, 2006

Lineup Construction

Cross-posted at RedsZone.

I just ran across this article over at the Hardball Times, which linked to an earlier article by Cyril Morong at Beyond the Boxscore. Cyril did a regression analysis in which he looked at the effect of OBA and SLG on team runs for each position in the batting order. Based on his analysis, other bloggers have created various scripts and tools to take a set of nine hitters and spit out the best possible lineup. A nice tool, written by David Pinto at Baseball Musings, can be found here. It allows one to quickly input your favorite team's starting 9 and see what Morong's regression analysis would recommend for the lineup order.

My understanding is that it all works like this: for each position in the lineup, Morong's regression analysis created a coefficient for OBA and a coefficient for SLG. Pinto's tool on his blog has you input nine batters, their OBA, and their SLG. First, the tool calculates the expected runs per game for the lineup you inputted by multiplying each player's OBA's and SLG's by the corresponding coefficients from Morong's analysis. Next, it tries all other possible lineup combinations of the nine players you inputted and sorts these lineups by most runs produced.

To try it out, I inputted the lineup that I would recommend for the Reds at the start of the season (Harang is used as a pitcher to help understand how a horrible hitting pitcher would be placed--his OPS last year was 0.054!):

1. Freel
2. Lopez
3. Griffey
4. Dunn
5. Kearns
6. LaRue
7. Pena
8. Encarnacion
9. Harang

Pinto's tool indicated that this lineup would produce, on average, 4.541 runs/game. What were the best lineups? Pinto's tool spits out the top 30, and below, for each position, I list the players selected, followed by the frequency of their occurrences in those 30 lineups. The estimated runs per game of these 30 lineups hardly differed at all, and range from 4.942 to 4.924. Over a 162 game season, Morong's analysis predicts that if these optimized lineups were used instead of mine, we could see an increase in run production from 736 runs to 798-801 runs--a difference of 65 runs per season (between 6 and 7 wins)! It's also interesting to note that these same players, put into the worst possible lineup order, could produce only 3.943 runs per game (639/season, 162 runs different from the maximum, or about 16 wins). So lineup order does matter!

1. Dunn (18), LaRue (8), Lopez (4)
2. Dunn (9), Lopez (8), Griffey (6), LaRue (4), Kearns (3)
3. Kearns (12), LaRue (10), Lopez (4), Encarnacion (4)
4. Griffey (24), Lopez (3), Pena (2), Kearns (1)
5. Lopez (11), LaRue (8), Kearns (8), Dunn (3)
6. Pena (18), Encarnacion (11), Kearns (1)
7. Encarnacion (15), Pena (10), Kearns (5)
8. Harang (30)
9. Freel (30)

Whoa! Dunn leading off? Harang batting 8th? My boy Freel hitting 9th?? What is going on here?!?

To decipher this, I turned to this article by Dan Scotto, which offered some nice guidance into what Morong's analysis found to help interpret why it's doing what it is doing. After reading Scotto's descriptions and staring at the coefficients for a while, here is my take on what the analysis recommends for each position and why it's choosing the players it is choosing (numbers in parentheses is the rank from an ascending sort of the OBA and SLG coefficients; better ranking = larger coefficient = higher rewards (runs) for this attribute at this lineup position):
  1. (#1 ranked OBA, #7 ranked SLG) This one is all about on-base percentage. The more likely this player is to get on base, the more likely it is that the players behind him will drive him in. This is why Dunn does so well here; he gets on base more reliably than anyone else on the team (it should be noted that the simulator knows nothing about speed or strikeouts).
  2. (#3 OBA, #2 SLG) The #2 hitter is the best, most balanced hitter. Last year, Dunn, Lopez, and Griffey all had relatively high OBA (0.352+) and SLG (0.480+), and thus they all fit in fairly well in this spot (though Griffey's high SLG makes him a better candidate for a position that really emphasizes SLG...see below).
  3. (#5 OBA, #6 SLG) The conventional wisdom is that your best pure hitter should go here -- Sean Casey in his prime, for example. Or perhaps your best power hitter who you want to make sure hits in the first inning, like McGwire in his prime. But, in fact, the analysis indicates that a relatively average player should go here, perhaps in order to spread around your automatic outs. I had a hard time believing this, as it seems like you'd want to at least have a high OBA here. Nevertheless, the relatively low coefficients indicate relatively little differences in total runs scored resulted from varying OBA or SLG at this lineup position relative to other positions. Very surprising. Based on last years' statistics, Kearns and LaRue fit this bill the best. A stronger season from Kearns and a weaker season by LaRue would probably place LaRue here and Kearns in more valuable lineup positions (see ZiPS work below).
  4. (#7 OBA, #1 SLG) The complete opposite of the leadoff hitter. This position rewards SLG above all else, so you want your biggest bopper here. Griffey was the clear choice; if another very high OBP guy was in the lineup, however, I wouldn't be surprised to see Adam Dunn being placed here as well.
  5. (#4 OBA, #5 SLG) This is another spot that demands balance, like the #2 hole. Rewards aren't quite as good from this position as #2, but the regression's recommendation contrasts with the more traditional high power, low OBP guys that I've always heard belonged in the 5 hole. Still, it's very surprising that this player should be a better hitter in terms of both OBA and SLG than the #3 guy. Anyway, Lopez, LaRue, and Kearns all fit in well here with relatively good balance.
  6. (#8 OBA, #3 SLG) Following the balanced player in #5 is the guy I usually think of for a #5 hitter. Poor on base average, but a guy who can knock the heck out of the ball. Essentially, this is the same type of player you'd put in the 4 hole, it's just that he's not as good. Wily Mo Pena is the obvious choice.
  7. (#7 OBA, #4 SLG) This guy is fairly similar to the #6 hitter, but doesn't quite have the power and has a bit better ability to get on base. EdE fits in well here, as he can get on base better than Pena, yet still does have good power (9 HR in 211 AB's last year).
  8. (#9 OBA, #9 SLG) Surprise! This should be your absolute worst hitter, which for just about any NL team will be the pitcher. Increasing OBA or SLG results in fewer runs gained here than any other position in the lineup. Now all NL teams (except for a brief experiment by LaRussa back in McGwire's prime) bat the pitcher 9th, because they want to minimize the number of at bats this player receives and postpone, for as long as possible, the need to pinch hit for him later in the game. But as we'll see, the 9th hitter can be a very productive player:
  9. (#2 OBA, #8 SLG). The OBA coefficient for the #9 hitter (2.55) was more than twice as large as that for the #8 hitter (1.188). This means that an increase in OBA in the #8 hole resulted in less than half as many additional runs as the same increase in the #9 hole. Why this dramatic discrepancy? Because the #9 hitter will be on base for your best hitters - the guys in the #1, #2, and #4 hole. Their SLG matters very little, however, because few people are likely to be on base when they come up to bat (especially with the pitcher hitting in front of them!). Ryan Freel, a high OBA, low SLG player, is the prototypical guy for this spot. Another example might be someone like Frank Menechino.
Now I think a lot of people would predict some differences from last year's performance among our players. I'm expecting Edwin Encarnacion and Austin Kearns to be quite a bit more productive than they were last year, and I would not be surprised (unfortunately) to see Jason LaRue drop off in his production a bit. Therefore, I did a second run based on Baseball Think Factory's 2006 ZiPS Projections. Here are the results:

1. Dunn (22), Kearns (6), Lopez (1), LaRue (1)
2. Griffey (10), Kearns (10), Dunn (7), Lopez (1), Encarnacion (1)
3. LaRue (18), Lopez (9), Encarnacion (2), Kearns (1)
4. Griffey (16), Pena (13), Kearns (1)
5. Kearns (9), Lopez (9), Griffey (5), LaRue (4), Encarnacion (3)
6. Pena (15), Encarnacion (11), Lopez (2), LaRue (1), Kearns (1)
7. Encarnacion (13), Lopez (8), LaRue (5), Kearns (2), Pena (2)
8. Harang (30)
9. Freel (30)

A few differences in who wins out at the fiercely contended #2 and #5 spots (balanced players), as well as the #3 spot (the leftover player), but #'s 1, 4, 6, 7, 8, and 9 are all the same. In fact, the only major difference is LaRue's "dominance" in the #3 hole with these projections, caused no doubt by his predicted return to mediocrity and Kearns' predicted improvement. Nevertheless, the lineup recommendations are remarkably static, indicating that each spot in the lineup really does have an optimal role that corresponds to particular players' strengths in the Reds lineup.

Of course, these predictions are inferences based on looking at variances in performance at each lineup position from '98 to '02, not actual experimental data. The best evidence for these claims would come from actually having a team try these ideas out, which unfortunately is unlikely to ever happen. I may try to do some additional toying with Pinto's tool, or maybe even some simulations, at a later date. For now, however, it's an interesting thought exercise.

A few other quick notes:
  • Using ZiPS Projections, replacing Encarnacion with Aurilia at 3B drops the maximum optimized run production from 4.912 to 4.834 runs per game (12.6 runs total in a season - about 1 win). Not a huge difference, but given Encarnacion's upside at this point in his career, it seems the obvious move.
  • Again using ZiPS Projections, replacing Freel with Womack at 2B drops the maximum optimized run production from 4.912 to a dreadful 4.670 runs per game (39.2 runs total difference over the season, or about 4 wins!). Both Freel and Womack are always placed in the #9 hole in the top 30 lineups; what we're seeing is the benefit of high OBA from that lineup position.
-JinAZ

3 comments:

  1. Oh great, now I have ANOTHER blog that I have to read everyday. You're just trying to show me up by starting a blog and having a kid at the same time. Sheesh! :D

    ReplyDelete
  2. The finding that the #3 hitter should be such an average hitter, based on the above analysis has been nagging at me. As TeamSelig pointed out in my post over at RedsZone (see cross-post link above), the #3 hitter will receive more at-bats than even the #4 hitter. Why would you put a crappy hitter in that position?

    Reading the recent Hardball Times article that got me thinking about all of this in the first place might be starting to give us an answer regarding this point. It discusses a chapter on lineup construction in The Book, which is now high up on my birthday wishlist. :)

    Points from the article:
    * First, by putting a very high OBA guy (as is traditionally done) in the #3 hole, you're sacrificing runs in the second inning, because the #3 hitter is the least likely to lead off that inning.
    * Second, the #3 hitter is the most likely player to hit with two outs and nobody on base among any of the top-5 lineup spots.

    Another interesting tid-bit on speed guys:
    * Stolen bases are most valuable when in front of singles hitters who don't strike out much.
    * A caught stealing attempt is more costly when in front of power hitters than singles hitters.

    Both of these seem "obvious," but also unappreciated. They argue for a speed guy hitting 6th when they are not among your top-5 best hitters, as this will put them in front of lower-power guys, maximizing the reward/cost ratio for any theft activity.

    I'm not sure I'd go so far as to bat the winner of the Freel/Womack 6th, however. The regression analysis showed big payoffs for a power hitting #6 batter, and given that we have Wily Mo Pena on our roster, I think he has to go #6. So it comes down to either Edwin Encarnacion or Freel/Womack at #7, with the pitcher #8. My druthers would be to hit Freel/Womack 7th, as this gives you the option of stealing in front of a pinch hitter late in the game, or in front of EdE. Much better to do that than to steal in front of Adam Dunn. -JinAZ

    ReplyDelete