I've been concerned about Arroyo's high workload for some time now. When the Reds signed Arroyo to his contract extension this past winter, I wrote "...Arroyo led the league in innings pitched last year (240.7) and was 6th in baseball in Pitcher Abuse Points. Unlike Harang, Bronson isn't a big guy (though I always forget that he is 6'5") and, as Joel pointed out, he does have a somewhat herky-jerky delivery. Therefore, even though he proved to be amazingly durable last season, I think the Reds really need to watch how they use him in the coming seasons if they want him to remain effective throughout the course of this contract."
Tonight, I decided to do a quick study to see if I could bring some numbers to bear on the issue. The specific question I tried to evaluate was whether high workloads in one start caused poorer performance in starts that followed. To test this, I looked at all of Arroyo's starts since arriving with the Reds in 2006. I then categorized his starts based on the number of pitches Arroyo threw in the prior start (I threw out his first start in 2006 and 2007). For example, on May 1st, Arroyo threw 95 pitches while going 7 innings and allowing one run against Houston. I therefore placed his next start, a 120-pitch outing against Colorado on May 6th, into the 90-99 pitches bin. And then, his next start, a 117-pitch outing against Los Angeles, went into the 120+ pitches bin. The result is a dataset that allows me to see whether there has been a relationship between how many pitches Arroyo throws in one start and his performance in the next start.
Here are the data, first in table form, and then graphically:
|Pitches in Prior Start||Number of Starts ||IP||H||R||ER||HR||BB||SO||ERA||R/9||FIP||K/9||BB/9||HR/9|
Arroyo's runs allowed per nine innings and FIP (fielding independent pitching, uses peripherals to estimate ERA), plotted by the number of pitches Arroyo threw in the prior appearance. For the graph, I opted to ignore the first two bins in the table due to the minuscule sample sizes.
As predicted--though I am somewhat surprised at how clean it looks--Arroyo's performance in a start historically has been predictable, to some degree, based on his workload in his previous appearance. Lower pitch counts in a start have generally been followed by great performance in the subsequent start. Even if you look only at the center two columns that have the best sample sizes--the 100-109 and 110-119 bins--the effect is fairly dramatic. An extra 10 pitches in one start tended to result (by cause or correlation) in about an extra half-run allowed per nine innings in the subsequent start.
The effect looks most dramatic in the 120+ bin, though I would caution that the sample size here is still very low. In fact, despite the 5.53 ERA in this group, three of the five starts in this bin were quality starts...it's just that Arroyo got shelled in the others, most notably the 2 IP, 6 run effort against Washington on May 21st. If one removes the Washington outing, his ERA in this group drops to a tidy 3.96, though the FIP remains a high 4.92 thanks to a low strikeout rate and high walk rate.
What is the cause of Arroyo's struggles following high pitch-count outings? His peripherals show steadily increasing walk and hr-allowed rates, which both seem to indicate poorer control. Much of his success seems to depend on his ability to locate his slow curve ball, and perhaps that's more difficult to do when his arm is still tired from the previous outing. He may also have taken a hit on his strikeout rate following extremely long outings, though I'm hesitant to make much of a conclusion given the sample size issues with the 120+ pitch group.
Comparing Arroyo and Harang
The effect we see in Bronson above, despite having relatively small sample sizes, is consistent with what has been observed in other pitchers in other studies of how pitch count and other factors affects performance. Keith Woolner has a nice article in the 2007 Baseball Prospectus Annual that gives a good overview. Nevertheless, I thought it might be informative to compare Arroyo's results to those of the Reds' ace, Aaron Harang. While both Harang and Arroyo have had similar success over the past season and a half, they are different sorts of pitchers--Harang is a big man with smooth mechanics, while Arroyo is smaller and more "herky-jerky." Therefore, we might expect less of an effect of heavy work on Harang than on Arroyo.
My procedure was the same: take all of Harang's starts since April 2006 and categorize them based on the number of pitches in his prior starts (exception: there was one appearance last season when Harang was used in relief--I skipped over that appearance and the next start because I didn't know how to categorize them). Here are the data:
|Pitches in Prior Start|| Number of Starts ||IP||H||R||ER||HR||BB||SO||ERA||R/9||FIP||K/9||BB/9||HR/9|
Harang's runs allowed per nine innings and FIP (fielding independent pitching, uses peripherals to estimate ERA), plotted by the number of pitches Arroyo threw in the prior appearance. For the graph, I again opted to ignore the first two bins due to the minuscule sample sizes.
Needless to say, Harang does indeed show far less of a consistent relationship between workload and performance. In fact, he seems to steadily improve in subsequent starts as his workload increases in the prior start, at least until you start to cross the 120-pitch threshold. The one stat that seems to be a predictable response to increased workload is a drop-off in Harang's strikeout rates among starts following those with high workload. Nevertheless, he has been able to compensate for this with better control and fewer HR-allowed.
Conclusions and Recommendations
The critical reader will no doubt identify that this little study is based on limited sample sizes, and evaluates only two pitchers. It does rely on other research on pitcher workload and carryover effects to ground it, but nonetheless I will try to be cautious in my conclusions.
Historically, since arriving with the Reds, Arroyo has been substantially better when his workload in the previous start is kept down. For every 10 extra pitches above 90, he has tended to allow about a half-run more per nine innings. The effect has been even more dramatic following starts in which he tops 120 pitches. The effect is not absolute--Arroyo has pitched well following long outings, and has pitched poorly following short ones. But the average effect is sizable enough that it is probably worth paying attention to. While one obviously has to consider the present game situation when managing a pitcher, not to mention what the pitcher is telling you verbally and via body language, it would seem a "best practice" to try to keep Arroyo's starts under the 110-pitch mark when possible.
Harang, on the other hand, be it due to his delivery, body type, or just stochasticity, has historically handled higher workloads better than Arroyo, at least in terms of carry-over effects from subsequent starts. While I certainly would not recommend regularly extending Harang beyond 120 pitches (much less 135) because of the risk that he might injure himself due to pitching while tired, I would probably worry less about pitch counts with him and let the game situation dictate how long to keep him in the ballgame...at least until he gets into the 110-120 pitch range.
References and related studies
Pitcher Abuse Point Analysis by Keith Woolner
Pitcher Abuse Point^3 FAQ
Baseball Prospectus Annual 2007 (Keith Woolner's article)
Photo by AP/Ted S. Warren