Tuesday, May 19, 2015

Batted Ball Velocity Data Predicts Performance

Jay Bruce leads the Reds with 92 mph Batted Ball Velocity,
but his ISO and HR/FB rates a just middle of the pack.
Photo credit: Trev Stair
We've long known that random events can influence hitters' batting lines.  We'll see Jay Bruce crush a ball to deep center field only to see it caught.  And then, the next batter, we'll see Brandon Phillips bloop a "dying quail" over the second baseman's head for a single.  Probably, we'd expect that a hitter's future results will relate better to how hard he hits the ball, rather than his past "luck" in "hitting it where they ain't."

The advent of StatCast batted ball velocity data in the gameday feed is an exciting development in that, for the first time, we have a direct measure of how hard hitters are striking the ball.  The data are still a bit hard to come by; our best source is at Baseball Savant, who scrapes the data together from gameday.  I know others have worked with these data already, but what follows is my first foray into analysis using these data.

Describing batted ball velocity data

I pulled average batted ball velocity data for all players in Savant's database using the link above.  One can do more specific comparisons using his pitchf/x search tool, but I was happy with overall average velocity for a first look (that said, I have no doubt that variation in his number is extremely important).  I linked this up, by name, to hitters from FanGraphs' database.  All data were through May 17, 2015.  Many thanks to these two sites for the data.

After culling out anyone who did not have velocity data, I then stripped down the sample to those with 60 PA or more.  It seemed like a good mid-range number.  That left 291 players in my dataset.  Here's a histogram of their velocity data:

The average batted ball velocity in this set of players was 88.5 mph, and you can see that the distribution is almost perfectly normal.  Very few major league hitters average over 95 mph, and very few average under 82 mph.  The latter is probably a selection process; if you don't hit the ball harder than that, you're not likely to be in the big leagues for long.

How well does batted ball velocity predict offensive numbers?

It seemed to me that there were three primary variables that should be most directly affected by batted ball velocity:

  • BABIP: the harder one hits the ball, the more balls in play should fall in as hits.  
  • ISO: the harder one hits the ball, the more extra bases you should gain.
  • HR/FB: the harder one hits the ball, the more often fly balls should become home runs.
I didn't look at overall performance statistics, like wOBA or OPS, because those numbers will be affected by non-contact events (strikeouts & walks).  Any affect on those summary numbers will occur specifically due to changes in the above statistics.

Let's go through those one by one.

Batting Average on Balls in Play (BABIP)
As it turns out, there was no significant effect of batted ball velocity on BABIP (P = 0.06, R2 = 0.012).  This really surprised me.  But maybe there is some signal that is lost in the overall comparison?  For example, maybe batted ball velocity is better for fly ball hitters, but is worse for ground ball hitters who are trying to beat out infield hits.  

Therefore, I decided to split up my hitters into three groups:
  • Ground Ball Hitters: - hitters who had a ground ball % in the upper quartile of the data (a GB% greater than 50%)
  • Fly Ball Hitters: hitters who had a ground ball % in the lowest quartile of the data (GB% less than 39.3%).
  • Non-GB Non-FB Hitters: hitters who were in the two middle quartiles of the data.
Here's what happened:
Neat, right?  When you look at fly ball hitters, average batted ball velocity does predict BABIP, at least a little bit (p = 0.008, R2 = 0.07).  But there's no relationship for other hitters.  

I guess the lesson here is that BABIP is still a pretty volatile statistic, with other factors (luck/fielding/pitchers/parks/weather) playing a large enough role that it masks any potential effect.  Or, perhaps we need to be even more nuanced; maybe ground ball-speed guys, like Dee Gordon, might have a different relationship with batted ball velocity than ground-ball slow guys?  It's a topic for future study.

Also, another fun thing: there's not really much difference in BABIP between the different hitter types.  FB Hitter = .289,  GB Hitter = .305, Middle 50% Hitters = 0.302.  Compared to the spread in the data, that's not much of a difference.  ....  although it probably matters more in larger samples.

Isolated Power (ISO)

Isolated power (which is SLG - AVG) is a measure of how many of one's hits result in extra bases.  Here's the overall trend:
Batted Ball Velocity does predict isolated power (p < 0.0001, R2 = 0.22).  Here's the breakdown by hitter type, as I did for BABIP:

It looks like the relationship is pretty consistent across hitter types.  The one caveat is that the more fly balls you hit, the higher your ISO and the higher your slope.  In other words, by hitting fly balls, you are going to get more extra bases.  And increasing batted ball velocity results in a more extra bases if you're hitting fly balls than if you're hitting ground balls.  That all makes sense, I think.

Home Run per Fly Ball Ratio (HR/FB)

One more: does hitting the ball harder result in more home runs per fly ball?  For this one, I removed anyone from the dataset who hadn't yet hit a homer...because I don't like 0's when running regressions.
While HR/FB is a notoriously volatile number, even for hitters, there is a significant relationship once again (p < 0.0001, R2 = 0.19).  And if we break down by hitter type:
...much the same story.  Interesting thing with the ground ball hitters, though: despite mostly-similar batted ball velocity, their fly balls turn into home runs at a lower rate than the dedicated fly ball hitters.  This must be a swing angle effect; fly ball hitters probably use an upper-cut swing, and therefore will hit the ball hard and in the air, which converts into home runs.  In contrast, if ground ball hitters have more of a level swing, when they hit it in the air it is likely to be a mistake, and not among their harder-hit balls.  As a result, they turn into outs rather than home runs more often.

Can we predict future regression based on batted ball velocity?

So, we have two variables that are predicted well by batted ball velocity: ISO and HR/FB.  Can we predict players who will regress (positively or negatively) in these statistics based on how hard they've hit the ball thus far?

Let's look at isolated power first.  Here is a graph showing residual ISO (the difference between actual and expected ISO values, based on our regression line) of a bunch of players:
It's messy, but at least you can make out the guys on the extremes.  Players near the top of the graph have higher isolated power than their average batted ball velocity would predict.  In contrast, players with a residual below 0.0 show improvement in their ISO.

Here's a list of the largest residual players:
So, yes, of course the guys who are expected to decline have high ISO's, and the guys expected to improve have low ISO's.  But we've got more precision than just a sort of ISO now.  Giancarlo Stanton, for example, has an ISO of 0.293 currently, and yet he hits the ball so hard that his residual is only slightly positive.  Similar things can be said about Joc Pederson.  On the other side of the coin, Jordan Schafer, Cesar Hernandez, and Ichiro Suzuki all hit the ball very lightly, and so their low ISO's (0.04-0.06) all seem very appropriate.

I don't want to overstate the effect, but this should help us anticipate player who will regress.

Also: Grady Sizemore is playing this year?  I had no idea.

Now, let's do HR/FB:
Again, higher residuals = better HR/FB than expected based on batted ball velocity.  Here's the players who stand out:
Again, we have more information here than just picking the highest and lowest HR/FB guys.  Giancarlo Stanton hits the ball really hard and has a high HR/FB, and this is not disputed by the regression.  Ichiro and Billy Hamilton are at the low end, and that doesn't seem strange.  But when there's a mis-match between HR/FB and BB Velocity, they show up on this chart.

So, maybe we can make better predictions now.  That said, I think a lot more would need to be done before this is ready for any kind of "real" use (in fantasy baseball, or otherwise).  A lot of the guys on the "probably will regress" list are fly ball hitters, and therefore we'd expect a higher HR/FB as a result.  A lot of guys on the "probably will improve" list are ground ball hitters.  At the least, if we're trying to project, we need to take that into consideration.  I'm just not there yet.

Nevertheless, I think this is promising enough that I just put a reminder in my planner to go back and check on this at the start of July and see how we did, compared to players who had similar HR/FB or SLG but had corresponding batted ball velocity.

Next up: how does StatCast batted ball velocity data compare to the BIS Hard-Hit ball data?

No comments:

Post a Comment