So, the question is, how well do these correlate to the (hopefully) more accurate StatCast data? Pretty well!
If you look across the top row of the matrix, you see Batted Ball Velocity (on y-axis) plotted against Hard %, Med %, and Soft %. You can see that it tracks very well with Hard Hit Ball %, in particular, and shows a pretty strong negative correlation with Soft %. This is just as expected, but it's nice to see the data looking pretty solid.
Interestingly, Batted Ball Velocity also tracks negatively with Med %, though it's a weaker relationship; the harder you hit the ball, the fewer medium-hit balls you will make. Similarly, Hard Hit % is negatively correlated with Med % and Soft %.
Here's a correlation matrix of those data:
BBVelo
|
Hard %
|
Med %
|
Soft %
|
|
BBVelo
|
---
|
0.660
|
-0.359
|
-0.502
|
Hard %
|
0.660
|
---
|
-0.713
|
-0.554
|
Med %
|
-0.359
|
-0.713
|
---
|
-0.185
|
Soft %
|
-0.502
|
-0.554
|
-0.185
|
---
|
Using Quality of Contact as a Surrogate for Batted Ball Velocity
For seasons prior to this one, where we don't have StatCast data available, it would be really nice to be able to estimate batted ball velocity based on these data. Unfortunately, given how correlated each BIS variable is with the others, it's hard to use more than one of them in a regression because they introduce multicollinearity and, potentially, don't provide much additional information. To check, however, I did an all possible subsets regression analysis, using combinations of Hard %, Med %, and Soft % variables to predicted batted ball velocity. Here's the output:
This is kind of a weird figure, but what it shows is the quality of fit (as measured by adjusted R2 on the y-axis) versus the different possible models (shown on the x-axis). What we see is that a simple regression predicting Batted Ball Velocity with Hard Hit Ball % alone gives an adjusted R2 of 0.43. It is, by far, the best of the single-variable models. Furthermore, adding additional variables provides almost no additional explanatory power (it maxes out at 0.46). Therefore, our best option is to simply predict Batted Ball velocity using Hard %.
If you'd like to do this at home, this is the regression equation: Velocity = 80.69 + 26.6842 * Hard %
This will give you a pretty solid fit:
R2 = 0.43. It looks like, most of the time, you'll be within about + 5% of the actual batted ball velocity using Hard %. Maybe it gets better with larger samples; well see later in the season.
But hey, something is better than nothing! Right now, this gives us a basic format that will permit us to look at quality of contact in seasons prior to 2015. And, like Batted Ball Velocity, Hard % tracks pretty well with variables like ISO and HR/FB...and, like BB Velocity, it does NOT track well with BABIP:
So, in short, while I love using actual velocity data, the Quality of Contact Data--and particularly the Hard % data--provided by BIS looks to be high quality and very usable.
Next up (probably): more of this stuff, but applied to Reds hitters
No comments:
Post a Comment