Table of Contents

Friday, June 13, 2014

Optimism, Carlos Gomez, and Projections

I surprised myself with a surprisingly upbeat preview of the Brewers vs. Reds series today:
Going into their game against the Mets last night (as I wrote this), the Brewers had a 7.5-game lead on the Reds.  Now, one can't really hope for a sweep.  But can you imagine?  That'd suddenly put the Reds 4.5 games back.  With the Cardinals still hovering around 0.500, the Reds have an opportunity here.  I like the Reds pitching match-ups in all three games.  If the offense can build on what it did the last two games of the Dodger series, this could be an exciting weekend for the Reds.  Here's to optimism! Go Reds!
Somewhere recently, I saw someone write (I think for another team) that it's more fun to be optimistic and wrong than pessimistic and right.  I'm trying to adopt that view. :)

Carlos Gomez: A lesson on the importance of
patience, scouts, and tools.
Photo Credit: Keith Allison
I also wrote a bit about Carlos Gomez.  He fits into the mold of a former top prospect who had been given up on by nearly everyone, only to discover himself.  Others include Jose Bautista and Edwin Encarnacion, although to his credit, Gomez was never DFA'd.
Some of Lucroy's best competition in the MVP race is his center fielder, Carlos Gomez.  I think Gomez is fascinating.  I find it almost impossible to believe that he's only in his age-28 season, because it seems like he's been around forever.  He was the key acquisition in the Twins' deal that sent Johan Santana to the Mets before the 2008 season.  Despite playing in the majors as a 22-year old, he was largely considered a flop in the aftermath of that trade.  He earned a strong reputation as a great defensive center fielder, but he didn't hit a lick.  The Brewers acquired him in exchange for J.J. Hardy, a deal that many (myself probably included) panned.  Hardy was often-injured, but he was a quality offensive shortstop coming off a bad season, while Carlos seemed like the definitive no-hit defensive center fielder.  Then, something happened in the second half of 2012: Gomez started to hit for power.  From July through September, Gomez slugged 15 home runs (previous season high was 8).  He continued to show that power through last all of last season, and FanGraphs estimated his season value at 7.6 fWAR (though that might be a bit inflated by a +24 run fielding rating...though b-ref gave him +38 runs in the field and 8.9 WAR, so....).  He's one of the best players in baseball.
This is the kind of thing that has taken me forever to wrap my head around.  I am a projection guy, and rely on them to guide my evaluations of players to avoid getting overly excited about new changes in performance.    In fact, MGL just wrote a great piece on the importance of this approach, which was then summarized by Dave Cameron.  But there are guys for whom I am bound to miss with this approach, and Carlos Gomez (and Bautista and Encarnacion) are cases in point.  Therefore, while I use projections, I also try to keep an eye out for toolsy guys who seem to finally be figuring it out.  In my view, the lesson is to trust your projection tools, but to still be cautiously conservative in their ability to forecast the future.  I'll always have a blind spot for this kind of player, but hopefully I'm not as blind as I once was.

Brandon Phillips might have been my first lesson in that.  Here's what I wrote when the Reds acquired him:
Previously a highly-touted prospect out of the Montreal farm system, this guy is apparently going to sit on our bench this year. He is out of options, and the Reds seem to have acquired him from the Indians because they didn't want to lose him via waivers and he's not good enough to be on their team. Phillips had a very good half-year as a 21-year old in the then-Expos' AA franchise, hitting 0.327/0.380/0.506/0.886 in 245 at bats in '02. But that's the last time he had a really outstanding, prospect-like season. He was probably rushed a bit to the majors in '03, but was thoroughly ineffective there and hasn't done a whole lot since. His '04 campaign with the Indians' AAA affiliate was decent (0.353 OBP), but he regressed a bit in '05. Krivsky mentioned in his rain delay interview with Marty Brennaman today that Phillips hit 15 homers last year in AAA, which is good. But his slugging percentage was a fairly poor 0.409, particularly given those HR totals. Overall...I wouldn't expect much from this guy's bat this season, and perhaps not ever.
Phillips might be overrated by some of the Cincinnati media, but there's no question that I missed the mark badly on this one.  Phillips has posted five seasons as an above-average hitter during his time with the Reds, has played legitimate gold-glove defense, and has topped 3 WAR five times (and posted 2.9 fWAR a sixth time).  He's been an excellent player for the Reds since almost the first day he donned the Reds' uniform.  The scouts were right.

Jesse Winker: a near-future answer in left field?

Reds' prospect Jesse Winker has been having a terrific season in high-A Bakersfield.  He was a 1st-round (49th overall) compensation pick in the 2012 draft when the Reds lost Ramon Hernandez, and so far has vastly outperformed the Reds 1st selection that draft, Nick Travieso**.  He was the Reds' consensus #4 prospect entering the season, and had really strong preseason Oliver projections (0.331 wOBA, *if* he played in MLB this season).  Well, he hasn't disappointed:


And that performance is despite missing a week or so with a concussion following a collision with a wall.  Apparently, it's not currently a factor.

Doug Gray had a nice piece yesterday comparing his raw performance to that of other recent uber-Reds prospects, namely Joey Votto, Jay Bruce, and Devin Mesoraco.  The short version?  Albeit with caveats, he's comparing extremely well.  Excellent walk rates, solid strikeout rates, and excellent power (as mesaured by ISO).

What are the caveats?  Three big ones:

  • Competition level.  Jay Bruce did spend some time at high-A, but also spent time at more advanced levels.  However, Votto and Mesoraco were primarily at low-A in their age-20 seasons, meaning that Winkers numbers are even more impressive in comparison with him.
  • Run environment.  Because the Reds' High-A affiliate is now in Bakersfield, Winker gets to play in the California league.  That league has the highest run environment among leagues above rookie ball (~5.3 runs/game when I looked at this between 2007-2009).  And furthermore, Bakersfield is a moderate hitters park for that league (runs park factor 1.03), and is especially favorable to the home run (1.15).  By comparison, the midwest league is quite pitcher friendly, even if Dayton is a bit of a hitter's park (park factor 1.07), meaning that the games played by Votto and Mesoraco are close to neutral.  When Bruce played in high-A, he was playing in Sarasota with the Florida State League, which is about as pitcher-friendly of a league as you can find.
  • Sample size.  Doug is comparing Winker's 239 PA's against a full minor league season by Bruce, Votto, and Mesoraco.
Nevertheless, despite all of that, it's very encouraging to see Winker performing so well!  I think it's reasonable to expect a mid-season promotion to AA at this point.  If he hits well there, he could potentially be at least a long-shot for left field coming into spring training in 2015.  This possibility was discussed by Joel Luckhaupt in a recent Redleg Nation Radio.  Goodness knows that the Reds could use some help in left field!

** This was not to condemn Nick Travieso.  After a slow start to his career, Travieso is having a quality season at Dayton this year as a control-oriented starter.  His strikeout rate is still far from where I'd like it to be, but he could potentially still work out to be a back-end starter.  Doug Gray has said that he's seen Travieso throw in the mid-90's over the past year (albeit inconsistently), so there might still be stuff there to succeed.

Thursday, June 12, 2014

Cameron: Johnny Cueto's Fastball Unhittable

Dave Cameron pointed out today that a big part of what has made Johnny Cueto so amazing this year is that his four-seam fastball has been nearly unhittable:
I mean, sheesh.

Cueto is known as a sinkerballer, and that's still very much what he is.  His ground-ball percentage still stands at 53%, and he still throws his sinker a healthy portion of the time (22.5%).  But Cueto has been seeing uncanny results with his four-seam fastball, which he is throwing a bit more often this season:

***Note also: Tony Cingrani also appears on this graph of low wOBA allowed.


That increase in his four-seamer frequency seems to part of why he's showing a 1 mph increase in his "fastball" velocity this season, although all of his pitches are seeing small velocity increases:


I tend to expect that outliers as substantial as Cueto is in that first graph are due for a regression.  That shouldn't be a surprise, because Cueto has been so unbelievable this year.  It's pretty rare for a pitcher to be almost twice as good as everyone else on something.

Nevertheless, the Dodgers announcers were talking a lot about Cueto and the deception in his delivery last night.  I think that Cueto's increased velocity, coupled with the deception of how well he hides the ball during his delivery, are a bit part of why he's been able to accomplish what he has this year.  As Cameron states, this is not just a BABIP-inspired oddity.  I can't explain it either, but I am enjoying the ride.

Saturday, June 07, 2014

Reprints: How Hitting Statistics Explain Runs Scored

This week in the SABR101x course, we're covering hitting statistics.  The lesson was very similar to an article I wrote 6 years ago on this site comparing the ability of different offensive stats to predict runs scored (many others have written such articles; it's a classic approach to addressing the question of hitting stat quality).

In it, I argued that there wasn't much of a problem to using OPS if it improved communication because the gains from it to other, better stats (like wOBA) were so meager.  Fortunately, in the time since, FanGraphs has popularized wOBA so much that I feel pretty comfortable just reporting it and ignoring OPS altogether.  And in many cases, I've even moved on to using wRC+ to get the advantages of park controls and run environment-neutrality.  OPS just isn't necessary anymore.

In any case, I thought it would be fun to reproduce that article here.  Here's the most relevant graph.  The rest appears below the jump.


Thursday, June 05, 2014

Selling Jeans: Ballplayer Height, Weight, and BMI

So, my question for today is: how have the physical attributes of ballplayers changed over the years?  Let's look at this graphically.

Player Height

I'm reporting all dates as birth year, as that seemed a logical way of organization players.  I'm also throwing out the edges of the database that contain fewer than 50 players per birth year.  That means, for recent years, I'm not including anyone born after 1990 (i.e. 24 years old in 2014).

We can see that, after an initial surge of the extremely short in late-1880's ball as baseball became more professional and required players to be top athletes, average player height quickly reached 70 inches (5'10", aka jinaz-standard height) and then progressively have gotten taller, on average, as a group.  Currently, baseball players average just shy of 74" (6'2").

No real surprises here.  Thanks to some combination of improved diet, sanitation, medicine, and social programs, average human height has increased four inches in the past 100 years, and ballplayers are right on track with that increase:
Furthermore, major league baseball players tend to be taller than the average population.  Current average height is 5'10", while ballplayers today average 6'2".  Among those born in the 1920's, average height was around 170 cm on this graph (67", or 5'7"), while baseball players.averaged 5'10".

Weight

This one's a bit more interesting:
So, we have a steady increase to 1920, then a slight increase that follows height...and then BOOM, something happens.  Weight shoots from 188 lbs to 206 lbs in a matter of 13 years (1965-1979 birth years).  That corresponds to players who played their age-27 years between 1992 and 2006.  What gives?

Before we address that question, let's first look at one more graph


Body Mass Index


So here, we're seeing a metric that tracks both height and weight in the same number.  And again, we're seeing a steady drop in BMI as the game becomes professional, a flat-lined BMI for many decades, and then a spike again once we hit 1965 babies.

The knee-jerk reaction is to claim that this matches up pretty well with the PED era.  There are no clear fenceposts for when that era began and ended, but I tend to think of the steroid era running from around 1994 (the year of The Strike) until the advent of MLB's testing program in 2003.  The steep part of the slope begins and ends, more or less, with players who peaked during that period of time (1992 through 2006).

The interesting thing is that it hasn't really dropped that much since MLB started its testing program.  Average weight of players has decreased slightly since its peak in 1982 babies (208.7 lbs) through 1989 babies (205 lbs).  Height has also dropped slightly during that time (0.3 inches), so BMI changes very little in that time.  That span describes players who are currently ages 25-33.  These are players that, by and large, have played their careers during a setting in which drug testing was a thing.  And yet, while they've declined, we're a far cry from where we might expect to be before that spike.  If the spike in weight and BMI occurred due to steroids taking over the game, and if the current testing program works well enough that steroids are now largely NOT a part of the game, we'd predict weight and BMI to return to pre-steroid levels.

My feeling is that some of this could be steroids.  But I think there's two other, important factors that could be involved:
  1. A shift in training regimes of players: an emphasis on being bigger, stronger, and faster through weight lifting and nutrition...and for scouts to prefer bigger players.
  2. An influx of international talent (including lots of big guys) that push up the pool of available talent.  If you have more players available to choose from, and baseball favors larger humans, you'll be able to shift up the averages by casting a larger net when selecting players.
The latter one seems like it could be a big deal, and I can think of a few ways to test it.  But I'll need to sharpen up my database skills a bit better to do so. :)

Thoughts?

What ex-Ballplayer Was Born at Sea?

So, I was playing around with some basic queries in the Lahman Database for the SABR101x course.  I decided to do a search on the birthCountry column.  Here's something that caught my eye:

Who is that?  Well:
Ed Porray pitched 10 innings for the Buffalo Buffeds in 1914.  The Buffeds were part of the Federal League.  He finished the season, and his big league career, with a 4.35 ERA and a 6.99 FIP.  And, he's a now the answer to a trivia question!

Wednesday, June 04, 2014

The First Week of SABR101x

We're at the end of the first week of +Andy Andres' SABR101x course, offered through +edX.  Having completed the materials, I thought I'd share a few reflections.

The EdX Platform & Distance Education

The discussion forums are an important part of what
makes courses on edX work.
This is my first edX course, but I don't think it will be my last.  I'm pretty impressed with it as a platform.  It strikes me as an excellent learning platform, with the ability to deliver a tightly organized course that presents information in multiple ways.  Furthermore, it allows students to interact with assessments via multiple choice-style questions as well as text entry, and to interact with each other via targeted discussion boards that can be inserted into specific stopping points within lectures.  

I'm a college professor in my day job.  I teach brick-and-mortar classes, and have avoided digging into the realm of online classes.  One of the things that I'll be doing is taking a look at how the course is constructed, both in terms of information progression as well as the mechanics of how Andres presents the material.  There's a lot to like, here.  The lectures are presented in short video format that usually runs 5-13 minutes in length.  In between, there are at least a few quick assessment questions, which gives students a chance to think about and process what they've just learned.  And intermixed with the lectures are short, 1-2 page written explanations that complement, but are not redundant with, the lecture material.

One thing that I didn't anticipate was how much I like having the narration to go along with the video/audio.  As a learner, I know that I do best when I can both see and hear something.  But, aside from video games, it's rare that I've had the chance to watch and listen to a narrative at the same time.  I can tell that I can grasp concepts much better when getting to read and see at the same time.  I don't plan to turn on the substitles on my home TV any time soon, but it's great for an education setting.  Along the way, I'm keeping a google doc window open where I can take notes as well.

As a side note, reading the narration allows for fun little quirks.  I feel bad for whoever they had transcribing all of the words, as that doesn't seem like a fun job.  But watching them try to spell Voros McCracken's name was funny ("Vhoorees," I think?).


SABR101x Content, Week 1

This week began with a basic introduction to sabermetrics.  Andres started exactly where I tend to start most of my courses: by defining terms.  He spent a lot of time looking at dictionary definitions, as well as definitions from those in the disciplines covered here: sabermetrics, statistics, data science, and big data.  These kinds of discussions always seem a little bit laborious.  But at the same time, they provide the opportunity to dispel a lot of misconceptions.  They also help enforce the idea what we need to be precise with our language.  I found the definitions when discussing databases to be particularly helpful, because I have very little background in that area.

Beyond definitions, this week was pretty light in content.  We took our first stab at running some MySQL queries in the BUx SQL Sandbox that Andres and his team set up on edX, and it worked well enough.  They set up a Lahman database and got everything set so that users needed only to type in the queries as presented in order to retrieve their data.  There is no inherent need to set up one's own SQL server/workbench to complete the course (although I did just that; see below).

Assessment thus far has been pretty light.  Some of the questions have been recall of minutia.  For example, one of the first questions asks you to report the year in which Bill James coined the term sabermetrics.  Good grief. :)  But most answers have been readily apparent from the videos, if one is paying attention & taking at least light notes.  Coding submissions are graded based on the output MySQL server stemming from your query, as far as I can tell.  So far, the coding assignments have basically been copy-and-paste exercises that require almost nothing from the student.  Still, a glance at the discussions shows that students are still having trouble with this.  Therefore, basic practice in syntax and input is probably appropriate at this stage of the course.  Future modules will almost certainly require a bit more thought in the assessment sections.

There is also a History of Sabermetrics track in the course.  This week's focus was on Henry Chadwick. Chadwick is sometimes known as the Father of Baseball, and is sometimes mentioned as an early pioneer of baseball in the same breath as Abner Doubleday (who, for a moment, was confused in my mind with Albus Dumbledore!  Go figure!).  But, as Andres notes, he was also the first real sabermetrician.  While he might not have actually invented box scores, he established a careful approach to observing, recording event, and the reporting on games that was pioneering.  He also was instrumental in carefully recording and refining the rules of the game.  Furthermore, through his writing in newspapers and his books on baseball, he was instrumental in publicizing and popularizing reports of baseball.  He's a guy that I've read a bit about before, most notably in Alan Swartz's Number's Game (which I read close to a decade ago!  'Tis a bit fuzzy).  Nevertheless, I found it a neat little foray into baseball history to learn more about him.  I'm looking forward to more of these history segments.

My Own MySQL Workbench

A local copy of MySQL Workbench offers a lot of
usability advantages over running from the course sandbox.
In order to get more practice, and to be set up to work on my own, I did opt to get a MySQL server running on my own computer.  I went to MySQL's website and downloaded their installer for "MySQL on Windows."  It was pretty easy to set up, although there was one hiccup where a certain "ODBC Connector" file (whatever that is) was not found by the installer and I had to download and install it manually.  Once installed, I launched the program and s elected Database-->Connect to Database from the menu.  That launched the workbench, which gave options to "Startup/Shutdown" the server.  Once started, my next step was to install the Lahman database (Andres provided a specific one to users of the site--they apparently made some changes?  There were two files...I went ahead and installed both as a SABR_101x schema in MySQL, seems ok!).

Now, I'm set to run queries!  Everything that works in the course works on my rig, although mine was installed such that all table names are lowercase.  There's an option in the server settings to not do that, but things were getting screwy when I changed that.  So...I'm just going to remember that this is a difference between the course and my computer.  I actually prefer this, because tables are not case sensitive on my system.  But I'm sure I'll get a few submissions wrong in the course as a result!

The interface of this workbench is light-years nicer than what I used when following Colin Wyers' instructions some years ago to install the Essentials SQL server/workbench.  There are options to save queries as script files, which is huge.  As you're editing, the editor color-codes commands, and offers pop-up help whenever you put your cursor on specific functions or operators.  I also love the schema view: you can select multiple columns in a table--or even multiple tables--with your mouse, right click, and it will automatically add the appropriate bare-bones SELECT text.  It's very nice.



That's all I have for now.  If you're on the edX course, I'm going by Justin90 there.  Please feel free to say "hi" if you see me on the forums.  Or, of course, just chime in here!