In the first inning of the last World Series the Rays fielders played Chase Utley extremely shifted, the way teams defend against guys like Big Papi, Giambi, or Howard. Chase tried to get advantage of the alignment by laying down a bunt, but it rolled out of the third base line; then he decided to ignore the opponents placement and hit a ball that no shift can take care of.
I hadn't watched many Phillies games during the regular season (living in Europe I'm usually exposed to the Cubbies... unless my boss accepts me sleeping on my desk in the morning), so I didn't know whether it is common place for teams to defend shifted against Utley. From what I heard during the telecast, Rays alignment was pretty unusual for that guy.
Using data both from Gameday and Retrosheet, I tried to figure an optimal positioning against Chase.
Here's what I've done.
First I plotted the locations of every Utley's batted ball, using coordinates from Gameday.
I plotted grounders in red. All the GBs you see in the outfield have gone through the infield and - theoretically - could have been caught by a well positioned infielder (or one that happened to be where the ball was hit). That's why people working at Project Scoresheet used to record, for groundball hits, the place where the ball left the infield, instead of that where the ball was collected in the outfield (you can see this good habit in the Retrosheet files of the '80s). As we see that's not the case with Gameday stringers.
I corrected for this. I dusted off a couple of geometry books and projected every groundball that went to the outfield to the place it left the infield.
Here's the new graph.
Finally I added some random noise because the groundball hits (those that went through) were plotted one over the other and wasn't easy to distinguish (or count) them.
Here's something easier to read.
"You can observe a lot just by watching". In this case, just watching you can see where batted balls tend to cluster.
Cluster analysis is a statistical tool that can help our eyes in this task.
I hoped that, with the help of cluster analysis, I could more or less identify where to place seven fielders against Chase Utley (pitcher and catcher are not so free to move around the field).
I must admit I didn't expect to be so lucky, and I don't expect to be when I perform similar analyses on other batters.
It turned out that I forgot to put a costraint on the number of clusters I wanted to find, but the algorithm I used chose exactly seven as the optimal result. Should that happen to every following analysis I do, that would be a scientific demonstration that nine players on the field is the perfect number for baseball.
Here's a chart showing the defensive alignment cluster analysis suggests for Utley. You put a player in the middle of each circle and he's responsible of every ball in that circle (or in the neighborhood, just look at the colors).
Weren't for that third baseman (I suppose) just behind the pitcher mound, the result would have looked too good to be true - I would have understood every one accusing me of making up the data.
It looks like the Rays alignment made sense. I'm curious about the midway (between infield and outfield) position where the second baseman usually plays in this kind of shift: the clustering algorithm places a player there to take care of short flies, while managers put the man there to tackle hard grounders too.
This analysis doesn't take into account the range a player can realistically cover. This causes the player behind the pitcher mound having to take care of all the infield grass; besides you have to believe that, however well positioned, you can't count on your pair on the right scooping every groundball.
I didn't expect running a simple cluster analysis (well, that's not so simple, to be honest) and finding results that make so much baseball sense.
Maybe if I remove from the data set the very short grounders (those that become infield hits or that are catcher/pitcher responsibility), I can even get that third baseman off the pitching mound!