Sunday, February 22, 2009

Chase-ing the FieldF/x

In the first inning of the last World Series the Rays fielders played Chase Utley extremely shifted, the way teams defend against guys like Big Papi, Giambi, or Howard. Chase tried to get advantage of the alignment by laying down a bunt, but it rolled out of the third base line; then he decided to ignore the opponents placement and hit a ball that no shift can take care of.

I hadn't watched many Phillies games during the regular season (living in Europe I'm usually exposed to the Cubbies... unless my boss accepts me sleeping on my desk in the morning), so I didn't know whether it is common place for teams to defend shifted against Utley. From what I heard during the telecast, Rays alignment was pretty unusual for that guy.

Using data both from Gameday and Retrosheet, I tried to figure an optimal positioning against Chase.
Here's what I've done.

First I plotted the locations of every Utley's batted ball, using coordinates from Gameday.

I plotted grounders in red. All the GBs you see in the outfield have gone through the infield and - theoretically - could have been caught by a well positioned infielder (or one that happened to be where the ball was hit). That's why people working at Project Scoresheet used to record, for groundball hits, the place where the ball left the infield, instead of that where the ball was collected in the outfield (you can see this good habit in the Retrosheet files of the '80s). As we see that's not the case with Gameday stringers.

I corrected for this. I dusted off a couple of geometry books and projected every groundball that went to the outfield to the place it left the infield.

Here's the new graph.

Finally I added some random noise because the groundball hits (those that went through) were plotted one over the other and wasn't easy to distinguish (or count) them.
Here's something easier to read.

"You can observe a lot just by watching". In this case, just watching you can see where batted balls tend to cluster.

Cluster analysis is a statistical tool that can help our eyes in this task.
I hoped that, with the help of cluster analysis, I could more or less identify where to place seven fielders against Chase Utley (pitcher and catcher are not so free to move around the field).

I must admit I didn't expect to be so lucky, and I don't expect to be when I perform similar analyses on other batters.
It turned out that I forgot to put a costraint on the number of clusters I wanted to find, but the algorithm I used chose exactly seven as the optimal result. Should that happen to every following analysis I do, that would be a scientific demonstration that nine players on the field is the perfect number for baseball.

Here's a chart showing the defensive alignment cluster analysis suggests for Utley. You put a player in the middle of each circle and he's responsible of every ball in that circle (or in the neighborhood, just look at the colors).

Weren't for that third baseman (I suppose) just behind the pitcher mound, the result would have looked too good to be true - I would have understood every one accusing me of making up the data.

It looks like the Rays alignment made sense. I'm curious about the midway (between infield and outfield) position where the second baseman usually plays in this kind of shift: the clustering algorithm places a player there to take care of short flies, while managers put the man there to tackle hard grounders too.

This analysis doesn't take into account the range a player can realistically cover. This causes the player behind the pitcher mound having to take care of all the infield grass; besides you have to believe that, however well positioned, you can't count on your pair on the right scooping every groundball.

I didn't expect running a simple cluster analysis (well, that's not so simple, to be honest) and finding results that make so much baseball sense.

Maybe if I remove from the data set the very short grounders (those that become infield hits or that are catcher/pitcher responsibility), I can even get that third baseman off the pitching mound!


  1. Max - Congratulations! This is a great study, and to see it in English is just amazing - well done.

  2. Wow this is awesome, great work.

    A quick question: Do you have any recommendations for a book or link to learn the basics of cluster analysis? I have a decent stats/econometrics background but cluster analysis is something I haven't discovered yet. Thanks.

  3. Max,

    Very, very cool stuff. Great idea on using cluster analysis. (And a nice bit of serendipity on choosing Utley and getting 7 clusters).

    Graham, if you're looking for pretty easy practical guide to using cluster analysis, check out the O'Reilly book, Programming Collective Intelligence. It's got a chapter dedicated to clustering.

  4. Your English is excellent - while reading, I couldn't tell this was not your primary language. Keep it up.

  5. Thank you all for the comments.

    Graham, I hope Dan's advice is good for you, because I learnt the basics of cluster analysis on Italian textbooks.

    Dan, after getting the lucky result, I looked at the less optimal results and there were a lot of them veeery close to the 7-clusters solution. Next time I think I'll remember to put the 7 clusters constraint.

    Now I'm going to upload post number 2.

  6. I just saw this thanks to a link from Rob Neyer, it looks like a really interesting study. My only question though, it looks like you haven't taken into account that you need to have a player positioned to be at first to record outs on ground balls. Do you think that the dark green cluster is close enough that if a first baseman was stationed there he could cover the bag and field those balls?

  7. Nathan,
    at the time I didn't put any constraint on the first baseman position (was discussed in following posts).
    A couple of notes; after P. Jensen published his work on fielding and the gameday coordinates system, I realized the diamond might not be exactly placed: the dark green cluster should be a little closer to the bag.
    Also, looking at data of actual fielding positioning (curtesy of Matt Thomas), first basemen play around 30 feet from the bag, and in some cases well over 40.