Monday, February 23, 2009

Refining the shift

I worked some more on the subject of last post.
First of all, as I said at the end of that post, I rerun the cluster analysis after having removed the short grounders from my data set.
Here's the resulting chart.

No more fielders behind the rubber... I like it.

Then I tried to do things separately for infielders and outfielders. In this way I have more confidence in constraining the clusters to having equal sizes. This makes sense especially for the outfielders.

Here is a first chart for infielders: I run a model for four players.

I'm sorry for the questionable choice of colors, but I haven't fully grasped how the customizations of plots works for the package I'm using (Mclust for R, for those interested).

And here are the outfielders (three for the moment).

Then I tried to move one player from the infield to the outfield. Following are the charts for three infielders and four outfielders, respectively.

Summing up.
Playing Utley shifted is the right thing to do. You don't need anybody playing near the third base bag. Putting a fielder in short right makes sense too: the statistical analysis puts him there to catch short flies, but (as managers who employ the shift know) he is mainly valuable for handling the grounders not collected by the infielders on that side.
When data on batted ball velocity are available, a new dimension will be added to the cluster analysis: balls along the lines that get quicker to the infielders would be treated appropriately, thus producing clusters of different sizes (in the spatial dimensions considered here) for third basemen and first basemen.


  1. I don't know if playing Utley in an overshift would work all the time. He's a dead pull hitter, but he's a much better athlete than most of the guys that defense is used against. He quicker and faster, and I suspect that he'd be much more successful than the likes of Howard and Giambi to bunt down the third base line. He was unsuccessful in one attempt in the world series, but that doesn't mean he would be regularly.

  2. Very interesting max. Just one simple question if I may. Where in the gameday infomation is the batted ball location found?

  3. Mark, the XML file containing the locations for World Series Game 1 is at
    I don't know if this is what you're looking for.

  4. Great work Max and nice to hear that there are more baseball interested analysts in europe than just me!

    I'm not really good (but also not that bad) in statistics and I'm very interested in your work and I'm trying to make the same u did with Utley (as I know a little in "R")!

    How did u get the data for Utley? I can just find the gamelogs on MLB!

  5. BurGi if you look in the xml file max has linked to in the reply to me i think the x and y values are the hit coordinates. the other files in the inning sub directory contain various other infomation from gameday such as all the pitch f/x stuff.

    Also european btw.

  6. BurGi, try to point to
    it gets you all the data.

    BurGi and Mark, where are you from?

  7. Hi max, i'm from northern england. I'm currently working on a python script to extract the data from the gameday site and get in into some csv files from which i can hopefully troll for interesting things to look at.

  8. Mark, if you know python, you should definitely look at the link I suggested above. With that I've been able to get all the 2008 data... and I know close to nothing about python.

  9. Thx Max!

    I'm from Austria!

    I already know a bit about Python and will look at the link soon when I've some time, thx!