Wednesday, February 25, 2009

Before moving on

I'm happy with my debut in the blogosphere.
I got some feedback and - most importantly - critics and suggestions.
I'd like to thank those of you who spread the news about the new kid on the block.

I will use this post to sum up what has emerged from the comments here and at Tango's blog.

1. What my analysis shows is the best (or at least a very good) alignment for maximizing the proportions of batted balls that are caught by the seven fielders. That's not the whole picture about defense.
The first baseman has to take care of the assists by the other infielders so he's not completely free to take a position on the field; my model doesn't account for this.

2. Not all batted balls are created equal. An up-the-middle groundball usually produces a single, while a grounder down either foul line is more likely to produce an extra-base hit. My analysis treats all the batted balls the same way.
Batted balls should be weighted according to the likelihood of them becoming a single, a double and so on. Anyway such a weighting would improve a theoretical defense alignment without considering the game situation: sometimes you have to concede the single and guard the lines, in other situations a hit is all the opponent needs to win the game so you don't care about giving up a single or a double (obviously, in such a case, you wouldn't dream of leaving the left side of the infield unguarded).
On the same subject I'd like to reiterate that we have no available information on the speed of the batted balls: even if the third basemen had as much range as the shortstops, we should probably expect them to cover less ground, because balls get at them more quickly. I think that is just a matter of having the data, because cluster analysis can cope with more than two dimensions.

3. Not all hitters are created equal. A second baseman throwing from short right can easily beat the big men running to first; the specific batter I considered can move faster than Big Papi, Howard & co.
Utley's speed also makes him a bigger threat on the basepaths, should he decide to lay down a bunt to the left side.

4. Not all fielders are created equal. Defensemen have different ranges, some move better on one side, some can't run backward, and so on. Obviously they should choose their position according to their strenghts and weaknesses too.

With my first couple of posts I intended to scratch a surface, and I'm really happy that people read them with interest. I never thought that a team can look at the graphs I posted and place his fielders in the very same spots. I'm not even sure that a team can learn anything new from such charts: a few readers of my posts weren't aware about the pull tendency of Chase Utley, but the Rays were playing him shifted four months ago (actually it was their weird alignment that made me delve on the issue).

Anyway, a quick look at one of those chart, won't hurt.

A few side notes: to draw the basepaths on my charts I used the coordinates suggested by Adler in Baseball Hacks; though they are pretty close, Mike Fast provided more accurate values on Tango's blog, and Peter Jensen is expected to publish normalized values at THT, since the coordinate system varies from park to park.

Monday, February 23, 2009

Refining the shift

I worked some more on the subject of last post.
First of all, as I said at the end of that post, I rerun the cluster analysis after having removed the short grounders from my data set.
Here's the resulting chart.

No more fielders behind the rubber... I like it.

Then I tried to do things separately for infielders and outfielders. In this way I have more confidence in constraining the clusters to having equal sizes. This makes sense especially for the outfielders.

Here is a first chart for infielders: I run a model for four players.

I'm sorry for the questionable choice of colors, but I haven't fully grasped how the customizations of plots works for the package I'm using (Mclust for R, for those interested).

And here are the outfielders (three for the moment).

Then I tried to move one player from the infield to the outfield. Following are the charts for three infielders and four outfielders, respectively.

Summing up.
Playing Utley shifted is the right thing to do. You don't need anybody playing near the third base bag. Putting a fielder in short right makes sense too: the statistical analysis puts him there to catch short flies, but (as managers who employ the shift know) he is mainly valuable for handling the grounders not collected by the infielders on that side.
When data on batted ball velocity are available, a new dimension will be added to the cluster analysis: balls along the lines that get quicker to the infielders would be treated appropriately, thus producing clusters of different sizes (in the spatial dimensions considered here) for third basemen and first basemen.

Sunday, February 22, 2009

Chase-ing the FieldF/x

In the first inning of the last World Series the Rays fielders played Chase Utley extremely shifted, the way teams defend against guys like Big Papi, Giambi, or Howard. Chase tried to get advantage of the alignment by laying down a bunt, but it rolled out of the third base line; then he decided to ignore the opponents placement and hit a ball that no shift can take care of.

I hadn't watched many Phillies games during the regular season (living in Europe I'm usually exposed to the Cubbies... unless my boss accepts me sleeping on my desk in the morning), so I didn't know whether it is common place for teams to defend shifted against Utley. From what I heard during the telecast, Rays alignment was pretty unusual for that guy.

Using data both from Gameday and Retrosheet, I tried to figure an optimal positioning against Chase.
Here's what I've done.

First I plotted the locations of every Utley's batted ball, using coordinates from Gameday.

I plotted grounders in red. All the GBs you see in the outfield have gone through the infield and - theoretically - could have been caught by a well positioned infielder (or one that happened to be where the ball was hit). That's why people working at Project Scoresheet used to record, for groundball hits, the place where the ball left the infield, instead of that where the ball was collected in the outfield (you can see this good habit in the Retrosheet files of the '80s). As we see that's not the case with Gameday stringers.

I corrected for this. I dusted off a couple of geometry books and projected every groundball that went to the outfield to the place it left the infield.

Here's the new graph.

Finally I added some random noise because the groundball hits (those that went through) were plotted one over the other and wasn't easy to distinguish (or count) them.
Here's something easier to read.

"You can observe a lot just by watching". In this case, just watching you can see where batted balls tend to cluster.

Cluster analysis is a statistical tool that can help our eyes in this task.
I hoped that, with the help of cluster analysis, I could more or less identify where to place seven fielders against Chase Utley (pitcher and catcher are not so free to move around the field).

I must admit I didn't expect to be so lucky, and I don't expect to be when I perform similar analyses on other batters.
It turned out that I forgot to put a costraint on the number of clusters I wanted to find, but the algorithm I used chose exactly seven as the optimal result. Should that happen to every following analysis I do, that would be a scientific demonstration that nine players on the field is the perfect number for baseball.

Here's a chart showing the defensive alignment cluster analysis suggests for Utley. You put a player in the middle of each circle and he's responsible of every ball in that circle (or in the neighborhood, just look at the colors).

Weren't for that third baseman (I suppose) just behind the pitcher mound, the result would have looked too good to be true - I would have understood every one accusing me of making up the data.

It looks like the Rays alignment made sense. I'm curious about the midway (between infield and outfield) position where the second baseman usually plays in this kind of shift: the clustering algorithm places a player there to take care of short flies, while managers put the man there to tackle hard grounders too.

This analysis doesn't take into account the range a player can realistically cover. This causes the player behind the pitcher mound having to take care of all the infield grass; besides you have to believe that, however well positioned, you can't count on your pair on the right scooping every groundball.

I didn't expect running a simple cluster analysis (well, that's not so simple, to be honest) and finding results that make so much baseball sense.

Maybe if I remove from the data set the very short grounders (those that become infield hits or that are catcher/pitcher responsibility), I can even get that third baseman off the pitching mound!

Friday, February 20, 2009

Play ball!

A few of the guys I spent some time with at the Pitchf/x Summit last May in San Francisco, Prof Alan Nathan among others, spurred me to write articles in English.
It's been a very long time since then, but now I'm willing to give it a try.
Stay tuned.