Thursday, March 26, 2009

Another one bites the clutch

A lot of words have been spent on clutch hitting and by many people smarter than me.
Anyway I would like to share some work I did a few years ago on my Italian website. That work has gone through some imporant refinement before giving birth to this post.

At the time I wrote the Italian article, the wonderful FanGraphs didn't have the Clutch stat yet, thus I was looking for some way to measure if (and possibly how much) a particular season by a particular player was clutch.
(Those were the times when A-Rod won the MVP award and many Sox fans - and Yankee haters - cried that Big Papi was more deserving due to his clutch performance).

The following was my line of thinking.

1. Using average Run Values for batting events, I can assign a value to each plate appearence of a particular player. This value measures the outcome of the at bat without taking into account the importance (leverage... more on that in few moments) of the situation.

2. And I have Leverage, that is a number that measures the importance of the moment without looking at the outcome of the AB.

Suppose there's a player who is the clutchiest guy on the face of earth. He may have whatever batting line, but he will get the greatest production (HRs) in the highest leverage situations and the least production (Ks) in the lowest leverage situations.
Take all the at bats of this player, put the run values in a spreadsheet column, then put the corresponding leverage index in the next column.
Sort your spreadsheet for the run value column; now sort for the leverage column. Nothing changes for this particular guy.

Obviously such a player doesn't exsist, but I can use correlation between run values and leverage index to look at a player clutchiness.
If a player has a correlation of zero, then he achieved his best and worst results randomly across the leverage spectrum; the closer to one the more he has been clutch, the closer to minus one the more he has been a choker.
But how closer to 1 (or -1) can a player go in a season (or a career)?

This leads to the first enhancement I have made over my original article. Since at bat run values can only assume a limited set of values, correlation between RV and LI can not be exactly one even for the hypothetical super clutch player. Thus, when I show confidence intervals for the correlation, they have been calculated using bootstraping techniques.
The other improvement has to do with the way Leverage is calculated. At that time I used something like Keith Woolner's formula (Baseball Prospectus 2005), now it's something more like Tango's and Elan Fuld's (he uses the acronym PIOGO for his index, and his work can be found here).

Note: I realize that Tango's numbers for LI are now widely used and I would like to perform my analyses using his values, but I had a table with my own values that was already formatted as I needed. I checked some of my values with the corresponding ones on Tango's website and I didn't find big discrepancies (save for the fact that he chooses to put LI = 1 at the average leveraged situation, while I put LI = 1 at the beginning of the game).

There's a third refinement, that occurred when I was ready to post (thus forcing me to redo all you read from now on).
Run Values are centered at 0 and are additive: a natural scale is the right thing to use.
Leverage Index is a ratio, centered at one; a LI of 0.5 and a LI of 2 are conceptually equidistant from a LI of 1: we need a logarithmic transformation to get the right behaviour from the numbers (e.g. log(1)=0; log(0.5)=-0.69; log(2)=0.69).

Well, let's look at some players.
Fangraphs has Pujols in 2006 as one of the best recent seasons for clutch hitting performance (Clutch = 3.25). Correlation between his RVs and the LI at which he compiled them is 0.05, with a 95% confidence interval of -0.02 - 0.13.
Such a "clutch" season might have been occurred by chance, since the confidence interval contains zero.

Here's Big Papi in 2005, the clutch season par antonomasia (FanGraphs has it at a 3.31 Clutch score).
Correlation: 0.02 (95% confidence interval: -0.06 - 0.09).
We can make a couple of observations on these numbers. Even a very clutch season likely won't result in a correlation significantly different from zero (i.e. the 95% confidence interval doesn't have the value zero in between), and should we find one we'd have to look at - at least - the 2nd decimal value.

And now, A-Rod (A-Choke?) 2008 (-3.16 by FanGraphs metric).
Correlation: -0.09 (95% confidence interval: -0.18 - 0.004).

It seems that it's really difficult to get a value significantly different from zero. Here we must note one important thing: to get a high value of correlation, a player should overperform in high leverage situations AND underperform in low leverage situations. Thus players like Pujols or Ortiz who always perform near the excellence level can't possibly overachieve in high leverage situations - I believe that a correlation significantly different from zero (albeit we have to look at the second or third decimal point) on the positive side is noteworthy nonetheless; it's no surprise that we have found a significant (though minuscule) effect on the negative side for a very good hitter like Rodriguez.

In Weaver on Strategy, the Earl of Baltimore talked about Eddie Murray as a guy who always produced when the game was on the line, but tanked a bit in lopsided games. Let's have a look at his run value / leverage index correlation values.

Year - Correlation (95% Confidence Interval)
1977 - 0.004 (-0.09 - 0.07)
1978 - 0.06 (-0.003 - 0.12)
1979 - 0.02 (-0.06 - 0.08)
1980 - 0.04 (-0.03 - 0.11)
1981 - -0.07 (-0.19 - 0.01)
1982 - 0.01 (-0.09 - 0.09)
1983 - -0.05 (-0.14 - 0.02)
1984 - 0.07 (0.01 - 0.14)
1985 - 0.10 (0.03 - 0.16)

1986 - 0.01 (-0.08 - 0.09)

1987 - 0 (-0.08 - 0.06)

1988 - 0 (-0.08 - 0.07)

1989 - 0.01 (-0.07 - 0.08)

1990 - 0.08 (0.01 - 0.14)

1991 - 0.06 (-0.01 - 0.13)

1992 - 0.01 (-0.08 - 0.10)

1993 - -0.02 (-0.10 - 0.04)

1994 - 0.04 (-0.06 - 0.12)

1995 - -0.03 (-0.13 - 0.05)

1996 - -0.003 (-0.08 - 0.06)

1997 - 0.05 (-0.07 - 0.16)

There are a few seasons in which Eddie beat the zero, only one of them under Weaver (1985 - and the book came out one year earlier).
I reapeat. When we find an effect, it is very small (so small we can doubt it really is an effect). Anyway, this player that was perceived to be clutch by one of his managers never had a choke season (defined as one for which the entire 95% CI is under zero).

The next natural step is to look at correlation values for a career.
Since I reported every season for Eddie Murray, here's his career clutch line:
Correlation: 0.02 (95% confidence interval: 0 - 0.03).

Here are the other players I mentioned so far.
Albert Pujols: Correlation: 0.02 (95% confidence interval: -0.01 - 0.04).
David Ortiz: Correlation: 0.03 (95% confidence interval: 0.004 - 0.05).
Alex Rodriguez: Correlation: -0.002 (95% confidence interval: -0.02 - 0.02).

Since we are looking at a very small effect, it's easier to find something statistically significant (if there is any effect) when looking at a career - i.e. at more observations.

I'm tempted to write a few more lines of code in order to calculate correlation values with confidence intervals for every player in the Retrosheet era (for every single season, and for entire careers). Maybe I will do that. Would it be useful, and for what purpose? Sure you can't say that Ortiz's career values are better than Murray's: the respective confidence intervals overlap a lot; thus it won't be helpful to single out the clutchiest player ever.
I'm curious to see if there are any players that beat the zero correlation year in and year out (I doubt) or players that both shows clutch and choke seasons.


  1. If you're selecting the top clutch seasons using one metric from the thousands of player-seasons out there, isn't it almost guaranteed that you'll show clutch using another (presumably highly correlated) metric with 95% confidence? Obviously not, since you didn't, so my real question is if you corrected for this and, if so, how.

  2. I'm not sure I have understood your question.
    Are you saying that running multiple correlations I can get significance by chance?
    If this is your point, I haven't corrected for this, but I have looked at only a few (though selected) seasons.
    If I go on and run correlations for every player/season, I would definitely check if the number of significant occurrences exceeds the 5% value.

  3. Yes, I think you've got it. Basically, I'm saying if clutch doesn't exist and we've got 2000 player-seasons, we'd expect 100 of them to show significance at the 5% level. If we select the 2 seasons that show the highest amount of clutch using 1 statistic (that is, significant at about the 2/2000= 0.1% level using that statistic given randomness), shouldn't we expect they'd be significant at the 5% level using a different clutch statistic? If fangraphs's clutch and your clutch correlation are measuring roughly the same thing, an extreme outlier in one should be at least a moderate outlier in the other, no?

  4. I chose a few extreme seasons by FanGraphs metrics to look at what amount of correlation we can find in a "very clutch" season... and the answer seems to be very little.
    I don't get how you say those 2 season to be 0.1% significant using FG's stat. They are two of the most extreme, but nothing says that those two are significantly different from every other season.