What do I mean?
I set the conditions in my previous posts (I - II).
Every MLB pitchers I considered has his repertoire and his mix combination.
The Minimum Level of Predictability of a pitcher is the percentage of correct guesses that can be made on his selection... if he chooses his pitches without being at all influenced by the situation (opponent, score, outs, inning, men on, etc.)
High Theoretical Predictability
I showed the top-10 and bottom-10 because lot of people like to see ranking tables, but I think in this case they don't make a lot of sense. Anyway a quick look at them can't be harmful.
First of all, my dataset has Wakefield throwing nearly 100% knuckleballs (thus yelding the highest predictability value); I don't know if either I did something wrong in importing pitchf/x data or MLBAM classification algorithm marks every offering by Tim as a knuckler - anyway Fangraphs (BIS data) has his split at 81% KN, 13% FB, 6% CB, and ESPN Insider at 82% KN, 13% FB, 5% CB (using these sources I would get 67.7% and 69.2% predictability value, respectively).
Wakefield leads to another point: knuckleballs have a very high degree of variability, so knowing that the pitch coming at you is called knuckleball, won't tell you anything about the path the ball will travel.
You can make a similar case for Mariano's cutter (Mo's out of the top ten list): it's not really a single pitch, because he varies the position of his fingers to obtain different cuts.
Analyzing how much Wakefield or Rivera are predictable on their selection is pretty useless, as it is for guys like Ballfour who throw (nearly) nothing but fastballs: hitters already know what is gonna come at them - Ballfour, Wakefield and Mo don't live on surprising opponents.
Further analyses on this subject can be more useful if done on pitchers with, say, three very distinguishable pitches (e.g. Fastball, Slider, Changeup) none thrown more than 3/4 of the time, and that's what I'll try to do in the future.
Now you might ask what's all this effort in getting theoretical predictabilities for.
I'd like to look at individual pitchers' predictabilities using advanced statistical techniques, such as multinomial logistic regression: I would like to build models to help me predict what a pitcher will throw on a specifical situation (i.e. against a right hander, late in a close game, with a runner on first, nobody out and following a slider); but I needed to set a minimum usefulness for the models... if I build model performs worse than the Minimum Level of Predictability, then it is useless.
On the other side, if a model for a pitcher significantly outperforms the Minimum Level of Predictability, then maybe the pitcher is falling into patterns, and this fact could be exploited by the opposition.
As I said, I'll tackle the models in the future.
pitcher | MLP | repertoire |
Tim Wakefield | 99.7% | KN-FB-CB |
Grant Balfour | 86.2% | FB-SL-CH-CB |
Jonathan Papelbon | 78.7% | FB-SL-CH-SF |
Neal Cotts | 76.7% | FB-SL-CH-CB |
Matt Thornton | 76.4% | FB-SL-CT-CH |
Brad Ziegler | 75.2% | FB-CB-SL-CH |
Dennis Sarfate | 69.7% | FB-CB-SL |
David Riske | 69.5% | FB-CH-SL |
Joe Beimel | 69.4% | FB-CB-CH-SL |
Daniel Cabrera | 68.9% | FB-SL-CH-CB |
Low Theoretical Predictability
pitcher | MLP | repertoire |
Andy Sonnanstine | 24.1% | FB-CT-CB-SL-CH |
R.A. Dickey | 24.2% | KN-FB-CB-CH-SL-SF |
Jorge Campillo | 26.9% | FB-SL-CH-CB-CT |
Shaun Marcum | 27.0% | FB-SL-CH-CB-CT-SF |
Bronson Arroyo | 27.2% | FB-CB-CH-SL-CT |
Carlos Villanueva | 27.2% | FB-SL-CH-CB |
Mike Mussina | 28.1% | FB-CB-SL-CH-SF |
Doug Davis | 28.7% | CT-FB-CB-CH-SL |
Lance Cormier | 29.2% | CB-SL-FB-CT-CH |
Jered Weaver | 29.7% | FB-CH-SL-CT-CB |
Some observations
The pitch repertoire is what comes out from MLBAM Gameday (not necessarily correct)
The pitch repertoire is what comes out from MLBAM Gameday (not necessarily correct)
I showed the top-10 and bottom-10 because lot of people like to see ranking tables, but I think in this case they don't make a lot of sense. Anyway a quick look at them can't be harmful.
First of all, my dataset has Wakefield throwing nearly 100% knuckleballs (thus yelding the highest predictability value); I don't know if either I did something wrong in importing pitchf/x data or MLBAM classification algorithm marks every offering by Tim as a knuckler - anyway Fangraphs (BIS data) has his split at 81% KN, 13% FB, 6% CB, and ESPN Insider at 82% KN, 13% FB, 5% CB (using these sources I would get 67.7% and 69.2% predictability value, respectively).
Wakefield leads to another point: knuckleballs have a very high degree of variability, so knowing that the pitch coming at you is called knuckleball, won't tell you anything about the path the ball will travel.
You can make a similar case for Mariano's cutter (Mo's out of the top ten list): it's not really a single pitch, because he varies the position of his fingers to obtain different cuts.
Analyzing how much Wakefield or Rivera are predictable on their selection is pretty useless, as it is for guys like Ballfour who throw (nearly) nothing but fastballs: hitters already know what is gonna come at them - Ballfour, Wakefield and Mo don't live on surprising opponents.
Further analyses on this subject can be more useful if done on pitchers with, say, three very distinguishable pitches (e.g. Fastball, Slider, Changeup) none thrown more than 3/4 of the time, and that's what I'll try to do in the future.
Now you might ask what's all this effort in getting theoretical predictabilities for.
I'd like to look at individual pitchers' predictabilities using advanced statistical techniques, such as multinomial logistic regression: I would like to build models to help me predict what a pitcher will throw on a specifical situation (i.e. against a right hander, late in a close game, with a runner on first, nobody out and following a slider); but I needed to set a minimum usefulness for the models... if I build model performs worse than the Minimum Level of Predictability, then it is useless.
On the other side, if a model for a pitcher significantly outperforms the Minimum Level of Predictability, then maybe the pitcher is falling into patterns, and this fact could be exploited by the opposition.
As I said, I'll tackle the models in the future.
Good work again Max.
ReplyDeleteI'm guessing that the more specific you get the lower most probabilities are likely to get. It will be interesting to see what the effect of predictability has on how a pitcher performs. I imagine sample sizes could be an issue here though.