Here is my latest effort at THT.
Master of Fooling featuring Johan Santana and his feared change-up... and a few nice 3D charts.
Showing posts with label pitch selection. Show all posts
Showing posts with label pitch selection. Show all posts
Saturday, July 4, 2009
Friday, June 19, 2009
Friday, March 6, 2009
Predictability - A baseline for future analyses
Here are the most and least predictable pitchers... in theory.
What do I mean?
I set the conditions in my previous posts (I - II).
Every MLB pitchers I considered has his repertoire and his mix combination.
The Minimum Level of Predictability of a pitcher is the percentage of correct guesses that can be made on his selection... if he chooses his pitches without being at all influenced by the situation (opponent, score, outs, inning, men on, etc.)
What do I mean?
I set the conditions in my previous posts (I - II).
Every MLB pitchers I considered has his repertoire and his mix combination.
The Minimum Level of Predictability of a pitcher is the percentage of correct guesses that can be made on his selection... if he chooses his pitches without being at all influenced by the situation (opponent, score, outs, inning, men on, etc.)
High Theoretical Predictability
I showed the top-10 and bottom-10 because lot of people like to see ranking tables, but I think in this case they don't make a lot of sense. Anyway a quick look at them can't be harmful.
First of all, my dataset has Wakefield throwing nearly 100% knuckleballs (thus yelding the highest predictability value); I don't know if either I did something wrong in importing pitchf/x data or MLBAM classification algorithm marks every offering by Tim as a knuckler - anyway Fangraphs (BIS data) has his split at 81% KN, 13% FB, 6% CB, and ESPN Insider at 82% KN, 13% FB, 5% CB (using these sources I would get 67.7% and 69.2% predictability value, respectively).
Wakefield leads to another point: knuckleballs have a very high degree of variability, so knowing that the pitch coming at you is called knuckleball, won't tell you anything about the path the ball will travel.
You can make a similar case for Mariano's cutter (Mo's out of the top ten list): it's not really a single pitch, because he varies the position of his fingers to obtain different cuts.
Analyzing how much Wakefield or Rivera are predictable on their selection is pretty useless, as it is for guys like Ballfour who throw (nearly) nothing but fastballs: hitters already know what is gonna come at them - Ballfour, Wakefield and Mo don't live on surprising opponents.
Further analyses on this subject can be more useful if done on pitchers with, say, three very distinguishable pitches (e.g. Fastball, Slider, Changeup) none thrown more than 3/4 of the time, and that's what I'll try to do in the future.
Now you might ask what's all this effort in getting theoretical predictabilities for.
I'd like to look at individual pitchers' predictabilities using advanced statistical techniques, such as multinomial logistic regression: I would like to build models to help me predict what a pitcher will throw on a specifical situation (i.e. against a right hander, late in a close game, with a runner on first, nobody out and following a slider); but I needed to set a minimum usefulness for the models... if I build model performs worse than the Minimum Level of Predictability, then it is useless.
On the other side, if a model for a pitcher significantly outperforms the Minimum Level of Predictability, then maybe the pitcher is falling into patterns, and this fact could be exploited by the opposition.
As I said, I'll tackle the models in the future.
pitcher | MLP | repertoire |
Tim Wakefield | 99.7% | KN-FB-CB |
Grant Balfour | 86.2% | FB-SL-CH-CB |
Jonathan Papelbon | 78.7% | FB-SL-CH-SF |
Neal Cotts | 76.7% | FB-SL-CH-CB |
Matt Thornton | 76.4% | FB-SL-CT-CH |
Brad Ziegler | 75.2% | FB-CB-SL-CH |
Dennis Sarfate | 69.7% | FB-CB-SL |
David Riske | 69.5% | FB-CH-SL |
Joe Beimel | 69.4% | FB-CB-CH-SL |
Daniel Cabrera | 68.9% | FB-SL-CH-CB |
Low Theoretical Predictability
pitcher | MLP | repertoire |
Andy Sonnanstine | 24.1% | FB-CT-CB-SL-CH |
R.A. Dickey | 24.2% | KN-FB-CB-CH-SL-SF |
Jorge Campillo | 26.9% | FB-SL-CH-CB-CT |
Shaun Marcum | 27.0% | FB-SL-CH-CB-CT-SF |
Bronson Arroyo | 27.2% | FB-CB-CH-SL-CT |
Carlos Villanueva | 27.2% | FB-SL-CH-CB |
Mike Mussina | 28.1% | FB-CB-SL-CH-SF |
Doug Davis | 28.7% | CT-FB-CB-CH-SL |
Lance Cormier | 29.2% | CB-SL-FB-CT-CH |
Jered Weaver | 29.7% | FB-CH-SL-CT-CB |
Some observations
The pitch repertoire is what comes out from MLBAM Gameday (not necessarily correct)
The pitch repertoire is what comes out from MLBAM Gameday (not necessarily correct)
I showed the top-10 and bottom-10 because lot of people like to see ranking tables, but I think in this case they don't make a lot of sense. Anyway a quick look at them can't be harmful.
First of all, my dataset has Wakefield throwing nearly 100% knuckleballs (thus yelding the highest predictability value); I don't know if either I did something wrong in importing pitchf/x data or MLBAM classification algorithm marks every offering by Tim as a knuckler - anyway Fangraphs (BIS data) has his split at 81% KN, 13% FB, 6% CB, and ESPN Insider at 82% KN, 13% FB, 5% CB (using these sources I would get 67.7% and 69.2% predictability value, respectively).
Wakefield leads to another point: knuckleballs have a very high degree of variability, so knowing that the pitch coming at you is called knuckleball, won't tell you anything about the path the ball will travel.
You can make a similar case for Mariano's cutter (Mo's out of the top ten list): it's not really a single pitch, because he varies the position of his fingers to obtain different cuts.
Analyzing how much Wakefield or Rivera are predictable on their selection is pretty useless, as it is for guys like Ballfour who throw (nearly) nothing but fastballs: hitters already know what is gonna come at them - Ballfour, Wakefield and Mo don't live on surprising opponents.
Further analyses on this subject can be more useful if done on pitchers with, say, three very distinguishable pitches (e.g. Fastball, Slider, Changeup) none thrown more than 3/4 of the time, and that's what I'll try to do in the future.
Now you might ask what's all this effort in getting theoretical predictabilities for.
I'd like to look at individual pitchers' predictabilities using advanced statistical techniques, such as multinomial logistic regression: I would like to build models to help me predict what a pitcher will throw on a specifical situation (i.e. against a right hander, late in a close game, with a runner on first, nobody out and following a slider); but I needed to set a minimum usefulness for the models... if I build model performs worse than the Minimum Level of Predictability, then it is useless.
On the other side, if a model for a pitcher significantly outperforms the Minimum Level of Predictability, then maybe the pitcher is falling into patterns, and this fact could be exploited by the opposition.
As I said, I'll tackle the models in the future.
Tuesday, March 3, 2009
Predictability (Play Ball!)
I found the answer to the predictability question in last post with basic probability.
In such a hypothetical situation, what the pitcher is going to throw and what the hitter thinks is coming at him are independent events.
Thus we have four scenarios (the following example is for Pitcher B - the one who throws 90% Fastball, 10% slider).
The probability of a correct guess is the sum of the probabilities for first and last scenarios, i.e. 0.82.
You can be more general and calculate the Minimum Level of Predictability given any number of pitch types a pitcher has in his toolbox, any way he mixes them, and the information the hitter has on him (that might not be accurate).
It's as simple as:
Prob pitcher throws fastball * Prob hitter looks for fastball
+ Prob pitcher throws slider * Prob hitter looks for slider
+ Prob pitcher throws curve * Prob hitter looks for curve
+ and so on...
Obviously the Minimum Level of Predictability is lower for pitchers who
Here are some combinations coming out of my mind as examples:
I think it's too much for theory (well, not so fast... look at the note below). Next time I'll write some real pitcher names.
Note
If you think that Pitcher P is never going to change his mixing (90% FB, 10% SL), the guessing hitter will have more success if he always guesses Fastball (he'll be correct 90% of the times instead of 82%).
Obviously, in such a case, after some time the pitcher will stop throwing fastballs to that hitter at all; then the hitter will adjust and look only for sliders. After some back and forth adjustments the couple will reach an equilibrium. Ideally that would be at 50-50, since it's the most unpredictable combo; but the pitcher, as we said in our example, might not have equal confidence in both pitches and/or one pitch might be more stressful for his body; thus he will reach a different equilibrium (90-10 in our example).
In such a hypothetical situation, what the pitcher is going to throw and what the hitter thinks is coming at him are independent events.
Thus we have four scenarios (the following example is for Pitcher B - the one who throws 90% Fastball, 10% slider).
- Pitcher B is going to throw a fastball (probability = 90%, or P(PF) = .9) and Hitter C is looking for a fastball (P(HF) = .9): the probability of this scenario is P(PF) * P(HF) = .9 * .9 = .81;
- Pitcher B is going to throw a fastball (P(PF) = .9) and Hitter C is looking for a slider (P(HS) = .1): the probability of this scenario is P(PF) * P(HS) = .9 * .1 = .09;
- Pitcher B is going to throw a slider (P(PS) = .1) and Hitter C is looking for a fastball (P(HF) = .9): the probability of this scenario is P(PS) * P(HF) = .1 * .9 = .09;
- Pitcher B is going to throw a slider (P(PS) = .1) and Hitter C is looking for a slider (P(HS) = .1): the probability of this scenario is P(PS) * P(HS) = .1 * .1 = .01;
The probability of a correct guess is the sum of the probabilities for first and last scenarios, i.e. 0.82.
You can be more general and calculate the Minimum Level of Predictability given any number of pitch types a pitcher has in his toolbox, any way he mixes them, and the information the hitter has on him (that might not be accurate).
It's as simple as:
Prob pitcher throws fastball * Prob hitter looks for fastball
+ Prob pitcher throws slider * Prob hitter looks for slider
+ Prob pitcher throws curve * Prob hitter looks for curve
+ and so on...
Obviously the Minimum Level of Predictability is lower for pitchers who
- have more pitches in their repertoire
- and use them in equal proportions.
Here are some combinations coming out of my mind as examples:
# pitches in repertoire | selection percentages | MLP |
2 | 50-50 | 50% |
2 | 90-10 | 82% |
3 | 90-5-5 | 81.5% |
3 | 50-30-20 | 38% |
3 | 34-33-33 | 33% |
4 | 50-30-15-5 | 36.5% |
4 | 90-5-3-2 | 81.4% |
4 | 25-25-25-25 | 25% |
I think it's too much for theory (well, not so fast... look at the note below). Next time I'll write some real pitcher names.
Note
If you think that Pitcher P is never going to change his mixing (90% FB, 10% SL), the guessing hitter will have more success if he always guesses Fastball (he'll be correct 90% of the times instead of 82%).
Obviously, in such a case, after some time the pitcher will stop throwing fastballs to that hitter at all; then the hitter will adjust and look only for sliders. After some back and forth adjustments the couple will reach an equilibrium. Ideally that would be at 50-50, since it's the most unpredictable combo; but the pitcher, as we said in our example, might not have equal confidence in both pitches and/or one pitch might be more stressful for his body; thus he will reach a different equilibrium (90-10 in our example).
Monday, March 2, 2009
Predictability (cerimonial first pitch)
Can we measure how much a pitcher is predictable in choosing the type of pitch he will deliver?
Today I will introduce the Minimum Level of Predictability.
Let's say Pitcher A has two pitches in his arsenal, e.g a fastball and a slider. He is so confident on both pitches that he delivers fastball 50% of the time and slider the other 50%. Suppose also he is perfectly random in his selection: no matter the count, the on base situation, the score, the hitter, the last pitch he has thrown - every delivery is just like a coin toss.
Pitcher B has a fastball and a slider too, but he chooses the hard one 90% of the time. Though his predilection toward the fastball is huge, he doesn't care the situation either: whatever is happening (or has just happened) on the field, his selection will be absolutely random (though heavily unbalanced).
Now we have Hitter C, who is an extreme guess hitter. Every time he's going to face a pitch he spins a roulette in his head to decide what pitch to look for. He is not entirely devote to Lady Luck: he talks a lot with the advanced scouts and takes notes.
The scouts have told him that Pitcher A throws fastball-slider in a 50-50 proportion and Pitcher B throws the same combination but with a 90-10 ratio.
He sets up his mental roulette accordingly.
How many times will he correctly guess offerings from Pitcher A and from Pitcher B?
Short answer: 50% correct against Pitcher A, 82% correct against Pitcher B.
Long answer: ...will see you tomorrow night!
Today I will introduce the Minimum Level of Predictability.
Let's say Pitcher A has two pitches in his arsenal, e.g a fastball and a slider. He is so confident on both pitches that he delivers fastball 50% of the time and slider the other 50%. Suppose also he is perfectly random in his selection: no matter the count, the on base situation, the score, the hitter, the last pitch he has thrown - every delivery is just like a coin toss.
Pitcher B has a fastball and a slider too, but he chooses the hard one 90% of the time. Though his predilection toward the fastball is huge, he doesn't care the situation either: whatever is happening (or has just happened) on the field, his selection will be absolutely random (though heavily unbalanced).
Now we have Hitter C, who is an extreme guess hitter. Every time he's going to face a pitch he spins a roulette in his head to decide what pitch to look for. He is not entirely devote to Lady Luck: he talks a lot with the advanced scouts and takes notes.
The scouts have told him that Pitcher A throws fastball-slider in a 50-50 proportion and Pitcher B throws the same combination but with a 90-10 ratio.
He sets up his mental roulette accordingly.
How many times will he correctly guess offerings from Pitcher A and from Pitcher B?
Short answer: 50% correct against Pitcher A, 82% correct against Pitcher B.
Long answer: ...will see you tomorrow night!
Subscribe to:
Posts (Atom)