Prof Pepper's Assistant: Predictability - A baseline for future analyses

Here are the most and least predictable pitchers... in theory.
What do I mean?
I set the conditions in my previous posts (I - II).
Every MLB pitchers I considered has his repertoire and his mix combination.

The Minimum Level of Predictability of a pitcher is the percentage of correct guesses that can be made on his selection... if he chooses his pitches without being at all influenced by the situation (opponent, score, outs, inning, men on, etc.)

High Theoretical Predictability

pitcher	MLP	repertoire
Tim Wakefield	99.7%	KN-FB-CB
Grant Balfour	86.2%	FB-SL-CH-CB
Jonathan Papelbon	78.7%	FB-SL-CH-SF
Neal Cotts	76.7%	FB-SL-CH-CB
Matt Thornton	76.4%	FB-SL-CT-CH
Brad Ziegler	75.2%	FB-CB-SL-CH
Dennis Sarfate	69.7%	FB-CB-SL
David Riske	69.5%	FB-CH-SL
Joe Beimel	69.4%	FB-CB-CH-SL
Daniel Cabrera	68.9%	FB-SL-CH-CB

Low Theoretical Predictability

pitcher	MLP	repertoire
Andy Sonnanstine	24.1%	FB-CT-CB-SL-CH
R.A. Dickey	24.2%	KN-FB-CB-CH-SL-SF
Jorge Campillo	26.9%	FB-SL-CH-CB-CT
Shaun Marcum	27.0%	FB-SL-CH-CB-CT-SF
Bronson Arroyo	27.2%	FB-CB-CH-SL-CT
Carlos Villanueva	27.2%	FB-SL-CH-CB
Mike Mussina	28.1%	FB-CB-SL-CH-SF
Doug Davis	28.7%	CT-FB-CB-CH-SL
Lance Cormier	29.2%	CB-SL-FB-CT-CH
Jered Weaver	29.7%	FB-CH-SL-CT-CB

Some observations
The pitch repertoire is what comes out from MLBAM Gameday (not necessarily correct)

I showed the top-10 and bottom-10 because lot of people like to see ranking tables, but I think in this case they don't make a lot of sense. Anyway a quick look at them can't be harmful.

First of all, my dataset has Wakefield throwing nearly 100% knuckleballs (thus yelding the highest predictability value); I don't know if either I did something wrong in importing pitchf/x data or MLBAM classification algorithm marks every offering by Tim as a knuckler - anyway Fangraphs (BIS data) has his split at 81% KN, 13% FB, 6% CB, and ESPN Insider at 82% KN, 13% FB, 5% CB (using these sources I would get 67.7% and 69.2% predictability value, respectively).

Wakefield leads to another point: knuckleballs have a very high degree of variability, so knowing that the pitch coming at you is called knuckleball, won't tell you anything about the path the ball will travel.

You can make a similar case for Mariano's cutter (Mo's out of the top ten list): it's not really a single pitch, because he varies the position of his fingers to obtain different cuts.

Analyzing how much Wakefield or Rivera are predictable on their selection is pretty useless, as it is for guys like Ballfour who throw (nearly) nothing but fastballs: hitters already know what is gonna come at them - Ballfour, Wakefield and Mo don't live on surprising opponents.

Further analyses on this subject can be more useful if done on pitchers with, say, three very distinguishable pitches (e.g. Fastball, Slider, Changeup) none thrown more than 3/4 of the time, and that's what I'll try to do in the future.

Now you might ask what's all this effort in getting theoretical predictabilities for.

I'd like to look at individual pitchers' predictabilities using advanced statistical techniques, such as multinomial logistic regression: I would like to build models to help me predict what a pitcher will throw on a specifical situation (i.e. against a right hander, late in a close game, with a runner on first, nobody out and following a slider); but I needed to set a minimum usefulness for the models... if I build model performs worse than the Minimum Level of Predictability, then it is useless.
On the other side, if a model for a pitcher significantly outperforms the Minimum Level of Predictability, then maybe the pitcher is falling into patterns, and this fact could be exploited by the opposition.

As I said, I'll tackle the models in the future.

1 comment:

AnonymousMarch 9, 2009 at 8:19 PM
Good work again Max.

I'm guessing that the more specific you get the lower most probabilities are likely to get. It will be interesting to see what the effect of predictability has on how a pitcher performs. I imagine sample sizes could be an issue here though.

Prof Pepper's Assistant

Friday, March 6, 2009

Predictability - A baseline for future analyses

1 comment:

Contact me

Blog Archive

Prof Pepper's Assistant

Friday, March 6, 2009

Predictability - A baseline for future analyses

1 comment:

Contact me

Subscribe To

Blog Archive