Friday, March 6, 2009

Predictability - A baseline for future analyses

Here are the most and least predictable pitchers... in theory.
What do I mean?
I set the conditions in my previous posts (I - II).
Every MLB pitchers I considered has his repertoire and his mix combination.

The Minimum Level of Predictability of a pitcher is the percentage of correct guesses that can be made on his selection... if he chooses his pitches without being at all influenced by the situation (opponent, score, outs, inning, men on, etc.)

High Theoretical Predictability
pitcher MLP repertoire
Tim Wakefield 99.7% KN-FB-CB
Grant Balfour 86.2% FB-SL-CH-CB
Jonathan Papelbon 78.7% FB-SL-CH-SF
Neal Cotts 76.7% FB-SL-CH-CB
Matt Thornton 76.4% FB-SL-CT-CH
Brad Ziegler 75.2% FB-CB-SL-CH
Dennis Sarfate 69.7% FB-CB-SL
David Riske 69.5% FB-CH-SL
Joe Beimel 69.4% FB-CB-CH-SL
Daniel Cabrera 68.9% FB-SL-CH-CB


Low Theoretical Predictability
pitcher MLP repertoire
Andy Sonnanstine 24.1% FB-CT-CB-SL-CH
R.A. Dickey 24.2% KN-FB-CB-CH-SL-SF
Jorge Campillo 26.9% FB-SL-CH-CB-CT
Shaun Marcum 27.0% FB-SL-CH-CB-CT-SF
Bronson Arroyo 27.2% FB-CB-CH-SL-CT
Carlos Villanueva 27.2% FB-SL-CH-CB
Mike Mussina 28.1% FB-CB-SL-CH-SF
Doug Davis 28.7% CT-FB-CB-CH-SL
Lance Cormier 29.2% CB-SL-FB-CT-CH
Jered Weaver 29.7% FB-CH-SL-CT-CB

Some observations
The pitch repertoire is what comes out from MLBAM Gameday (not necessarily correct)

I showed the top-10 and bottom-10 because lot of people like to see ranking tables, but I think in this case they don't make a lot of sense. Anyway a quick look at them can't be harmful.

First of all, my dataset has Wakefield throwing nearly 100% knuckleballs (thus yelding the highest predictability value); I don't know if either I did something wrong in importing pitchf/x data or MLBAM classification algorithm marks every offering by Tim as a knuckler - anyway Fangraphs (BIS data) has his split at 81% KN, 13% FB, 6% CB, and ESPN Insider at 82% KN, 13% FB, 5% CB (using these sources I would get 67.7% and 69.2% predictability value, respectively).

Wakefield leads to another point: knuckleballs have a very high degree of variability, so knowing that the pitch coming at you is called knuckleball, won't tell you anything about the path the ball will travel.

You can make a similar case for Mariano's cutter (Mo's out of the top ten list): it's not really a single pitch, because he varies the position of his fingers to obtain different cuts.

Analyzing how much Wakefield or Rivera are predictable on their selection is pretty useless, as it is for guys like Ballfour who throw (nearly) nothing but fastballs: hitters already know what is gonna come at them - Ballfour, Wakefield and Mo don't live on surprising opponents.

Further analyses on this subject can be more useful if done on pitchers with, say, three very distinguishable pitches (e.g. Fastball, Slider, Changeup) none thrown more than 3/4 of the time, and that's what I'll try to do in the future.

Now you might ask what's all this effort in getting theoretical predictabilities for.

I'd like to look at individual pitchers' predictabilities using advanced statistical techniques, such as multinomial logistic regression: I would like to build models to help me predict what a pitcher will throw on a specifical situation (i.e. against a right hander, late in a close game, with a runner on first, nobody out and following a slider); but I needed to set a minimum usefulness for the models... if I build model performs worse than the Minimum Level of Predictability, then it is useless.
On the other side, if a model for a pitcher significantly outperforms the Minimum Level of Predictability, then maybe the pitcher is falling into patterns, and this fact could be exploited by the opposition.

As I said, I'll tackle the models in the future.

1 comment:

  1. Good work again Max.

    I'm guessing that the more specific you get the lower most probabilities are likely to get. It will be interesting to see what the effect of predictability has on how a pitcher performs. I imagine sample sizes could be an issue here though.

    ReplyDelete