Prof Pepper's Assistant: Arm 2.0

A few years ago I did some work on right fielders arm on my Italian website. I won't go into details of that work because it was very similar to what John Walsh did at THT. Actually my first article and his first article saw the light on the very same day.
Conceptually our works were very similar: they used the run expectancy matrix and compared the run prevented (allowed) by an outfielder with his arm compared to league average.
The main difference between our analyses was that I focused my research on a very limited subset of actions. I just considered singles fielded by right fielders with a man on first and second base empty. This choice, while leaving out a lot of opportunities in which an outfielder arm is tested, was intended to compare the RFs on equal opportunities - I supposed that singles to right are very similar one another (i.e. they are all fielded in a small area of the field), while that's not true of doubles (that may go into the gap or down the line) or fly outs (that can be made close to the infield or against the wall).
All the work was done using Retrosheet data.

Enter Gameday data.
In the following lines I'll try to see if my assumption - that singles to right are created (nearly) equal (relative to "arm testing") - is true, and if outfielders' arm evaluation need to (and can) be corrected using batted ball location data.

Again, I take every base/out situation (in my restricted example it's just 0-1-2 outs and man on 1st / men on 1st and 3rd), and calculate the runs prevented (allowed) by an average arm right fielder on a single.
This time I add the location information. Ideally I want to have the average run value at every possible location where a right fielder can collect a single.
Obviously there can't be enough observations to have every single point on the field covered, thus I used loess smoothing.
Very quickly on loess smoothing: you estimate the (run-)value of every point on the field using the run values of points in the neighborhood which have observations - nearer points have more weight on the estimation.

The chart above shows how much the expected run value of a single with a man on first is increased (lighter blue) or decreased (darker blue) due to where the ball is collected by the right fielder.
It makes quite sense. Maybe I was expecting something like the following (the more the distance from home plate - and from third - the more likely the runners will advance extra bases):

But looking back at the previous chart, I think I can give a reasonable explanation to the increasing run value of the areas close to the infield: while the throwing distance is short when you field a ball in those zones, it's likely that you had to run many more yards to get there - and the runners were circling the bases in the meantime.

OK, so for every outfielder I adjust his expected run value according to where he fielded the singles batted at him.
Here's a table containing both unadjusted and adjusted arm run values for the right fielders with at least 30 chances (singles fielded with man on first and second base open) in 2008.

	N	arm	adj_arm	adjustment
Abreu Bobby	64	0.52	0.70	0.18
Church Ryan	40	0.10	0.39	0.29
Drew J.D.	46	1.12	0.47	-0.65
Dye Jermaine	51	-0.05	-0.22	-0.17
Ethier Andre	42	-0.42	-0.43	-0.02
Francoeur Jeff	43	-0.51	-0.56	-0.05
Fukudome Kosuke	45	-0.87	-0.90	-0.03
Giles Brian	53	1.23	1.24	0.00
Griffey Jr. Ken	31	0.29	0.25	-0.05
Gross Gabe	35	-1.01	-1.00	0.01
Guerrero Vladimir	32	0.85	0.79	-0.06
Guillen Jose	31	-0.68	-0.53	0.15
Gutierrez Franklin	50	-1.28	-1.59	-0.31
Hart Corey	53	0.66	-0.40	-1.06
Hawpe Brad	62	-0.02	-0.31	-0.29
Hermida Jeremy	60	-0.76	-0.74	0.02
Kearns Austin	34	-0.39	-0.52	-0.14
Ludwick Ryan	48	-0.52	-0.59	-0.06
Markakis Nick	60	0.53	0.35	-0.18
Nady Xavier	38	-0.06	-0.32	-0.26
Ordonez Magglio	63	0.95	0.88	-0.07
Pence Hunter	52	-2.30	-2.35	-0.05
Span Denard	32	-0.14	-0.01	0.12
Suzuki Ichiro	32	-1.67	-1.68	-0.00
Teahen Mark	33	1.81	1.68	-0.13
Upton Justin	43	1.59	1.59	-0.01
Winn Randy	43	1.40	0.99	-0.42

In many cases the adjustment is quite small, but there are some exceptions.
J.D. Drew, for example, sees a significant improvement when taking into account the batted ball locations: as we see in the following graph he fielded a couple of singles very far away from home.

Corey Hart has even more balls collected in unusual (for a single) places.

Maybe the adjustment I applied using loess smoothing is way to big, since a couple of outlying observations can change a lot in the valuing of a fielder. Anyway, I think the work I've done outlines that some correction is due when evaluating outfielders' arms. It's possible that the couple of "long singles" against Drew are the product of some unusual event (the batter slipping while rounding first, or not running because of an injury, or whatever else); without looking at batted ball location data, we assume that those hits are just like any other single and that JD has an average chance of holding the runner at second or gunning him at third. Obviously, that's not the case and, if we don't trust in the adjustments I proposed, we should at least consider dropping the two outliers.

As I did three years ago using Retrosheet data, I considered a limited subset of plays that test an outfielder's arm. The subset was intended to consist of very similar batted balls, but we ended seeing at least some unusual observations. If we want to have a complete evaluation of the arms, I'm sure the locations will have a higher impact: as I said at the beginning of this post, doubles go in the gap or down the line, flies are caught calling off an infielder or against the wall, and this makes a huge difference in the chances an outfielder has on holding/killing the runner(s).

PS: as I was writing this, Pete Jensen published his normalization of Gameday coordinates at THT. While applying his correction would certainly affect the values of RF arms, I believe that what I wrote in the final paragraphs holds.

3 comments:

Will DwinnellMarch 14, 2009 at 11:18 AM
This is an interesting analysis. Thanks!
John WalshApril 2, 2009 at 4:36 PM
Ciao Max,

Great stuff! The idea of using just one situation (single to RF) has its advantages, as you note, but also its disadvantages: it reduces the overall sample size significantly. Including the other opps will increase your sample by a factor of 3 or more.

Maybe a guy was unlucky on singles with a runner on 1B, but perhaps he got lucky on singles with a guy on 2B, etc.

Of course, maybe Drew is disadvantaged because of the large RF in Fenway -- in that case, you'd need to apply a park correction -- which I finally got around to doing this year.

It's good to see you writing to the American public. Forse dovrei cominciare a scrivere qualcosa per i nostri fratelli italiani??

Ciao,

John
MaxApril 3, 2009 at 12:03 PM
Ciao John,
good point on the sample size.
The issue is very evident on the couple of players I outlined: for them, a few outlying observations have a lot of influence.

What I'd like to do is incorparating more events like you do.
My plan would be splitting the batting outcomes in FB hits, FB outs, GB hits. I wouldn't separate singles from doubles, since the spatial information should account for the different opportunities to advance (together with the outcome and the base/out situation).

Does it seem reasonable?

Talking about our "fratelli italiani" (or should we say "Fratelli d'Italia"?), I would like you to write a guest column either on my website or on the widely known PlayIt USA.
Check your mailbox in the next few hours.

Prof Pepper's Assistant

Wednesday, March 11, 2009

Arm 2.0

3 comments:

Contact me

Blog Archive

Prof Pepper's Assistant

Wednesday, March 11, 2009

Arm 2.0

3 comments:

Contact me

Subscribe To

Blog Archive