Monday, September 16, 2013

Using analytics to plan a (sports) trip

My latest effort into analytics of sports-related data has been a little bit different than usual.
The goal was to plan a trip around Europe to watch hockey games.

The EuroHockey website has schedules for possibly every hockey league around the world (yeah, not only Euro), thus I grabbed data about matches in the top continental leagues scheduled between September and December.

Then I opened my friend R.

As a first pass, I calculated distances between each pair of games scheduled in consecutive days and filtered down to those within reasonable travel time--thus, sorry Admiral, I'm not gonna come to Vladivostok.

Then I recursively combined the pairs to obtain longer trips and added some subjective scores to rank the calculated trips to my likings.

The main parameters I used to rank my trips were:
  • Short travel distances between cities;
  • Trips with games from multiple leagues to be preferred;
  • A ranking of the leagues (e.g. priority to KHL games);
  • Starting and ending point of the trip possibly close to home (so that I can go just by train/bus).
After letting R do its job, some interesting solutions came up, but in the end I stuck with the one mapped below.
Clicking on the icons will show games (and make sure to zoom on Prague, as I'll be there both for KHL and Extraliga).


View Max EHT 2013 games only in a larger map
Thus, starting from December 1, I'll be around 5 European countries to catch 8 games in 8 days, featuring 14 teams from 5 leagues, in 7 different cities.

Maybe I'll blog about it. Or maybe not.

Tuesday, February 12, 2013

Baseball effect on hockey

In case you missed it, there is a website named Open Source Sports which contains databases for various sports, including the Lahman's for baseball.

Since in this blog, other than baseball I have dealt with basketball, soccer and football, it's time to do a short post on hockey. (The fact that the Hockey DB is one of the currently available at OSS helps too.)

This post is based on a single query on the Master table, the one containing players' bio info.
And reports something that I believe is widely known—so it's just a warm-up on the hockey database.

Percentage of left-shooting skaters by country

Slovakia       80
Sweden         78
Finland        76
Russia         75
Czech Republic 68
Canada         63
USA            55

The above numbers are for countries with at least 40 skaters in the database.

So, while for the Europeans the percentage of righty shooters is slightly above the population of lefthanded people (which should be in the order of 15%), the North-Americans lean way more on the right.

One likely explanation is Americans grow up playing baseball where the righthanded batter position is on the same side of the right-shooting hockey player. USA being more extreme than Canada would support this.

An American friend of mine who coached Team Sweden (baseball) said everyone seemed to bat lefthanded over there—which would support the case the other way around.

Wednesday, January 16, 2013

Icing the kicker?

What do I do when there's no baseball around?
Sometimes I do like Rogers Hornsby and just stare out of the window, waiting for Spring. Other times I watch hockey (KHL so far this year) or even football.

Last weekend I happened to watch the Seahawks @ Falcons game and couldn't help but noticing the timeout called by Seattle's head coach just before Atlanta kicked the decisive field goal.

After I understood that it was done just to disrupt the kicker's concentration (I'm not a football expert,) I decided I could have a statistical look at the issue (as stats is something I know better.)

The data

I found the following sources for play-by-play NFL data, both going back to 2002:
  • http://www.advancednflstats.com/2010/04/play-by-play-data.html
  • http://www.armchairanalysis.com/nfl-play-by-play-data.php

but neither had explicit information on when timeouts were called.
However, the play-by-play at the latter link is somewhat parsed and more ready to use, so I went with that.

Preparation

In order to identify when a timeout was called by the defensive team before a field goal, I looked at the remaining timeouts on the field goal plays and the remaining timeouts on plays immediately preceding them. When there was a difference, I classified the action as an "icing the kicker". Note that in some instances, the difference in timeouts left might have been due to a lost challenge on the previous play.

I wanted to use the stadium as one of the predictors of field goal success, but the data I used had them named in many different ways (with typos included,) thus I decided to use the home-team/season combination instead of it. Note that this will lead to considering the games played at Wembley no differently from those played at home by the Dolphins, the Saints, the Buccos or the 49ers.

Variables tested

I threw the following variables into my model (for those interested a multilevel multivariable logistic regression.)
  • the identity of the kicker;
  • distance (modeled linearly: it's a lazy choice, but not completely off the charts);
  • wind speed (no direction, as I would have needed to know the orientation of the field);
  • temperature;
  • being at home (for the kicker);
  • the "icing the kicker" dummy variable.


Results

Here's what I got. 
  • A 13% success reduction every 5 added yards of distance.
  • A 3% success reduction every additional 5mph of wind.
  • Around 1% success increase every 5 degrees (F) of temperature.
  • No effect for being at home
All the above seem too make sense. Also here are the best and worst kickers according to the model.

Best:
  1. Stover, Matt
  2. Gould, Robbie
  3. Kasay, John
  4. Akers, David
  5. Graham, Shayne
Worst:
  1. Peterson, Todd
  2. Hall, John
  3. Christie, Steve
  4. Gramatica, Martin
  5. Tynes, Lawrence
An here I need the help of knowledgeable NFL fans, to know whether the two lists pass the sniff test (though, for what I know, the first name seems OK out there.)

 The "icing"

Finally, what about the "icing the kicker"?
Though the point estimate would hint to a possible effect (-3%,) the variability is a bit large (from -8% to +1%.)

For now I would dismiss it having any influence on the outcome, but some further analysis could be in order for looking at the effect on particular kickers.

But, hey, the baseball season is approaching, so maybe someone else should look at this...