Over the last season or so, I’ve been considering the impact of statistics on the sport of soccer, and especially how we might use the numbers to exemplify Major League Soccer and its teams. If you’ve read this blog for awhile, I had a post last year which talked about SABR and the way that baseball uses statistical probabilities to predict future behavior. There has even been a full-length motion picture devoted to Billy Beane, the man who developed the Moneyball approach to that sport.
I’m still wrapping my head around how best to utilize the pure numbers, but recently I thought what I might try to do is use data to model something most blogs tackle: Power Rankings. Our sister site, EPL Talk, has had an Alternative Premier League table, which has some similarities to this. I’ve also traded emails with my colleague Matt Hackenmiller about these types of issues. Most of what has been done uses weighted systems which focuses solely on parsing the results.
And as far as Power Rankings go, just about every sports site or blog (even Major League Soccer itself) has a Power Ranking. The challenge I gave myself was to develop a system which would shed a different light on the teams, and hopefully give our readers a representation that more closely fits the landscape of our top division. I’m not sure how other sites develop their rankings. Opinion can lead to bias, whether based on past performance or personal loyalties. While we as human beings have the best intentions for objectivity, we can’t get it right every time. Much less bias is found in pure raw data. Now bias can rear its ugly head through the manipulation and interpretation of that data, and I will admit upfront that these rankings are produced through my own decisions about what makes a team powerful. But I am not using any results from 2011 or before to influence this data, this is based off of 2012 match results and statistics compiled from the MLSSoccer.com website.
Next I’ll try to give a little broad insight into the way I’m maneuvering through the data to come up with a ranking. For starters, a sizable weight in these rankings is based on pure results. I have modified the standard 3-1-0 point system to further differentiate results. Take two separate victories at the Home Depot Center – Real Salt Lake’s away victory over the Galaxy carries more weight than Beckham & Co.’s home gashing of D.C. United. Additionally, since some teams have played fewer matches, the results component is based per match played. The remainder of the ranked score is statistical in nature. I think of the statistical portion as adding color to the results, rewarding the teams that tend to play the better football. This statistical component factors in shooting and passing numbers in comparison to their opponent in each match.
There may be better methods to do this, and this algorithm will probably evolve as I decide that other factors demonstrate a level of dominance or weakness in the league. While I’m not going to go into full detail, if I make changes, I will disclose where the method has adapted. I may describe more of this process as the season progresses, but here is my first publication of the MajorLeagueSoccerTalk.Com Statistical Power Rankings.
I’ve put our MLSTalk rankings on the left, and placed the single table point totals on the right. The only reason I use a single table is to make a better comparison for the entire league.
Sporting Kansas City has dominated the first three weeks, and this calculation continues to support that conclusion. But one thing that this could help to clarify is the current logjam at 6 points. These rankings show that a) Vancouver’s performances have not been all that impressive statistically, and thus their ascension to 2nd in table points may be misleading, and b) Seattle and Houston are the strongest of those teams at 6 points. In fact, Houston was helped by dominating possession in their lone loss at Seattle.
A portion of the disparity between the two systems can be attributed to teams playing fewer games thus far. Seattle, Chicago, and Los Angeles have played only 2 matches, and the per game nature of the MLSTalk rankings accurately reflects their point totals thus far.
One thing that I will certainly consider as the season wears on is a strength of schedule component. There has been a lot of Interconference play to start the 2012 season, and thus the unbalanced schedule’s effects will probably not be felt until summer approaches.
So what do you think? Does this seem an accurate reflection of the first three weeks? If you have suggestions on what you think might move this discussion further along, please leave a comment. Or if you have some ideas for what kind of theoretical questions might be answered through stats (i.e., who is the most direct team in MLS), throw them in here. It will be interesting to see the progression as teams like Red Bull New York (RNY in the chart) try to build off recent results to climb the standings, while others like the Chicago Fire and San Jose Earthquakes do their best to maintain their opening successes.