MLS Power Rankings: A Better Statistical Approach

Over the last season or so, I’ve been considering the impact of statistics on the sport of soccer, and especially how we might use the numbers to exemplify Major League Soccer and its teams. If you’ve read this blog for awhile, I had a post last year which talked about SABR and the way that baseball uses statistical probabilities to predict future behavior. There has even been a full-length motion picture devoted to Billy Beane, the man who developed the Moneyball approach to that sport.

I’m still wrapping my head around how best to utilize the pure numbers, but recently I thought what I might try to do is use data to model something most blogs tackle: Power Rankings. Our sister site, EPL Talk, has had an Alternative Premier League table, which has some similarities to this. I’ve also traded emails with my colleague Matt Hackenmiller about these types of issues. Most of what has been done uses weighted systems which focuses solely on parsing the results.

And as far as Power Rankings go, just about every sports site or blog (even Major League Soccer itself) has a Power Ranking. The challenge I gave myself was to develop a system which would shed a different light on the teams, and hopefully give our readers a representation that more closely fits the landscape of our top division. I’m not sure how other sites develop their rankings. Opinion can lead to bias, whether based on past performance or personal loyalties. While we as human beings have the best intentions for objectivity, we can’t get it right every time. Much less bias is found in pure raw data. Now bias can rear its ugly head through the manipulation and interpretation of that data, and I will admit upfront that these rankings are produced through my own decisions about what makes a team powerful. But I am not using any results from 2011 or before to influence this data, this is based off of 2012 match results and statistics compiled from the website.

Next I’ll try to give a little broad insight into the way I’m maneuvering through the data to come up with a ranking. For starters, a sizable weight in these rankings is based on pure results. I have modified the standard 3-1-0 point system to further differentiate results. Take two separate victories at the Home Depot Center – Real Salt Lake’s away victory over the Galaxy carries more weight than Beckham & Co.’s home gashing of D.C. United. Additionally, since some teams have played fewer matches, the results component is based per match played. The remainder of the ranked score is statistical in nature. I think of the statistical portion as adding color to the results, rewarding the teams that tend to play the better football. This statistical component factors in shooting and passing numbers in comparison to their opponent in each match.

There may be better methods to do this, and this algorithm will probably evolve as I decide that other factors demonstrate a level of dominance or weakness in the league. While I’m not going to go into full detail, if I make changes, I will disclose where the method has adapted. I may describe more of this process as the season progresses, but here is my first publication of the MajorLeagueSoccerTalk.Com Statistical Power Rankings.

I’ve put our MLSTalk rankings on the left, and placed the single table point totals on the right. The only reason I use a single table is to make a better comparison for the entire league.

Sporting Kansas City has dominated the first three weeks, and this calculation continues to support that conclusion. But one thing that this could help to clarify is the current logjam at 6 points. These rankings show that a) Vancouver’s performances have not been all that impressive statistically, and thus their ascension to 2nd in table points may be misleading, and b) Seattle and Houston are the strongest of those teams at 6 points. In fact, Houston was helped by dominating possession in their lone loss at Seattle.

A portion of the disparity between the two systems can be attributed to teams playing fewer games thus far. Seattle, Chicago, and Los Angeles have played only 2 matches, and the per game nature of the MLSTalk rankings accurately reflects their point totals thus far.

One thing that I will certainly consider as the season wears on is a strength of schedule component. There has been a lot of Interconference play to start the 2012 season, and thus the unbalanced schedule’s effects will probably not be felt until summer approaches.

So what do you think? Does this seem an accurate reflection of the first three weeks? If you have suggestions on what you think might move this discussion further along, please leave a comment. Or if you have some ideas for what kind of theoretical questions might be answered through stats (i.e., who is the most direct team in MLS), throw them in here. It will be interesting to see the progression as teams like Red Bull New York (RNY in the chart) try to build off recent results to climb the standings, while others like the Chicago Fire and San Jose Earthquakes do their best to maintain their opening successes.

6 thoughts on “MLS Power Rankings: A Better Statistical Approach”

  1. Looks good to me, although Dallas is definitely too low, but that’s
    a fluke of being so early in the season. To me, Power Rankings are
    a measure of “who would you least want to play this week?” And as
    that depends on form (player getting healthy, trades, returning
    call-ups, teams just getting better), as the season progresses, I
    feel like there should be a fall-off in the importance of
    stats/results from earlier games. Maybe a 7-game window or
    something. Because let’s be honest, when you go to calculate the
    Power Rankings in August, none of the games so far should really
    have any meaning. If overall season ranking is what you want, just
    go look at the table.

  2. I like where you are going with this… I would say for now, some
    of the teams positions on the list don’t pass the ‘eyeball’ test;
    however, I look forward to seeing this in a couple of weeks when
    the sample size is increased. Finally, I wonder if you have/will
    weight for the unbalanced schedule? Would be tough to do, I’m
    sure… however, if it is result oriented, it may bias the eastern
    teams a bit as they are perceived to have the weaker schedules.

  3. I like the idea here, but I’m curious what statistics you are using
    that factor into your number. For one thing, there is no direct
    relationship between possession or number of passes completed and
    being the dominant team. I quote Jonathan Wilson from his recent
    piece in the Guardian on the trickiness of stats and Sunderland:
    “As Barcelona pass teams into submission on a regular basis, it has
    become common to look at pass-completion rates and nod approvingly
    as they stretch beyond 90%. Yet Zambia won the African Cup of
    Nations in February with the lowest pass-completion rate of any of
    the 16 sides in the tournament. What is even more baffling is that,
    to the naked eye, they appeared a cohesive side who used the ball
    well. The issue was that they got the ball forward quickly, looking
    for raking forward passes out of defence that, being high-risk,
    often went astray. If they did find a man, though, a smart
    interchange between the astute Christopher Katongo, the rapid
    Rainford Kalaba and the intelligent Emmanuel Mayuka was often
    enough to undo opponents.”
    The same can be said for shots on goal. The team that takes more
    shots isn’t necessarily taking good shots. Soccer is a very hard
    game to quantify on a statistical basis. Perhaps it can be done,
    but not so far…

Leave a Reply

Your email address will not be published. Required fields are marked *