Using ELO ratings for match result prediction in association football

I occasionally comment on a scientific paper that is of interest to me. This time, it was:

Lars Magnus Hvattuma and Halvard Arntzenb (20010) Using ELO ratings for match result prediction in association football, International Journal of Forecasting, 26(3), 460-470 (doi).

This paper falls into the broader categories of Football, Forecasting and Sport, which I have blogged about in various ways.

ELO ratings were originally designed for Chess. In brief, if you play an opponent who is lot weaker than you and you lose, your rating would reduce a lot more than if you were beaten by a player who is only slightly weaker than you. Similarly, if you beat a player who is a lot stronger than you, your rating will increase a lot more than if you beat a player who is only slightly stronger than you. It is usually (if not always) a zero sum game so that what ever points you gain (or lose) the same number of points is lost (or won) by your opponent. There are a number of variations as to how ratings are updated but once you have a rating you can compare yourself to another player (or team) to decide who is the better player.

The idea behind ELO ratings have now been used for a number of sports. Relevant to this blog post, it has been applied to football. In fact, there is a web site that maintains a list of ELO ratings for world football. At the time of writing, the latest match played (on the 15th Aug 2012) was a friendly between Albania and Moldova. The score was 0-0 and the teams gained 5 points (Moldova) and lost 5 points (Albania) respectively. Their points are now 1461 (Albania) and 1376 (Moldova). The fact that Albania were higher ranked meant they lost points against a lower ranked team when they could only manage a draw. They also had home advantage, though I am not sure that this is a factor in this model. Looking at the web page, I don’t think so.

In the paper, which is the subject of this post, the authors define two ELO models to try and rate teams. The first uses a basic ELO to work out ratings and then uses the difference between the two ratings as a covariate (a variable that could be a predictor of the study under investigation) in a regression model. This seems reasonable, as the the ELO rating is a measure of the current strength of the team. A second ELO model is also used which also takes into account the margin of victory. That is, a 3-0 win is better than a 2-1 win.

To compare the two ELO models the authors define another six prediction methods. The first (which they call UNI) simply chooses an outcome at random. The second (FRQ) uses the frequency of home wins, draws and away wins to make its prediction. For example, over a range of matches if two thirds result in a home win, then FRQ would predict that two thirds of the matches it predicts are also predicted as home wins. It is noted that UNI and FRQ are quite naive prediction methods. Two other benchmarks are derived from a paper by John Goddard (Goddard J. (2005) Regression models for forecasting goals and match results in association football, International Journal of Forecasting, 21(2), 331-340 (doi)). The fifth comparative model is based on bookmakers odds, by taking the average of a set of odds from various bookmakers. The final model is almost the same as the fifth, but instead of using the average odds, the maximum odds are used.

The paper also uses various staking strategies. A fixed bet of one unit, a stake in proportion to the odds, to return one unit and stakes based on the Kelly system which is an optimal staking system if you know the exact probabilities.

The various prediction models and staking plans were tested on eight seasons of data (2000-2001 to 2007-2008). Previous seasons data was used, to train the various models.

So, what were the results? The two ELO models were found to be significantly worse than two of the other six models (those based on bookmaker odds), but the ELO models were better than the other four models. The conclusion is that this is a promising result but perhaps more covariates are required to predict even more accurately.


Odds-setters as forecasters: The case of English Football

I sometimes comment on a scientific paper that has caught my eye. This time, it was (comments on other papers are also available):

David Forrest, John Goddard and Robert Simmons (2005) Odds-setters as forecasters: The case of English football, International Journal of Forecasting, 21(3), 551-564 (doi).

This paper falls into the broader categories of Football, Forecasting and Sport, which I have blogged about in various ways.

The premise of this paper is that the odds set by bookmakers are in fact predictions by experts. Not a bad assumption to make given that the odds that are set equates to real money coming in, or going out. It was also motivated by the fact that a previous study had concluded that a statistical model performed better than newspaper forecasters.

But is it true that bookmakers odds are good predictors?

To answer this question, the authors took almost 10,000 games, between 1998 and 2003, and compared the bookmaker’s predictions with forecasts made by a statistical model. The model they use is fairly complex (in my opinion) although understandable (and, importantly, reproducible) to the interested, and mathematically competent,¬† reader The model also looks back over fifteen seasons so if you wanted to implement it, there is quite a lot of data that needs to be collected.

The comparisons are quite interesting between the different forecasting methods. Over the five seasons considered, 1998-1999 (1944 matches), 1999-2000 (1946 matches), 2000-2001 (1946 matches), 2001-2002 (1946 matches) and 2002-2003 (1945 matches) it showed that the statistical model performed better in the earlier seasons, but by the end of the period under investigation the odds-setters were performing better. Looking at just the odds-setters they did worse in the two seasons at the ends of the period under investigation (that is, 1998-1999 and 2002-2003). This is presumed to be there being more noise in the match results.

If you look at the data another way, there is little to choose from the home, draw and away predictions between the two prediction methods.

A follow up comparison attempts to estimate if the odds-setters are capturing all the information contained in the model (if you look at the paper, the model contains many elements such as whether a team is still in the FA Cup, if the match is important to a particular team, attendance information etc.) All the odds-setters have is one figure (the odds) and that has to incorporate everything in order to provide a prediction. The figures presented suggest that, over time, the odds-setters get better at incorporating information into their odds. It is shown that incorporating the bookmakers odds into the statistical model gives superior performance, suggesting that the bookmakers are privy to information that is not included in the benchmark model. This was not the case in the earlier paper, where this test failed when considering tipsters.

Next the paper investigates if indiscriminate betting (placing bets,  using identical stakes, on any outcome) on matches produces better or worse returns than if you follow the benchmark model. Indiscriminate betting, maybe not surprisingly, produced a loss at a rate that a bookmaker might expect in the long term (termed the over-round), around 12%. By comparison if you place a bet on the outcome with the highest expected return, as indicated by the benchmark model, then you would do much better. Over the five seasons you would have only lost -0.2%.

A possible downside is next considered. The benchmark model is the same for every season, whereas the odds-setters are (probably) learning each season or, perhaps, using different people/methods to set the odds. Actually, the model did not give superior predictions by refining it as the simulation progressed from season to season, or even as the season progressed.

One of the conclusions of the paper is that experts (in this case odds-setters) are actually quite good at predicting the outcome of matches. This is attributed to a variety of factors including the fact that their livelihood depends on it and also the sector is much more competitive (due to technology and tax reforms) that have required them to become better to remain competitive.

My conclusion? Use the model, and incorporate bookmakers odds – that gives you the best predictions. But it is quite a lot of data to collect. Alternatively, develop your own predictions methodology and see how it performs.

A compound framework for sports results prediction: A football case study

The latest paper that caught my attention, that I thought I would comment on is (other publications I have commented are can be seen here).

Byungho Min, Jinhyuck Kim, Chongyoun Choe, Hyeonsang Eom and R.I. (Bob) McKay (2008) A compound framework for sports results prediction: A football case study, Knowledge Based Systems, 21(7), 551-562 (doi).

You might be able to download a copy of the paper from here. Note that this link may not be permanent and it may not be an exact copy of the paper that was published (although it does look like it).

The paper presents a framework which is designed to predict the outcome of football matches. They call their system FRES (Football Result Expert System).

The authors note that most previous research focuses on a single factor when predicting the outcome of a football match, and the main factor that is used usually the score data. Even when other factors are taken into account, the score tends to still dominate the prediction process.

Within FRES, two machine learning methodologies are utilised, a rule-based system and Bayesian networks. The paper describes how they are used within FRES in enough detail to allow readers to produce (as all good scientific papers should do) the system.

FRES is tested on the 2002 World Cup tournament. Most football prediction systems are tested on league competitions, where teams (typically) play a double round robin tournament. Testing their approach on a the 2002 World Cup means that the system cannot easily be compared to other systems. Where previous approaches have been tested on other tournaments (for example, previous World Cups) not all the data was available to enable FRES to make those predictions. In the words of the authors, “In the case of the few works which predict a tournament such as the World Cup, the available evaluation was conducted with old data, such as the World Cup 1994, 1998, which would unfairly hobble FRES, since some of the data it relies on are not available for these earlier tournaments.

Although not a scientific term (at least not one I am familiar with!), I do like the term unfairly hobble.

In order to provide some sort of comparison with FRES, the paper implements two other systems, a historic predictor and a discounted historic predictor.

FRES was able to predict six countries out of the top eight in the tournament, The other predictors were able to predict five. Moreover, various statistical tests are conducted which confirms that FRES is statistically better than the other two methods.

One thing I like about the FRES system is that is has a lookahead mechanism. Based on this, England does not rate very highly as, due to the draw, there is a high probability that they will meet Brazil in the quarter finals. Turkey, on the other hand are rated more highly due to the perceived easier draw.

It would be useful to have FRES tested on league competitions, so that better comparisons could be made with more prediction systems that have been reported in the scientific literature. Perhaps the authors are working on that now? It would, for example, be interesting to see if it beats a random method, or a method which always predicts a home win (as the authors did in the paper I discussed a few days ago).