Using ELO ratings for match result prediction in association football

I occasionally comment on a scientific paper that is of interest to me. This time, it was:

Lars Magnus Hvattuma and Halvard Arntzenb (20010) Using ELO ratings for match result prediction in association football, International Journal of Forecasting, 26(3), 460-470 (doi).

This paper falls into the broader categories of Football, Forecasting and Sport, which I have blogged about in various ways.

ELO ratings were originally designed for Chess. In brief, if you play an opponent who is lot weaker than you and you lose, your rating would reduce a lot more than if you were beaten by a player who is only slightly weaker than you. Similarly, if you beat a player who is a lot stronger than you, your rating will increase a lot more than if you beat a player who is only slightly stronger than you. It is usually (if not always) a zero sum game so that what ever points you gain (or lose) the same number of points is lost (or won) by your opponent. There are a number of variations as to how ratings are updated but once you have a rating you can compare yourself to another player (or team) to decide who is the better player.

The idea behind ELO ratings have now been used for a number of sports. Relevant to this blog post, it has been applied to football. In fact, there is a web site that maintains a list of ELO ratings for world football. At the time of writing, the latest match played (on the 15th Aug 2012) was a friendly between Albania and Moldova. The score was 0-0 and the teams gained 5 points (Moldova) and lost 5 points (Albania) respectively. Their points are now 1461 (Albania) and 1376 (Moldova). The fact that Albania were higher ranked meant they lost points against a lower ranked team when they could only manage a draw. They also had home advantage, though I am not sure that this is a factor in this model. Looking at the web page, I don’t think so.

In the paper, which is the subject of this post, the authors define two ELO models to try and rate teams. The first uses a basic ELO to work out ratings and then uses the difference between the two ratings as a covariate (a variable that could be a predictor of the study under investigation) in a regression model. This seems reasonable, as the the ELO rating is a measure of the current strength of the team. A second ELO model is also used which also takes into account the margin of victory. That is, a 3-0 win is better than a 2-1 win.

To compare the two ELO models the authors define another six prediction methods. The first (which they call UNI) simply chooses an outcome at random. The second (FRQ) uses the frequency of home wins, draws and away wins to make its prediction. For example, over a range of matches if two thirds result in a home win, then FRQ would predict that two thirds of the matches it predicts are also predicted as home wins. It is noted that UNI and FRQ are quite naive prediction methods. Two other benchmarks are derived from a paper by John Goddard (Goddard J. (2005) Regression models for forecasting goals and match results in association football, International Journal of Forecasting, 21(2), 331-340 (doi)). The fifth comparative model is based on bookmakers odds, by taking the average of a set of odds from various bookmakers. The final model is almost the same as the fifth, but instead of using the average odds, the maximum odds are used.

The paper also uses various staking strategies. A fixed bet of one unit, a stake in proportion to the odds, to return one unit and stakes based on the Kelly system which is an optimal staking system if you know the exact probabilities.

The various prediction models and staking plans were tested on eight seasons of data (2000-2001 to 2007-2008). Previous seasons data was used, to train the various models.

So, what were the results? The two ELO models were found to be significantly worse than two of the other six models (those based on bookmaker odds), but the ELO models were better than the other four models. The conclusion is that this is a promising result but perhaps more covariates are required to predict even more accurately.

 

Odds-setters as forecasters: The case of English Football

I sometimes comment on a scientific paper that has caught my eye. This time, it was (comments on other papers are also available):

David Forrest, John Goddard and Robert Simmons (2005) Odds-setters as forecasters: The case of English football, International Journal of Forecasting, 21(3), 551-564 (doi).

This paper falls into the broader categories of Football, Forecasting and Sport, which I have blogged about in various ways.

The premise of this paper is that the odds set by bookmakers are in fact predictions by experts. Not a bad assumption to make given that the odds that are set equates to real money coming in, or going out. It was also motivated by the fact that a previous study had concluded that a statistical model performed better than newspaper forecasters.

But is it true that bookmakers odds are good predictors?

To answer this question, the authors took almost 10,000 games, between 1998 and 2003, and compared the bookmaker’s predictions with forecasts made by a statistical model. The model they use is fairly complex (in my opinion) although understandable (and, importantly, reproducible) to the interested, and mathematically competent,  reader The model also looks back over fifteen seasons so if you wanted to implement it, there is quite a lot of data that needs to be collected.

The comparisons are quite interesting between the different forecasting methods. Over the five seasons considered, 1998-1999 (1944 matches), 1999-2000 (1946 matches), 2000-2001 (1946 matches), 2001-2002 (1946 matches) and 2002-2003 (1945 matches) it showed that the statistical model performed better in the earlier seasons, but by the end of the period under investigation the odds-setters were performing better. Looking at just the odds-setters they did worse in the two seasons at the ends of the period under investigation (that is, 1998-1999 and 2002-2003). This is presumed to be there being more noise in the match results.

If you look at the data another way, there is little to choose from the home, draw and away predictions between the two prediction methods.

A follow up comparison attempts to estimate if the odds-setters are capturing all the information contained in the model (if you look at the paper, the model contains many elements such as whether a team is still in the FA Cup, if the match is important to a particular team, attendance information etc.) All the odds-setters have is one figure (the odds) and that has to incorporate everything in order to provide a prediction. The figures presented suggest that, over time, the odds-setters get better at incorporating information into their odds. It is shown that incorporating the bookmakers odds into the statistical model gives superior performance, suggesting that the bookmakers are privy to information that is not included in the benchmark model. This was not the case in the earlier paper, where this test failed when considering tipsters.

Next the paper investigates if indiscriminate betting (placing bets,  using identical stakes, on any outcome) on matches produces better or worse returns than if you follow the benchmark model. Indiscriminate betting, maybe not surprisingly, produced a loss at a rate that a bookmaker might expect in the long term (termed the over-round), around 12%. By comparison if you place a bet on the outcome with the highest expected return, as indicated by the benchmark model, then you would do much better. Over the five seasons you would have only lost -0.2%.

A possible downside is next considered. The benchmark model is the same for every season, whereas the odds-setters are (probably) learning each season or, perhaps, using different people/methods to set the odds. Actually, the model did not give superior predictions by refining it as the simulation progressed from season to season, or even as the season progressed.

One of the conclusions of the paper is that experts (in this case odds-setters) are actually quite good at predicting the outcome of matches. This is attributed to a variety of factors including the fact that their livelihood depends on it and also the sector is much more competitive (due to technology and tax reforms) that have required them to become better to remain competitive.

My conclusion? Use the model, and incorporate bookmakers odds – that gives you the best predictions. But it is quite a lot of data to collect. Alternatively, develop your own predictions methodology and see how it performs.

A compound framework for sports results prediction: A football case study

The latest paper that caught my attention, that I thought I would comment on is (other publications I have commented are can be seen here).

Byungho Min, Jinhyuck Kim, Chongyoun Choe, Hyeonsang Eom and R.I. (Bob) McKay (2008) A compound framework for sports results prediction: A football case study, Knowledge Based Systems, 21(7), 551-562 (doi).

You might be able to download a copy of the paper from here. Note that this link may not be permanent and it may not be an exact copy of the paper that was published (although it does look like it).

The paper presents a framework which is designed to predict the outcome of football matches. They call their system FRES (Football Result Expert System).

The authors note that most previous research focuses on a single factor when predicting the outcome of a football match, and the main factor that is used usually the score data. Even when other factors are taken into account, the score tends to still dominate the prediction process.

Within FRES, two machine learning methodologies are utilised, a rule-based system and Bayesian networks. The paper describes how they are used within FRES in enough detail to allow readers to produce (as all good scientific papers should do) the system.

FRES is tested on the 2002 World Cup tournament. Most football prediction systems are tested on league competitions, where teams (typically) play a double round robin tournament. Testing their approach on a the 2002 World Cup means that the system cannot easily be compared to other systems. Where previous approaches have been tested on other tournaments (for example, previous World Cups) not all the data was available to enable FRES to make those predictions. In the words of the authors, “In the case of the few works which predict a tournament such as the World Cup, the available evaluation was conducted with old data, such as the World Cup 1994, 1998, which would unfairly hobble FRES, since some of the data it relies on are not available for these earlier tournaments.

Although not a scientific term (at least not one I am familiar with!), I do like the term unfairly hobble.

In order to provide some sort of comparison with FRES, the paper implements two other systems, a historic predictor and a discounted historic predictor.

FRES was able to predict six countries out of the top eight in the tournament, The other predictors were able to predict five. Moreover, various statistical tests are conducted which confirms that FRES is statistically better than the other two methods.

One thing I like about the FRES system is that is has a lookahead mechanism. Based on this, England does not rate very highly as, due to the draw, there is a high probability that they will meet Brazil in the quarter finals. Turkey, on the other hand are rated more highly due to the perceived easier draw.

It would be useful to have FRES tested on league competitions, so that better comparisons could be made with more prediction systems that have been reported in the scientific literature. Perhaps the authors are working on that now? It would, for example, be interesting to see if it beats a random method, or a method which always predicts a home win (as the authors did in the paper I discussed a few days ago).

 

Sports Forecasting: A Comparison of the Forecast Accuracy of Prediction Markets, Betting Odds and Tipsters

In some of my posts I comment on a scientific paper that has caught my eye. There is no particular reason for the papers that I choose, they are just of interest to me. In this post, the paper that caught my eye was (comments on other papers can be seen here).

Martin Spann and Bernd Skiera (2009) Sports forecasting: a comparison of the forecast accuracy of prediction markets, betting odds and tipsters, Journal of Forecasting, 28(1), 55-72 (doi).

This paper looks at three different prediction methods, and assesses their effectiveness in predicting the outcome of premier league matches from the German football league. The three methods that are investigated are prediction markets, tipsters and betting odds.

Prediction Markets are based on various people taking a stance on the same event and willing to back their hunch by paying (or collecting) money should their hunch be wrong (or right). Given that a number of people are taking a stance on the same event, it can be seen as a predictive model of the event.

Tipsters are (or should be) the views of experts who publish their predictions in newspapers, on web sites etc. The advice from tipsters is often based on their expertise, rather than applying some system or formal model. The paper (citing Forrest and Simmons, 2000 as its source) says that tipsters can often beat a random selection method, but does worse than simply choosing a home win every time. It also cites Andersson et al., saying that soccer experts often do worse that people who are less well informed about the game.

Betting Odds, in previous work, have found to be a good forecasting method (not surprising I suppose seeing as the bookmakers rely on setting the correct prices to make their living). Of course, the bookmakers can change their odds but when they publish fixed odds (on say a special betting coupon), this can be seen a prediction of the match outcome.

The games that are forecast in this paper are those from the German premier league from three seasons (1999-2000, 2000-2001 and 2001-2002).  The number of games predicted by each method varied (Prediction Markets and Betting Odds = 837, Tipsters = 721 and Prediction Markets, Betting Odds and Tipsters = 678). The number of predictions for each method varied simply due to the data that was available and where the number of games is between two, or three, methods, this is the intersection of the games that that method was able to predict.

To evaluate each method, the authors calculate the percentage of correct predictions. They also calculate the root mean squared error, as well as the amount of money that each method would have won (three figures are given, a 25% fee, a 12% fee and no fee). Comparisons are also made with a random selection policy as well as a naive selection policy, which simply assume a home win.

So, what did the authors find? Over the 837 games, the prediction market and betting odds were able to predict 52.69% and 52.93% of games respectively. If there was no fee this would have returned a profit of 12.30% and 11.92% respectively. The naive model (pick home wins) predicted 50.42% correctly and returned a profit of 11.79% The random method only managed to predict 37.98% of games correctly.

If we look at the 678 games that all three methods could predict, then the percentages of correct predictions were 54.28% (prediction market), 53.69% (betting odds), 42.63% (tipsters), 50.88% (naive model) and 37.98% (random). The returns (assuming no fee) were 16.20% (prediction market), 13.49% (betting odds), -0.19% (tipsters) and 12.44% (naive model).

I’m not sure why, but profit information is not given for the random model but it would almost certainly result in a loss.

A further test is also carried out. Only games where methods agree on the selection are bet upon. For example, of the 678 games, there are 380 games where the three methods agree on the result. If we only bet on those games, we get a correct prediction percentage of 57.11%, higher than any of the methods used in isolation, and betting on every game. The profit return would be 13.86% (no fee), 1.66% (12% fee) and -8.72% (25% fee).

The authors conclude that the prediction market and the betting odds provide the best indication of the outcome. They agree with previous work that tipsters are generally quite poor at prediction.

 

 

Can Forecasters Forecast Successfully?: Evidence from UK Betting Markets

Journal of Forecasting, 19(6): 505-513I am occasionally blog on a paper that is of interest. Well, of interest to me. The latest paper to catch my eye is (other papers I have commented on can be seen here).

Leighton Vaughan Williams (2000) Can Forecasters Forecast Successfully?: Evidence from UK Betting Markets, Journal of Forecasting, 19(6), 505-513 (doi).

The reason that this paper was of interest is because I was reading Leighton’s book (Betting to Win : A Professional Guide to Profitable Betting, High Stakes Publishing, 2002, ISBN: 1-84344-015-6) and this paper was mentioned. Many (many, many) years ago, long before I was an academic, I went through a phase where I collected all sort of horse racing systems. My idea was to test them all out to see if any of them worked. I never actually placed a bet and I never really tested any systems as they generally involved too much time to collect all the data, process the data etc. etc.

Since then I have still thought that it would be interesting to look at various horse racing systems to see if they worked.

This is what this paper does. Unlike my idea though, it takes tips from services that you subscribe to, either by paying money or by contacting them using a premium rate phone service. This seems a lot more sensible, rather than having to enter all the data yourself.

This paper looks at the performance of tipping services, with the analysis being carried out in 1995. Five services were compared. Four of these were subscription based. That is a fee is paid, and you gets tips at various times. In 1995, these services cost at least 99 GBP per month, which seems a lot to me now, let alone in 1995. The other service was a premium rate phone number, where you phone up to receive the tip and the costs of the phone call effectively covers the cost of the service. These five services were chosen as they were amongst the top tipping services as assessed by the Racing Information Database (I have tried to google this and am not sure that it is in existence anymore, but would be willing to be corrected, and update this post to provide a link).

The paper goes through each of the tipping services and evaluates how many tips were provided (and over what period – some, for example, were analysed over three months, others over six months – I think the period was probably chosen to ensure that a sufficient number of tips were analysed as not all services provide tips at the same interval), any conditions associated with the service (for example, only bet if a certain price is available), the profit (or loss) from investing in that service etc.

The good news is that all the tipping services produced a pre-tax profit when used with the relevant staking/price plans. Leighton also makes the point that none of these profits could be said to be significant. It was also interesting to note that increased profits could have been achieved if some of the lesser supported tips were ignored. Of course, this would be a hindsight examination and the obvious question would be, when in play, what tips do you ignore, and what ones do you actively bet upon? There is also evidence that you should use a variable staking plan, rather than a flat stakes method.

If you use a tipping service, there are also other factors to take into account. There is an upfront investment (which you may never recover). Unlike an academic study, you will probably only choose one and which one do you choose? There is also (as pointed out by Leighton, in chapter 20 of his book) the fact that you have to take what the tipping services advertise with a pinch of salt. As an example, a service might only say bet if you can get a price of 4-1 or better. What happens if that price is almost impossible (or even actually impossible) to get, will the service still include that in their results if the horse should win? And what if there was a price available (even for a few seconds) at 6-1, would the service return that as the price you could have got even though, unless you were very quick, or very lucky, you would have struggled to get on at 6-1.

It is twelve years since the paper was published, and seventeen years since the analysis was carried out and things have moved on. Tipping services (I suspect) come and go, technology has moved on, the tax regime has changed and there are now many other ways to bet which were not so predominant at the time. I am thinking specifically of spread betting and betting exchanges. These have, undoubtedly, made a big difference to the industry.

I am not up to date with the scientific literature in the area of sports forecasting and I suspect that there are many papers out there that provide various comparisons and analysis. If you know of any, or know of a good review paper in this area, I would be very interested if you could post a comment giving the reference.

I would also be interested in hearing from any professional tipping services (no matter what sport, but UK based as I don’t claim to understand American sports or markets) who wish to subject their service to scientific analysis. Note, this is not an open invite to advertise your service on this blog. I get enough spam as it is (and moderate it out) and I don’t want the comments box filled up with lightly disguised adverts to various web sites that claim to make millionaires from people who subscribe. But serious enquiries are welcomed.