I sometimes comment on a scientific paper that has caught my eye. This time, it was (comments on other papers are also available):
The premise of this paper is that the odds set by bookmakers are in fact predictions by experts. Not a bad assumption to make given that the odds that are set equates to real money coming in, or going out. It was also motivated by the fact that a previous study had concluded that a statistical model performed better than newspaper forecasters.
But is it true that bookmakers odds are good predictors?
To answer this question, the authors took almost 10,000 games, between 1998 and 2003, and compared the bookmaker’s predictions with forecasts made by a statistical model. The model they use is fairly complex (in my opinion) although understandable (and, importantly, reproducible) to the interested, and mathematically competent, reader The model also looks back over fifteen seasons so if you wanted to implement it, there is quite a lot of data that needs to be collected.
The comparisons are quite interesting between the different forecasting methods. Over the five seasons considered, 1998-1999 (1944 matches), 1999-2000 (1946 matches), 2000-2001 (1946 matches), 2001-2002 (1946 matches) and 2002-2003 (1945 matches) it showed that the statistical model performed better in the earlier seasons, but by the end of the period under investigation the odds-setters were performing better. Looking at just the odds-setters they did worse in the two seasons at the ends of the period under investigation (that is, 1998-1999 and 2002-2003). This is presumed to be there being more noise in the match results.
If you look at the data another way, there is little to choose from the home, draw and away predictions between the two prediction methods.
A follow up comparison attempts to estimate if the odds-setters are capturing all the information contained in the model (if you look at the paper, the model contains many elements such as whether a team is still in the FA Cup, if the match is important to a particular team, attendance information etc.) All the odds-setters have is one figure (the odds) and that has to incorporate everything in order to provide a prediction. The figures presented suggest that, over time, the odds-setters get better at incorporating information into their odds. It is shown that incorporating the bookmakers odds into the statistical model gives superior performance, suggesting that the bookmakers are privy to information that is not included in the benchmark model. This was not the case in the earlier paper, where this test failed when considering tipsters.
Next the paper investigates if indiscriminate betting (placing bets, using identical stakes, on any outcome) on matches produces better or worse returns than if you follow the benchmark model. Indiscriminate betting, maybe not surprisingly, produced a loss at a rate that a bookmaker might expect in the long term (termed the over-round), around 12%. By comparison if you place a bet on the outcome with the highest expected return, as indicated by the benchmark model, then you would do much better. Over the five seasons you would have only lost -0.2%.
A possible downside is next considered. The benchmark model is the same for every season, whereas the odds-setters are (probably) learning each season or, perhaps, using different people/methods to set the odds. Actually, the model did not give superior predictions by refining it as the simulation progressed from season to season, or even as the season progressed.
One of the conclusions of the paper is that experts (in this case odds-setters) are actually quite good at predicting the outcome of matches. This is attributed to a variety of factors including the fact that their livelihood depends on it and also the sector is much more competitive (due to technology and tax reforms) that have required them to become better to remain competitive.
My conclusion? Use the model, and incorporate bookmakers odds – that gives you the best predictions. But it is quite a lot of data to collect. Alternatively, develop your own predictions methodology and see how it performs.