Prediction of sporting events: A Scientific Approach

My final year undergraduate dissertation project (many years ago) attempted to predict the outcome of horse races using Neural Networks. I briefly blogged about it in June 2009 (http://graham-kendall.com/blog/?p=8/).

The result of the project was (in my view) encouraging but was lacking in a couple of areas. The data was incomplete (the starting prices were not available so I had to make some assumptions and it would have been more useful to have studied a greater number of races). I would also have liked to have tried some other prediction methods, beyond just neural networks.

Since doing that project I have maintained an interest in predicting sporting events, although sports scheduling (e.g. 10.1016/j.cor.2009.05.013 and 10.1057/palgrave.jors.2602382) has seemed to have taken up more of my time. But I have always wanted to return to prediction, utilising Operations Research methodologies.  As such, I maintain a database of any literature that I see on the topic. This incudes the scientific literature, as well any newspaper cuttings, useful web sites etc.

 

One of the problems that serious sports forecasters face is being taken seriously. A quick google for sports prediction (or many other similar terms) will bring up many sites offering services that (supposedly) enable you to make money. The services typically involve investing in some system, or subscribing to a service where you are sent the predictions for you back in whatever way you see fit.

Of course, if we were sceptical, we might assume that many of these services are really there to make money for the people selling the service, rather then those who are buying. I am sure that there are some services out there that make money for both the seller and the buyer, but the challenge is to find out which services offer value for money before you go bankrupt in the pocess!

 

Unfortunately, there are not that many scientific papers that consider how to predict the outcome of sporting events, at least as a way to return a monetary profit. There are some, of course. For example an article that appeared last year in the International Journal of Forecasting

 

S. Lessmann, M-C. Sung, and J.E.V. Johnson (2010) Alternative methods of predicting competitive events: An application in horserace betting markets, 26:518-536, DOI: 10.1016/j.ijforecast.2009.12.013

 

considered how to predict horse races. The motivation of the article was actually to try and predict competive events such as political elections and (of course) sporting events, although the paper was really a large scale (1000 races,  12,092 runners) study. The paper concluded that their proposed model was able to provide an increase in wealth of just over 528% if using a Kelly (Kelly, J. L. (1956). A new interpretation of information rate. The Bell System Technical Journal, 35, 917–926) strategy, with reinvestment.

 

Considering other sports, such as football (the UK version), a couple of examples of predicting matches can be found in Economics, Management and Optimization in Sports, Springer, 2004, ISBN: 3-540-20712-0

In Using Statistics to Predict Scores in English Premier League Soccer (John S. Croucher, pp 43-57), various models are presented that attempt to fit the number of goals scored by each team. The best model found had a Poisson distribution.

Another paper from the same book (Modelling and Forecasting Match Results in the English Premier League and Football League (Stephen Dobson and John Goddard, pp 59-77)) considers about 30 seasons of data. This paper also uses a statistical method, but assigns probabilities to win, lose or draw, rather than trying to predict the number of goals scored. This paper also provides a good overview of previous work in this area.

 

I suppose, the stock market is one area that has been widely studied with respect to prediction, with an eye on turning a profit. There have been hundreds (if not thousands) of papers that look at ways to predict stock prices, interest rates, inflation etc.

 

Maybe, not surprisingly, there has been limited reporting in the scientific literature as to whether anybody has made (or makes) money from the methodologies that they have developed. After all, if you have a successful system, why tell everybody about it (which is one of the major arguments as to why would you buy a system/tips from a service on the internet).

 

What I would actually like to see is a lot more scientific papers not only reporting their predictive systems but also how much money was made, over what period of time and if the system is in daily/weekly use at the time of writing.

Of course, the system needs to reproducible (as should all good scientific writing).

However, if the system is successful, the author(s) might be unwilling to reveal its secrets but might still want to let the world know about its effectiveness. Under these circumstances I have a few ideas as to how this could be done.

  1. In the run up to publishing the paper, the authors make a series of predictions and lodges them with a reputable source. This could be another scientist, a lawyer, or even published on a web site that can be verified from a date/time point of view. The important thing is to ensure that the predictions can be verified as being made in advance of the event. If these predictions were made over a period of (say) six months, then this could form part of the results presented in a paper.
  2. It is, of course, uderstandable that authors do not want to publish the full details of their winning methodology in a scientific paper but, as scientists, we like to publish our work. The scientific community should be understandable of this, in the same that they are accepting that sometimes certain factors must be kept confidential due to commercial sensitivities. Therefore, the general methodology could be described but omitting key points (and being upfront about that) but, if combined with 1), above, then this could still make a contribution to the scientific literature.

 

An attractive alternative would be to run a prediction competition (see Kaggle, who are doing excellent work in this area), where competitors are given a set of data and asked to provide predictions on the outcome of sporting events; ideally those that have not taken place yet.

 

In summary, I would really like to see more reporting (on a sicientific basis) of sports predictions which are unasheamedly about trying to return a profit, as this is an under represented area at the moment. Why not have a go?

 

Note: I entered this blog entry into the INFORMS blog competition. The March 2011 competition was O.R. and Sports.