Time to switch to Java for a football prediction project

Time to switch to Java for a football prediction project
Programming Books (Clive Darra, Creative Commons)

I have decided that it was time to switch to Java for a football prediction project that I have been planning for some time.

I want to do the project justice so I thought I would start from the most basic decision. What programming language should I use?

The Problem with C++

For the past 20 years, I have been using C++ (before that I was using all sorts of mainframe languages). I can do what I need to do with C++, but I have never been entirely happy with it.

As a language, I quite like it, but it is all the stuff that goes with it that has always been frustrating. And most of that stuff is around Visual Studio (VS), MFC (Microsoft Foundation Classes) and Templates. Don’t get me wrong, this is not a Microsoft bashing exercise. I know of many people who use VS and are very happy with it, develop great applications and know how to use the tools that are available.

But I have never really got my head around it. I run into all sorts of problems with namespaces, linking errors, deciding whether to use MFC (Micosoft Foundation Classes), or not – and then regretting it, whether to use the classes available in VS so that I could port my code to another compiler should I wish to do so; and the list goes on.

The end result is that I have never really developed as a C++ programmer. I can do all the usual stuff (classes, inheritance, operator overloading, polymorphism) but I have never been able to get to grips with Windows (Graphical User Interface – GUI) programming and so have always stuck with a Command Line Interface (CLI).

To be honest, using a CLI has served me well and I have managed to churn out some nice programs. But for my football prediction project, I really want to develop some sort of GUI. The question I asked is, should I have another go at getting to grips with MFC, Windows programming and all that is need to get an application developed in VS to display a Window on the screen, or do I change to something else, in the hope that I will find it a little easier to understand?

So why Java?

After a lot of soul searching, and Googling, I decided that it was time to switch to Java. I have written a couple of programs in Java before, but nothing too much beyond “Hello World”.

But is like C (syntax wise), and like C++ (class wise), it is well supported, many of my academic colleagues use it, it is platform independent and, I hope, that GUI programming is a little easier than VS.

More questions that answers

Of course, making the decision to use Java just raises a whole load more questions. What IDE should I use (I have chosen Eclipse), how big is the learning curve, can I easily access Excel files, can I (should I) use MySQL, what Java tools are available to support developing a a GUI etc.

All these questions will have to wait. For now, I have to learn the basics of Java and get my head around the Eclipse IDE (Intergrated Development Environment).

Wish me luck.


Football fixture forecasting. Are you any good?

Chelsea have produced artist's impressions of how a new stadium at Battersea Power Station would look. Photograph: Chelsea/PA (19 Jun 2013: downloaded via Google where the image was labeled as free to reuse)
Chelsea have produced artist’s impressions of how a new stadium at Battersea Power Station would look. Photograph: Chelsea/PA (19 Jun 2013: downloaded via Google where the image was labeled as free to reuse)

Football fixture forecasting is something I have expressed an interest in recently. Actually, this blog contains a forecasting category, that you might be interested in.

In my last post, I mentioned a crowdfunding project that I am trying to get off the ground. This prroject aims to investigate football fixture forecasting, utilising methodologies such as Artificial Neural Networks and algorithms based on Darwin’s prnciples of natural evolution (survival of the fittest).

Doing research for this project, I looked around a few football forums and found that there is a brisk business in football forecasting competitions (usually just for fun).

For example:

  • FootballForums.net have a Football Forum thread, which can be accessed here.
  • Total Football Forum also has a prediction thread.

I mention these now, as they might be of interest as the new season approaches.

Of course, one of football’s most well known pundits (Mark Lawrenson) makes predictions every week (on Radio Five Live I believe) and these are tracked at various web sites including MyFootballFacts.com. This site does a great job of showing the actual league tables and what would have happended if all Lawro’s predictions had come true. It’s interesting to see the fairly signisficant differences league table based on actual resuts. I guess it is no that surprising that there are large differences as, if Lawro, was an excellent preditor all the time, the bookies would be bankrupt.

I think I am also right in saying that Lawro tries to predict the actual score. This has got to be difficult. If I ever get my project off the ground, I think I’ll just try to predict whether the home team will Win, Lose or Draw.


Crowdfunding: A new model to fund research?

Pile of Money from freefoto.com (by Ian Britton)
Crowdfunding: Money from many contributors? (picture courtesy of freefoto.com (by Ian Britton))

A few weeks ago I came across something called crowdfunding. I have known about crowdsourcing for a while, but crowdfunding had escapsed me.

I am not sure it is a good idea but I thought I would try it out (see my project here, it develops my project for football prediction or, for more general blogs on forecasting, see here).

Of course, it would be nice to get the project funded but it is also an experiment as, if crowd funding does take off, I want to be able to have some knowledge/experince of the medium.

The idea is that you get a crowd of people to fund a project by contributing small amounts in the hope that you can reach the goal required for you to carry out the project. In order to motivate people to contribute, you offer a series of rewards with the rewards getting better the more that is contributed.

The research community has recently started to show an interest in this model of rasing funds for research. It is still in the very early stages, but I believe that it will become a more pominent feature of the funding landscape in the coming months/years.

If you are interested, there has been some limited scientific publications on Crowdfunding. This is certainly not a complete list, but here are some publications I found via a quick search on Web of Knowledge.

  • Finding philanthropy: Like it? Pay for it (link)
  • Crowd-Funded Micro-Grants for Genomics and “Big Data”: An Actionable Idea Connecting Small (Artisan) Science, Infrastructure Science, and Citizen Philanthropy (link)
  • Strapped for funding, medical researchers pitch to the crowd (vol 18, pg 1307, 2012) (link)
  • Crowd-funding: transforming customers into investors through innovative service platforms (link)

If you are interested in crowd funding platforms, there are many listed on wikipedia.

Using ELO ratings for match result prediction in association football

I occasionally comment on a scientific paper that is of interest to me. This time, it was:

Lars Magnus Hvattuma and Halvard Arntzenb (20010) Using ELO ratings for match result prediction in association football, International Journal of Forecasting, 26(3), 460-470 (doi).

This paper falls into the broader categories of Football, Forecasting and Sport, which I have blogged about in various ways.

ELO ratings were originally designed for Chess. In brief, if you play an opponent who is lot weaker than you and you lose, your rating would reduce a lot more than if you were beaten by a player who is only slightly weaker than you. Similarly, if you beat a player who is a lot stronger than you, your rating will increase a lot more than if you beat a player who is only slightly stronger than you. It is usually (if not always) a zero sum game so that what ever points you gain (or lose) the same number of points is lost (or won) by your opponent. There are a number of variations as to how ratings are updated but once you have a rating you can compare yourself to another player (or team) to decide who is the better player.

The idea behind ELO ratings have now been used for a number of sports. Relevant to this blog post, it has been applied to football. In fact, there is a web site that maintains a list of ELO ratings for world football. At the time of writing, the latest match played (on the 15th Aug 2012) was a friendly between Albania and Moldova. The score was 0-0 and the teams gained 5 points (Moldova) and lost 5 points (Albania) respectively. Their points are now 1461 (Albania) and 1376 (Moldova). The fact that Albania were higher ranked meant they lost points against a lower ranked team when they could only manage a draw. They also had home advantage, though I am not sure that this is a factor in this model. Looking at the web page, I don’t think so.

In the paper, which is the subject of this post, the authors define two ELO models to try and rate teams. The first uses a basic ELO to work out ratings and then uses the difference between the two ratings as a covariate (a variable that could be a predictor of the study under investigation) in a regression model. This seems reasonable, as the the ELO rating is a measure of the current strength of the team. A second ELO model is also used which also takes into account the margin of victory. That is, a 3-0 win is better than a 2-1 win.

To compare the two ELO models the authors define another six prediction methods. The first (which they call UNI) simply chooses an outcome at random. The second (FRQ) uses the frequency of home wins, draws and away wins to make its prediction. For example, over a range of matches if two thirds result in a home win, then FRQ would predict that two thirds of the matches it predicts are also predicted as home wins. It is noted that UNI and FRQ are quite naive prediction methods. Two other benchmarks are derived from a paper by John Goddard (Goddard J. (2005) Regression models for forecasting goals and match results in association football, International Journal of Forecasting, 21(2), 331-340 (doi)). The fifth comparative model is based on bookmakers odds, by taking the average of a set of odds from various bookmakers. The final model is almost the same as the fifth, but instead of using the average odds, the maximum odds are used.

The paper also uses various staking strategies. A fixed bet of one unit, a stake in proportion to the odds, to return one unit and stakes based on the Kelly system which is an optimal staking system if you know the exact probabilities.

The various prediction models and staking plans were tested on eight seasons of data (2000-2001 to 2007-2008). Previous seasons data was used, to train the various models.

So, what were the results? The two ELO models were found to be significantly worse than two of the other six models (those based on bookmaker odds), but the ELO models were better than the other four models. The conclusion is that this is a promising result but perhaps more covariates are required to predict even more accurately.


Odds-setters as forecasters: The case of English Football

I sometimes comment on a scientific paper that has caught my eye. This time, it was (comments on other papers are also available):

David Forrest, John Goddard and Robert Simmons (2005) Odds-setters as forecasters: The case of English football, International Journal of Forecasting, 21(3), 551-564 (doi).

This paper falls into the broader categories of Football, Forecasting and Sport, which I have blogged about in various ways.

The premise of this paper is that the odds set by bookmakers are in fact predictions by experts. Not a bad assumption to make given that the odds that are set equates to real money coming in, or going out. It was also motivated by the fact that a previous study had concluded that a statistical model performed better than newspaper forecasters.

But is it true that bookmakers odds are good predictors?

To answer this question, the authors took almost 10,000 games, between 1998 and 2003, and compared the bookmaker’s predictions with forecasts made by a statistical model. The model they use is fairly complex (in my opinion) although understandable (and, importantly, reproducible) to the interested, and mathematically competent,  reader The model also looks back over fifteen seasons so if you wanted to implement it, there is quite a lot of data that needs to be collected.

The comparisons are quite interesting between the different forecasting methods. Over the five seasons considered, 1998-1999 (1944 matches), 1999-2000 (1946 matches), 2000-2001 (1946 matches), 2001-2002 (1946 matches) and 2002-2003 (1945 matches) it showed that the statistical model performed better in the earlier seasons, but by the end of the period under investigation the odds-setters were performing better. Looking at just the odds-setters they did worse in the two seasons at the ends of the period under investigation (that is, 1998-1999 and 2002-2003). This is presumed to be there being more noise in the match results.

If you look at the data another way, there is little to choose from the home, draw and away predictions between the two prediction methods.

A follow up comparison attempts to estimate if the odds-setters are capturing all the information contained in the model (if you look at the paper, the model contains many elements such as whether a team is still in the FA Cup, if the match is important to a particular team, attendance information etc.) All the odds-setters have is one figure (the odds) and that has to incorporate everything in order to provide a prediction. The figures presented suggest that, over time, the odds-setters get better at incorporating information into their odds. It is shown that incorporating the bookmakers odds into the statistical model gives superior performance, suggesting that the bookmakers are privy to information that is not included in the benchmark model. This was not the case in the earlier paper, where this test failed when considering tipsters.

Next the paper investigates if indiscriminate betting (placing bets,  using identical stakes, on any outcome) on matches produces better or worse returns than if you follow the benchmark model. Indiscriminate betting, maybe not surprisingly, produced a loss at a rate that a bookmaker might expect in the long term (termed the over-round), around 12%. By comparison if you place a bet on the outcome with the highest expected return, as indicated by the benchmark model, then you would do much better. Over the five seasons you would have only lost -0.2%.

A possible downside is next considered. The benchmark model is the same for every season, whereas the odds-setters are (probably) learning each season or, perhaps, using different people/methods to set the odds. Actually, the model did not give superior predictions by refining it as the simulation progressed from season to season, or even as the season progressed.

One of the conclusions of the paper is that experts (in this case odds-setters) are actually quite good at predicting the outcome of matches. This is attributed to a variety of factors including the fact that their livelihood depends on it and also the sector is much more competitive (due to technology and tax reforms) that have required them to become better to remain competitive.

My conclusion? Use the model, and incorporate bookmakers odds – that gives you the best predictions. But it is quite a lot of data to collect. Alternatively, develop your own predictions methodology and see how it performs.

A compound framework for sports results prediction: A football case study

The latest paper that caught my attention, that I thought I would comment on is (other publications I have commented are can be seen here).

Byungho Min, Jinhyuck Kim, Chongyoun Choe, Hyeonsang Eom and R.I. (Bob) McKay (2008) A compound framework for sports results prediction: A football case study, Knowledge Based Systems, 21(7), 551-562 (doi).

You might be able to download a copy of the paper from here. Note that this link may not be permanent and it may not be an exact copy of the paper that was published (although it does look like it).

The paper presents a framework which is designed to predict the outcome of football matches. They call their system FRES (Football Result Expert System).

The authors note that most previous research focuses on a single factor when predicting the outcome of a football match, and the main factor that is used usually the score data. Even when other factors are taken into account, the score tends to still dominate the prediction process.

Within FRES, two machine learning methodologies are utilised, a rule-based system and Bayesian networks. The paper describes how they are used within FRES in enough detail to allow readers to produce (as all good scientific papers should do) the system.

FRES is tested on the 2002 World Cup tournament. Most football prediction systems are tested on league competitions, where teams (typically) play a double round robin tournament. Testing their approach on a the 2002 World Cup means that the system cannot easily be compared to other systems. Where previous approaches have been tested on other tournaments (for example, previous World Cups) not all the data was available to enable FRES to make those predictions. In the words of the authors, “In the case of the few works which predict a tournament such as the World Cup, the available evaluation was conducted with old data, such as the World Cup 1994, 1998, which would unfairly hobble FRES, since some of the data it relies on are not available for these earlier tournaments.

Although not a scientific term (at least not one I am familiar with!), I do like the term unfairly hobble.

In order to provide some sort of comparison with FRES, the paper implements two other systems, a historic predictor and a discounted historic predictor.

FRES was able to predict six countries out of the top eight in the tournament, The other predictors were able to predict five. Moreover, various statistical tests are conducted which confirms that FRES is statistically better than the other two methods.

One thing I like about the FRES system is that is has a lookahead mechanism. Based on this, England does not rate very highly as, due to the draw, there is a high probability that they will meet Brazil in the quarter finals. Turkey, on the other hand are rated more highly due to the perceived easier draw.

It would be useful to have FRES tested on league competitions, so that better comparisons could be made with more prediction systems that have been reported in the scientific literature. Perhaps the authors are working on that now? It would, for example, be interesting to see if it beats a random method, or a method which always predicts a home win (as the authors did in the paper I discussed a few days ago).


Sports Forecasting: A Comparison of the Forecast Accuracy of Prediction Markets, Betting Odds and Tipsters

In some of my posts I comment on a scientific paper that has caught my eye. There is no particular reason for the papers that I choose, they are just of interest to me. In this post, the paper that caught my eye was (comments on other papers can be seen here).

Martin Spann and Bernd Skiera (2009) Sports forecasting: a comparison of the forecast accuracy of prediction markets, betting odds and tipsters, Journal of Forecasting, 28(1), 55-72 (doi).

This paper looks at three different prediction methods, and assesses their effectiveness in predicting the outcome of premier league matches from the German football league. The three methods that are investigated are prediction markets, tipsters and betting odds.

Prediction Markets are based on various people taking a stance on the same event and willing to back their hunch by paying (or collecting) money should their hunch be wrong (or right). Given that a number of people are taking a stance on the same event, it can be seen as a predictive model of the event.

Tipsters are (or should be) the views of experts who publish their predictions in newspapers, on web sites etc. The advice from tipsters is often based on their expertise, rather than applying some system or formal model. The paper (citing Forrest and Simmons, 2000 as its source) says that tipsters can often beat a random selection method, but does worse than simply choosing a home win every time. It also cites Andersson et al., saying that soccer experts often do worse that people who are less well informed about the game.

Betting Odds, in previous work, have found to be a good forecasting method (not surprising I suppose seeing as the bookmakers rely on setting the correct prices to make their living). Of course, the bookmakers can change their odds but when they publish fixed odds (on say a special betting coupon), this can be seen a prediction of the match outcome.

The games that are forecast in this paper are those from the German premier league from three seasons (1999-2000, 2000-2001 and 2001-2002).  The number of games predicted by each method varied (Prediction Markets and Betting Odds = 837, Tipsters = 721 and Prediction Markets, Betting Odds and Tipsters = 678). The number of predictions for each method varied simply due to the data that was available and where the number of games is between two, or three, methods, this is the intersection of the games that that method was able to predict.

To evaluate each method, the authors calculate the percentage of correct predictions. They also calculate the root mean squared error, as well as the amount of money that each method would have won (three figures are given, a 25% fee, a 12% fee and no fee). Comparisons are also made with a random selection policy as well as a naive selection policy, which simply assume a home win.

So, what did the authors find? Over the 837 games, the prediction market and betting odds were able to predict 52.69% and 52.93% of games respectively. If there was no fee this would have returned a profit of 12.30% and 11.92% respectively. The naive model (pick home wins) predicted 50.42% correctly and returned a profit of 11.79% The random method only managed to predict 37.98% of games correctly.

If we look at the 678 games that all three methods could predict, then the percentages of correct predictions were 54.28% (prediction market), 53.69% (betting odds), 42.63% (tipsters), 50.88% (naive model) and 37.98% (random). The returns (assuming no fee) were 16.20% (prediction market), 13.49% (betting odds), -0.19% (tipsters) and 12.44% (naive model).

I’m not sure why, but profit information is not given for the random model but it would almost certainly result in a loss.

A further test is also carried out. Only games where methods agree on the selection are bet upon. For example, of the 678 games, there are 380 games where the three methods agree on the result. If we only bet on those games, we get a correct prediction percentage of 57.11%, higher than any of the methods used in isolation, and betting on every game. The profit return would be 13.86% (no fee), 1.66% (12% fee) and -8.72% (25% fee).

The authors conclude that the prediction market and the betting odds provide the best indication of the outcome. They agree with previous work that tipsters are generally quite poor at prediction.



Can Forecasters Forecast Successfully?: Evidence from UK Betting Markets

Journal of Forecasting, 19(6): 505-513I am occasionally blog on a paper that is of interest. Well, of interest to me. The latest paper to catch my eye is (other papers I have commented on can be seen here).

Leighton Vaughan Williams (2000) Can Forecasters Forecast Successfully?: Evidence from UK Betting Markets, Journal of Forecasting, 19(6), 505-513 (doi).

The reason that this paper was of interest is because I was reading Leighton’s book (Betting to Win : A Professional Guide to Profitable Betting, High Stakes Publishing, 2002, ISBN: 1-84344-015-6) and this paper was mentioned. Many (many, many) years ago, long before I was an academic, I went through a phase where I collected all sort of horse racing systems. My idea was to test them all out to see if any of them worked. I never actually placed a bet and I never really tested any systems as they generally involved too much time to collect all the data, process the data etc. etc.

Since then I have still thought that it would be interesting to look at various horse racing systems to see if they worked.

This is what this paper does. Unlike my idea though, it takes tips from services that you subscribe to, either by paying money or by contacting them using a premium rate phone service. This seems a lot more sensible, rather than having to enter all the data yourself.

This paper looks at the performance of tipping services, with the analysis being carried out in 1995. Five services were compared. Four of these were subscription based. That is a fee is paid, and you gets tips at various times. In 1995, these services cost at least 99 GBP per month, which seems a lot to me now, let alone in 1995. The other service was a premium rate phone number, where you phone up to receive the tip and the costs of the phone call effectively covers the cost of the service. These five services were chosen as they were amongst the top tipping services as assessed by the Racing Information Database (I have tried to google this and am not sure that it is in existence anymore, but would be willing to be corrected, and update this post to provide a link).

The paper goes through each of the tipping services and evaluates how many tips were provided (and over what period – some, for example, were analysed over three months, others over six months – I think the period was probably chosen to ensure that a sufficient number of tips were analysed as not all services provide tips at the same interval), any conditions associated with the service (for example, only bet if a certain price is available), the profit (or loss) from investing in that service etc.

The good news is that all the tipping services produced a pre-tax profit when used with the relevant staking/price plans. Leighton also makes the point that none of these profits could be said to be significant. It was also interesting to note that increased profits could have been achieved if some of the lesser supported tips were ignored. Of course, this would be a hindsight examination and the obvious question would be, when in play, what tips do you ignore, and what ones do you actively bet upon? There is also evidence that you should use a variable staking plan, rather than a flat stakes method.

If you use a tipping service, there are also other factors to take into account. There is an upfront investment (which you may never recover). Unlike an academic study, you will probably only choose one and which one do you choose? There is also (as pointed out by Leighton, in chapter 20 of his book) the fact that you have to take what the tipping services advertise with a pinch of salt. As an example, a service might only say bet if you can get a price of 4-1 or better. What happens if that price is almost impossible (or even actually impossible) to get, will the service still include that in their results if the horse should win? And what if there was a price available (even for a few seconds) at 6-1, would the service return that as the price you could have got even though, unless you were very quick, or very lucky, you would have struggled to get on at 6-1.

It is twelve years since the paper was published, and seventeen years since the analysis was carried out and things have moved on. Tipping services (I suspect) come and go, technology has moved on, the tax regime has changed and there are now many other ways to bet which were not so predominant at the time. I am thinking specifically of spread betting and betting exchanges. These have, undoubtedly, made a big difference to the industry.

I am not up to date with the scientific literature in the area of sports forecasting and I suspect that there are many papers out there that provide various comparisons and analysis. If you know of any, or know of a good review paper in this area, I would be very interested if you could post a comment giving the reference.

I would also be interested in hearing from any professional tipping services (no matter what sport, but UK based as I don’t claim to understand American sports or markets) who wish to subject their service to scientific analysis. Note, this is not an open invite to advertise your service on this blog. I get enough spam as it is (and moderate it out) and I don’t want the comments box filled up with lightly disguised adverts to various web sites that claim to make millionaires from people who subscribe. But serious enquiries are welcomed.


Prediction of sporting events: A Scientific Approach

My final year undergraduate dissertation project (many years ago) attempted to predict the outcome of horse races using Neural Networks. I briefly blogged about it in June 2009 (http://graham-kendall.com/blog/?p=8/).

The result of the project was (in my view) encouraging but was lacking in a couple of areas. The data was incomplete (the starting prices were not available so I had to make some assumptions and it would have been more useful to have studied a greater number of races). I would also have liked to have tried some other prediction methods, beyond just neural networks.

Since doing that project I have maintained an interest in predicting sporting events, although sports scheduling (e.g. 10.1016/j.cor.2009.05.013 and 10.1057/palgrave.jors.2602382) has seemed to have taken up more of my time. But I have always wanted to return to prediction, utilising Operations Research methodologies.  As such, I maintain a database of any literature that I see on the topic. This incudes the scientific literature, as well any newspaper cuttings, useful web sites etc.


One of the problems that serious sports forecasters face is being taken seriously. A quick google for sports prediction (or many other similar terms) will bring up many sites offering services that (supposedly) enable you to make money. The services typically involve investing in some system, or subscribing to a service where you are sent the predictions for you back in whatever way you see fit.

Of course, if we were sceptical, we might assume that many of these services are really there to make money for the people selling the service, rather then those who are buying. I am sure that there are some services out there that make money for both the seller and the buyer, but the challenge is to find out which services offer value for money before you go bankrupt in the pocess!


Unfortunately, there are not that many scientific papers that consider how to predict the outcome of sporting events, at least as a way to return a monetary profit. There are some, of course. For example an article that appeared last year in the International Journal of Forecasting


S. Lessmann, M-C. Sung, and J.E.V. Johnson (2010) Alternative methods of predicting competitive events: An application in horserace betting markets, 26:518-536, DOI: 10.1016/j.ijforecast.2009.12.013


considered how to predict horse races. The motivation of the article was actually to try and predict competive events such as political elections and (of course) sporting events, although the paper was really a large scale (1000 races,  12,092 runners) study. The paper concluded that their proposed model was able to provide an increase in wealth of just over 528% if using a Kelly (Kelly, J. L. (1956). A new interpretation of information rate. The Bell System Technical Journal, 35, 917–926) strategy, with reinvestment.


Considering other sports, such as football (the UK version), a couple of examples of predicting matches can be found in Economics, Management and Optimization in Sports, Springer, 2004, ISBN: 3-540-20712-0

In Using Statistics to Predict Scores in English Premier League Soccer (John S. Croucher, pp 43-57), various models are presented that attempt to fit the number of goals scored by each team. The best model found had a Poisson distribution.

Another paper from the same book (Modelling and Forecasting Match Results in the English Premier League and Football League (Stephen Dobson and John Goddard, pp 59-77)) considers about 30 seasons of data. This paper also uses a statistical method, but assigns probabilities to win, lose or draw, rather than trying to predict the number of goals scored. This paper also provides a good overview of previous work in this area.


I suppose, the stock market is one area that has been widely studied with respect to prediction, with an eye on turning a profit. There have been hundreds (if not thousands) of papers that look at ways to predict stock prices, interest rates, inflation etc.


Maybe, not surprisingly, there has been limited reporting in the scientific literature as to whether anybody has made (or makes) money from the methodologies that they have developed. After all, if you have a successful system, why tell everybody about it (which is one of the major arguments as to why would you buy a system/tips from a service on the internet).


What I would actually like to see is a lot more scientific papers not only reporting their predictive systems but also how much money was made, over what period of time and if the system is in daily/weekly use at the time of writing.

Of course, the system needs to reproducible (as should all good scientific writing).

However, if the system is successful, the author(s) might be unwilling to reveal its secrets but might still want to let the world know about its effectiveness. Under these circumstances I have a few ideas as to how this could be done.

  1. In the run up to publishing the paper, the authors make a series of predictions and lodges them with a reputable source. This could be another scientist, a lawyer, or even published on a web site that can be verified from a date/time point of view. The important thing is to ensure that the predictions can be verified as being made in advance of the event. If these predictions were made over a period of (say) six months, then this could form part of the results presented in a paper.
  2. It is, of course, uderstandable that authors do not want to publish the full details of their winning methodology in a scientific paper but, as scientists, we like to publish our work. The scientific community should be understandable of this, in the same that they are accepting that sometimes certain factors must be kept confidential due to commercial sensitivities. Therefore, the general methodology could be described but omitting key points (and being upfront about that) but, if combined with 1), above, then this could still make a contribution to the scientific literature.


An attractive alternative would be to run a prediction competition (see Kaggle, who are doing excellent work in this area), where competitors are given a set of data and asked to provide predictions on the outcome of sporting events; ideally those that have not taken place yet.


In summary, I would really like to see more reporting (on a sicientific basis) of sports predictions which are unasheamedly about trying to return a profit, as this is an under represented area at the moment. Why not have a go?


Note: I entered this blog entry into the INFORMS blog competition. The March 2011 competition was O.R. and Sports.

Football Prediction: A decision to be made

Today I have been working on my research that is investigating if it is possible to predict the outcome of football matches. The measure I will eventually use, to see if the predictions can be considered successful, will include if it can make money at the bookmakers, if it it more successful than other tipsters etc.

One of the functions I have in my system is to be able to generate the league table for a given date. That is, taking into account the fixtures played to date, generate the league table for any point in the season.

I believe that my function is working correctly and today I was carrying out some tests to see if the league tables I generated were:

  1. Correct at the end of the season. That is, taking into account every match played, is my input data correct and does my algorithm process that data correctly.
  2. Does my algorithm, given a date, generate the correct table for that point in the season.

I initially thought that point 2 would be very time consuming to check but I found a very useful web site. http://www.statto.com is not only a very useful web site (for all sorts of reasons) but one of the facilities it offers is to generate a league table for a particular point in the season.

When doing my checks (and there are still a lot more to do), I have found some problems where my generated tables are not correct. This is almost certainly down to my inputting the results incorrectly, so I need to check all those.

However, my checks also highlighted another problem. Actually, I knew this was something that I needed to address but I had not really thought it through.

The problem arises when teams having points taken away. I knew that this happened and I had yet to include it in my system so I was expecting the tables not to match exactly.

However, I had assumed that the points were deducted at the start of the season but this does not seem to be the case. It appears that the points can be deducted at any time in the season.
This is not too much of an issue. It just makes the programming more complex than I had hoped.

The real issue is what do I do when a team has had points deducted?

Let me give you an example. A team has won 3 games and drawn 2. That means that they have received 11 points (you get 3 points for a win and 1 point or a draw). But, if they have had 10 points deducted then they will only have 1 point. This obviously affects their league position. If I am using the league position as one of the contributory factors in my predictive model, is this fair – or should I ignore the points deduction for the purposes of prediction?
On the other hand, their league position, with the points deduction, may affect the way they play, and could be a factor in the prediction.

I’m not quite sure what I am going to do yet.