Jan 27 2012

Twitter: Identifying Potential Followers

Posted by gxkendall in Tweet, Twitter

In my previous twitter posts I have discussed various things that I have done (or am thinking of doing) to try and get a twitter service up and running, where I tweet on a certain subject for 24 hours, and then I change to something else. That is now all working and seems to be standing up pretty well. You can also subscribe to the service, and I’ll tell you when I start tweeting about your subject(s) of interest. In my last post I was saying that registrations are slow; in fact non-existent.

I mentioned a few ideas about how to attract new followers. One of them was ‘I have been investigating how to collect potential twitter users and then tweet them directly about the service. In fact, the twitter search API provides this type of functionality and I’ll be looking at this soon.‘ Over the past couple of days I have been working on this and I think I have a system that is now just about up and running. It works like this.

I have a cron job (that is a piece of PHP that runs every so often) that tries to identify potential followers. It does this by looking for previous tweets that includes the term that I am currently tweeting about. That might be, for example, ‘Vehicle Routing‘, ‘Healthcare‘ or ‘Bin Packing‘. The twitter API (Applications Programming Interface) actually provides this functionality via its search engine (see here). Once you have received a number of recent tweets that mention your search terms, it’s an easy task to parse that text to be left with just the twitter usernames.

Once I have the usernames, I store them in a database. As well as the usernames, I also store the time I have seen from that person, the number of messages I saw from that person, the last time I tweeted them and the subject that I was tweeting about at the time. When I add a new person to the database, I set their last contact date as four weeks ago (for reasons that will become clear in a moment).

I run this cron job every four hours, so that I do not keep picking up the same messages again and again (and thus incrementing the count of the number of times I have seen the user, when it is, in fact, the same message).

I have had this running for a couple of days now, so now I can start contacting people who might be interested in my twitter feed.

To do this, I have another cron job. At the moment, this runs every thirty minutes, but that is because I am still testing it. It will eventually go to about four hours as I don’t want my twitter feed to get too full of these messages. When I run the job, I extract all the tweets for the current subject and sort it by the number of times I have seen that person. This is on the basis that the more somebody has tweeted about the subject themselves, the more likely that they will be interested in what I to say. I then go through each of these users and check the following:

  1. Do they follow me already? If they do, I ignore them as it is pointless asking somebody to follow you who already is.
  2. Have I tweeted them in the past 30 days? If I have, I ignore them, as I don’t want to annoy people by continually tweeting them. Hopefully, they won’t mind once  a month, but if I do get complaints, I’ll set up an ignore list, so I don’t tweet people on that list.
  3. Have I seen them at least twice? I want to know that they have tweeted at least a couple of times on a given subject. Even so, this is not foolproof. I saw somebody tweet twice about bin packing. They were actually tweeting that they have been packing for their holidays!

If I can find somebody who fulfills the three criteria above, I send them a tweet. Hopefully they will be interested enough to follow me.

So, this is now live. I still need to put in a few things. For example, I need to keep the database tidy, so I need to have some way of deleting records, but that can wait for a while.

It’ll be interesting to see if this actually just annoys people, or does increase my number of followers. I’ll let you know.

 

Jan 24 2012

Twitter: Registrations are slow!

Posted by gxkendall in Twitter

Following my last post, my twitter system has been up and running for a few weeks now. Every day I switch domains and start tweeting about something else. The system seems to work well, although I have not had anybody register yet. However, I know that people are looking at the service as I can see the number of hits on my various web pages.

I suppose, people can just search twitter, so don’t need to register. But, the beauty of the system is that people don’t actually need to follow me on Twitter. They can register and then I will send them an email when I tweet about the subject they are interested in and, optionally send them an email of the actual tweets. Of course, I’d rather that they follow me but, they can if they wish, not even go near twitter, but just register and then wait for the emails or just do a search on twitter at the end of the day to see what I have had to say.

Having said that, when I do tweet about a certain subject, that is all I do. Unless you see the change domain tweet (which happens at 6am UK time), you are unlikely to be aware of the service. So, I have a few plans to try and raise the profile of the service.

  • This is obviously one way; blog about it.
  • I have been investigating how to collect potential twitter users and then tweet them directly about the service. In fact, the twitter search API provides this type of functionality and I’ll be looking at this soon.
  • I think I might do a few more general tweets throughout the day, just linking to the service, rather than having all the tweets simply about the domain.
  • I might also try and interact a little more with the publishers. INFORMS were good enough to retweet quite a few tweets, but I could do with more of that from the other publishers. It’ll be interesting anyway, just to see if that has any effect.
  • And, of course, I have to make the tweets as interesting as possible. That is difficult as what is interesting to one person might not be interesting to another. I think the best solution is to have really targeted domains so that it might only be of interest to a small number of people, but they will find most of the tweets useful.

Finally, in case you are interested, the domains I currently tweet about are Bin Packing, Gambling, Genetic Algorithms, Healthcare, Timetabling and Vehicle Routing. More details are available here and I am open to suggestions for other areas of interest.

Dec 31 2011

Beta Test for Twitter Service

In my last two posts (see here and here) I outlined a few ideas I had for improving the twitter feeds that I have had running for about a year. The original idea was just to tweet every so often (about ten times a day), with each tweet being a random scientific publication. This worked pretty well but I thought that it might be more interesting/useful if I tweeted on a particular subject over a 24 hour period. That way, followers would only need to look at my twitter feed when I was tweeting on a subject of interest to them. In fact, they would not even need to follow me, they could just subscribe to the service and when I was tweeting about a subject they wanted to follow, they can do this, simply by using the search options on Twitter.

I have spent a few hours getting an initial implementation.

As it stands the system:

  • Enables you to register;
  • You can choose which subjects you are interested in;
  • When I start tweeting about those subjects I will send you an email and/or a tweet (you choose). That is, just as I start a 24 hour tweeting session on your subject of interest, I let you know;
  • I did implement functionality where I would email/tweet you whenever I did an individual tweet on a subject that interested you, but I disabled in, on the basis that I might be tweeting too often;
  • Although it is all in the background, there has been a lot done to ensure that you confirm who you are when you register, else somebody could register for you. What happens now is that I send you an email and you have to click on a link before your registration is confirmed.

I am now welcoming people to register, in order to try and the system. I’ll class it as beta (just in case it all goes wrong!) but I’ll try to ensure that I don’t ask everybody to register again.

Of course, if you do use the system, I’d welcome any comments you have.

Like all these things, it is never finished and I have some things that I need/want to do.

  • The most important is to get the database populated with better quality entries. The database is not too bad at the moment, but I have started the (long, very long) process of collecting more scientific publications, which are even more relevant than the ones I have there at the moment;
  • I would like to interact with the various publishers more than I do at the moment (which is not at all, except for the occasional retweet from INFORMS (thank you));
  • As well as tweeting on scientific publications, it would also be useful to have more generic tweets on a particular subject. But this is quite difficult to do; or at least collect good quality tweets;
  • Some of the subjects do need a little refining. For example, sport is a tough one as it covers so many different areas (e.g. different sports as well as injuries, predictions, gambling etc.). Surveys is also a tough one as it is not immediately obvious what a ‘survey’ is. Anyhow, I am sure that the subjects will evolve over time and become more coherent;
  • One of the problems I recognise is that if you are interested in one topic out of seven (and more in the future) you won’t see many tweets as it takes while for each subject to come round. Therefore, in the future, there might be a case for refining the system so that we have a specific twitter account for a given topic.

In implementing this system, I have faced several challenges (such as how to register people, how to maintain a database, how to automate tweets). I will discuss some of these issues in later posts, as I am sure that other people face similar challenges and, perhaps (quite likely) I have not done things in the best way possible so I might be able to learn how to do something better.

 

Dec 16 2011

Improved tweeting

Posted by gxkendall in bibtex, Citations, database, doi, PHP, Tweet, Twitter

In my last post I set out a few ideas about some of the improvements I was thinking about making to improve my twitter feed. In essence I was toying with the idea of tweeting on certain subjects for a 24 hour period.

Over the past few days I have been working on a basic implementation for the ideas I mentioned.

The system, which is now live, chooses randomly from a set of a domains and then tweets about that domain for the next 24 hours (I may actually change this so that it tweets for a varying (random) amount of time for each domain). I did face a number of issues when I was implementing the system.

By far the biggest was populating the database of tweets. It is not exactly complicated but very time consuming. I’ll outline the challenges that I faced in a future blog. Indeed, I am still facing problems as I write this and this is an area that I need to revisit as the quality of the tweets are really down to the quality of the database. My current thinking is to start the database again, from scratch, but at the moment it is serving its purpose.

Another problem I faced was that part way through implementing this new system, my old system stopped tweeting. I am not sure if this was something that I changed (I don’t think so) or whether it was a change made by twitter. It actually took me a long time to resolve this and I eventually had to bite the bullet and access the twitter system via the OAuth mechanism, which is the recommended way. So I am now doing it the way they want me to, rather than the legacy method that I was using. So, if nothing else, I have made a change that was probably long overdue and that, hopefully, will stand the test of time.

The new system is far from complete. I still want to add the functionality where people can register for domains of interest to them so that I can inform them when I am about to tweet about a subject in which they are particularly interested. I also want to add other types of tweet (not just journal articles) that I hope will inform people (and publishers) about what I am doing; but more of that later.

But, the basic system is up and running and I have started to write some of the supporting web pages. If you are interested, take a look at http://www.graham-Kendall.com/twitter, although these are also earmarked for considerable updating.

I’ll let this new system bed in for a while and then do some more development. But, I have to say, that I am pleased with the system so far. I think that it is a lot more useful than just randomly tweeting scientific papers. At least now, if you are interested in a certain topic you can just keep an eye out over a 24 hour period, or just do a search (each tweet is tagged with #orj and #XXX, where XXX is a three character code representing the domain).

Dec 08 2011

Improving/targeting my Twitter Feed

Posted by gxkendall in PHP, SQL, Tweet, Twitter

Some of you might subscribe to my twitter feed you’ll see that I tweet journal articles on a regular basis. In fact, at a recent conference, I was taken to task (in a nice way) about how could tweet so often and, more to the point, how I could have tweeted whilst giving a presentation!

Actually, this is an automated procedure (punctuated with personal tweets on a regular basis). The automated tweets are proving to be popular, at least from the Re-Tweets and the favourite’d tweets.

However, I am aware that not all my tweets are of interest to everybody, all the time. That is, some people might be interested in the Vehicle Routing Problem but don’t really care about Personnel Rostering. There is not a lot I can do about that (if you follow me, you follow me – there is nothing selective about it). I suppose I could set up an individual account for each domain (Vehicle Routing, Travelling Salesman, Personnel Rostering etc.) but that seems a little excessive; not least of all from a maintenance point of view on my part!

Thinking about this, I came up with the idea that I could tweet about one particular topic in a 24 hour period and then change topic for the next 24 hours; and so on. In this way, you would only need to pay particular attention to my twitter feed when you knew I was tweeting about something that you were interested in. The idea is still very much in my head at the moment but I have some ideas about how I would implement this (with a sortof of draft almost done – but much work to do), which I’ll share with you in due course.

A further stage would be to allow people to register for certain domains and, when that domain is selected to be tweeted for the next 24 hours, then I could @message and/or email the person in order to tell them to pay extra attention to my twitter feed for the next 24 hours. But this would test my PHP/SQL skills beyond what they are at the moment, but that is not necessarily a bad thing.

As I say, mostly still in the idea stage at the moment (with some test functionality under development), but definitely something I want to take forward.

More soon.

Sep 06 2011

Tracking Paper Downloads: Database

In my last post I outlined a few thoughts about tracking downloads of papers from the MISTA web site. Of course, the ideas can be used on any web site but I am particularly interested in MISTA at the moment.

I have now started to develop the database, which will be a MySQL  database which will be updated via PHP.

The database design is still very much work in progress but my initial thoughts are to hold the following fields.

The first table is the paper downloads table. This will hold the following:

id: Auto incrementing index just to track the number of downloads.

bibtex: This is the bibtex key of the paper that was requested. In the future I might use the doi (Digital Object Identifier) but bibtex is the best thing for me to uniquely identify apaper at the moment.

whenRequested: This is be a time stamp indicating when the request was received.

whenRetrieved: This is a time stamp indicating when the paper was actually downloaded.

accessCode: This will be a link between when the paper is requested and when it is retrieved. I will talk more about this in a later blog.

givenName: This is the given name of the person requesting the paper. As I said on my previous blog, I may not actually use this.

familyName: This is the family name of the person requesting the paper. As I said on my previous blog, I may not actually use this.

affiliation: This is the affiliation (university or company) of the person requesting the paper. As I said on my previous blog, I may not actually use this.

email: This is the email address of the person requesting the paper. This field will definitely be used.

retrieved: This is a boolean flag, indicating if the paper has been retrieved. I could use the retrieval date for this so I suppose I am breaking at least one the rules for defining a database, but I think a boolean flag is useful. I will outline the use of this flag in a later blog.

 

There will be another table (papers). This will hold three fields:

bibtex: This is a unique identifier (for this table) which links it to the downloads table (above). Again, I could use the doi but, for now, I will use the bibtex key.

title: This is the title of the paper.

timesDownloaded: This will maintain a count of the number of times that the paper has ben downloaded. I could get it from the download table but having it stoed in this paper means that it is much quicker to access.

 

These are my thoughts so far. As I say, very much work in progress and I have no doubts that it will change but, at lest, it’s a start.

 

 

Sep 03 2011

Tracking Paper Downloads for MISTA

Posted by gxkendall in Conference, MISTA, Scheduling

The fifth MISTA conference  (www.mistaconference.org) has just taken place in Arizona. The web site, I think looks pretty good but there is a lot more that I would like to do with it. For example, I should have all the papers available for download and, over the next few months, I am going to put in the effort required to make them all available. I guess that there will be at least 500 of them (including abstracts) which, I believe, is a useful resource for the scheduling community.

But, when I do make them available I would like to do a few things:

  1. I would like to know how many times a paper is being downloaded. This is useful information for the authors as well as for the MISTA organisers.
  2. I would like to collect email addresses as potential conference delegates. I know that people may not like this sort of thing but as long as we are up front about it and, in any case, they are getting the paper for free.

So, for a few days now, I have been sketching out a few ideas as to how I could get this to work. I think I would have the same look/feel as my own publications (see http://graham-kendall.com/publications/index.php?type=all) where I list my publications but, for each one, you can go to another page and see all the details about that paper; including being able to download it.

The difference with the MISTA web site would be the fact that when you wanted to download a paper a couple of things would happen. Firstly you would be asked for your email address (and perhaps name and company/institution – but that might be a bit much). It would also say that they would be email’ed about future MISTA conferences and obtaining the paper says that they agree with this.  Once you had entered all the required information, you would be sent an email, with a link in it which would enable you (for one time only) to download the paper.

That’s the idea. Now all I need to do is design the database and write the associated PHP scripts. Oh, and get all the papers in a form that they can be downloaded, which is actually the most time consuming part.

Views welcome.

 

Mar 26 2011

Prediction of sporting events: A Scientific Approach

My final year undergraduate dissertation project (many years ago) attempted to predict the outcome of horse races using Neural Networks. I briefly blogged about it in June 2009 (http://graham-kendall.com/blog/?p=8/).

The result of the project was (in my view) encouraging but was lacking in a couple of areas. The data was incomplete (the starting prices were not available so I had to make some assumptions and it would have been more useful to have studied a greater number of races). I would also have liked to have tried some other predition methods, beyond just neural networks.

Since doing that project I have maintained an interest in predicting sporting events, although sports scheduling (e.g. 10.1016/j.cor.2009.05.013 and 10.1057/palgrave.jors.2602382) has seemed to have taken up more of my time. But I have always wanted to return to prediction, utilising Operations Research methodologies.  As such, I maintain a database of any literature that I see on the topic. This incudes the scientific literature, as well any newspaper cuttings, useful web sites etc.

 

One of the problems that serious sports forecasters face is being taken seriously. A quick google for sports prediction (or many other similar terms) will bring up many sites offering services that (supposedly) enable you to make money. The services typically involve investing in some system, or subscribing to a service where you are sent the predictions for you back in whatever way you see fit.

Of course, if we were sceptical, we might assume that many of these services are really there to make money for the people selling the service, rather then those who are buying. I am sure that there are some services out there that make money for both the seller and the buyer, but the challenge is to find out which services offer value for money before you go bankrupt in the pocess!

 

Unfortunately, there are not that many scientific papers that consider how to predict the outcome of sporting events, at least as a way to return a monetary profit. There are some, of course. For example an article that appeared last year in the International Journal of Forecasting

 

S. Lessmann, M-C. Sung, and J.E.V. Johnson (2010) Alternative methods of predicting competitive events: An application in horserace betting markets, 26:518-536, DOI: 10.1016/j.ijforecast.2009.12.013

 

considered how to predict horse races. The motivation of the article was actually to try and predict competive events such as political elections and (of course) sporting events, although the paper was really a large scale (1000 races,  12,092 runners) study. The paper concluded that their proposed model was able to provide an increase in wealth of just over 528% if using a Kelly (Kelly, J. L. (1956). A new interpretation of information rate. The Bell System Technical Journal, 35, 917–926) strategy, with reinvestment.

 

Considering other sports, such as football (the UK version), a couple of examples of predicting matches can be found in Economics, Management and Optimization in Sports, Springer, 2004, ISBN: 3-540-20712-0

In Using Statistics to Predict Scores in English Premier League Soccer (John S. Croucher, pp 43-57), various models are presented that attempt to fit the number of goals scored by each team. The best model found had a Poisson distribution.

Another paper from the same book (Modelling and Forecasting Match Results in the English Premier League and Football League (Stephen Dobson and John Goddard, pp 59-77)) considers about 30 seasons of data. This paper also uses a statistical method, but assigns probabilities to win, lose or draw, rather than trying to predict the number of goals scored. This paper also provides a good overview of previous work in this area.

 

I suppose, the stock market is one area that has been widely studied with respect to prediction, with an eye on turning a profit. There have been hundreds (if not thousands) of papers that look at ways to predict stock prices, interest rates, inflation etc.

 

Maybe, not surprisingly, there has been limited reporting in the scientific literature as to whether anybody has made (or makes) money from the methodologies that they have developed. After all, if you have a successful system, why tell everybody about it (which is one of the major arguments as to why would you buy a system/tips from a service on the internet).

 

What I would actually like to see is a lot more scientific papers not only reporting their predictive systems but also how much money was made, over what period of time and if the system is in daily/weekly use at the time of writing.

Of course, the system needs to reproducible (as should all good scientific writing).

However, if the system is successful, the author(s) might be unwilling to reveal its secrets but might still want to let the world know about its effectiveness. Under these circumstances I have a few ideas as to how this could be done.

  1. In the run up to publishing the paper, the authors make a series of predictions and lodges them with a reputable source. This could be another scientist, a lawyer, or even published on a web site that can be verified from a date/time point of view. The important thing is to ensure that the predictions can be verified as being made in advance of the event. If these predictions were made over a period of (say) six months, then this could form part of the results presented in a paper.
  2. It is, of course, uderstandable that authors do not want to publish the full details of their winning methodology in a scientific paper but, as scientists, we like to publish our work. The scientific community should be understandable of this, in the same that they are accepting that sometimes certain factors must be kept confidential due to commercial sensitivities. Therefore, the general methodology could be described but omitting key points (and being upfront about that) but, if combined with 1), above, then this could still make a contribution to the scientific literature.

 

An attractive alternative would be to run a prediction competition (see Kaggle, who are doing excellent work in this area), where competitors are given a set of data and asked to provide predictions on the outcome of sporting events; ideally those that have not taken place yet.

 

In summary, I would really like to see more reporting (on a sicientific basis) of sports predictions which are unasheamedly about trying to return a profit, as this is an under represented area at the moment. Why not have a go?

 

Note: I entered this blog entry into the INFORMS blog competition. The March 2011 competition was O.R. and Sports.

Jan 13 2011

Tweeting from PHP

Posted by gxkendall in PHP, SQL, Tweet

For a while I have been looking at ways of tweeting regularly. Not to annoy people, but just to have a presence a few times a day. Of course, I hope the tweets are also informative.

A quick google will find many tools that are available to do this but, in my view, they all have their shortcomings. Some want payment,  some you have to have your computer on all the time, some are difficult to use, others only have limited functionality and others did not seem to work at all.

Another problem I found was that there was limited scheduling available. For example, you could send the same tweet every n hours or days. You generally had to pay get anything more sopohisticated.

At one point I was running a number of automatic tweet services, just to try and get the service I wanted but it was not really working and took a long time to maintain.

So what did I actually want?

Nothing fancy. I just wanted to send interesting tweets, at “random” times and be able to configure how many I send each day. I was thinking one every couple of hours; nothing too much.

After some recent sucesses in learning more about PHP and SQL I decided to investigate if there was a better option. A solution where everthing could be hosted on my own domain so that I was in full charge of the system, as did not have to rely on (or logon, or pay for) other services.

After some searching I found a PHP class at http://www.phpclasses.org/. The class in question is called twitter-auto-publish. This provides a simple set of tools that enables you to easily post a tweet, as well as providing other functiality (which I have yet to explore).

To post a tweet is very simple. The following lines of PHP are all that are required (and the presence of the require libraries – but that is just a simple case of copying the files to a relevant folder).

@session_start();

require_once(“../php/twitter-auto-publish/openinviter_base.inc.php”);

require_once(“../php/twitter-auto-publish/twitter_auto_publish.inc.php”);

$user=’your user name’;

$pass=’your password’;

$rochak=new twitter_auto_publish();

$rochak->login($user,$pass);

echo ‘Tweeted: ‘ . $msg; // this is the tweet you wish to publish

$rochak->updateTwitterStatus($msg);

$rochak->logout();

Most of it is quite obvious as to what it does, perhaps not how it does it though! The important thing, as far as I am concerned, is that this gives me the basis on which to tweet. Now I have that, I am able to starting thinking about how to exploit this to develop a system that does what I want it to.

More later.

Dec 21 2010

Displaying BibTeX on web site

Posted by gxkendall in bibtex, Citations, scientific literature

For a long time I have been wanting to automate the way that I display my publications on my web site. There are facilites such as bib2html. They are very good at what they do but they never did exactly what I wanted.  In fact, at the  ALIO-INFORMS conference in June I recall having long conversations (yes plural) with a good friend of mine about the best way to take a bibTeX file and create a list of publications that is suitable to display on the web. What came across was that we both had slightly different requirements and none of the “off the shelf” solutions completely fitted the bill.

Then I came across the web site by Andreas Classen. This was the closest I had come across that did everything I needed, largely due to the fact that the scripts could be parameterised.

If there is a downside it is that you need to be able to run PHP on the server which serves your web pages. I susppose you also need to understand PHP, but you can get the scripts running without an in-depth understanding. I know, as I didn’t understand PHP, but I got them running!

Once I had got the scripts up and running,  I took the opportunity to learn PHP, as it was a language that I had never used before. If you have used almost any other language (C++, Java etc.) you won’t have any problems learning PHP. Of course, it’s slightly different as you are now dealing with a server side language, rather than a general pupose language. Still a quick google of any issues that you are unsure of usually brings up a solution.

Once I had got to grips with PHP (but I am still far from expert) I decided to start changing things for myself.

If you take a look at http://www.graham-kendall.com/publications/, you see the end result. It is still very much Work In Progress (in that many of my publications are still not correct as I need to overhaul my bibTeX file) but the things to take a look at, in the context of the main message of this blog, are:

  1. As well as displaying all publications, you can view just journals, just conference proceedings, just book chapters etc.
  2. If you look around my site, you can see that it is possible to view different types of papers (e.g. hyper-heuristics, sports scheduling, cutting and packing etc.). This is done by simply searching through the title, abstract and keywords and, if a match is found, then that publication is displayed.
  3. Each publication leads to a separate page where you can download the file (assuming a PDF is available), look at the abstract, go directly to where the publication is help (via the doi) etc.

I still want to do some work on the scripts, but at least I now have the ability to do almost anything I want. The main effort at the moment though is to get the underlying bibTeX correct, so that all my publications display correctly.

But, once it is all up and running, then the only maintenance required is to keep the bibTeX file up to date.

So, thank you Andreas. You provided the inspiration to enable me to do something that I have wanted to do for ages.