Bibtex: Display papers by a given author

For a while I have been blogging on various bibtex related topics. My interest in parsing bibtex sfiles started a long when I was looking for a way to automatically generate my publications web page, rather than having to update the raw HTML every time I published a new paper. There are a few free options around which enable you to do this but all those I looked at had some shortcomings, as far as I was concerned.

In the end, I found a PHP system from Andreas Classen. This system does just about everything I want it to do and, being written in PHP, it is also extendible. The other benefit, although it did not seem like it at the time, was the fact that it forced me to learn PHP. I am really glad I did now as it has proven to be invaluable in many projects that I am working on.

I have already made a number of changes to the system supplied by Andreas, just so that I can do things were not available out of the box. Not that this was a problem. Andreas was not selling a software package and it was good of him to make the software he had written as open source.

The approach I have taken is to split lots of the functions into smaller functions and also put them into a class just to try and make things easier and tidier. The overall aim being that I should be able to string these functions together (in true mash up style) to provide additional functionality.

As an example of this, I have recently added some functionality (albeit, quite specific)  that was not available before now in the original system, and would not have been easy to incorporate into the software as it arrived.

This new functionality enables you to show all the papers that have been published by a single author. That is, I take a bibtex files, generate one record for each author/paper. So if you have a paper that is written by two authors, this will generate two records; one for each author.

The process is as follows:

  1. Read in the bibtex file into an array.
  2. Process each record and extract each author as a separate record (along with some information about the paper). All of this is placed into another array.
  3. Sort this new array by the author family name.
  4. Once you have this array, you can display it as you see fit.

You can see what I do with it here. Note, this is not a page at my web site, but this functionality was developed for the MISTA conference series that I have chaired since 2003.

If you look at this page, you’ll also see I generate a separate menu for each author so that somebody could explore a specific author without having to continually scroll through the page. It also makes it easier to see how many publications a particular person has published.

Whether the system is really needed for the MISTA conference series is open to debate. The conference has published around 500 papers so it is possible to simply scan the papers, but as the conference publishes even more papers it will become increasingly useful, rather than having to look through every paper.

Probably of more use, at least to me, is that the various PHP support functions that I have developed now enables me to develop even more bibtex functionality, which make various other projects I have in mind a realistic possibility.

 

New Blog Theme

Genting Highlands Cable Car

I am impressed with WordPress. It tends to do what it says on the tin and the range of plugins and themes you can get enable you to add almost any sort of functionality.

Yesterday I decided to bite the bullet and install another theme. Actually, installing a theme is very easy. It really is a case of downloading it, activating it, and then seeing how it looks. In fact, you can even do a live preview so that you can see what it looks like before you activate it.

The problems potentially start after you activate the new theme. The type of issues that you might have to resolve are things like re-installing (or more accurately replacing) your widgets. This is quite easy if you can recall what widgets you had previously (so check beforehand). If you have a text widget where you have written your own html, javascript etc. best make sure that that you copy that text before you activate the new theme.

The other issue you might have is if you have updated any WordPress PHP scripts to add increased functionality, fix errors, or just change small things to make the theme look slightly better. In fact, I try not to amend the PHP scripts for exactly these reasons, but sometimes you just have to, to get the features that you need.

I decided to install the WordPress Twenty Eleven theme.  The previous one I was using was JaS Personal Publisher. I was actually very happy with it but I just wanted a change, as well as getting used to installing themes so that I could do it more easily in the future.

The only real problem I had was the fact that on smaller devices (phones, ipads, when you make the browser window smaller), the side bar goes below the main display. Apparently this is a design feature (and I’ve worked in the IT industry long enough to know what this means!).

It took a while, but I eventually found a fix for this feature. In essence, you delete a few lines from the style.css file and this forces a horizontal scroll bar, rather than moving the sidebar downwards.

The only piece of advice, which I was not aware of before, is to create a child theme. This inherits everything (in the same way that inheritance works in Object Oriented programming) from the main theme but if you ever get an update, then your changes are not overwritten.

Once I had the basics up and running, it was a case of making sure that all the settings were correct, uploading some header pictures to replace the standard ones that are supplied. I have gone with a loosely travel theme, where the pictures are of modes of transport at the moment. Things like planes, trams, cars and even cable care. Well, transport is a big Operational Research issue, which is one my main research interests.

The final thing I added was a couple of plugins that I downloaded. One enables people to vote whether they like/dislike a particular post. The other enables readers to share the post to various social networking platforms such as facebook and twitter.

I do like the look of the new theme, but I don’t think that it is perfect and I am still on the look out for something that looks good and provides all the functionality that I think I need.

The search continues but, for now, I am very happy with the new look and functionality of my blog pages.

Informing publishers of my tweeting activities

In my previous posts on twitter I have been describing (in general terms, I will do some posts on some of the more technical aspects soon) how I have automated some of my tweets. The system, in brief, tweets scientific articles on a random basis – doing about twenty tweets a day. Each 24 hours, I change the topic so that if people are interested they can just follow me when I am tweeting about a topic that interests them.

INFORMS, in particular, have been very kind to retweet some of my tweets. I have not checked but, I suspect that the articles they retweet are those that are published in INFORMS journals!

This made me think that I should be a little more proactive in telling the publishers what a good job I am doing for them. As I update my database with articles that I tweet, I have started to add a field which indicates which publisher they come from. At the moment, the publishers I have on file are IEEE, INFORMS, Taylor & Francis, Science Direct and Wiley. All of these also have a twitter account, with the exception of Wiley. Well, maybe they do, but I cannot find it.

Over the past couple of days I have been implementing a system that chooses one of these publishers at random and also chooses a time interval to query. For example, it might be the last 3 months, the last 7 days, the last 11 months etc. It is now a simple matter to run an SQL query from PHP to extract how many tweets I have done for that publisher over the relevant time period. I can then format a tweet so that it says something like

@TandFRef You may have noticed us tweeting your papers? We have tweeted 147 of your papers in the last 29 days

I can then use my automatic tweeting system to post the tweet to twitter. To try and keep things looking fresh, the format of the message I post is randomly selected from a number of templates I have defined. Along, with the variation of the intervals I use, I hope that the publishers will find it useful and not too repetitive.

Of course, one of the ideas behind this is to raise my profile with the publishers, and also on twitter generally but, in doing so, I hope that people find the information useful. I am conscious that it could become intrusive though, so I only do a couple of these tweets a day. Hopefully the publishers will not mind. If they do I can, of course, remove them from the service.

In the future, I am thinking of extending the system so that I can tell people how many tweets I have done on (say) vehicle routing in the past n days/months. I could even combine it with the publisher information so that I can tell the publishers how many of their articles I have tweeted over the past few days/months on a given topic. But I’ll let this new system bed in first.

Some of the Challenges in Parsing Bibtex Authors

In previous posts (see here) I have been talking about a system that I have been working on that enables the publications area of my web site to be driven by a bibtex file. It all seems to work pretty well and maintaining my web site is now a lot easier than it used to be. Another advantage (and one not to be sniffed at) is the fact that my web site has a uniform presentation. I also hope that once I have the various functions that I plan to implement I will be able to do things like present my papers sorted by journal name (so that you can see how many papers I have published in a given journal) and also publish a list of my co-authors. All of this, and more, should be quite easy once the main foundations are in place.

As I progress with this project, and I try to extend the system, there are various challenges that I keep coming up against. One of these is parsing author names.

The system I am developing is based on the one developed by Andreas Classen. It actually does a fantastic job of parsing author names. You simply pass a string of authors from the bibtex file to a function called formatAuthors, and you are returned a string that presents the authors in a standard way. Indeed, this is the method I use on my current web site (see here, but bear in mind the method may have changed by the time you read this).

I have recently been trying to write my own parser, for a research idea that I have. It is not easy! Just to give you a few examples of the issues that have to be addressed:

  • In bibtex, you have various ways that you can specify names. Norman Walsh does a better job that I could in describing some of the ways that names can be provided to bibtex.
  • The formatAuthors function provided by Andreas, as I said, does an excellent job but it is lacking in some ways.  For example, it does not deal with people who have two family names, such as some people from Belgium and Holland who are often called “van xxxxx” or “de xxxxx”.
  • When I was experimenting recently, I saw an author who was called “Billy James III”. The “III” causes problems, in the same way that the Belgium/Holland names do. There would be similar problems with people who have “jr” at the end of their name.

None of these problems are insurmountable. Indeed, the bibtex style files do a great job of handling all types of names. The challenge for me is to try and develop a parser that is able to deal with anything I throw at it, and which will deal with thousands of names, rather than the ones I can easily check from the bibtex file that comprises just my own publications.

Anything that I do, will be based on the formatAuthors function from Andreas but I think it needs a little tweaking just to try and deal with a few more cases.

 

Update: Displaying bibtex on web site

A while ago (see here for my Bibtex posts) I commented that I was working on a system where I could take a bibtex file and display that on my web site. The result can be seen here. The system works pretty well in that my web site (at least this part of it) is driven from a bibtex file.

I have also implemented a system where you can freely download my papers, but you have to supply your email address. This is for a number of reasons. Firstly it is useful to know how many of my papers are being downloaded. Secondly, it is interesting to know what papers people are interested in. Thirdly, it might be useful to collect email addresses to let people know about conferences, new publications etc.

Since the system went live, 108 of my papers have been downloaded. Some of downloads were done by me, just testing the system, or making sure newly added papers could be downloaded – but around 100 downloads is pretty good.

In my previous post I said that all I needed to do was keep my bibtex file up to date. Actually, I have a few ideas as to what I want to do with the system. I have started to work on some of those ideas, which I’ll explain in a future blog.

One of the reasons that I want to extend the system, and also make it easier to maintain, is for a research project that I have in mind. If I decide to go ahead with that, I’ll need to do a lot more bibtex manipulation that I do at the moment and anything that I can do to make my life that little bit easier will be well worth the implementation effort that is required.

Whatever I do, I still owe a big debt to Andreas who was good enough to provide the code that I initially used, and still draw on very heavily.

 

Twitter: Identifying Potential Followers

In my previous twitter posts (you can see my series of Twitter posts here) I have discussed various things that I have done (or am thinking of doing) to try and get a twitter service up and running, where I tweet on a certain subject for 24 hours, and then I change to something else. That is now all working and seems to be standing up pretty well. You can also subscribe to the service, and I’ll tell you when I start tweeting about your subject(s) of interest. In my last post I was saying that registrations are slow; in fact non-existent.

I mentioned a few ideas about how to attract new followers. One of them was ‘I have been investigating how to collect potential twitter users and then tweet them directly about the service. In fact, the twitter search API provides this type of functionality and I’ll be looking at this soon.‘ Over the past couple of days I have been working on this and I think I have a system that is now just about up and running. It works like this.

I have a cron job (that is a piece of PHP that runs every so often) that tries to identify potential followers. It does this by looking for previous tweets that includes the term that I am currently tweeting about. That might be, for example, ‘Vehicle Routing‘, ‘Healthcare‘ or ‘Bin Packing‘. The twitter API (Applications Programming Interface) actually provides this functionality via its search engine (see here). Once you have received a number of recent tweets that mention your search terms, it’s an easy task to parse that text to be left with just the twitter usernames.

Once I have the usernames, I store them in a database. As well as the usernames, I also store the time I have seen from that person, the number of messages I saw from that person, the last time I tweeted them and the subject that I was tweeting about at the time. When I add a new person to the database, I set their last contact date as four weeks ago (for reasons that will become clear in a moment).

I run this cron job every four hours, so that I do not keep picking up the same messages again and again (and thus incrementing the count of the number of times I have seen the user, when it is, in fact, the same message).

I have had this running for a couple of days now, so now I can start contacting people who might be interested in my twitter feed.

To do this, I have another cron job. At the moment, this runs every thirty minutes, but that is because I am still testing it. It will eventually go to about four hours as I don’t want my twitter feed to get too full of these messages. When I run the job, I extract all the tweets for the current subject and sort it by the number of times I have seen that person. This is on the basis that the more somebody has tweeted about the subject themselves, the more likely that they will be interested in what I to say. I then go through each of these users and check the following:

  1. Do they follow me already? If they do, I ignore them as it is pointless asking somebody to follow you who already is.
  2. Have I tweeted them in the past 30 days? If I have, I ignore them, as I don’t want to annoy people by continually tweeting them. Hopefully, they won’t mind once  a month, but if I do get complaints, I’ll set up an ignore list, so I don’t tweet people on that list.
  3. Have I seen them at least twice? I want to know that they have tweeted at least a couple of times on a given subject. Even so, this is not foolproof. I saw somebody tweet twice about bin packing. They were actually tweeting that they have been packing for their holidays!

If I can find somebody who fulfills the three criteria above, I send them a tweet. Hopefully they will be interested enough to follow me.

So, this is now live. I still need to put in a few things. For example, I need to keep the database tidy, so I need to have some way of deleting records, but that can wait for a while.

It’ll be interesting to see if this actually just annoys people, or does increase my number of followers. I’ll let you know.

 

Twitter: Registrations are slow!

Following my last post (or take a look at all my Twtter posts here), my twitter system has been up and running for a few weeks now. Every day I switch domains and start tweeting about something else. The system seems to work well, although I have not had anybody register yet. However, I know that people are looking at the service as I can see the number of hits on my various web pages.

I suppose, people can just search twitter, so don’t need to register. But, the beauty of the system is that people don’t actually need to follow me on Twitter. They can register and then I will send them an email when I tweet about the subject they are interested in and, optionally send them an email of the actual tweets. Of course, I’d rather that they follow me but, they can if they wish, not even go near twitter, but just register and then wait for the emails or just do a search on twitter at the end of the day to see what I have had to say.

Having said that, when I do tweet about a certain subject, that is all I do. Unless you see the change domain tweet (which happens at 6am UK time), you are unlikely to be aware of the service. So, I have a few plans to try and raise the profile of the service.

  • This is obviously one way; blog about it.
  • I have been investigating how to collect potential twitter users and then tweet them directly about the service. In fact, the twitter search API provides this type of functionality and I’ll be looking at this soon.
  • I think I might do a few more general tweets throughout the day, just linking to the service, rather than having all the tweets simply about the domain.
  • I might also try and interact a little more with the publishers. INFORMS were good enough to retweet quite a few tweets, but I could do with more of that from the other publishers. It’ll be interesting anyway, just to see if that has any effect.
  • And, of course, I have to make the tweets as interesting as possible. That is difficult as what is interesting to one person might not be interesting to another. I think the best solution is to have really targeted domains so that it might only be of interest to a small number of people, but they will find most of the tweets useful.

Finally, in case you are interested, the domains I currently tweet about are Bin Packing, Gambling, Genetic Algorithms, Healthcare, Timetabling and Vehicle Routing. More details are available here and I am open to suggestions for other areas of interest.

Beta Test for Twitter Service

In my last two posts (see here and here – or here for all my Twitter blogs) I outlined a few ideas I had for improving the twitter feeds that I have had running for about a year. The original idea was just to tweet every so often (about ten times a day), with each tweet being a random scientific publication. This worked pretty well but I thought that it might be more interesting/useful if I tweeted on a particular subject over a 24 hour period. That way, followers would only need to look at my twitter feed when I was tweeting on a subject of interest to them. In fact, they would not even need to follow me, they could just subscribe to the service and when I was tweeting about a subject they wanted to follow, they can do this, simply by using the search options on Twitter.

I have spent a few hours getting an initial implementation.

As it stands the system:

  • Enables you to register;
  • You can choose which subjects you are interested in;
  • When I start tweeting about those subjects I will send you an email and/or a tweet (you choose). That is, just as I start a 24 hour tweeting session on your subject of interest, I let you know;
  • I did implement functionality where I would email/tweet you whenever I did an individual tweet on a subject that interested you, but I disabled in, on the basis that I might be tweeting too often;
  • Although it is all in the background, there has been a lot done to ensure that you confirm who you are when you register, else somebody could register for you. What happens now is that I send you an email and you have to click on a link before your registration is confirmed.

I am now welcoming people to register, in order to try and the system. I’ll class it as beta (just in case it all goes wrong!) but I’ll try to ensure that I don’t ask everybody to register again.

Of course, if you do use the system, I’d welcome any comments you have.

Like all these things, it is never finished and I have some things that I need/want to do.

  • The most important is to get the database populated with better quality entries. The database is not too bad at the moment, but I have started the (long, very long) process of collecting more scientific publications, which are even more relevant than the ones I have there at the moment;
  • I would like to interact with the various publishers more than I do at the moment (which is not at all, except for the occasional retweet from INFORMS (thank you));
  • As well as tweeting on scientific publications, it would also be useful to have more generic tweets on a particular subject. But this is quite difficult to do; or at least collect good quality tweets;
  • Some of the subjects do need a little refining. For example, sport is a tough one as it covers so many different areas (e.g. different sports as well as injuries, predictions, gambling etc.). Surveys is also a tough one as it is not immediately obvious what a ‘survey’ is. Anyhow, I am sure that the subjects will evolve over time and become more coherent;
  • One of the problems I recognise is that if you are interested in one topic out of seven (and more in the future) you won’t see many tweets as it takes while for each subject to come round. Therefore, in the future, there might be a case for refining the system so that we have a specific twitter account for a given topic.

In implementing this system, I have faced several challenges (such as how to register people, how to maintain a database, how to automate tweets). I will discuss some of these issues in later posts, as I am sure that other people face similar challenges and, perhaps (quite likely) I have not done things in the best way possible so I might be able to learn how to do something better.

 

Improved Automatic Tweeting

In my last post I set out a few ideas about some of the improvements (you can see my complete series of Twitter posts here) I was thinking about making to improve my twitter feed. In essence I was toying with the idea of tweeting on certain subjects for a 24 hour period.

Over the past few days I have been working on a basic implementation for the ideas I mentioned.

The system, which is now live, chooses randomly from a set of a domains and then tweets about that domain for the next 24 hours (I may actually change this so that it tweets for a varying (random) amount of time for each domain). I did face a number of issues when I was implementing the system.

By far the biggest was populating the database of tweets. It is not exactly complicated but very time consuming. I’ll outline the challenges that I faced in a future blog. Indeed, I am still facing problems as I write this and this is an area that I need to revisit as the quality of the tweets are really down to the quality of the database. My current thinking is to start the database again, from scratch, but at the moment it is serving its purpose.

Another problem I faced was that part way through implementing this new system, my old system stopped tweeting. I am not sure if this was something that I changed (I don’t think so) or whether it was a change made by twitter. It actually took me a long time to resolve this and I eventually had to bite the bullet and access the twitter system via the OAuth mechanism, which is the recommended way. So I am now doing it the way they want me to, rather than the legacy method that I was using. So, if nothing else, I have made a change that was probably long overdue and that, hopefully, will stand the test of time.

The new system is far from complete. I still want to add the functionality where people can register for domains of interest to them so that I can inform them when I am about to tweet about a subject in which they are particularly interested. I also want to add other types of tweet (not just journal articles) that I hope will inform people (and publishers) about what I am doing; but more of that later.

But, the basic system is up and running and I have started to write some of the supporting web pages. If you are interested, take a look at http://www.graham-Kendall.com/twitter, although these are also earmarked for considerable updating.

I’ll let this new system bed in for a while and then do some more development. But, I have to say, that I am pleased with the system so far. I think that it is a lot more useful than just randomly tweeting scientific papers. At least now, if you are interested in a certain topic you can just keep an eye out over a 24 hour period, or just do a search (each tweet is tagged with #orj and #XXX, where XXX is a three character code representing the domain).

Improving/targeting my Twitter Feed

Some of you might subscribe to my twitter feed you’ll see that I tweet journal articles on a regular basis. In fact, at a recent conference, I was taken to task (in a nice way) about how could tweet so often and, more to the point, how I could have tweeted whilst giving a presentation!

Actually, this is an automated procedure (punctuated with personal tweets on a regular basis). The automated tweets are proving to be popular, at least from the Re-Tweets and the favourite’d tweets.

However, I am aware that not all my tweets are of interest to everybody, all the time. That is, some people might be interested in the Vehicle Routing Problem but don’t really care about Personnel Rostering. There is not a lot I can do about that (if you follow me, you follow me – there is nothing selective about it). I suppose I could set up an individual account for each domain (Vehicle Routing, Travelling Salesman, Personnel Rostering etc.) but that seems a little excessive; not least of all from a maintenance point of view on my part!

Thinking about this, I came up with the idea that I could tweet about one particular topic in a 24 hour period and then change topic for the next 24 hours; and so on. In this way, you would only need to pay particular attention to my twitter feed when you knew I was tweeting about something that you were interested in. The idea is still very much in my head at the moment but I have some ideas about how I would implement this (with a sortof of draft almost done – but much work to do), which I’ll share with you in due course.

A further stage would be to allow people to register for certain domains and, when that domain is selected to be tweeted for the next 24 hours, then I could @message and/or email the person in order to tell them to pay extra attention to my twitter feed for the next 24 hours. But this would test my PHP/SQL skills beyond what they are at the moment, but that is not necessarily a bad thing.

As I say, mostly still in the idea stage at the moment (with some test functionality under development), but definitely something I want to take forward.

More soon, but you might also be interested in my series of Twitter posts, available here.