GK Logo 003 350 x 100

In previous posts (see here) I have been talking about a system that I have been working on that enables the publications area of my web site to be driven by a bibtex file. It all seems to work pretty well and maintaining my web site is now a lot easier than it used to be. Another advantage (and one not to be sniffed at) is the fact that my web site has a uniform presentation. I also hope that once I have the various functions that I plan to implement I will be able to do things like present my papers sorted by journal name (so that you can see how many papers I have published in a given journal) and also publish a list of my co-authors. All of this, and more, should be quite easy once the main foundations are in place.

As I progress with this project, and I try to extend the system, there are various challenges that I keep coming up against. One of these is parsing author names.

The system I am developing is based on the one developed by Andreas Classen. It actually does a fantastic job of parsing author names. You simply pass a string of authors from the bibtex file to a function called formatAuthors, and you are returned a string that presents the authors in a standard way. Indeed, this is the method I use on my current web site (see here, but bear in mind the method may have changed by the time you read this).

I have recently been trying to write my own parser, for a research idea that I have. It is not easy! Just to give you a few examples of the issues that have to be addressed:

None of these problems are insurmountable. Indeed, the bibtex style files do a great job of handling all types of names. The challenge for me is to try and develop a parser that is able to deal with anything I throw at it, and which will deal with thousands of names, rather than the ones I can easily check from the bibtex file that comprises just my own publications.

Anything that I do, will be based on the formatAuthors function from Andreas but I think it needs a little tweaking just to try and deal with a few more cases.