In a previous blog I mentioned two examination timetabling datasets. The Carter dataset and the ITC (International Timetabling Competition) dataset. I thought it would be worthwhile describing these datasets for anybody that is interested, as they are the datasets that everybody in the scientific community uses in order to test any proposed algorithms.
In this blog, I’ll discuss the Carter dataset and save the discussion of the ITC datasets for a later blog.
The Carter dataset were introduced in 1996 by Carter, Laporte and Lee in a paper published in the Journal of the Operational Research Society.
- M.W. Carter, G. Laporte, S. Lee, (1996) Examination timetabling: Algorithmic strategies and applications, Journal of the Operational Research Society 47(3), 373 – 383 (doi:10.1057/jors.1996.37)
In this paper they made the observation that “One of the major drawbacks of most articles in the timetabling literature is that testing is limited to randomly generated problems and perhaps to one practical example.“
They go on to say “In this study, in addition to running randomly generated problems, we carried out a comparison of thirteen unconstrained real-life examples.“
The thirteen problems were taken from Canadian institutions (10), the London School of Economics (1), King Fahd University of Petroleum and Minerals (1) and Purdue University (1).
One of the main characteristics of these problems is that the objective we are trying to minimise is quite simple. We are trying to ensure that the spread of examinations for a given student is as wide as possible. That is, if a student is sitting one exam followed immediately by another, the penalty is greater than if the student has an exam, a rest period, and then another examination. If the student has two rest periods, this is even better, and so on, until there are five rest periods. At this point anything greater than or equal to five rest periods incurs no penalty.
The formulation for the Carter dataset does not capture some aspects such as:
- The room capacities for the examination rooms (which is why is it considered an uncapacitated problem).
- The fact that two consecutive examinations, which are on different days, is better than two consecutive examinations on the same day. Both of these scenarios would give the same penalty cost using the usual objective function used in the Carter dataset, even though the student would have the evening (indeed, all night) to revise, as opposed to no time at all if the examinations were truly consecutive. Indeed, each instance in the dataset just has a number of timeslots. There is concept of different days.
However, capturing all the complexities inherent in the real world was not the point of the 1996 paper.
The point was, for the first time, Carter et al. provided the scientific community with a standard set of problems and these thirteen instances have been the bedrock of examination timetabling research since that time.
If you decide to use this dataset you should be aware that there was some confusion about five of the instances as there were two different versions available. The recent paper by Qu et al. provides the definitive source for the variants.
- R. Qu, E. K. Burke, B. McCollum, L. T. G. Merlot and S. Y. Lee (2009) A survey of search methodologies and automated system development for examination timetabling, 12(1):55-89 (doi:10.1007/s10951-008-0077-5)