Everybody wins. The scientific publication process is not ideal to find the best bioinformatics methodology for a given problem. Most predictions are not performed blind as our data sets are so small that separating them in to several disjoints sets for training and testing purposes is not possible or sensible. The structural biology community has started to tackle the problems by establishing a competition called Critical Assessment of protein Structure Prediction (CASP). For example, the solution of the 3D structure of a protein is announced but the data withheld for a couple of months to give computational groups time to submit a prediction which is then evaluated by an independent team. A concluding conferences crowns the best prediction groups. In recent years, systems biology and sequence interpretation produce sufficient data to make similar challenges possible.
The competitions might seen as a sport but may just as well be considered a better way of finding out what works and how far we are from achieving our lofty goals in the community. For some challenges, all groups fail to predict anything useful, or most groups get something right. Homology modelling is now one of problems considered to be solved for a wide range of sequences with reasonable sequence similarities over domains.
Autumn is the season for bioinformatics prediction challenges: DREAM – focused on systems biology, CASP – protein structure prediction – and now CAGI – genome interpretation are all currently open, only the CAPRI for interacting proteins has closed recently. Steven Brenner and John Moult, the key organizers of CAGI are rooted in structural biology and have been active in CASP for many years. Luckily, the interpretation challenge organizers focus on useful challenges and not simply on prizes like the “CAGI Molly” in 2010 shown below. People with a better sense of English than myself can also comment on the acronym.
The CAGI challenges were announced last weekend and include challenges relevant to our EuroEPINOMICS efforts, in particular identification of causal variants in case/control exome sequencing studies of Crohns’s disease and familial studies of lipid metabolism disorders and congenital glaucoma, the latter using whole genome sequencing.