Do you still draw your pedigrees by hand? Or generate them using some website, take a screenshot of it (with Photoshop) and paste it into a Powerpoint file that you convert to PDF, send it by mail to a colleague, who then tries to extract the information into a text file representing the pedigree structure in a computer readable format? Continue reading
Program completed. On Sunday, we finished our EuroEPINOMICS next generation sequencing (NGS) bioinformatics meeting. After working through the command line, running scripts, and staring at black screens with white cursors, we completed our four day course by looking at the more user friendly, web-based tools that the NGS world has to offer, including Galaxy, Varbank, and Ingenuity. I think it was the general consensus among the participants that this was the bioinformatics meeting that we needed in order to understand the data that we generate and deal with on day-to-day basis. These were my favorite sound bites of our meeting. Continue reading
Lessons. Today was the first day of our bioinformatics workshop in Leuven, Belgium. We started out with some basic command line programming and eventually moved on to working with R Studio. What is this all about? It’s about getting some basic understanding of what your computer does and how your computer handles files. It’s about good data and bad data and losing the fear of the command line. We collected responses from the participants today about today’s take home messages. Continue reading
Join the genome hacking league. We are preparing a EuroEPINOMICS bioinformatics workshop in Leuven and I really, really encourage you to join us, as there are handful of place left. This will be the workshop that I always wanted to attend, but never got a chance to take part in. And yes, there is a final exam, but there is a chance that you might pass it. If you’re worried, skip ahead two paragraphs.
Sequence databases are not the only repositories that see exponential growth. The internet helps companies to collect information in unprecedented orders of magnitude, which has spurned the development of new software solutions. “Big data” is the term that stuck with it and blew life into the data analysis. Widespread coverage ensued, including a series of blog posts published by the New York Times. Data produced by sequencing is big: Current hard drives are too slow for raw data acquisition in modern sequencers and we have to ship the discs because we lack the bandwidth to transmit the data via the internet. But we process them only once and in a couple of years from now they can be reproduced with ease.
Large-scale data collection is once again hailed as the next big thing and spiced with calls for a revolution in science. In 2008, Wired even announced the end of theory. Experimental scientists make good use of hypotheses and targeted experiments under the scientific method the last time I checked though. A TEDMED12 presentation by Atul Butte, bioinformatician at Stanford is symptomatic in it’s revolutionary language and caused concern with Florian Markowetz, bioinformatician at the Cancer Center in Cambdridge, UK (and a Facebook friend of mine). Florian complains and explains that the quantitative changes in the data does not lead to a new quality of science and calls for better theories and model development. He’s right, although the issue of data acquisition and source material had deserved more attention (what can you expect from a mathematician).
We don’t know what to expect from e.g. exome sequencing for a particular disease and the only way to find out is to do the experiment, look at the data, come up with guestimates and confirm your finding in the next round. Current data gathering and analysis projects in the life sciences won’t be classified as big data by the next sweep of scientists anyway. They are mere community technology exploration projects using ad hoc solutions.