Exomes on Twitter. Two different trains of thoughts eventually prompted me to write this post. First, a report of a father identifying the mutation responsible for his son’s disease pretty much dominated the exome-related twittersphere. In Hunting down my son’s killer, Matt Might describes his family’s journey that finally led to the identification of the gene coding for N-Glycanase 1 as the cause of his son’s disease, West Syndrome with associated features such as liver problems. The exome sequencing that finally led to the discovery was part of a larger program on identifying the genetic basis of unknown, putatively genetic disorders reported in a paper by Anna Need and colleagues, which is available through open access. This paper is an interesting proof-of-principle study that exome sequencing is ready for prime time. Need and colleagues suggest exome sequencing can find causal mutations in up to 50% of patients. By the way, a gene also that turned up again was SCN2A in a patient with severe intellectual disability, developmental delay, infantile spasms, hypotonia and minor dysmorphisms. This represents a novel SCN2A-related phenotype, expanding the spectrum to severe epileptic encephalopathies.
The exome consult. My second experience last week was my first “exome consult”. A colleague asked me to look at a gene list of a patient to see whether any of the genes identified (there were 300+ genes) might be related to the patient’s epilepsy phenotype. Since I wasn’t sure how to best handle this, I tried to run an automated PubMed search for combination of 20 search terms with a small R script I wrote. Nothing really convincing came up except the realisation that this will be an issue that we will be increasingly faced in the future: working our way through exome dataset after the first “flush” of data analysis did not reveal convincing results. Two terms that came to my mind were bioinformatic literacy as something that we need to improve and Program or be Programmed, a book by Douglas Rushkoff on the “Ten commands of the Digital Age”. In his book, he basically points out that in the future, understanding rather than simply using IT will be crucial.
The cost of interpretation is rising. The Genome Center in Nijmegen suggests on their homepage that by the year 2020, whole-genome sequencing will be a standard tool in medical research. What this webpage does not say is that by 2020, 95% of the effort will not go into the technical aspects of data generation, but into data interpretation. For biotechnology, interpretation will be the largest marketing sector.
By 2020, probably more than 10 million genomes will have been sequenced. Data interpretation rather than data generation will represent the most pressing issue.
So, what about epilepsy? “50% of cases to be identified” sounds good for any grant proposal that I would write, but this might be a clear overestimate. Need and colleagues used a highly selected patient population and even in the variants they identified, causality is sometimes difficult to assess. We are maybe much further away from clinical exome sequencing in the epilepsies than we would like to admit. The only reference point we have for seizure disorders to date is large datasets for patients with autism and intellectual disability. While some genes with overlapping phenotypes can be identified, we would virtually be drowning in exome data without being capable of making sense of this.
10,000 exomes now. I would like to predict that after having identified some low-hanging fruits with monogenic disorders, 10,000 or more “epilepsy exomes” would have to be collected before making significant progress. It is, therefore, crucial not to be tempted by wishful thinking that particular epilepsy subtypes necessarily have to be monogenic, as in the case of epileptic encephalopathies or other severe epilepsies. Much of the genetic architecture of the epilepsies might be more complex than anticipated, requiring larger cohorts and unanticipated perseverance.