First of its kind. In 2010, a virtually unknown gene became the first epilepsy gene to be discovered through massive parallel sequencing techniques. This gene, TBC1D24, was found in two recessive families with different types of epilepsy. Afterwards, it became silent around this gene with no further findings. Now, a recent paper reports on a third family with a mutation in this gene with a complex phenotype of epileptic encephalopathy and movement disorders. As the mutation is located in an alternative exon of this gene, this raises important issues on how we identify and interpret mutations.
TBC1D24. Little is known about the TBC1 domain family, member 24 protein other than the fact that it interacts with ARF6, a small guanine nucleotide-binding protein that is involved in vesicle recycling and axon guidance. Therefore, a role for TBC1D24 can be deduced from this. In 2010, the TBC1D24 gene was found in an Italian family with familial myoclonic infantile epilepsy (FIME) and in an Arab family with focal epilepsy and intellectual disability. Both phenotypes are different, possibly due to the fact that the Italian family was compound heterozygous, whereas the Arab family was homozygous for a loss-of-function mutation. Either way, the full phenotypic spectrum is unknown. No further report was published on TBC1D24 in the meanwhile and only four citations on this gene are found in PubMed. Now, Guven and Tolun find TBC1D24 as a candidate gene in a recessive Turkish family with five affected family members.
Alternative exons. Many genes in the human genome undergo alternative splicing, which results in various protein isoforms. If a potentially causative mutation is found in one specific isoform, it is difficult to interpret this finding if a biological model system is lacking. Guven and Tolun sequenced a newly described alternative exon of TBC1D24 after exome sequencing and candidate sequencing failed to identify a causative mutation in a recessive family linking to chromosome 16p. This alternative exon, exon 3, was found to be expressed in all major isoforms expressed in brain, but is lacking from the TBC1D24 isoform that is expressed in liver and muscle.
Annotations, references, databases. The authors’ ordeal to find a mutation missed by exome sequencing and candidate gene sequencing reminded me of one of our EuroEPINOMICS trios that I presented at the ECE2012 in London. We had identified a compound heterozygous mutation in CNTNAP2. Recessive mutations are known to cause severe epilepsy and one allele was considered “damaging” by the prediction programs. The other allele, however, was found only in an alternative exon predicted by Ensemble, but not by USCS or Refgene. In the absence of further functional studies, a finding like this is difficult, if not impossible to interpret, especially for a gene like CNTNAP2, which represents the “largest gene of the human genome” (CNTNAP2 is the largest gene by chromosomal region covered, the largest gene in the human genome by transcript size is Titin). Therefore, expect that the discussion on a possible role of CNTNAP2 variants is not over. This gene is large enough to confuse researchers over and over again, especially as heterozygous mutations and deletions are known risk factors for neuropsychiatric disorders. A thorough evaluation of the expressed isoform of a gene will undoubtedly help, which can be challenging depending on the complexity of the transcriptional landscape of this gene. While Guven and Tolun were not able to assess gene expression in their patients directly, they looked at available tissues from various brain regions. The isoform affected by their mutation is in fact part of the most prominent isoforms of TBC1D24.
A systematic analysis of compound heterozygous mutations. …is something that we would like to see, but that is out of reach for many different reasons including the annotation issues mentioned above. While many papers casually refer to the lack of compound heterozygous mutations in their sample, they often fail to mention that a statement like this is subjective and highly dependent on the filter and annotation settings used. In contrast to de novo mutations and recessive homozygous mutations, compound heterozygous calling is much more vulnerable to changes in filter setting and the calling algorithm used. Therefore, while I am deeply impressed by publications that manage to find causative compound heterozygous mutations, a systematic analysis of the burden of these mutations will be difficult, if not impossible.
How does this affect the trio exome studies in EuroEPINOMICS? Exome data is alive. It needs to be reanalyzed from time to time by different people with different ideas and concepts. There is no single way of doing this, but rather many possibilities with different rationales. When we invited the EuroEPINOMICS partners at the end of 2012 to contribute to the analysis of exome data, an open, collaborative analysis was the idea that we had in mind. After a “first wash” of analysis, curation of exome data for easy future access and analysis might almost be as important as escalating the analysis by switching to genomes. In other words, imagine a pie chart of heritability explained by different analysis methods. Now envision that the segment for “secondary analysis” might be as large or even larger as the additional benefit of genome sequencing.