The omics flood. Large amounts of sequence data are produced every day and we can use the genetic information of several thousand individuals as controls of any present-day genetic study. However, much of research on “traditional” epilepsy genes had been performed prior to the genomic era and often only included limited control cohorts. This begs the question whether a closer look at the currently available data might provide additional information. Now, a recent paper in the Journal of Neurogenetics investigates the presence of reported mutations for epilepsy in large, available datasets. And the results are surprising.
EVS and 1000 Genomes. We currently have the possibility to query exome data of ~13,000 individuals through the 1000 Genomes Project and the Exome Variant Server (EVS). The EVS includes samples sequenced from studies of heart, lung, and blood disorders, indivuals are usually not screened for epilepsy or other neurological disorders. However, it can be assumed that individuals with severe epileptic encephalopathies would usually not be included in these studies. Cherepanova and colleagues now queried both data sources for variants previously reported to be associated with epilepsy and compiled a map of genetic variation in known epilepsy-associated genes.
Candidates. Cherepanova and colleagues included a list of 19 genes and 280 reported variants in their study. The excluded indels and splice site mutations and queried EVS and 1000 Genomes for the remainder of 208 variants. Of those, 7 variants were reported in the control datasets including variants in SCN1A, SCN1B and EFHC1. In addition, they found a plethora of variants in this dataset predicted to be pathogenic through computational methods. There are two findings of this study, which are particularly perplexing, (1) the existence of a pathogenic SCN1A de novo mutation in controls and (2) the amount of variation found in EFHC1.
SCN1A R1596C. This variant has previously been reported as a de novo mutation in a patient with SIMFE (Severe Infantile Multifocal Epilepsy). The de novo status of this variant and the fact that this variant affected a highly conserved region led to the conclusion that this variant was pathogenic. Now the same variant has been found in a single European American individual in the Exome Variant Server. There are several possible explanations for this observation. First, the data in the exome variant server might be unreliable and falsely positive, i.e. a mutation was called even though it is not there. While this argument was valid for high throughput sequence data in the past, exome data and calling algorithms have improved considerably. Therefore, a purely technical artifact is possible, but not very likely. Secondly, an individual with severe epilepsy was included, either on purpose or by accident. Due to the fact that the data is anonymous, this is impossible to trace back. Third, the mutation is not pathogenic and only represents a low-frequency mutation hotspot. The original variant reported by Harkin and collaborators was de novo, eliminating the possibility that we are dealing with a low frequency population variant. The segregation of the EVS variant cannot be traced back. In addition, the reported variant was found in an atypical phenotype (SIMFE) rather than Dravet Syndrome. Whereas an SCN1A mutation can be found in >80% of patients with Dravet Syndrome and also in mutation-negative patients through novel technologies, the probability of a different cause of the disease might be higher in atypical phenotypes.
EFHC1 – the chameleon. The EFHC1 gene is the big loser of the study by Cherepanova and colleagues. EFHC1 was initially reported in families with Juvenile Myoclonic Epilepsy and subsequently reported in different epilepsies including recessive mutations in epileptic encephalopathies. Three reported EFHC1 mutations were found in EVS at very low frequency. However, these variants were found in 40-100 individuals, making a purely technical artifact unlikely. These three variants (F229L, P77T, R221H) were reported in affected and unaffected individuals in the initial study and might represent susceptibility variants. In addition, the EFHC1 gene is the single gene with multiple truncation mutations identified in EVS, which were found in four individuals in total. In summary, some reported EFHC1 epilepsy-associated variants were found to be low frequency population variants and the spectrum of variants in the population also includes truncation mutations. These findings make the interpretation of identified EFHC1 mutations challenging and the difficulties in understanding the biology of this gene adds to this.
Conclusions. There are two main conclusions from the study by Cherepanova and colleagues. First, in known epilepsy genes such as SCN1A, some of the reported variants may be revisited. Assessing the de novo status of a causative variant is virtually mandatory. SCN1A represents the epilepsy gene with the highest number of reported mutations. Therefore, it will naturally be the first gene where the additional complexity added by the available omics data will manifest. The situation is even more complex in the p.Arg1912X variant that is found in 1000 Genomes and was previously found and published in four patients with Dravet Syndrome. Of these four mutations, a de novo status could be found twice and paternal inheritance in one patient, while segregation was unknown in the fourth patient. Accordingly, while this variant may have variable penetrance, it might also represent a low-frequency mutation hotspot. The second main conclusion relates to the spectrum of variants in epilepsy genes that are more uncertain. EFHC1 is a prime example of a gene in which the distinction of pathogenic from normal variation is difficult and a meaningful biological assay to assess epilepsy-related functional defects is not available. Accordingly, novel concepts are necessary to tell pathogenic variants from genomic noise in these genes.