A new beast. Rare genetic variants probably account for a significant fraction of the genetic liability to many common and rare disorders. Rare variants occupy the liability space between monogenic variants and common genetic variants. Their existence has often been postulated, and genetic investigations looking at copy number variants have elucidated some examples of rare variants. These rare variants appear to carry particular properties that are quite unexpected including the way that these variants run in families. Now, in a recent paper in the European Journal of Human Genetics, we have developed a model of the way rare variants behave in families. And there is a lot of misbehaving.
The odds ratio. Modern genetic studies use a case control design, i.e. the frequency of genetic variants is compared in patients with disease and controls. A variant is associated with disease when it is found to be significantly more frequent in cases compared to controls. One measure for this association is the odds ratio. Under most situations applicable to epilepsy genetics, the odds ratio corresponds to the risk ratio. For example, a risk ratio of 5 means that an individual with this variant is five times more likely to develop the disease than an individual without this variant. So far, so good.
The land of rare variants. Odds ratios and risk ratios do not really align with some of our concepts for epilepsy genes. Traditionally, we have traced dominant epilepsy genes in large families. Some of the probands with microdeletions at 15q13.3, 15q11.2 and 16p13.11 were probands from such families, which gave us the opportunity to follow these variants in the respective families. At least for the 15q13.3 microdeletion, you would expect a clear segregation with the epilepsy phenotype, i.e. phenotypes and microdeletions run together. However, the picture seen in families with these variants is perplexing. For all three microdeletions, there is a colorful pattern that runs against our intuitive assumptions: there are unaffected carriers, patients with epilepsy not carrying the familial variant and de novo events in braches of families that look otherwise nicely monogenic. In brief, if you see the pedigrees, you think that you are looking at random noise. What has happened? You have just crossed the boundary to the land of rare genetic variants.
Genetic relevance. It is easy to dismiss rare genetic variants as risk factors that simply do not segregate as opposed to the traditional monogenic variants that we know. However, this notion does not take into account that these variants are genetic risk factors and that some association with disease is likely to be seen in families. Therefore, we set out to develop a measure to describe the behavior of these variants in families. One component of this description is the concept of penetrance, i.e. the probability of having a phenotype if an individual carries the variant. For example, the penetrance of the IGE microdeletion triad 15q11.2, 16p13.11 and 15p13.3 is between 5% and 20%. The other component describing the trajectory of these variants in families needed to be developed: the probability that an affected family member carries the familial variant. In analogy to the concept of penetrance, I thought about calling this parameter “relevance”, but we eventually abandoned this idea. For the sake of simplification, however, I will use this phrase in this post. To summarize: Penetrance describes the probability of having disease if an individual has the genetic variant. Relevance describes the probability of having the variant if an individual is affected. For monogenic disorders, the issue of genetic relevance usually doesn’t come up. All affected family members are found to have the variant. We reserve the term phenocopies for the rare affected family member who does not carry the familial mutation. This might apply, for example, for a patient with epilepsy due to traumatic brain injury in a large GEFS+ family. In the land of rare variants, however, phenocopies are the rule, not the exception and we thought that we could provide a framework to describe this.
Mating types. Ruth Ottman and myself had pondered the issue of segegration of rare variants for almost two years before Sue Hodge came up with the suggestion to look at mating types using a two-locus model. In brief, we suggest that the phenotype in families with rare microdeletions is fully captured by the interaction of two genetic variants. These variants, G and H, are the rare variant and “everything else”. We are fully aware that lumping all the remaining risk into a single factor may not accurately describe the disease, but we needed a first approach to tackle this issue. Mating types are the combinations of all alleles for G and H in a family. We then modeled the distribution of all alleles of G and H in a family with two siblings. These possible combinations and their frequencies were then fed through a “penetrance filter”, i.e. an additional matrix that allowed us to determine how likely each combination of alleles was to result in disease. The penetrance of the rare variant was known and the penetrance for the other variant and the combinations of all alleles could be derived from the other parameters. Therefore, we ended up with a constellation that allowed us to estimate the probability that an affected sibling of a variant-carrying proband also carries the variant. And this is a measure of genetic relevance.
Segregation 2.0. Once we found a way to describe the probability that the affected sibling of a variant-carrying proband carries the familial variant, we could plot this measure against the odds ratio. One important ingredient to this calculation is lambda, the sib recurrence risk. In brief, if a disease has a high sib recurrence risk, the disease is likely to have strong genetic contribution and the relative contribution of a rare variant is actually reduced. “Everything else” is stronger in these situations and the sibling of a proband is more likely to have the disease due to some other reasons other than the familial variant. Therefore, we have stratified the calculation by lambda. First, let’s look at the infamous 15q13.3 variant. With an odds ratio estimate of 68, this variant will occur in >90% of affected siblings (Figure). When critically reviewing the existing pedigree data, we find that this estimate is compatible with the current pedigrees once de novo variants are removed. For rare variants with a smaller odds ratio, the situation gets interesting. For example, variants with an odds ratio of 2 are only found in ~65% of affected siblings. Taken into account that 50% transmission is expected by chance, this is only a tiny increase of 15%, i.e. up to 1/3 of all sibs will have the disease, but will not carry the variant. This is what we see in the pedigrees of patients with 15q11.2 microdeletions. There is also a strong effect of lamda, i.e. in a more genetic disease, the variants loses some of its “relevance”.
Hold your colour. This track by Pendulum, a Perth-based Drum ‘n Base combo, is one of my top energizers when I almost fall asleep in front of the computer. And putting this paper together and getting it published took quite some energy. This paper originated as a simple sketch on a napkin at the pool bar during the European Congress of Epileptology in Rhodes in 2010 and has kept us busy ever since. I remember several Skype teleconferences with my baby daughter snuggled into the baby carrier (she wouldn’t sleep otherwise back then), editors who invite you to submit and then rejected our paper, as well as countless hours trying to understand complicated Bayesian formulae in Excel tables. Also, there was a bad period in the winter of 2010 when we had to cancel several subsequent teleconferences due to technical problems with Skype. I would like to thank Ruth and Sue for their patience and enthusiasm during this project. It was great fun and I think we need more projects like this.
The Epilepsy Genetic Epidemiology Programme. There won’t be an easy answer to the genetics of epilepsies, even in multiplex families. Even though massive parallel sequencing technologies carry the great potential to identify monogenic variants, recent studies in autism, the paradigmatic neurodevelopmental disorder where research is always one step ahead, suggests that causative variants can be identified in as little as 10% of patients. The rest is complex genetics, and there is no reason to believe that the genetic architecture of the epilepsies should be different. Claims that the “bulk of causative genetic variants” can be identified through genome sequencing almost appear delusional in 2012. Multiplex families, however, offer an interesting possibility to generate models for the interaction of rare genetic variants in families. These models can then be compared to the existing genetic data in these families. I believe that this is a promising lead for future research. Anybody interested?