What is missing? The catchy term “missing heritability” refers to a long-standing issue in human genetics that is particularly relevant to common diseases that are thought to have complex genetic architecture. Even though we know several thousands of risk factors for common diseases, the sum of all these risk factors only explains a small proportion of the genetic risk for disease. Where is all the remaining genetic disease risk hidden? A recent publication in PLOS Genetics suggests that known association peaks in genome-wide association studies (GWAS) may harbor more than one risk variant, turning GWAS peaks into mountain ranges. Also, this publication provides an interesting state-of-the art review on the role of common and rare variants with respect to missing heritability. Let’s turn back the clock and start with the decade-old debate on common versus rare variant models of human disease.
Common versus rare. The field of genetic epidemiology is a hybrid of two areas that have traditionally used very different concepts. Human genetics usually deals with causative genetic variants that explain or almost explain a particular phenotype. Epidemiology, in contrast, often deals with risk factors. For example, smoking is a risk factor for various cancers. It is not a causal mechanism, but adds to the overall risk. The analysis of risk factors comes with its own terminology including odds ratios, relative risks, confidence intervals and, in some cases, quite a lot of statistics. These two different concepts – causative mechanisms versus statistical risk – collide when genetics and epidemiology are put together. And this conflict broke out when the discussion about the best way forward to analyze human disease arose a decade ago. In principle, for most common disorders, we have little idea what the genetic architecture might be like. One the one hand, most of the risk may come from common variants, Single Nucleotide Polymorphisms that are present in the general population, but are slightly more common in patients. Alternatively, a combination of rare variants might be very important. These discussions may almost become ideologically, as the decision for one or the other alternative is important for study design. For the analysis of rare variants, deep sequencing technologies are required. The analysis of common variants, in contrast, requires very large patient cohorts and only limited genotyping. With the advent of high throughput sequencing, rare variant studies were given preference after the first wave of genome-wide association studies only gave us limited information with respect to genetic architecture.
SNP-based heritability. Common diseases, in contrast to rare diseases such as epileptic encephalopathies, are unlikely to have a single common and strong genetic risk factor. Genetic risk factors for common diseases may either be rare and strong or common and weak. Either way, large sample sizes are needed to identify these risk factors. In epilepsy research Idiopathic (Genetic) Generalized Epilepsy (IGE/GGE) and Temporal Lobe Epilepsy (TLE) are usually assumed to have the genetic architecture of common diseases. Even though we currently live in the era of next-gen sequencing, these technologies have not had their major breakthrough yet in the analysis of common diseases. In contrast, genome-wide association studies (GWAS) have had an unlikely comeback. Currently, there is some evidence that the total sum of common genetic variants may explain a significant proportion of the genetic risk for common diseases and that this model might also be applicable to neurodevelopmental disorders. However, the statistical methods applied to produce these results are blind to the identity of the contributing variants: they simply state that there is a collective contribution of all common variants.
Mountain ranges. Gusov and collaborators asked the following question: based on the observations that (a) common variants in general contribute significantly to the overall heritability and (b) known association hits in specific genes only contribute very little, might it be possible that other variants in the proximity of association peaks also contribute to disease risk? Let’s take one step back to explain this a bit more thoroughly: If a common genetic variant X is a strong risk factor for a disease, neighboring genetic markers in the human genome also become risk variants, as these variants are likely to be inherited together. With increasing distance, recombination of the genome over evolutionary time is increasingly likely, and the statistical connection between the markers drops. Nevertheless, some association peaks in GWAS can have relatively broad shoulders. The relevance of the markers on the slopes of the association mountains is usually ignored and attributed to the risk of the main marker. However, given the missing heritability, could some of these markers contribute independently?
Local heritability. In parallel to studies using variance-component analyses to estimate SNP-based heritability, Gusov and collaborators estimated the local heritability at known GWAS loci for nine phenotypes analyzed in the Wellcome Trust Case Control Consortium (WTCCC). In various analyses, they find that the local heritability estimated through SNPs in the region exceeds the heritability of the main marker, suggesting multiple risk markers in these regions. The authors then continue to show that this risk is likely due to common and not rare variants. In summary, there is some evidence that some missing heritability may be explained by the risk of additional common variants in proximity to the main association peak.
Lessons for EuroEPINOMICS. Admittedly, we don’t perform genome-wide association studies in EuroEPINOMICS. Nonetheless, it is important to keep an eye on what is happening in the field of common variant research, as we might soon be asking similar questions. For example, when looking for the genetic cause of the 70% patients with epileptic encephalopathies without explanatory de novo mutations, related concepts might soon be discussed. This might be particularly true as the samples size of sequenced trios increases, which will allow us to perform this kind of statistical analysis in epileptic encephalopathies.