My untested assumption. Recently, I have boasted quite a bit about the power of the trio design, i.e. the inclusion of patients and parents in the analysis of rare genetic variants. Rare variants, in contrast to monogenic variants that arise de novo, are usually transmitted from unaffected parents and are the big unknown of modern day genetic studies. Much of the missing heritability may be accounted for by rare variants, but identifying these variants from genomic noise is difficult. Power calculations for association studies usually suggest that thousands, if not tens of thousands, of patients are necessary to identify these variants with sufficient statistical certainty, a sample size that the field of epilepsy research may never arrive at. So what about switching to parent-offspring trios? Would this help us? Follow me on a brief statistical journey through the land of rare variants.
TDT. In the early days of association studies, many papers used the so-called Transmission Disequilibrium Test (TDT). This is a statistical test for genotypes parent-offspring trios, which assesses the “over-transmission” to affected offspring. Over-transmission basically means that the risk allele is more likely to be transmitted to the affected child compared to the non-risk counterpart. For example, if an A-allele in CACNA1H is the risk variant, it would be transmitted more than 50% of the time to affected children. For neutral alleles, in contrast, the average transmission rate would be 50%. Personally, I feel the concept of the TDT mind-blowing as I somehow have the surreal mental image in my mind that the genetic risk factors actively make the decision to over-segregate and then crowd in the affected offspring. To me, the concept of “over-transmission” has some eerie active connotation that I really struggle with. However, there is no active decision involved. If the allele is a risk allele, we expect a higher frequency in cases and a lower frequency in unaffected parents. Therefore, they need to over-segregate to make up for the difference in frequency. But how can we compare the TDT with association studies?
TDT vs. case control. For quite a while, I had difficulties understanding how to translate TDT into a measure we use in association studies. Association studies investigate a cohort of cases and compare allele frequency to a control cohort. Measures derived from association studies are usually the odds ratio and/or the risk ratio. I finally came across a paper in the International Journal of Epidemiology by Ahsan and collaborators that helped me fill in the gaps. In brief, the ratio between over-transmitted compared to under-transmitted alleles can be used as an approximation for the risk ratio. This means that for an allele that is transmitted to the affected offspring in 10/12 times, the risk ratio is 10/(12-10) = 5. And this risk ratio of 5 is the value I used for my small simulation experiment. This measure roughly correlates to the risk conferred by the 16p13.11 and 15q11.2 microdeletion and is probably at the lower end of risk estimates for rare variants that we will be able to pin down.
The comparison. I simulated case-controls studies and trio studies for different sample sizes using a risk ratio of 5 and a risk variant frequency of 1% in patients. For the case-control studies, I used a simple Fisher Test; for the trios I used a 1-sample proportion test. Then I plotted the sample size in comparison to the p-value. I took the liberty of using the negative logarithm of the p-value to get large numbers in parallel to a Manhattan plot: the higher the number, the more significant the p-value. What does this comparison look like?
More of the same. I was surprised to find that there is not really much of a difference. The statistical significance from 1000 trios is approximately the same as the p-value from 1000 cases compared to 1000 controls. For everybody who knows how time-consuming it is to recruit trios, this is a major disappointment. Control populations are readily available for most studies, therefore recruiting a complete trio is pretty much the same as recruiting a single patient. But is this really true?
Using available data. The growing body of data on trio exome sequencing in the epilepsy field will collectively hit the 1000 trio mark later this year or early next year. This data has been generated and is available in principle. In contrast to case-control studies, trio association studies are resistant to population stratification, which is still a major concern for rare variants. Population stratification is the difference in allele frequencies in different populations that are compared in association studies. For example, if you perform an association study comparing cases from Northern Europe to controls from Southern Europe, the association study will probably find significant differences in variant frequencies of the lactase gene. This has no association with the disease, but is simply a variant with a North-South gradient. For rare variants that only occur in a small fraction of a cohort, these issues might be even more important and impossible to disentangle. Trio studies might help in this respect. In an upcoming post, I will try to find a way to assess the statistical power for different variants to be identified.