# The genetic architecture toolkit – modeling polygenic disease with rare variants

Architecture. Even though we often write about novel gene findings in the epilepsies, we assume that most epilepsies are complex genetic or polygenic. Polygenic inheritance suggests the genetic architecture is composed of multiple interacting genetic risk factors, each contributing a small proportion to the disease risk. However, when using the phrase genetic architecture, sometimes I am not quite sure what I actually mean by this. For example, how many genes are needed? This is why I wanted to build a model genetic architecture and explore what happens if we build a genetic disease solely from rare risk variants. Follow me to a brief back-of-the-envelope calculation of how this might work.

Rare variants. With the advent of exome sequencing we have discovered the plethora of rare genetic variants that are contained within our genome. It is assumed that these variants might be particularly relevant in complex genetic disorders, but pinpointing individual variants is difficult, as very large sample sizes would be needed. Our genetic model architecture is made up of such rare variants that are present in 1% of patients and confer a relative risk (RR) of 5, which is roughly equivalent to the odds ratio (OR). The first question is: How many of these variants does an individual needed to be affected? The straightforward answer is that we don’t know. However, we can set a probability threshold, assuming that an individual is affected once the variants add up to a risk of 95%. A histogram showing the distribution of 130 rare variants in a population of 10,000 individuals. The x-axis shows the number of variants per individual; the y-axis shows the number of people with a certain number of risk variants. The number of 130 variants was chosen as it leads to ~1% of the population having 5 or more variants. And this number of variants is needed to be affected with a probability of 95% or higher. In summary, 130 variants are needed for a 1% disease with 1% variants that have an odds ratio of 5.

The magic 2000. We would assume that the variants add up nicely and that two variants with a relative risk of 5 each would confer a 25-fold risk (5×5). The probability threshold is the same as the penetrance of the variant, which can be calculated from the increase in risk and the population frequency of the disease. Our model disease affects 1% of the population. In a nutshell, a RR or OR of 5 only has a very small penetrance of ~5%. For a penetrance of 95%, an individual needs 4-5 of these variants. Translated back, you would arrive at such a penetrance of 95% with a single variant if this variant had an OR of 2000. This is way beyond any rare risk factor identified so far and is in the range of what you would see in monogenic disorders. To summarize, each individual would need 5 or more variants to be affected.

The population. Given that these each of these variants only occurs in 1% of patients, how many of these variants are needed in the population to make 1% of all individuals affected? If we play around with binomial distributions that show the overall distribution of multiple variants with a probability of 1%, we find the following: we would need ~130 genes in our model population to have 1% with 5 or more variants. And this number of variants is needed for an individual to be affected. To conclude: in a simple model genetic architecture, we need 130 variants with a frequency of 1% each and an OR of 5 each to have 1% of the population affected. If our model genetic architecture consisted of variants with an OR of 2, we would need ~480 of these variants.

From model to reality. Even though this little model has many flaws, I believe it is interesting to look at genetic architecture from this perspective. Many of the risk variants that make up the polygenic architecture of the epilepsies might be rarer and weaker with respect to the odds ratio. However, these brief calculations might give you a benchmark on the number of variants that can be expected.