# Initialisation of the genetic system

## Contents

## Introduction

The genetic control of individual functional traits is modelled by additive-linear relationship between the allelic effect (i.e. "allele dose”) and the phenotypic value of the trait (Liu, 1998). In the simulation the genetic part of each trait is determined by a given initial number of loci. The contribution of a locus to the phenotype is proportionally to the effects of its alleles, which do not change during the course of the simulation, and the frequencies of the alleles, which do change during the course of the simulation.

The contribution of each locus on the phenotypic value of the individual is independent from other loci. I.e. there is no epistasis between loci. However, loci can be linked with a user-defined recombination fraction.

Each locus has initially 2 alleles, which is kept constant during the simulation. This means that there is neither mutation, nor immigration of new alleles for a particular loci. However, gene flow of known alleles per locus between populations can be simulated.

To obtain the actual phenotypic value of a traits, a random/environmental component is added to the value characterised by the genetic system. A user-defined initial heritability determines the additive genetic variance as fraction of the total phenotypic variance. During the course of the simulation genetic variation can be lost, resulting in a reduction of the heritability of the trait.

Thus, the following aspects of the genetic model need to be quantified for the initialisation of the ForGEM model:

a. the initial frequencies of the alleles for each locus contributing to the phenotypic values of the trait

b. the initial genetic and non-genetic variances

c. the allelic effects or 'allele dose' of each allele

If measured of otherwise observed values for these aspects are not available, they are determined by statistical methods during the initialisation of the model. These methods are described below. Observed values can always be used to overrule the statistically derived values.

Different evolutionary forces such as selection, random genetic drift, migration and mutation, act upon these frequencies and modify them through time. These genetic processes are all be modeled in the simulation in detail.

## initial allele frequencies

For polymorphic loci in trees, the distribution of the frequency of alleles over all loci that determining a trait is typically U-shaped. This means that alleles are either very common (allele frequency approaching unity) or very rare (allele frequency approaching zero), but rarely have a frequency in the population of around 0.5 (e.g. Hamrick, 2004; Chakraborty et al., 1980). In absence of observations on the frequency distribution of alleles of adaptive traits to initiate the genetic model, an equilibrium allele frequency distribution of neutral traits is used Crow and Kimura, 1970).

This equilibrium distribution of allelic frequencies (*x*) can expressed as (Nei, 1987, p. 367):

**Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("<p>There was a problem during the HTTP request: 502 Bad Gateway
</p>") from server "http://mathoid.testme.wmflabs.org":): \phi \left(x\right)=\frac{\Gamma \left(M+M'\right)}{\Gamma \left(M\right)\Gamma \left(M'\right)} \left((1-x)^{M-1} x^{M'-1} \right) \right **

where:

*M = 4Ne v*

*M' = M / (k -1)*
*Ne* is the effective population size, *v* is the mutation rate per locus and per generation, *k* is the number of alleles per locus, Γ() is the gamma function.

*M* can also be estimated from the average heterozygosity (*H*). If a large number of loci are examined, then: *M = H/(1-H)*

Figure 1 presents an example of the shape of this equation for different values of H and k.

*Figure 1. Example of the equilibrium distribution of allele frequencies in a population for neutral traits. *H* = 0.25 and *k* = 2 (typical values for isozyme data)*

Note that in Figure 1 the number of loci is indetermined as Eqn. 1 represents the distribution of allele frequencies over a very large number of loci. To arrive at initial allele frequencies for an actual number of loci (e.g. 5), most conveniently the cumulative distribution of &phi(x) is calculated and the allele frequencies for the actual number of loci are determined at the quantile values of the cumulative distribution. i.e. every 20% quantile in case of 5 loci.

To obtain the cumulative distribution of *φ(x)*, *φ(x)dx* is numerically integrated between 0 and 1 (extreme are excluded because *φ(x)*→∞ when x→0 or x→1 ):

then compute a cumulative distribution function, *p*, of *φ(x)x'* as:

To compute the allele frequency for a given number of loci, the inverse of the integral of φ(x) is required, where φ(x) is the distribution of allele frequencies of all loci in a population. This inverse can be obtained by linear interpolating after evaluating φ(x) over a large number of x-values.

As an example, to choose the 5 initial allelic frequencies (nLoci = 5; k = 2 and k=4) equally spaced points in the first half of distribution of cumulative *φ(x)* are selected, other half is determined by the other allele. Examples of Eqn. 2 for nLoci = 5 and k = 2 or k = 4 are presented in Figure 2.

*Figure 2. Cumulative distribution of *φ(x)* and relative allelic frequencies *(x)*.* Dots indicate interpolated values for a 5 locus genetic system (the allele frequencies > 0.5 are 1 - the allele frequencies < 0.5).

These 5 points are the 10th, 20th, 30th, 40th and 50th percentile of the cumulative distribution. In case H = 0.25 and k = 2, their relative frequencies are 0.006, 0.044, 0.141, 0.299, 0.499. In this way, 5 allelic frequencies are obtained that take into account the natural distribution of frequencies, with many of loci with low (or high) allele frequencies and few loci with allele frequencies around 0.5.

## initializing allelic effects

For the ForGEM model, it is necessary to assign allelic effects to each of the alleles that compose the genotype of the individual tree. Allelic effects are kept constant during the entire simulation. If information is lacking on the actual number of loci, the number of alleles and the allelic effects that determine quantitative phenotypic traits, a statistical approach is taken. This is done by designing for each trait a genotype distribution over the population such that the observed mean and variance of the phenotypic trait of the population are attained, under the constraint that the allele frequencies follow the U-shaped initial distribution. If information becomes available on the QTLs or candidate genes of the phenotypic traits considered, this statisticaly procedure can be replaced by actual data on the genetic make-up of these traits for a particular population.

The approach followed in ForGEM to obtain the observed mean phenotypic value is:

- assign initially arbitrary allelic effecs of
*i*= +1 and*j*= -1 to each of the alleles - calculate mean and variance under the constraint of the the U-shaped distribution of allele frequencies
- scale allelic effects such that the distribution of phenotypic values over over all possible genotypes is normalised (mean equals zero, variance equals unity)
- add the mean and multiply with the variance of the functional trait in question

The mean and variance of a genotype are:

This assignment of +1 and -1 values can be done for all alleles in a multi-locus 2 allele system.

The following steps are made to arrive at a mean of zero, and a variance of unity for the whole population.

First, make expectations zero by offset and sum of individual effect.

This leads to a large number of possible allelic values. Arbitrarily, the first combination of allelic effects that yield the lowest expectancy (m) is selelected. in the example above this is:

q p a b c d e A B C D E 0.006 0.044 0.141 0.299 0.499 0.994 0.956 0.859 0.701 0.501 m var c -1 -1 -1 -1 -1 1 1 1 1 1 3.0306 2.5001 -1.0028 |

:

a b c d e A B C D E 0.006 0.044 0.141 0.299 0.499 0.994 0.956 0.859 0.701 0.501 m var 0.002848 0.002848 0.002848 0.002848 0.002848 -0.002848 -0.002848 -0.002848 -0.002848 -0.002848 -0.008606 0.000041 |

a b c d e A B C D E 0.006 0.044 0.141 0.299 0.499 0.994 0.956 0.859 0.701 0.501 m var 0.447213595 0.447213595 0.447213595 0.447213595 0.447213595 -0.447213595 -0.447213595 -0.447213595 -0.447213595 -0.447213595 0 1 |

The population values are then be obtained by adding the observed mean and multiplying by the observed standard deviation.

The allelic effect thus depends on the number of loci.

## initializing phenotypic values

For each diploid individual and each locus two random drawings are done using the above probabilities. Each drawing results in a particular allele.