# Initial allele frequencies

For polymorphic loci in trees, the distribution of the frequency of alleles over all loci that determining a trait is typically U-shaped. This means that alleles are either very common (allele frequency approaching unity) or very rare (allele frequency approaching zero), but rarely have a frequency in the population of around 0.5 (e.g. Hamrick, 2004; Chakraborty et al., 1980). In absence of observations on the frequency distribution of alleles of adaptive traits to initiate the genetic model, an equilibrium allele frequency distribution of neutral traits is used Crow and Kimura, 1970).

This equilibrium distribution of allelic frequencies (*x*) can expressed as (Nei, 1987, p. 367):

**Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("<p>Error fetching URL: Could not resolve host: mathoid.testme.wmflabs.org
</p>") from server "http://mathoid.testme.wmflabs.org":): \phi \left(x\right)=\frac{\Gamma \left(M+M'\right)}{\Gamma \left(M\right)\Gamma \left(M'\right)} \left((1-x)^{M-1} x^{M'-1} \right) \right **

where:

*M = 4Ne v*

*M' = M / (k -1)*
*Ne* is the effective population size, *v* is the mutation rate per locus and per generation, *k* is the number of alleles per locus, Γ() is the gamma function.

*M* can also be estimated from the average heterozygosity (*H*). If a large number of loci are examined, then: *M = H/(1-H)*

Figure 1 presents an example of the shape of this equation for different values of H and k.

*Figure 1. Example of the equilibrium distribution of allele frequencies in a population for neutral traits. *H* = 0.25 and *k* = 2 (typical values for isozyme data)*

Note that in Figure 1 the number of loci is indetermined as Eqn. 1 represents the distribution of allele frequencies over a very large number of loci. To arrive at initial allele frequencies for an actual number of loci (e.g. 5), most conveniently the cumulative distribution of &phi(x) is calculated and the allele frequencies for the actual number of loci are determined at the quantile values of the cumulative distribution. i.e. every 20% quantile in case of 5 loci.

To obtain the cumulative distribution of *φ(x)*, *φ(x)dx* is numerically integrated between 0 and 1 (extreme are excluded because *φ(x)*→∞ when x→0 or x→1 ):

then compute a cumulative distribution function, *p*, of *φ(x)x'* as:

To compute the allele frequency for a given number of loci, the inverse of the integral of φ(x) is required, where φ(x) is the distribution of allele frequencies of all loci in a population. This inverse can be obtained by linear interpolating after evaluating φ(x) over a large number of x-values.

As an example, to choose the 5 initial allelic frequencies (nLoci = 5; k = 2 and k=4) equally spaced points in the first half of distribution of cumulative *φ(x)* are selected, other half is determined by the other allele. Examples of Eqn. 2 for nLoci = 5 and k = 2 or k = 4 are presented in Figure 2.

*Figure 2. Cumulative distribution of *φ(x)* and relative allelic frequencies *(x)*.* Dots indicate interpolated values for a 5 locus genetic system (the allele frequencies > 0.5 are 1 - the allele frequencies < 0.5).

These 5 points are the 10th, 20th, 30th, 40th and 50th percentile of the cumulative distribution. In case H = 0.25 and k = 2, their relative frequencies are 0.006, 0.044, 0.141, 0.299, 0.499. In this way, 5 allelic frequencies are obtained that take into account the natural distribution of frequencies, with many of loci with low (or high) allele frequencies and few loci with allele frequencies around 0.5.