The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
Currently there are no user-friendly software that can handle mixed-ploidy datasets for haplo-diploid animals (e.g. ants, bees, wasps) aiming to compute basic population genetic statistics. Researchers working with such animals have either 1) sequenced only females or 2) transform the male genotypes from haploid to diploid. These solutions do not come without challenges, focusing only in females is easier when working with social organisms (e.g. honeybees, ants, social wasps) but more difficult when working with solitary ones (e.g. 70% of all bee species are solitary). Furthermore neglecting males can hinder the capacity to better understand some questions such as sex-biased dispersion. The second solution involves changing the genotype and allele frequencies, biasing any population genetic statistic.
We developed a R framework to compute some population genetic statistics of haploid-diploid data without any data transformation. Furthermore we relaxed Hardy-Weinber model in order to accommodate for the differences in ploidy in the population and the the particular reproductive cycle of haplo-diploids (diploids mate with haploids to produce other diploids but produce haploids through asexual reproduction).
This repository contains: 1. R scripts to estimate population genetic summary statistics from haplo-diploid systems (Script_folder) 2. Fake data (.vcf and .txt) to test scripts (Fake_data) 3. A theoretical explanation of the model (see below)
R/HaploDip_GenoFreq.R: Functions to calculate genotype
frequencies and FIS (not agnostic to ploidy)R/Fst_HapDip.R: Functions to calculate FST
from allele frequency data (agnostic to ploidy)R/Haplo-Dip_Sex_Ref_Allele.R: Functions to calculate
reference allele frequencies (0) in each sex across populations
(agnostic to ploidy)R/HaploDip_Neis_H.R: Functions to calculate Nei’s H, a
genetic diversity metric (agnostic to ploidy)unfinished/Watternson_Theta.R: script to calculate
Watterson’s Theta (agnostic to ploidy) unfinishedR/helper_functions.R: Helper function for loading
dataImport vcf data
vcf.data <- vcf2GT("path/to/input.vcf")
Create matrix with sites as rows and samples as columns
gt_matrix <- vcf.data$gt_matrix
contig/scafold/chromosome names
contigs <- vcf.data$contig_vector
site positions
pos <- vcf.data$positions
Import population file with two columns: 1st with sample names
(equal to gt_matrix columns) and 2nd with population names. Column names
must be ID and Pop
Pop.File <- read.csv("path/to/input.txt")
sanity check
colnames(gt_matrix) == Pop.File$ID
Calculate Nei’s H by population within an adjustable (e.g. 10k
bp) window size, within each contig
Neis_h <- compute_Hs_W(geno.data = gt_matrix, pop.file = Pop.File, contigs = contigs, positions = pos, window.size = 10000)
Summarise Nei’s H. For each population returns mean and sd,
weighted by the number of sites at each window
summary_H_pop <- summarize_NeisH(Neis_h)
In a haplo-diploid system where one sex is haploid and another diploid (e.g. hymentoptera insects) and assuming 1. equal sex ratio 2. random mating 3. infinite population size (no drift) 4. equal probability of reproduction (no selection), 5. alleles don`t change from individual to gene pool (no mutation) 6. no addition of gametes from other gene pools (no gene flow) 7. equal allele frequencies across sexes 8. no overlapping generations
Let p be the frequency of the allele “A” and q be the frequency of
the allele “a” of a bi-allelic marker so that p + q = 1, then:
f(AA) = p² x 1/2 = p²/2
f(aa) = q² x 1/2 = q²/2
f(Aa) = 2pq x 1/2 = pq
f(A) = p x 1/2 = p/2
f(a) = q x 1/2 = q/2
f(AA) + f(Aa) + f(aa) + f(A) + f(a) = p²/2 + pq + q²/2 + p/2 + q/2
=
= 1/2 x (p² + 2pq + q² + p + q) =
= 1/2 x (p² + 2pq + q²) + 1/2 x (p + q) =
= 1/2 x (p + q)² + 1/2 x (1) =
= 1/2 x (1)² + 1/2 =
= 1/2 + 1/2 =
= 1
Let N(X) be the number of individuals with genotype X, and N.dip and
N.haplo being the number of diploid and haploid individuals
respectively, then:
p = ( 2x N(AA) + N(Aa) + N(A) ) / (2x N.dip + N.haplo)
q = ( 2x N(aa) + N(Aa) + N(a) ) / (2x N.dip + N.haplo
In order to calculate genotype frequencies on next generation:
if offspring is diploid, then:
| Mating pair | Offspring genotype distribution | Mating pair frequency
| |————-|———————————|———————–| | AA x A | AA (100%) | 2x ( f(AA) x f(A)
) | | AA x a | Aa (100%) | 2x ( f(AA) x f(a) ) | | Aa x A | Aa (50%) +
AA (50%) | 2x ( f(Aa) x f(A) ) | | Aa x a | Aa (50%) + aa (50%) | 2x (
f(Aa) x f(a) ) | | aa x A | Aa (100%) | 2x ( f(aa) x f(A) ) | | aa x a |
aa (100%) | 2x ( f(aa) x f(a) ) |
if offspring is haploid, then:
| Parent genotype | Offspring genotype distribution | Parent frequency |
|—————–|———————————|——————| | AA | A (100%) | f(AA) | | Aa | A (50%) + a
(50%) | f(Aa) | | aa | a (100%) | f(aa) |
The frequency of each genotype of next generation are:
f(AA)t+1 = 2x (f(AA) x f(A)) + 2x (f(Aa) x f(A))/2 = 2x (p²/2 x p/2) +
(pq x p/2) = 2x p³/4 + p²q/2 = p³/2 + p²q/2 = p²/2 x (p + q) = p²/2 x 1
= p²/2
f(aa)t+1 = 2x (f(aa) x f(a)) + 2x (f(Aa) x f(a))/2 = 2x (q²/2 x q/2) + (pq x q/2) = 2x q³/4 + q²p/2 = q³/2 + q²p/2 = q²/2 x (q + p) = q²/2 x 1 = q²/2
f(Aa)t+1 = 2x (f(AA) x f(a)) + 2x (f(Aa) x f(a))/2 + 2x (f(Aa) x f(A))/2 + 2x(f(aa) x f(a)) = 2x(p²/2 x q/2) + (pq x q/2) + (pq x p/2) + 2x(q²/2 + p/2) = = p²q/2 + pq²/2 + p²q/2 + pq²/2 = 1/2 x (p²q + pq² + p²q + pq²) = 1/2 x (2p²q + 2pq²) = 2pq/2 x (p + q) = pq x (1) = pq
f(A)t+1 = f(AA) + f(Aa)/2 = p²/2 + pq/2 = p/2 x (p + q) = p/2 x (1) = p/2
f(a)t+1 = f(aa) + f(Aa)/2 = q²/2 + pq/2 = q/2 x (p + q) = q/2 x (1) = q/2
| Genotype | Frequency at t0 | Frequency at t+1 |
|---|---|---|
| AA | p²/2 | p²/2 |
| Aa | pq | pq |
| aa | q²/2 | q²/2 |
| A | p/2 | p/2 |
| a | q/2 | q/2 |
Since, the genotype frequencies in t+1 are the same as in t0, no evolution as occurred.
Let f be the probability of a individual’s allele be identical by descent with its other allele and (1-f) the probability that is not. Let XX’ be the genotype frequency of XX under inbreeding
AA’ = fp + (1-f) x p²/2 = fp + p²/2 -fp²/2 = p²/2 + fp(1-p/2) aa’ = fq + (1-f) x q²/2 = fq + p²/2 -fq²/2 = q²/2 + fq(1-q/2) A’ = p/2 and a’ = q/2 Aa’ = 1 - (AA’+ aa’ + A’ + a’) = 1 - [p²/2 + fp(1-p/2) + q²/2 + fq(1-q/2) + p/2 + q/2] = 1 - [p²/2 + q²/2 + p/2 + q/2 + fp + fq - fp²/2 - fq²/2] = 1 - [(1 - pq) + f(-p²/2 -q²/2 + p + q)] = 1 - [1- pq + f(1 - p²/2 - q²/2)] = pq - f(1 - p²/2 - q²/2)
| Genotype | Frequency with inbreeding |
|---|---|
| AA | p²/2 + fp(1-p/2) |
| Aa | pq - f(1- p²/2 -q²/2) |
| aa | q²/2 + fq(1-p/2) |
| A | p/2 |
| a | q/2 |
| Genotype | f = 0 | f = 1 |
|---|---|---|
| AA | p²/2 + 0 | p |
| Aa | pq - 0 | -p/2 -q/2 |
| aa | q²/2 + 0 | q |
| A | p/2 | p/2 |
| a | q/2 | q/2 |
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.