Alright guys, welcome to the tutorial/breakdown on IPRB or Introduction to Mendelian Inheritance.
This one is a it tricky and involves some math which I will lay out for you guys as best as possible.
Given Problem
Given: Three positive integers k, m, and n, representing a population containing k+m+n organisms: k individuals are homozygous dominant for a factor, m are heterozygous, and n
are homozygous recessive.
Return: The probability that two randomly selected mating organisms will produce an individual possessing a dominant allele (and thus displaying the dominant phenotype). Assume that any two organisms can mate.
Example: Input : 2 2 2 Output : 0.78333
Source Code
def runIprb(inputFile): fi = open(inputFile, 'r') #reads in the file that list the before/after file names activityFile = fi.readline().split() #reads in files k, m, n = float(activityFile[0]), float(activityFile[1]), float(activityFile[2]) total = k + m + n r_r = (n / total) * ((n - 1) / (total - 1)) d_d = (m / total) * ((m - 1) / (total - 1)) d_r = (m / total) * (n / (total - 1)) + (n / total) * (m / (total - 1)) k_total = r_r + d_d * 1/4 + d_r * 1/2 k_total = 1 - k_total return str(k_total)
Alright let’s start breaking this down!
Data Input and Filtering
def runIprb(inputFile): fi = open(inputFile, 'r') #reads in the file that list the before/after file names activityFile = fi.readline().split() #reads in files k, m, n = float(activityFile[0]), float(activityFile[1]), float(activityFile[2]) total = k + m + n
So here we have the initial data entry and filtering process. As per usual, the function is being provided with the location of the file with the “inputFile” variable. The file is opened in the second line and in the third line the file data is imported and split into an list with only three items.
The first item in the list (k) represents the amount of homozygous dominant members in the initial sample size, the second (m) represents the heterozygous population count and the third item in the list (n) represents the homozygous recessive population.
Remember, the goal is to find the probability of two randomly chosen members children having a dominant phenotype (which requires at least one dominant allele).
The “total” value will be used later in the function and explained then.
r_r = (n / total) * ((n - 1) / (total - 1)) d_d = (m / total) * ((m - 1) / (total - 1)) d_r = (m / total) * (n / (total - 1)) + (n / total) * (m / (total - 1))
So, here we go, the main gist of the program, the math behind figuring out the various probabilities.
The first three lines are there to define the probability of each combinations, with r representing recessive and d representing dominant
Here are the formulas we use in this problem:
r_r
d_d
d_r
ntotal*n-1total-1
mtotal*m-1total-1
(mtotal*ntotal-1)+(ntotal*mtotal-1)
In these equations…
n = heterozygous
m = homozygous recessive
So, by using the above formula to calculate the individual values for the probability for each allele in the following generation given the composition of the previous generation’s alleles.
Now, we need to figure out what the probability of randomly selected organisms have offspring with at least one dominant allele (two works as well).
k_total = r_r + d_d * 1/4 + d_r * 1/2 k_total = 1 - k_total
Here goes the last bit of math. To get the total change of finding offspring with at least one dominant allele (k_total) we need to add the probability of getting homozygous recessive, a fourth of homozygous dominant and half of heterozygous. Then, by taking the inverse of generated percent we get the probability of the offspring containing one or two samples of the dominant allele.
return str(k_total)
After, just return the total. Hope this helps!