Welcome to the first Rosalind Tutorial! [add more?]
In DNA, the objective is to break apart a DNA string and count how many of each base pair the string contains.
Input : a string of DNA
Output : four numbers containing the output for ‘A’, ‘C’, ‘G’ and ‘T’
Example: Input : AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC Output : 20 12 17 21
This problem is well laid out and does not need to be expanded too much further, but lets take a look a the way that I solved the problem step by step. Here is the entire code :
def runDna(inputFile): fi = open(inputFile, 'r') #reads in the file that list the before/after file names activityFile = fi.read() #reads in files aCount, gCount, tCount, cCount = 0, 0, 0, 0 for k in activityFile: if k =="A": aCount +=1 if k =="G": gCount +=1 if k =="T": tCount +=1 if k =="C": cCount +=1 return (str(aCount) + " " + str(cCount) + " " + str(gCount) + " " + str(tCount))
Lets start by taking a look at line 1 of the code.
Setup
def runDna(inputFile):
I have set up each of the rosalind questions as a function that is part of a library that I am constructing to perform the Rosalind tasks in a nice, downloadable package. The way that the library works is by breaking down each problem and providing each function with the name/location of the input file that is outside of the library (but still in a known location to the script running the interface, check out “runLib” to understand where it pulls its data from).
Data Input and Initialization
fi = open(inputFile, 'r') #reads in the file that list the before/after file names inputData = fi.read() #reads in files aCount, gCount, tCount, cCount = 0, 0, 0, 0
The first two line is the standard way to input a data file. Later on, there will be isntances were we need more complicated ways to input data, typically by filtering the inputs and providing more structure but in this initial assignment, inputting the entire file into a single variable is sufficient.
The third line initializes four counters for each of the four base letters in DNA: “A”, “G”, “T” and “C”. These will be used in the following for loop.