Welcome to the first Rosalind Tutorial! [add more?]

In DNA, the objective is to break apart a DNA string and count how many of each base pair the string contains.
Input : a string of DNA
Output : four numbers containing the output for ‘A’, ‘C’, ‘G’ and ‘T’

Example:
Input    : AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC
Output    : 20 12 17 21

This problem is well laid out and does not need to be expanded too much further, but lets take a look a the way that I solved the problem step by step. Here is the entire code :

def runDna(inputFile):
fi = open(inputFile, 'r') #reads in the file that list the before/after file names
activityFile = fi.read() #reads in files
aCount, gCount, tCount, cCount = 0, 0, 0, 0

for k in activityFile:
if k =="A":
aCount +=1
if k =="G":
gCount +=1
if k =="T":
tCount +=1
if k =="C":
cCount +=1

return (str(aCount) + " " + str(cCount) + " " + str(gCount) + " " + str(tCount))

Lets start by taking a look at line 1 of the code.

Setup

def runDna(inputFile):

I have set up each of the rosalind questions as a function that is part of a library that I am constructing to perform the Rosalind tasks in a nice, downloadable package. The way that the library works is by breaking down each problem and providing each function with the name/location of the input file that is outside of the library (but still in a known location to the script running the interface, check out “runLib” to understand where it pulls its data from).

Data Input and Initialization

fi = open(inputFile, 'r') #reads in the file that list the before/after file names
inputData = fi.read() #reads in files
aCount, gCount, tCount, cCount = 0, 0, 0, 0

The first two line is the standard way to input a data file. Later on, there will be isntances were we need more complicated ways to input data, typically by filtering the inputs and providing more structure but in this initial assignment, inputting the entire file into a single variable is sufficient.

The third line initializes four counters for each of the four base letters in DNA: “A”, “G”, “T” and “C”. These will be used in the following for loop.

Processing the DNA string