For Loop Part II of II

Now it is time to talk about my favourite loop in python programming.

This is a bit advanced topic for the for loop. Understanding all the possible ways to use the for loop is very helpful for python programming skills.

We will take a look at a bunch of different scanarios with for loops in this video like:

Before we start we have to define a function

We will be using our previous knowledge to define a function in two different ways. First we will define it using the basic for loops and then we will define it using the list comprehentions.

## List comprehentions
def read_FASTA(filename):
    with open(filename) as file:
        return [(part[0].split('|'),
                 part[2].replace('\n', ''))
                    for part in 
                         [entry.partition('\n') for entry in file.read().split('>')[1:]]]
print(read_FASTA('sequence.fasta')[1:10])

Combine

Well in our last video we ended up talking about the collection. Here we are going to start with the combine. There is very minor difference between collect and combine. Rather than making a collection this time you will keep on adding or combinig to the value and at the end you will just return that value.

result = identity-value  
for item in collection:
    result = result . item      # One of these will be used
    result = .= item            # The top one does the same 
    result = fn(result, item)
return result                      

Let’s try to use this in a bioinformatickey function to get the longest sequence form a fasta file with multiple sequences.

def longest_sequence(filename):
    longest_seq = ''
    for info, seq in read_FASTA(filename):
        longest_seq = max(longest_seq, seq, key=len)
    return longest_seq


print(longest_sequence('longestSeq.fasta'))

Before we go to the next topic. You can try to think and comment down below that which form out of three forms described in the sudo code above is used the in the function longest_sequence.

Count

Well again it is very similar to the previous one. This time we will increment by just one because we are gonna be counting.

count = 0
for item in iterable:
    count += 1
return count
kmers = ['tag', 'atgcgt', 'tagat', 'aaaaaaaaaaaaaa']

count = 0
for kmer in kmers:
    count += 1


print(count)

Collection Combine

An ordinary Combine operation “reduces” a collection to a value; a Collection Combine reduces a collection of collections to a single collection. (In the template presented here the reduction is done step by step, but it could also be done by first assembling the entire collection of collections and then reducing them to a single collection.)

result = []
for item in collection:
    result += fn(item)
    # merge result with previous result
def extract_gi_id(description):
    '''Given a FASTA file description line, return its GenInfo ID if it has one'''
    if description[0] != '>':
        return None
    fields = description[1:].split('|')
    if 'gi' not in fields:
        return None
    return fields[1 + fields.index('gi')]

def get_gi_ids(filename):
    """Return a list of GenInfo IDs of all sequences in the file names filename"""
    with open(filename) as file:
        return [extract_gi_id(line) for line in file if line[0] == '>']

def get_gi_ids_from_files(filenames):
    """Return a list of the GenInfo ID's of all sequences formed in the
    files whose names are contained in the collection filenames."""
    idlst = []
    for filename in filenames:
        idlst += get_gi_ids(filename)
    return idlst


get_gi_ids_from_files(['longestSeq.fasta', 's3.fasta'])

Search

With in a for loop you simply keep on testing for a specific thing and then return as soon as you find it. Here is the sudo code. Suppose you have a huge fasta file and you want to see if that file contains a sequence with specific id then it will not be a good idea to store all the detials about that file. So, we will just read one sequence at a time and then check if this matches with our id and returns it otherwise it go to get the next sequence from the file.

for item in colleciton:
    if test item:
        return item

Let’t take a look at the bioinformatickey example of the code. Like I always recomend we will be defining 4 functions to achieve this task. The four functions are as follow in the order in which we will define them:

search_FASTA_file_by_gi_id(id, filename):   
    FASTA_search_by_gi_id(id, fil)          
        extract_gi_id(line)   
        read_FASTA_sequence(fil)

search_FASTA_file_by_gi_id(id, filename)

This is the top level function which is usually simplest function and just the entry point for complicated tasks. The main purpose of this function is to provide the next function data in correct form. This function will open the file to start the real search operation.

FASTA_search_by_gi_id(id, fil):

It will keep on searching for lines starting with ‘>’. As soon as it finds such line it will call the function get_gi_i(description) to get the GenInfo ID from the line if it exixts. Then it will compare. If there is a match we will call the read_FASTA_sequence and return. Otherwise we will keep on loooking for next sequence.

So, let’s strart coding by difining the top level function.

def search_FASTA_file_by_gi_id(id, filename):
    """Return the sequence with the GenInfo ID ID from the FASTA file
    named filename, reading one entry at a time until it is found"""
    id = str(id) # user might call with a number
    with open(filename) as file:
        return FASTA_search_by_gi_id(id, file) 


## About the functions that we will define next
## there will be a task for you after we finish 
## defining them here.


def FASTA_search_by_gi_id(id, file):
    for line in file:
        if (line[0] == '>' and
            str(id) == get_gi_id(line)):
            return read_FASTA_sequence(file)

def read_FASTA_sequence(file):
    seq = '' 
    for line in file:
        if not line or line[0] == '>':
            return seq
        seq += line[:-1]
   

def get_gi_id(description):
    fields = description[1:].split('|')
    if fields and 'gi' in fields:
        return fields[1 + fields.index('gi')]


print(search_FASTA_file_by_gi_id('1443065911', 'sequence.fasta'))

Now here is the task for you. You have to define the following functions by the help of a while loop:

Special case for search Iteration

We will keep on searching and rather than returning the item we found we will return True or False. Let’s take a look at two code examples to make things clear.

def rna_sequence_is_valid(seq):
    for base in seq:
        if base not in 'UCAGucag':
            return False
    return True


def dna_sequence_contains_N(seq):
    for base in seq:
        if base == 'N':
            return True
Can you point out the difference between these two functions defined above?

Filter

This time we will filter stuff based on the specific condition. We can have quite a few different filter iterations. Let’s talk about them one by one:

Filter Do

In filter do we will do a an action on the item if it is according to a specific condition. Here is the sudo code.

for item in collection:
                                    # we could have initializatios here before the 
                                    # conditional
    if test item:                   #Test condition shown here could be complicated
        statements using item 

Let’s take a look at the function in which we will print all the headers of the fasta file.

def print_FASTA_headers(filename): 
    with open(filename) as file:
        for line in file:
            if line[0] == '>':
                print(line[1:-1])

Filtered Combine

It is just like normal combine but this time we will only combine if it passes a given test.

result = identity-value
    for item in collection:
        if test item:
            result = result · item          # One of these
            result ·= item                  # three forms
            result = fn(result, item)       # is used.
return result

Filtered Count

This is like count but this time we will count only for those who pass the test.

count = 0
for item in iterable:
    if test item:
        count += 1
return count

Your task is to think of a scenario where we can implement the Filtered Combine and Filtered Count approach in a Bioinformatickey or just ordinary function.

Nested iterations

Lets talk about the last topic of this video. This is when we want to do (a same set of things) for each item in the collection. Sudo code will look like this and it will get clear once we take a look at the code examples.

for outer in outer_collection:
    for inner in inner_collection:
        do something with inner and outer

Time to code a nested iteration function where we will be printing the codon table.

from pyforbinf import translate_RNA_codon
def print_codon_table():
    """Print the DNA codon table in a nice, but simple, arrangement"""
    DNA_bases = ['A', 'U', 'G', 'C']
    
    for base1 in DNA_bases: # horizontal section (or "group")
        for base3 in DNA_bases: # line (or "row")
            for base2 in DNA_bases: # vertical section (or "column")    
                # the base2 loop is inside the base3 loop!
                print(base1+base2+base3,
                        translate_RNA_codon(base1+base2+base3),
                        end='     ')
            print()
        print()

print_codon_table()