I’m new at Python so this is like drinking out of a fire hose. I’ll explain the context of what the problem is here to give a better idea of what to discuss.
This program gives a pairwise DNA alignment using “dynamic programming”. A score is assigned based on how well the pairs line up; preferably the optimal alignment should be reached, but that can be discussed later.
The “substitution matrix” here considers the possible pair matchings and assigns a score value to each. Every time there is the same letter, it is 20 points. This table shows the values.

In the case of dynamic programming, an optimal alignment is accomplished by considering three options for each location in the DNA sequence, and selecting the best option before continuing on.
(Conceptually, the second sequence would be the columns of the matrix, and the first sequence is the rows of the matrix. This is just an example)

Don’t worry about affine gap penalties (consecutive gaps), there’s just only a gap penalty variable required here and it’s set to -5 points. A gap (indel) is when we can’t distinguish whether a letter was inserted into a string or removed from the other string. For alignment purposes we can’t tell so gaps are used when necessary.
So now onto the problems!
In order to get the aligned results of the two DNA sequences, it looks like one would have to:
1.) Make the scoring matrix. <-- Scoring matrix maintains the current alignment score for the particular alignment.
2.) Make the arrow matrix. <-- determines the optimal score path. It is created at the same time the scoring matrix is.
3.) Get path from the arrow matrix through a backtracing algorithm and reverse the string after.
4.) Pass the two strings to a scoring function and calculate the score.
Here the DNA sequences are being converted to integers to represent the index on the original substitution matrix (0 for T, 1 for C, etc.) and this new substitution matrix also aligns with the scoring matrix.
In this code we’re making the Scoring and Arrow Matrices. The first row and column is filled entirely with gap penalties.
Making the arrow matrix from the previous block of code allows us to backtrace the path to return the aligned DNA sequences to get their scores calculated.
Now what’s confusing here is why when I did mat[x,y] in this stringScore function it wasn’t converted into an integer, I got an error about a tuple:
>>> t1 = stringScore( mat, alphabet, seq1 , seq2 , g = -5 )
Traceback (most recent call last):
File "<pyshell#31>", line 1, in <module>
t1 = stringScore( mat, alphabet, seq1 , seq2 , g = -5 )
File "C:\bioinformatics\hw2.py", line 126, in stringScore
score -= mat[x,y] # Search substitution matrix for value to subtract according to pair.
TypeError: list indices must be integers, not tuple
>>>
This program gives a pairwise DNA alignment using “dynamic programming”. A score is assigned based on how well the pairs line up; preferably the optimal alignment should be reached, but that can be discussed later.
The “substitution matrix” here considers the possible pair matchings and assigns a score value to each. Every time there is the same letter, it is 20 points. This table shows the values.
Code:
alphabet = "TCAG" # 0 for T, 1 for C, 2 for A, 3 for G.

In the case of dynamic programming, an optimal alignment is accomplished by considering three options for each location in the DNA sequence, and selecting the best option before continuing on.
(Conceptually, the second sequence would be the columns of the matrix, and the first sequence is the rows of the matrix. This is just an example)

Don’t worry about affine gap penalties (consecutive gaps), there’s just only a gap penalty variable required here and it’s set to -5 points. A gap (indel) is when we can’t distinguish whether a letter was inserted into a string or removed from the other string. For alignment purposes we can’t tell so gaps are used when necessary.
So now onto the problems!
In order to get the aligned results of the two DNA sequences, it looks like one would have to:
1.) Make the scoring matrix. <-- Scoring matrix maintains the current alignment score for the particular alignment.
2.) Make the arrow matrix. <-- determines the optimal score path. It is created at the same time the scoring matrix is.
3.) Get path from the arrow matrix through a backtracing algorithm and reverse the string after.
4.) Pass the two strings to a scoring function and calculate the score.
Code:
# Our two DNA sequences.
seq1 = 'ATGTTAT'
seq2 = 'ATCGTAG'
alphabet = "TCAG" # 0 for T, 1 for C, 2 for A, 3 for G.
mat = [
[20, 10, 5, 5],
[10, 20, 5, 5],
[5, 5, 20, 10],
[5, 5, 10, 20],
]
In this code we’re making the Scoring and Arrow Matrices. The first row and column is filled entirely with gap penalties.
Making the arrow matrix from the previous block of code allows us to backtrace the path to return the aligned DNA sequences to get their scores calculated.
Now what’s confusing here is why when I did mat[x,y] in this stringScore function it wasn’t converted into an integer, I got an error about a tuple:
>>> t1 = stringScore( mat, alphabet, seq1 , seq2 , g = -5 )
Traceback (most recent call last):
File "<pyshell#31>", line 1, in <module>
t1 = stringScore( mat, alphabet, seq1 , seq2 , g = -5 )
File "C:\bioinformatics\hw2.py", line 126, in stringScore
score -= mat[x,y] # Search substitution matrix for value to subtract according to pair.
TypeError: list indices must be integers, not tuple
>>>
Code:
"""
Calculate score of the two DNA sequences.
They are stored as strings in two arrays.
"""
def stringScore( mat, alphabet, seq1 , seq2 , g = -5 ):
length = len( seq1 )
score = 0
for i in range (length):
if seq1[i] != seq2[i]:
x = alphabet.index( seq1[i] )
y = alphabet.index( seq2[i] )
score -= mat[x,y] # Search substitution matrix for value to subtract according to pair.
elif seq1[i] == '-' or seq2[i] == '-':
score += g # Gap penalty
else:
score += 20 # This means the two letters match.
return score
Comment