Python pattern matching algorithm tuple error

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • DossarLX ODI
    Batch Manager
    Game Manager
    FFR Simfile Author
    • Mar 2008
    • 14989

    #1

    Python pattern matching algorithm tuple error

    I’m new at Python so this is like drinking out of a fire hose. I’ll explain the context of what the problem is here to give a better idea of what to discuss.

    This program gives a pairwise DNA alignment using “dynamic programming”. A score is assigned based on how well the pairs line up; preferably the optimal alignment should be reached, but that can be discussed later.

    The “substitution matrix” here considers the possible pair matchings and assigns a score value to each. Every time there is the same letter, it is 20 points. This table shows the values.
    Code:
    alphabet = "TCAG" # 0 for T, 1 for C, 2 for A, 3 for G.


    In the case of dynamic programming, an optimal alignment is accomplished by considering three options for each location in the DNA sequence, and selecting the best option before continuing on.
    (Conceptually, the second sequence would be the columns of the matrix, and the first sequence is the rows of the matrix. This is just an example)


    Don’t worry about affine gap penalties (consecutive gaps), there’s just only a gap penalty variable required here and it’s set to -5 points. A gap (indel) is when we can’t distinguish whether a letter was inserted into a string or removed from the other string. For alignment purposes we can’t tell so gaps are used when necessary.

    So now onto the problems!

    In order to get the aligned results of the two DNA sequences, it looks like one would have to:
    1.) Make the scoring matrix. <-- Scoring matrix maintains the current alignment score for the particular alignment.
    2.) Make the arrow matrix. <-- determines the optimal score path. It is created at the same time the scoring matrix is.
    3.) Get path from the arrow matrix through a backtracing algorithm and reverse the string after.
    4.) Pass the two strings to a scoring function and calculate the score.

    Code:
    # Our two DNA sequences.
    seq1 = 'ATGTTAT'
    seq2 = 'ATCGTAG'
    alphabet = "TCAG" # 0 for T, 1 for C, 2 for A, 3 for G.
    
    mat =       [
                     [20, 10, 5, 5],
                     [10, 20, 5, 5],
                     [5, 5, 20, 10],
                     [5, 5, 10, 20],
                ]
    Here the DNA sequences are being converted to integers to represent the index on the original substitution matrix (0 for T, 1 for C, etc.) and this new substitution matrix also aligns with the scoring matrix.

    Code:
    """
    mat is the substitution matrix being passed.
    abet is the alphabet associated with that matrix.
    sq1 is the first DNA sequence.
    sq2 is the second DNA sequence.
    Here we are making a substitution matrix that aligns with the scoring matrix.
    """
    def alignSubMatrix( mat, alphabet, seq1, seq2 ):
        l1, l2 = len( seq1 ), len( seq2 ) # Get the lengths of our two DNA strings
        subvals = zeros( ( l1+1,l2+1 ) , int ) # Matrix of integers, add 1 to the lengths
        # Convert the DNA sequences to numbers. si for substitution index.
        si1 = zeros( l1,int ) # Array of integers, length of horizontal dimension
        si2 = zeros( l2,int ) # Array of integers, length of vertical dimension
        # In this for loop, first DNA sequence gets converted to a set of integers.
        for i in range( l1 ):
            si1[i] = alphabet.index( seq1[i] )
        # In this for loop, second DNA sequence gets converted to a set of integers.
        for i in range( l2 ):
            si2[i] = alphabet.index( seq2[i] )
        # Fill in one row of the substitution matrix at a time.
        for i in range( 1, l1+1 ):
            # Subtracting 1 from index to line up sub matrix with scoring matrix.
            subvals[i,1:] = mat[ si1[i-1] , si2[i-1] ] # Take ith row of subvals, go from column 1 to the last column for each row
        return subvals


    In this code we’re making the Scoring and Arrow Matrices. The first row and column is filled entirely with gap penalties.

    Code:
    """
    This function computes the scoring matrix and the arrow matrix.
    subvals are the substitution matrix values from alignSubMatrix.
    sq1 and sq2 are the DNA sequences.
    g is the gap penalty.
    """
    def SandAmatrix( subvals, alphabet, seq1, seq2, g = -5 ):
        l1, l2 = len( seq1 ), len( seq2)
        sMatrix = zeros( ( l1+1,l2+1 ) , int )
        aMatrix = zeros( ( l1+1,l2+1 ) , int )
        # Create first row and first column. Remember they're all gap penalties
        sMatrix[0] = arange( l2+1 )*g # make first row all gap penalties
        sMatrix[:,0] = arange( l1+1 )*g # make first column all gap penalties
        aMatrix[0] = ones( l2+1 ) # make first row all ones
        for i in range( 1, l1+1 ):
            for j in range( 1, l2+1 ):
                pathcheck = zeros( 3 ) # 3-element array of zeros.
                pathcheck[0] = sMatrix[i-1,j] + g # If we go right
                pathcheck[1] = sMatrix[i,j-1] + g # If we go down
                x = alphabet.index( seq1[i-1] )
                y = alphabet.index( seq2[j-1] )
                pathcheck[2] = sMatrix[i-1,j-1] + mat[x,y] # If we go diagonally
                sMatrix[i,j] = pathcheck.max()
                aMatrix[i,j] = pathcheck.argmax()
        return sMatrix, aMatrix


    Making the arrow matrix from the previous block of code allows us to backtrace the path to return the aligned DNA sequences to get their scores calculated.

    Code:
     """
    Extract the aligned sequences from the arrow matrix.
    Start from the bottom right, work to the top left.
    """
    def Backtrace( aMatrix, seq1, seq2 ):
        st1, st2 = ","
        y,x = aMatrix.shape
        ok = 1
        # x for horizontal position, y for vertical position in the arrow matrix
        y -= 1
        x -= 1
        # Conceptually here, second sequence is at the top, first sequence is along the side.
        while ok:
            if aMatrix[y,x] == 0: # indicates backtrace moves up
                st1 += '-'
                y -= 1
            elif aMatrix[y,x] == 1: # indicates backtrace moves left
                st1 += '-'
                st2 += seq2[x-1]
            elif aMatrix[y,h] == 2: # indicates backtrace moves diagonally up-left
                st1 += seq1[y-1]
                st2 += seq2[x-1]
                y -= 1
                x -= 1
            if y==0 and x==0: # This means we are at the top left
                ok = 0
        # Now we need to reverse the strings, since we went bottom right to top left
        st1 = st1[::-1] # Reverse the first string.
        st2 = st2[::-1] # Reverse the second string.
        return st1, st2


    Now what’s confusing here is why when I did mat[x,y] in this stringScore function it wasn’t converted into an integer, I got an error about a tuple:

    >>> t1 = stringScore( mat, alphabet, seq1 , seq2 , g = -5 )

    Traceback (most recent call last):
    File "<pyshell#31>", line 1, in <module>
    t1 = stringScore( mat, alphabet, seq1 , seq2 , g = -5 )
    File "C:\bioinformatics\hw2.py", line 126, in stringScore
    score -= mat[x,y] # Search substitution matrix for value to subtract according to pair.
    TypeError: list indices must be integers, not tuple
    >>>

    Code:
    """
    Calculate score of the two DNA sequences.
    They are stored as strings in two arrays.
    """
    def stringScore( mat, alphabet, seq1 , seq2 , g = -5 ):
        length = len( seq1 )
        score = 0
        for i in range (length):
            if seq1[i] != seq2[i]:
                x = alphabet.index( seq1[i] )
                y = alphabet.index( seq2[i] )
                score -= mat[x,y] # Search substitution matrix for value to subtract according to pair.                  
            elif seq1[i] == '-' or seq2[i] == '-':
                score += g # Gap penalty
            else:
                score += 20 # This means the two letters match.
        return score
    Originally posted by hi19hi19
    oh boy, it's STIFF, I'll stretch before I sit down at the computer so not I'm not as STIFF next time I step a file
  • beary605
    FFR Veteran
    • Oct 2008
    • 448

    #2
    Re: Python pattern matching algorithm tuple error

    A really simple error: you need to use map[x][y] instead of map[x,y] ("x,y" gets interpreted as (x, y), so you access map[ (x, y) ] which Python doesn't like)

    I'll look over the rest of the code to see if there's any other errors

    edit: wow I did not know you could put commas like that in splices... see if the above works if not then... hm
    Last edited by beary605; 02-9-2014, 02:48 AM.

    Comment

    • DossarLX ODI
      Batch Manager
      Game Manager
      FFR Simfile Author
      • Mar 2008
      • 14989

      #3
      Re: Python pattern matching algorithm tuple error

      Originally posted by beary605
      A really simple error: you need to use map[x][y] instead of map[x,y] ("x,y" gets interpreted as (x, y), so you access map[ (x, y) ] which Python doesn't like)
      I changed it to what you said and got this, thanks

      >>> t1 = stringScore( mat, alphabet, seq1 , seq2 , g = -5 )
      >>> t1
      65
      >>>

      Which looks correct, because the two strings would result in:
      20 + 20 + (-5) + (-5) + 20 + 20 + (-5) = 80 - 15 = 65
      Those negatives being the mismatches.

      Now I'll see what goes wrong with the other functions. I know that the scoring function works when the strings are aligned, so that's a good start.

      Edit: Ok, another problem

      Code:
      >>> from numpy import *
      
      Traceback (most recent call last):
        File "<pyshell#2>", line 1, in <module>
          from numpy import *
        File "C:\Python27\lib\site-packages\numpy\__init__.py", line 153, in <module>
          from . import add_newdocs
        File "C:\Python27\lib\site-packages\numpy\add_newdocs.py", line 13, in <module>
          from numpy.lib import add_newdoc
        File "C:\Python27\lib\site-packages\numpy\lib\__init__.py", line 8, in <module>
          from .type_check import *
        File "C:\Python27\lib\site-packages\numpy\lib\type_check.py", line 11, in <module>
          import numpy.core.numeric as _nx
        File "C:\Python27\lib\site-packages\numpy\core\__init__.py", line 6, in <module>
          from . import multiarray
      ImportError: DLL load failed: %1 is not a valid Win32 application.
      I'm trying to use numpy but there's that ImportError message. I downloaded a 64-bit version of opencv but that still didn't solve it. I am running a 64-bit machine (Windows 7) and I know numpy is 32-bit but there doesn't seem to be much on handling this online, maybe someone else can help?
      Last edited by DossarLX ODI; 02-9-2014, 08:52 AM.
      Originally posted by hi19hi19
      oh boy, it's STIFF, I'll stretch before I sit down at the computer so not I'm not as STIFF next time I step a file

      Comment

      • Reincarnate
        x'); DROP TABLE FFR;--
        • Nov 2010
        • 6332

        #4
        Re: Python pattern matching algorithm tuple error

        I am way too lazy to read through all this, can you just explain what you need to do in like 2 sentences -- I'll whip up the solution in Python real quick and you can look at it or something

        you have two sequences
        seq1 = 'ATGTTAT'
        seq2 = 'ATCGTAG'

        what's the score and why

        (or are you trying to do something else?)

        Comment

        • Reincarnate
          x'); DROP TABLE FFR;--
          • Nov 2010
          • 6332

          #5
          Re: Python pattern matching algorithm tuple error

          something like this maybe

          Code:
          def stringScore( mat, baseMap, seq1, seq2, g = -5 ):
          
              if len(seq1) != len(seq2):
                  raise Exception("Fix your shit, both strings must be of equal length")
          
              score = 0
              
              for base1,base2 in zip(seq1,seq2):
                  if base1 != base2:
                      score -= mat[baseMap[base1]][baseMap[base2]]
                  elif base1 == "-" or base2 == "-":
                      score += g
                  else:
                      score += mat[baseMap[base1]][baseMap[base2]]
                  
              return score
          
          
          
          mat = [[20, 10, 5, 5],
                 [10, 20, 5, 5],
                 [5, 5, 20, 10],
                 [5, 5, 10, 20]]
          
          alphabet = "TCAG"
          baseMap = {base:index for index,base in enumerate(alphabet)}
          
          seq1 = 'ATGTTAT'
          seq2 = 'ATCGTAG'
          
          print stringScore(mat, baseMap, seq1, seq2)
          Last edited by Reincarnate; 02-9-2014, 09:02 AM.

          Comment

          • DossarLX ODI
            Batch Manager
            Game Manager
            FFR Simfile Author
            • Mar 2008
            • 14989

            #6
            Re: Python pattern matching algorithm tuple error

            Originally posted by Reincarnate
            I am way too lazy to read through all this, can you just explain what you need to do in like 2 sentences -- I'll whip up the solution in Python real quick and you can look at it or something

            you have two sequences
            seq1 = 'ATGTTAT'
            seq2 = 'ATCGTAG'

            what's the score and why

            (or are you trying to do something else?)
            Yeah I can make a tl;dr version, my post above also shows an error when I try importing numpy. However I still have to give lots of details here because this assignment involves creating a path

            The program is supposed to work for any two sequences e.g.
            ATGAATGCGATTTCGGGTGGCC
            TTGGCAGGACATGAAGTTCGATACGGAA

            Obviously the first string is too short so that's when we have to insert gaps and there are gap penalties to make it the same length. The gap penalty is just some integer, say -5.

            In a nutshell we're doing a 5-way comparison:
            - Gaps.
            - A, G, C, or T (those are the four letters).

            This is the score table (substitution matrix).


            In dynamic programming you're considering an optimal sequence alignment by considering three options for each location in the sequence: down, right, or down-right. Visual example:


            So basically, we have our substitution matrix (4x4) which has the values for a pair of letters found, listed in the score table. Then the scoring matrix maintains the current alignment score for the particular alignment. The arrow matrix determines the optimal score path (It is created at the same time the scoring matrix is). Then we backtrace that path to get the strings aligned so they're the same length, and finally use the score function.
            Last edited by DossarLX ODI; 02-9-2014, 09:00 AM.
            Originally posted by hi19hi19
            oh boy, it's STIFF, I'll stretch before I sit down at the computer so not I'm not as STIFF next time I step a file

            Comment

            • Reincarnate
              x'); DROP TABLE FFR;--
              • Nov 2010
              • 6332

              #7
              Re: Python pattern matching algorithm tuple error

              so like this?

              Code:
              def stringScore( mat, baseMap, seq1, seq2, g = -5 ):
                  score = g*abs(len(seq1)-len(seq2)) #if one string is longer than the other, gap penalty = g*(difference in lengths)
                  
                  for base1,base2 in zip(seq1,seq2):
                      if base1 != base2:
                          score -= mat[baseMap[base1]][baseMap[base2]]
                      else:
                          score += mat[baseMap[base1]][baseMap[base2]]
                  
                  return score
              sorry still in tl;dr mode

              are you trying to basically say "we don't know where the real gaps are, so let's try all possible ways of inserting gaps to determine the maximal possible score?"


              ATGAATGCGATTTCGGGTGGCC-------
              TTGGCAGGACATGAAGTTCGATACGGAA

              probably a shitty score due to all mismatches, but if we had

              -ATG-AATG-CGATT-TC-GG--GTGGCC
              TTGGCAGGACATGAAGTTCGATACGGAA

              or something (just making up a shitty example), then the gaps could possibly be matches and we'd have a better score
              Last edited by Reincarnate; 02-9-2014, 09:06 AM.

              Comment

              • DossarLX ODI
                Batch Manager
                Game Manager
                FFR Simfile Author
                • Mar 2008
                • 14989

                #8
                Re: Python pattern matching algorithm tuple error

                Originally posted by Reincarnate
                are you trying to basically say "we don't know where the real gaps are, so let's try all possible ways of inserting gaps to determine the maximal possible score?"


                ATGAATGCGATTTCGGGTGGCC-------
                TTGGCAGGACATGAAGTTCGATACGGAA

                probably a shitty score due to all mismatches, but if we had

                -ATG-AATG-CGATT-TC-GG--GTGGCC
                TTGGCAGGACATGAAGTTCGATACGGAA

                or something (just making up a shitty example), then the gaps could possibly be matches and we'd have a better score
                Yes, the gaps are supposed to align the sequences better. From a practical standpoint we also don't readily know where the gaps are so they have to be inserted based on how much the score will improve.

                The score is calculated at the end. The scoring matrix is just made so the arrow matrix can be made to determine the best path for alignment. Then after those paths are backtraced (and then the strings reversed), the alignments are sent to the score function (essentially we're making a new string with dashes in it to indicate gaps since it was too short).

                Gaps here are a pain in the ass to work with. The gaps are considered by the arrow matrix which considers the best possible path by checking how the score changes, because like I mentioned some strings aren't of equal length and that's when the gaps come in.

                I guess another way of tl;dr this is that it's getting two strings with four possible letters, and if they are not the same length then gaps have to be inserted to make the letters line up better for an optimal score.
                Last edited by DossarLX ODI; 02-9-2014, 09:15 AM.
                Originally posted by hi19hi19
                oh boy, it's STIFF, I'll stretch before I sit down at the computer so not I'm not as STIFF next time I step a file

                Comment

                • Reincarnate
                  x'); DROP TABLE FFR;--
                  • Nov 2010
                  • 6332

                  #9
                  Re: Python pattern matching algorithm tuple error

                  So if two bases match, they basically add the corresponding matrix value (typically 20), but if they mismatch, they subtract the matrix value?

                  And using a gap incurs a -5 penalty?

                  Comment

                  • DossarLX ODI
                    Batch Manager
                    Game Manager
                    FFR Simfile Author
                    • Mar 2008
                    • 14989

                    #10
                    Re: Python pattern matching algorithm tuple error

                    Originally posted by Reincarnate
                    So if two bases match, they basically add the corresponding matrix value (typically 20), but if they mismatch, they subtract the matrix value?

                    And using a gap incurs a -5 penalty?
                    From my understanding yes. It makes no sense that the score would increase if there is a mismatch, and it should subtract the matrix value. Subtracting 1 from the score for a mismatch sounds incredibly stupid and I'll be mad if that's actually the case here, but just go by the matrix values in that table.

                    Edit: Back. That was faster than I planned
                    Last edited by DossarLX ODI; 02-9-2014, 09:49 AM.
                    Originally posted by hi19hi19
                    oh boy, it's STIFF, I'll stretch before I sit down at the computer so not I'm not as STIFF next time I step a file

                    Comment

                    • Reincarnate
                      x'); DROP TABLE FFR;--
                      • Nov 2010
                      • 6332

                      #11
                      Re: Python pattern matching algorithm tuple error

                      Untested code, let me know if this is returning the right values for you -- if so then I'll clean up the code and make it a little more readable / with better coding practices / whatever -- just want to ensure it's returning the right vals first

                      Code:
                      class memoize:
                          def __init__(self, fn):
                              self.fn = fn
                              self.memo = {}
                          def __call__(self, *args, **kwds):
                              import cPickle
                              str = cPickle.dumps(args, 1)+cPickle.dumps(kwds, 1)
                              if not self.memo.has_key(str):
                                  self.memo[str] = self.fn(*args, **kwds)
                              return self.memo[str]
                      
                      
                      
                      @memoize
                      def dpScore( mat, baseMap, seq1, seq2, gapsLeft, g ):
                          if (len(seq1)==0 and gapsLeft>0) or gapsLeft<0: return -9999999999999999999999 #lazy right now, using a sentinel
                          if len(seq1)==0 or len(seq2)==0: return 0
                      
                          possibleScore1 = g + mat[baseMap[seq1[0]]][baseMap[seq1[0]]] + dpScore( mat, baseMap, seq1[1:], seq2, gapsLeft-1, g )
                      
                          if seq1[0] == seq2[0]: matchScore = mat[baseMap[seq1[0]]][baseMap[seq2[0]]]
                          else: matchScore = -mat[baseMap[seq1[0]]][baseMap[seq2[0]]]
                      
                          possibleScore2 = matchScore + dpScore( mat, baseMap, seq1[1:], seq2[1:], gapsLeft, g )
                          
                          return max(possibleScore1, possibleScore2)
                      
                      
                      
                      def bestStringScore( mat, baseMap, seq1, seq2, g = -5 ):
                          gapsLeft = abs(len(seq1)-len(seq2))
                          if len(seq1) > len(seq2):
                              return dpScore(mat, baseMap, seq1, seq2, gapsLeft, g)
                          else:
                              return dpScore(mat, baseMap, seq2, seq1, gapsLeft, g)
                      
                      
                      
                      mat = [[20, 10, 5, 5],
                             [10, 20, 5, 5],
                             [5, 5, 20, 10],
                             [5, 5, 10, 20]]
                      
                      alphabet = "TCAG"
                      baseMap = {base:index for index,base in enumerate(alphabet)}
                      
                      seq1 = 'ATGTTAT'
                      seq2 = 'ATCGTAG'
                      
                      print bestStringScore(mat, baseMap, seq1, seq2)

                      Comment

                      • DossarLX ODI
                        Batch Manager
                        Game Manager
                        FFR Simfile Author
                        • Mar 2008
                        • 14989

                        #12
                        Re: Python pattern matching algorithm tuple error

                        Yes, copied that code snippet and got 65 for those sequences (same with what I got).
                        Originally posted by hi19hi19
                        oh boy, it's STIFF, I'll stretch before I sit down at the computer so not I'm not as STIFF next time I step a file

                        Comment

                        • Reincarnate
                          x'); DROP TABLE FFR;--
                          • Nov 2010
                          • 6332

                          #13
                          Re: Python pattern matching algorithm tuple error

                          I mean ideally you should also test with sequences that are of different lengths and then check to see if the resulting maximum score is what you get by hand (inserting gaps manually etc)

                          Comment

                          • DossarLX ODI
                            Batch Manager
                            Game Manager
                            FFR Simfile Author
                            • Mar 2008
                            • 14989

                            #14
                            Re: Python pattern matching algorithm tuple error

                            I'm testing it with

                            seq1 = 'ATGTTAT'
                            seq2 = 'ATCGTAGTA'

                            seq1 = 'ATGTTA-T-'
                            seq2 = 'ATCGTAGTA'

                            20 + 20 - 5 - 5 + 20 + 20 - 5 + 20 - 5 = 80

                            seq1 = 'AT-GTTAT-'
                            seq2 = 'ATCGTAGTA'

                            20 + 20 - 5 + 20 + 20 - 5 - 10 + 20 -5 = 75

                            ATGTTA-T- would be the optimal sequence here. The second solution was slightly lower since AG is worth 10, not 5.

                            When I input into the program it said 105.

                            Edit: Ok so what I consider a possibility is that it did this:

                            AT-GTTAT-
                            ATCGTAGTA

                            20 + 20 - 5 + 20 + 20 + 5 + 10 + 20 - 5 = 105

                            Edit 2: Optimal score looks like 100 actually.

                            seq1 = 'AT-GTTA-T-'
                            seq2 = 'ATCGT-AGTA'

                            20 + 20 - 5 + 20 + 20 - 5 + 20 - 5 + 20 - 5 = 100
                            Last edited by DossarLX ODI; 02-9-2014, 10:39 AM.
                            Originally posted by hi19hi19
                            oh boy, it's STIFF, I'll stretch before I sit down at the computer so not I'm not as STIFF next time I step a file

                            Comment

                            • Reincarnate
                              x'); DROP TABLE FFR;--
                              • Nov 2010
                              • 6332

                              #15
                              Re: Python pattern matching algorithm tuple error

                              I think maybe I am not clear how you are supposed to be scoring stuff

                              is this right?

                              Code:
                              seq1: --ATGTTAT
                              seq2: ATCGTAGTA
                              score: -45
                              
                              seq1: -A-TGTTAT
                              seq2: ATCGTAGTA
                              score: -45
                              
                              seq1: -AT-GTTAT
                              seq2: ATCGTAGTA
                              score: -50
                              
                              seq1: -ATG-TTAT
                              seq2: ATCGTAGTA
                              score: -25
                              
                              seq1: -ATGT-TAT
                              seq2: ATCGTAGTA
                              score: 0
                              
                              seq1: -ATGTT-AT
                              seq2: ATCGTAGTA
                              score: 0
                              
                              seq1: -ATGTTA-T
                              seq2: ATCGTAGTA
                              score: -5
                              
                              seq1: -ATGTTAT-
                              seq2: ATCGTAGTA
                              score: 20
                              
                              seq1: A--TGTTAT
                              seq2: ATCGTAGTA
                              score: -20
                              
                              seq1: A-T-GTTAT
                              seq2: ATCGTAGTA
                              score: -25
                              
                              seq1: A-TG-TTAT
                              seq2: ATCGTAGTA
                              score: 0
                              
                              seq1: A-TGT-TAT
                              seq2: ATCGTAGTA
                              score: 25
                              
                              seq1: A-TGTT-AT
                              seq2: ATCGTAGTA
                              score: 25
                              
                              seq1: A-TGTTA-T
                              seq2: ATCGTAGTA
                              score: 20
                              
                              seq1: A-TGTTAT-
                              seq2: ATCGTAGTA
                              score: 45
                              
                              seq1: AT--GTTAT
                              seq2: ATCGTAGTA
                              score: 5
                              
                              seq1: AT-G-TTAT
                              seq2: ATCGTAGTA
                              score: 30
                              
                              seq1: AT-GT-TAT
                              seq2: ATCGTAGTA
                              score: 55
                              
                              seq1: AT-GTT-AT
                              seq2: ATCGTAGTA
                              score: 55
                              
                              seq1: AT-GTTA-T
                              seq2: ATCGTAGTA
                              score: 50
                              
                              seq1: AT-GTTAT-
                              seq2: ATCGTAGTA
                              score: 75
                              
                              seq1: ATG--TTAT
                              seq2: ATCGTAGTA
                              score: 5
                              
                              seq1: ATG-T-TAT
                              seq2: ATCGTAGTA
                              score: 30
                              
                              seq1: ATG-TT-AT
                              seq2: ATCGTAGTA
                              score: 30
                              
                              seq1: ATG-TTA-T
                              seq2: ATCGTAGTA
                              score: 25
                              
                              seq1: ATG-TTAT-
                              seq2: ATCGTAGTA
                              score: 50
                              
                              seq1: ATGT--TAT
                              seq2: ATCGTAGTA
                              score: 5
                              
                              seq1: ATGT-T-AT
                              seq2: ATCGTAGTA
                              score: 5
                              
                              seq1: ATGT-TA-T
                              seq2: ATCGTAGTA
                              score: 0
                              
                              seq1: ATGT-TAT-
                              seq2: ATCGTAGTA
                              score: 25
                              
                              seq1: ATGTT--AT
                              seq2: ATCGTAGTA
                              score: 30
                              
                              seq1: ATGTT-A-T
                              seq2: ATCGTAGTA
                              score: 25
                              
                              seq1: ATGTT-AT-
                              seq2: ATCGTAGTA
                              score: 50
                              
                              seq1: ATGTTA--T
                              seq2: ATCGTAGTA
                              score: 55
                              
                              seq1: ATGTTA-T-
                              seq2: ATCGTAGTA
                              score: 80
                              
                              seq1: ATGTTAT--
                              seq2: ATCGTAGTA
                              score: 55
                              max score 80?

                              score: +matrix[base1][base2] if base1 and base2 match (e.g. "AA"), -matrix[base1][base2] if they don't match (e.g. "GC"), and -5 if we have a gap (e.g. "T-")

                              Wait you can insert gaps in both sequences, or just the shorter of the two?

                              (I also don't want to waste your time if I am going off-track with all this -- feel free to ignore if you're on a tight deadline)
                              Last edited by Reincarnate; 02-9-2014, 10:45 AM.

                              Comment

                              Working...