Shortcut Navigation:

Modifying your characters with transform

  1. Your input characters can be modified in many ways, for example to use a particular cost or weighting scheme, as well as to modify the type of character being analyzed. To begin this series of exercises, let's start with our typical data set:
                read ("course.fasta")
  2. Now we will report the cost matrix being used in the loaded characters:
                report (data)
  3. You can see that by default POY will give cost 2 to each indel, and 1 to every substitution (that's what the  tcm:(1,2) means). We can modify all the characters with the command  transform, as follows:
                transform (tcm:(1,1))
    This will change will the characters for which a transformation cost matrix for an alignment is applicable.  tcm:(1,1) will assign cost 1 to every substitution and 1 to every indel. Let's verify its effect:
                report (data)
  4. We can also assign a particular cost to opening a gap block. Not surprisingly the argument is  gap_opening:
                transform (tcm:(3,1), gap_opening:3)            
                report (data)
    Can you see the effect in the  data report? It is time now to see the effect of the different parameters in the implied alignment.
  5. First read again your input data and build a tree:
                wipe ()            
                read ("course.fasta")            
                build (1)
  6. Now we write down the cost of the tree, and output the implied alignment in a file:
                report (treestats, "1_2_ia.txt", implied_alignments)
  7. Next we modify the cost regime to substitutions 1, indels 1, and report the new cost as well as the implied alignment:
                transform (tcm:(1,1))            
                report (treestats, "1_1_ia.txt", implied_alignments)
  8. Finally we will do the same operations using a cost of 3 for substitutions, 1 for an individual gap, and 3 for gap opening:
                transform (tcm:(3,1), gap_opening:3)            
                report (treestats, "3_1_3_ia.txt", implied_alignments)            
                wipe ()
  9. Compare the costs and the implied alignments. What do you expect? what do you observe? Are the transformation cost matrices metric? Are your characters metric?
  10. You can fix a particular scheme of indels using the command  transform (static_approx), which stands for ``static approximation''. A static approximation fixes a particular implied alignment for the best tree in memory, and creates a set of characters that match that particular alignment and resembles as much as possible the cost regime of choice. Here is example of this:
                read ("course.fasta")            
                build (1)            
                transform (tcm:(1,1))            
                report (data)
  11. We see that there are 8 molecular characters currently in memory. Before we continue, as we will play around with this initial set of characters and tree, we should store this initial state of the program:
                store ("initial")
  12. We can now check the implied alignment:
                report (ia)
    Yes,  ia and  implied_alignment are equivalent.
  13. This alignment can now be fixed to use the resulting matrix as the characters:
                transform (static_approx)            
                report (data)
  14. Observe that after the transform there are no molecular characters left. Instead, there are a number of non-additive characters.
  15. What happens if we have the default cost regime? Let's roll back to the characters stored in ``initial'' and give this a try:
                use ("initial")            
                transform (tcm:(1,2))            
                transform (static_approx)            
                report (data)
    What can you observe?
  16. Finally, let's check how the static approximation behaves if you have a gap opening parameter:
                use ("initial")            
                transform (tcm:(3,1), gap_opening:3)            
                transform (static_approx)            
                report (data)
    What is the main difference the you observe? How are indel blocks being treated?
  17. Now we will learn how to  transform specific characters. Suppose that we would like to assign  tcm:(2,1) to the first fragment in course.fasta. We first check the name of the fragment:
                use ("initial")            
                report (data)
    You can see that the name of the first fragment is  course.fasta:0 (the precise name may vary slightly in your computer). We can specify in the transform command which characters should be transformed in which way:
                transform ((names:("couse.fasta:0"), tcm:(2,1)))
  18. try to visually match the parenthesis and understand their effect. Here is another example, aimed at up-weighting static homology characters only:
                transform ((static, weight:2))
    In this case instead of specifying characters by name, we do it by type. This command probably makes the syntax easier to understand. If you had troubles with the first one, try to understand the  weight example and go back to the  tcm:(2,1) case again.
  19. To finish this section, we leave you a task: fix the alignment of the third and fourth fragments of the file course.fasta using cost 1 for substitutions and cost 1 for indels. Every other character should have the default cost regime of substitutions 1 and indels 2.

American Museum of Natural History

Central Park West at 79th Street
New York, NY 10024-5192
Phone: 212-769-5100

Open daily from 10 am - 5:45 pm
except on Thanksgiving and Christmas
Maps and Directions