Shortcut Navigation:

Loading and using Sankoff characters

In this tutorial we will load some Sankoff (matrix) type characters, and introduce the commands  build,  cd,  pwd,  read,  report,  select,  transform,  exit.

  1. Before we read data, we will make sure that POY is working in the directory that containing the data files. The working directory tells the application where to look for the files. In this way, whenever we tell POY to read a file, we don't need to specify where is it located in the file system, we can simply use its name. To Change Directory we use the command  cd, as in
                cd ("/Users/andres/Desktop/SampleData")
    You will have to modify the path to match your particular computer organization.
  2. Every command in POY is composed of a command name in lower case (in this case  cd), followed by its arguments in parentheses. In this example,  cd takes only one argument, which is a string enclosed in quotes: in POY, very string enclosed in quotes represents a file or a directory. In the last example, it is the directory SampleData.
  3. To verify that POY is in the correct directory, we can Print the Working Directory with the command
                pwd()
    The program should now print (in the output frame) the correct path to read the SampleData directory. If not, use the  cd command again and make sure that the program does not give you any error. Remember that you can always use the <TAB> to autocomplete file and directory names. This will help you to avoid many mistakes!.
  4. Once we are working in the desired directory, we can read input data using the command  read. POY supports ASN.1, Clustal, FASTA, GBSeq, Genbank, Hennig86, Newick, NewSeq, NEXUS, PHYLIP, POY3, TinySeq, and XML formats directly, and performs an automatic file format recognition. In this tutorial we need to read the sample files  35.san and  1.fasta. Type:
                read ("35.san", "1.fasta")
  5. read also accepts wild cards. For example, to read all the files with extension .fasta, it would be enough to use the command  read ("*.fasta") (do not run it now!).
  6. Another feature of this application is that input files add data to existing data. For example, we could have used two separate  read commands, and we would have had the same effect:
                read("35.san")            
                read("1.fasta")
  7. After the data has been read, the output frame contains information displaying what type of files where loaded, and what their contents are. It is advisable to verify if those files where properly parsed by checking the characters and terminals POY has in memory. To do this, we use the command
                report (data)
  8. Using the arrows, PageUp, and PageDown keys, navigate the contents of the output frame. You will see that two kinds of characters are currently in memory: Sankoff and Molecular. Sankoff characters where loaded from the  35.san file, and the one gene contained in the  1.fasta file is the molecular character.
  9. We will now run a small (and weak) analysis for just four minutes. So type the command:
                search (max_time:0:0:4)
    Now we have to wait four minutes for the program to run a search that includes building trees, swapping them with TBR, using a ratchet procedure to escape local optima, and tree fusing.
  10. Once the search has finished, take a moment to see what the interactive console displays: the best cost found, how many times it was found, and how many trees are currently held in memory. We are only interested in the best (shortest) trees found, so we can get rid of duplicated and suboptimal trees with:
                select ()
  11. We now look at the trees using the command:
                report (asciitrees)
    Notice that POY colors the branches so that you can follow them easily when scrolling up and down in the output frame.
  12. Seeing the trees on screen is somewhat useful, but it would be better if we could produce them in parenthetical notation to use in other programs like TreeView. We can do this using the command:
                report (trees)
    This will generate trees in newick format. To store them in a file, we simply write first the name of the output file that should contain them:
                report ("trees.txt", trees)
  13. How about publication quality trees directly from POY? The following command will produce a postscript file that can be read in Adobe Illustrator or any vectorial image edition program:
                report ("graphic_tree.ps", graphtrees)
  14. Excellent! we have finished now, time to close the application:
                exit ()

American Museum of Natural History

Central Park West at 79th Street
New York, NY 10024-5192
Phone: 212-769-5100

Open daily from 10 am - 5:45 pm
except on Thanksgiving and Christmas
Maps and Directions