Loading a Direct Optimization Sequence main content.

Loading a Direct Optimization Sequence

  1. Again change to the exercises directory (Section 2), and read the files  3.fasta and  5.fasta. Both contain sequences in FASTA format:
                read ("3.fasta", "5.fasta")
  2. The  cross_references argument of the  report command is quite useful in checking the completeness of data files relative to one another. In addition to giving us an idea of our data completeness (to a level we may not want to know!), producing a presence/absence table of terminals versus files:
                report (cross_references)
  3. We will now construct 10 Wagner trees with the command  build (the default), then select the best unique trees resulting from the Wagner builds and report the trees in parenthetical notation:
                build ()            
                select ()            
                report (trees:(total))
  4. Another useful way to view the data is to  report the implied alignment of the molecular data currently loaded. Implied alignments can be used to discover problems in your data, and unexpected results before running the complete analysis:
                report (ia)
  5. Implied alignments also show us where we have issues with variability in sequence lengths as is the case with t16 (5.fasta). However, note that sequence length is not problematic for t18 (5.fasta). In POY the characters N and X are symbols used to represent any nucleotide base (as the IUPAC code specifies), while a question mark '?' represents any base or a gap. However, for missing sequences, the implied alignment always show them as gap-only sequences. This way your files will remain readable by other programs.
  6. We are done for now with this tutorial. Close the interactive console:
                exit ()