University of Arizona, Department of Computer Science

CSc 120: Phylogenetic Trees: Examples

Input file information

The examples below are based on input files of various different sizes. These data files were obtained from the following sources:

I. Small

These files have a small number of "organisms", each with a "genome sequence" of small length (around 5–10).

II. Medium

Each of these files contains around 8–12 organisms, each with a small portion of its genome (length ∼2000).

III. Large

These files are larger: either because they have many organisms with partial genomes or because they contain the complete genome sequences of organisms.

Examples

Note that the pretty-printed outputs shown below are not part of the assignment specification: they are shown simply to illustrate the structure of the trees being computed. The code you submit should not print out the pretty-printed trees.
  1. Input File = tiny01.fasta; N = 3

    Output:

    ((name1, name2), name3)
    Pretty-printed tree:
              +--- name1
         +---|
    +---|     +--- name2
        |
         +--- name3

  2. Input File = tiny02.fasta; N = 3

    Output:

    (((name2, name5), name3), (name1, name4))
    Pretty-printed tree:
                   +--- name2
              +---|
         +---|     +--- name5
        |    |
    +---|     +--- name3
        |
        |     +--- name1
         +---|
              +--- name4

  3. Input File = Influenza-C-small01.fasta; N = 6

    Output:

    ((((((gb:AB126191:22-2346, gb:AB126193:22-2151), gb:AB126192:18-2282), gb:AB126194:22-1989), gb:AB126195:30-1727), (gb:AB126196:26-982, gb:AB126196:731-1150)), gb:AB283001:27-888)
    Pretty-printed tree:
                                  +--- gb:AB126191:22-2346
                             +---|
                        +---|     +--- gb:AB126193:22-2151
                   +---|    |
              +---|    |     +--- gb:AB126192:18-2282
             |    |    |
         +---|    |     +--- gb:AB126194:22-1989
    +---|    |    |
        |    |     +--- gb:AB126195:30-1727
        |    |
        |    |     +--- gb:AB126196:26-982
        |     +---|
        |          +--- gb:AB126196:731-1150
        |
         +--- gb:AB283001:27-888

  4. Input File = Influenza-C-small04.fasta; N = 6

    Output:

    (((((gb:KM504280:22-1989, gb:M11638:1-2015), gb:KM504279:22-2151), gb:KM504281:1-1807), gb:KM504282:1-1180), (((gb:LC124979:1-862, gb:M15090:1-894), gb:LC124979:1-741), gb:KM504283:1-935))
    Pretty-printed tree:
                             +--- gb:KM504280:22-1989
                        +---|
                   +---|     +--- gb:M11638:1-2015
              +---|    |
         +---|    |     +--- gb:KM504279:22-2151
        |    |    |
        |    |     +--- gb:KM504281:1-1807
        |    |
    +---|     +--- gb:KM504282:1-1180
        |
        |               +--- gb:LC124979:1-862
        |          +---|
        |     +---|     +--- gb:M15090:1-894
         +---|    |
             |     +--- gb:LC124979:1-741
             |
              +--- gb:KM504283:1-935

  5. Input File = Ebola.fasta; N = 6

    Output:

    (((((FJ621583.1, FJ621585.1), FJ621584.1), FJ968794.1), ((KC545395.1, KC545396.1), NC_014373.1)), ((KP271018.1, KP271020.1), (KP342330.1, KR824525.1)))
    Pretty-printed tree:
                             +--- FJ621583.1
                        +---|
                   +---|     +--- FJ621585.1
              +---|    |
             |    |     +--- FJ621584.1
             |    |
         +---|     +--- FJ968794.1
        |    |
        |    |          +--- KC545395.1
        |    |     +---|
    +---|     +---|     +--- KC545396.1
        |         |
        |          +--- NC_014373.1
        |
        |          +--- KP271018.1
        |     +---|
        |    |     +--- KP271020.1
         +---|
             |     +--- KP342330.1
              +---|
                   +--- KR824525.1