The following exercises will help familiarize you with the basic tasks Notung can perform on a gene tree. The tree files used in these exercises are included in the Notung distribution, in the sampleTrees folder. If the program window becomes too cluttered, you may close trees that are no longer being used by selecting the tree and clicking on “File → Close.”
In this exercise, you will reconcile the gene tree genetree_NOTCH with the species tree speciestree_mega. You will also generate a pruned species tree, and use Notung to determine the upper and lower bounds on the time when a duplication occurred.
Open the tree files
The gene tree is located in the sampleTrees folder, which is included in the downloaded zip file. Once loaded, the gene tree is displayed in the tree panel.
The species tree is located in the sampleTrees folder. Once loaded, the species tree appears in the tree panel. Because it is the most recent tree opened, it is now selected.
Note that the options that Notung offers differ depending on whether a species tree or a gene tree is selected. For example, because speciestree_mega is now selected, the box showing parameter values in the lower right corner has disappeared, and the task panel includes only two task modes, History and Annotation.
Reconcile the gene tree with the species tree
The Reconciliation task panel opens below. From here you can reconcile a gene tree with a species tree, display a pruned species tree, show duplication bounds, and hide duplication marks and loss nodes.
The Reconciliation dialog appears. In this dialog box, Notung asks you to specify which species tree to use for the reconciliation and what naming convention is used in the gene tree to specify the species associated with each gene.
Currently, the only selection available is speciestree_mega. However, if you have more than one species tree open in Notung, you must specify here which species tree to use.
This section in the dialog box asks you to specify the naming convention used in the gene tree to indicate from which species the genes originated. Notung tries to guess the naming convention, but it does not always guess correctly. Notung should have guessed correctly in this case. In general, remember to check the leaf node names in your gene tree during this step to make sure that they agree with the naming convention you choose.
For more details about the species label naming conventions, see Appendix A.4 - Specifying the Species Associated with Each Gene.
Click on image to see larger versionThe reconciled gene tree now appears in the tree panel. The D/L Score of the reconciled tree, displayed in the bottom-left corner of the program window, is 20.5 - five duplications and thirteen losses. Five red D’s in the tree mark the inferred duplications. At the right end of the tree (at the leaves), thirteen loss nodes appear in light gray type.
Display the pruned species tree
The leaves of speciestree_mega include more species than are relevant to genetree_NOTCH. After reconciliation, you can view the species tree pruned of all species that are not represented by genes in the gene tree.
A dialog box appears asking you to give a title for the pruned species tree. The default title is “Pruned Species Tree.”
The pruned species tree appears in the tree panel. It contains only seven leaf nodes, all of which are species represented in the reconciled gene tree. The pruned species tree has a tab above the tree panel, labeled “Mega_Pruned.” You can now select and use this tree as you would any other species tree.
Click on image to see larger version
Check the duplication bounds
The duplication bounds provide information regarding when gene duplications occurred in the course of species evolution.
Node name labels appear in red type next to each internal node. You can now identify each duplication by name. If internal node names are not provided in the gene tree file, Notung will assign the node an alphanumeric name (e.g. n132).
Node name labels appear in red type next to each internal node.
A new window appears. Inferred duplications are listed in the left column, expressed as node names in the gene tree. The lower and upper bounds are listed in the middle and right columns, respectively, and are expressed as internal node names in the species tree. Information on losses is displayed below duplication bounds. The left column lists the species nodes in the species tree. The right column provides the number of losses that occurred in each species.
The node name may vary, depending on how many internal nodes Notung has counted in your current session.
With Mega_Pruned selected, you can see internal nodes representing euteleostomi and coelom. The duplication occurred somewhere on the edge between those nodes.
The gene tree genetree_ANK is unrooted. In this exercise, you will select a root based on duplication loss parsimony.
Open the tree files
The gene tree is located in the sampleTrees folder.
Since this tree is unrooted, it has a trifurcation (a node with 3 children) at the top of the tree, but is otherwise binary.
Run the Rooting Analysis
The Rooting task panel is displayed. Notung is now in Rooting mode.
A diagnostic message appears warning you that this tree contains a trifurcation at its root and may be unrooted. Click “OK.”
You will be asked to reconcile the tree. Select speciestree_mega and “Prefix,” click “Reconcile”. The edge at the top of the tree panel, leading to caeel*unc-44, is colored red. This means it has the minimum root score.
Each edge is labeled with its root score. Notice that the red edge leading to caeel*unc-44 has a root score of 4.0. The next lowest score is 8.5.
Select a root
The tree is now rooted on the edge leading to the caeel*unc-44 gene. The D/L Score of the tree is now 4.0, with two duplications and one loss.
Click on image to see larger version
In this exercise, you will reconcile the gene tree genetree_SMALL with the species tree speciestree_small and use Notung’s rearrangement tasks to investigate alternate gene trees with minimum D/L Score. Both input trees are located in the sampleTrees folder.
Reconcile the gene tree with the species tree
This is an artificial tree made up for this exercise. The edge weights in this tree represent bootstrap values. Note that two internal edges have a bootstrap value of 100, one has a bootstrap value of 73, and several have not been assigned a weight. (Note that edges adjacent to leaves are usually not assigned bootstrap values since those edges are present in all trees.) Notung sets the default edge weight threshold to 90% of the maximum edge weight in the tree. Since the maximum edge weight in this tree is 100, the edge weight threshold is set to 90.0.
The reconciled tree appears in the tree panel. Note that it has a D/L Score of 10.0, with four duplications and four losses.
Rearrange the reconciled tree
The Rearrange task panel is now displayed.
Several edges in the reconciled tree are highlighted in yellow. These are edges with weights below the Edge Weight Threshold and are considered “weak.” Weak edges may be rearranged to reduce the number of duplications and losses in the tree. Edges with weights above the threshold will not be rearranged.
Note that in addition to the edge with weight 73.0, the internal edges with no edge weight are also highlighted in yellow. Notung assumes that any internal edge that is not explicitly assigned a weight is considered weak.
Click on image to see larger version
The rearranged tree appears in the tree panel. It now has a D/L Score of 4.0, with two duplications and only one loss.
Click on image to see larger version
Change the parameter values and rearrange again
In the previous steps, we rearranged the tree using the default parameter values (cD=1.5 and cL=1.0). For the default values, there is only one minimum cost tree. We now explore what happens when we rearrange the tree when duplications and losses are weighted equally.
A message appears to warn us that although we have changed the parameter values, this has had no effect on the tree. We must rearrange the tree again to see the effect of rearrangement with this choice of parameter values. Click “OK.”
Duplications and losses are now weighted equally in Notung’s reconciliation algorithm.
Click on image to see larger version
View a different alternate event history
With the new parameter values, there is more than one alternate gene tree with minimal D/L Score. You are currently viewing history 0.
This opens a list of available alternate event histories. You should see history 0 and history 1.
A different tree appears. This tree also has a D/L Score of 3.0, but has two duplications and one loss instead of three duplications and no losses.
Swap nodes in the rearranged tree
Note that this tree groups gB_human with gA_mouse and gA_human with gB_mouse. However, the tree that groups gA_human with gA_mouse and gB_human with gB_mouse has the same score.
Nodes that can be interchanged without changing the D/L Score are marked with enlarged light blue boxes.
To select the node, you must click on the enlarged blue box. When you are able to click and select a node, a blue triangle will mark the node(s). Once selected, the node is marked with a light blue triangle. Each node it can be swapped with is marked with a pink triangle. In this case, there is just one: gA_human.
The nodes gB_human and gA_human are swapped. Once they have been swapped, they are temporarily highlighted with yellow triangles, so that you can see the results of the most recent action. Note that the gA genes are now grouped together, and the gB genes are together in the same subtree, along with the g_gorilla gene.
Click on image to see larger version
Try performing additional swaps to see how many alternate, minimum cost trees you can find.
In this exercise, you will perform Notung’s main tasks on the gene tree exercise4_genetree with the non-binary species tree exercise4_speciestree. You will reconcile and root the gene tree, and use Notung to determine the upper and lower bounds on the time when a duplication occurred.
Open the tree files
This is an artificial tree made up for this exercise.
As you will notice, this is a non-binary species tree with a polytomy representing the common ancestor of the marsupials.
Reconcile the gene tree with the species tree
The reconciled tree appears in the tree panel. Note that it has a D/L Score of 8.0, with two duplications, one conditional duplication, and five losses. Two red D’s in the tree mark the required duplications, while the one pink cD marks the conditional duplication. At the leaves of the tree, five loss nodes appear in light gray type.
Click on image to see larger version
Check the duplication bounds
The duplication bounds provide information regarding when gene duplications occurred in the course of species evolution.
In the new window, required duplications are described first. Conditional duplications are described below the required duplications. For both types of duplications, the duplication nodes are listed in the left column, expressed as node names in the gene tree. The lower and upper bounds are listed in the middle and right columns, respectively, and are expressed as internal node names in the species tree. Information on losses is provided below the conditional duplication bounds.
Run the Rooting Analysis
The edge leading to genes from placental mammals (cow, mouse, and human) is colored red. This means it has the lowest root score.
Notice that the red edge has a root score of 7.0. The next lowest root score is 8.0.
The name of the species to which the node is mapped appears in italics next to each internal node.
The tree is rooted on the edge which splits the tree between placental mammals (Eutheria) and marsupials (Metatheria). The D/L Score of the tree is now 7.0, with two duplications, one conditional duplication, and four losses.
Do not close these trees yet - they will be used in upcoming steps.
Click on image to see larger version
Reconcile the Tree using the Combined Polytomy Losses algorithm
This step uses the command line interface and can be skipped, if desired. You will use the command line interface to reconcile the gene tree exercise4_genetree with the species tree exercise4_speciestree using the combined losses algorithm.
For instructions on using Notung from the command line, see Chapter 12.2 - Running Notung from the command line.
java -jar Notung-2.6.jar
sampleTrees/exercise4_genetree
-s
sampleTrees/exercise4_speciestree --reconcile
--exact-losses
--outputdir sampleTrees
--report-heuristic-losses
Notung will print information to the screen as it reconciles the tree for both combined and explicit losses. Notice that the first unrooted gene tree has a D/L Score of 8.0, with two duplications, one conditional duplication and five heuristic losses as compared to the second unrooted gene tree, which has a D/L Score of 7.0, with two duplications, one conditional duplication, and four exact losses. The tree, reconciled and with exact losses, will be saved to the sampleTrees folder (as specified by --outputdir) as exercise4_genetree.reconciled.
Root the tree reconciled with the Combined Polytomy Losses algorithm
In the previous step, you reconciled the gene tree while using the combined polytomy losses algorithm. In this step you are will find the optimal root for this gene tree. If you skipped the previous step, you will need to use the gene tree exercise4_genetree-exactLosses.ntg instead of exercise4_genetree.reconciled.
If you skipped the last step, use exercise4_genetree-exactLosses.ntg instead.
A warning will appear stating that the tree was reconciled using --exact-losses. Click the “OK” button.
The tree is rooted on the edge leading to placental mammals. The D/L Score of the tree is now 6.0, with two duplications, one conditional duplication, and three losses.
Click on image to see larger version
Compare this tree with the previously rooted gene tree (exercise4_genetree). Can you find the difference between the trees? In exercise4_genetree, the loss node, tasmanian_devil*LOST, above the subtree containing genes gene3 and gene2, has been moved below the duplication node and combined with opossum*LOST and bandicoot*LOST in the gene3 and gene2 subtrees, respectively, in exercise4_genetree.reconciled. This resulted in a reduction of the total number of losses.
View polytomy losses without species names included
There are two display options for polytomy losses. In this step, you will see the other way to display these losses.
Click on image to see larger version
In this exercise, you will perform Notung’s main tasks on the non-binary gene tree exercise5_genetree with the species tree exercise5_speciestree. You will reconcile, root, resolve, and rearrange the gene tree, and use Notung to determine some general statistics about the trees.
Open the tree files
This is an artificial tree made up for this exercise. Notice that this gene tree is non-binary and contains multiple polytomies.
The polytomies in the gene tree are circled and highlighted in cyan.
Click on image to see larger version
Reconcile the gene tree with the species tree
The reconciled tree appears in the tree panel. Note that it has a D/L Score of 20.0, with ten duplications and five losses. Also note that some of the polytomies have more than one duplication associated with the node (ex: the polytomy with eight children has two duplications).
Click on image to see larger version
Get general tree statistics for the gene tree
In this step you will gather some general statistics about the reconciled gene tree and the species tree.
The General Tree Statistics window appears. In this window is information on both the gene tree, the reconciled gene tree, and the species tree. You may have to scroll down to view all the information.
The General Tree Statistics Window should look like this.
Click on image to see larger version
For more information on the data in the General Tree Statistics window, see Chapter 3.4 - General Tree Statistics.
Resolve the polytomies in the gene tree
In this step, you will resolve all the polytomies in the gene tree, thus creating a binary gene tree.
The Resolve task panel opens below.
The polytomies in the gene tree are circled and highlighted in cyan.
The resolved tree appears in the tree panel. Edges associated with the resolved polytomies are now colored cyan. This is the same tree as before, only now the polytomies have been resolved. The number of duplications and losses are identical to the reconciled tree, and even the duplication bounds are the same.
Click on image to see larger version
Change the parameter values and view alternate event histories
In the previous steps, we reconciled and resolved the tree using the default parameter values (CD=1.5 and CL=1.0). For the default values, there is only one minimum cost tree. We now explore what happens when we reconcile the tree when duplications and losses are weighted equally.
We must go back in the history before we change parameter values, as the tree has already been resolved and the change in values might effect the current resolution of the tree.
The tree panel shows the state of the tree before the polytomies were resolved.
Duplications and losses are now weighted equally, and the gene tree is automatically rereconciled with the new parameter values.
The reconciled tree appears in the tree panel. There is now more than one alternate gene tree with the minimal D/L Score. You are currently viewing history 0.
A different tree appears. This tree has a D/L Score of 15.0, with ten duplications and five losses. This tree has the same duplications and losses as the tree reconciled with a duplication cost of 1.5 and a loss cost of 1.0 (see Figure E.14).
A different tree appears. This tree also has a D/L Score of 15.0, but has eleven duplications and four losses rather than the ten duplications and five losses in history 1. The large polytomy with seven children now has three duplications and one loss, whereas in history 1 it had two duplications and two losses.
Click on image to see larger version
Run the Rooting Analysis
Many edges and one polytomy are colored red, which indicates that all of these components of the tree have the lowest root score.
Notice that the large polytomy is circled in red. Placing a root at a polytomy indicates that at least one edge in the binary resolution of the polytomy has the lowest root score.
Click on image to see larger version
Each edge and polytomy is labeled with its root score.
The tree is rooted on the polytomy and the D/L Score of the tree is still 15.0, with eleven duplications and four losses.
Click on image to see larger version
Resolve the polytomies in the gene tree
In this step, you will resolve all the polytomies in the gene tree, thus creating a binary gene tree.
The Resolve task panel opens below.
The polytomies in the gene tree are circled and highlighted in cyan.
The resolved tree appears in the tree panel. Edges associated with the resolved polytomies are now colored cyan.
Click on image to see larger version
View a different alternate event history
With these parameter values, there is more than one alternate gene tree with minimal D/L Score. You are currently viewing history 0.
This displays a list of available alternate event histories. You should see history 0 and history 1.
A different tree appears. This tree also has a D/L Score of 15.0, but has ten duplications and five losses instead of eleven duplications and four losses.
Note that these alternate histories correspond to the same alternate histories that were presented after reconciliation.
Swap nodes in the resolved tree
Note that this tree groups human-gene-BB1 with mac-gene-BB2 and human-gene-BB2 with mac-geneBB1. However, the tree that groups human-gene-BB1 with mac-geneBB1 and human-gene-BB2 with mac-gene-BB2 has the same score.
Nodes that can be interchanged without changing the D/L Score or history implied by the polytomies are marked with enlarged light blue boxes.
The node is now marked with a light blue triangle. Each node it can be swapped with is marked with a pink triangle. In this case, there is just one: the node leading to mac-gene-BB2.
The nodes mac-gene-BB1 and mac-gene-BB2 are swapped. Once they have been swapped, they are temporarily highlighted with yellow triangles, so that you can see the results of the most recent action. Note that the BB1 genes are now grouped together, and the BB2 genes are together in the same subtree.
Click on image to see larger version
Annotate the Gene Tree
This step will introduce you to Notung’s annotations capabilities.
The Annotations task panel is displayed.
A box will appear to edit the new annotation.
This will automatically annotate all the leaves that contain the string “-A” with the color you selected.
This will automatically annotate all the leaves that contain the string “-BA” with the color you selected.
This will automatically annotate all the leaves that contain the string “BB1” with the color you selected.
This will automatically annotate all the leaves that contain the string “BB2” with the color you selected.
This option lets you select the nodes to add to the annotation without searching for a substring.
Notice that these leaves were previously in the color selected in step 3. The leaves are a new color now because the newer annotation takes precedence.
Click on image to see larger version
Rearrange the resolved tree
In this step, you will rearrange the gene tree to obtain the minimal D/L Score. In this exercise, you have resolved the polytomies in the gene tree before rearranging the weak areas of the tree. However, it is possible to do both task at the same time while in the rearrangement mode. Both Resolve and Rearrangement are available because these two functions have different purposes. If you want to obtain a hypothesis of the binary gene tree, but wish to retain all the information in the gene tree, use the Resolve task mode. However, if you wish to consider edges with an edge weight below a certain value as uninformative, use the Rearrangement task mode.
Several edges in the reconciled tree are highlighted in yellow. These are edges with weights below the Edge Weight Threshold and are considered “weak.“” Weak edges may be rearranged to reduce the number of duplications and losses in the tree. Edges with weights above the threshold will not be rearranged.
Click on image to see larger version
The rearranged tree appears in the tree panel. It has a D/L Score of 15.0, with twelve duplications and only three losses. Note that the score did not change; the rearranged tree is not necessarily “better” than the original tree.
Click on image to see larger version
View a different alternate event history
You are currently viewing history 0.
This opens a list of available alternate event histories. You should see history 0, history 1, and history 2.
Nodes that can be interchanged without changing the D/L Score are marked with enlarged light blue boxes. Try performing additional swaps to see how many alternate, minimum cost trees you can find.
HINT 1: Select the history with ten duplications and five losses.
HINT 2: Swap the subtree of BA1 and BA2 genes in “pan” with the LOST “pan” gene in the BA subtree.
HINT 3: Swap the subtree of BA4, BA5, and BA6 in human with the node for BA3 in human.