Notung is a tool for comparing gene and species trees. Notung takes tree files as input and allows users to refine and manipulate them. The modified trees can be saved as output. The following subsections introduce basic input and output in Notung, general tree statistics, the graphical user interface, and the parameter values used in Notung’s tree refinement tasks.
To perform its functions, Notung requires a gene tree and a species tree. The species tree must contain all the species from which genes in the gene tree were sampled. The species tree may contain additional species as well - these will be ignored. A correspondence between the leaves of the species and gene trees is determined by comparing the leaf labels in the gene and species trees: each leaf label in the gene tree must include a substring that specifies the species from which the gene was sampled. Trees may be provided in Newick, NHX, or Notung format. See Appendix A - File Formats for further information.
Notung can operate on a non-binary gene tree or a non-binary species tree. However, its functions cannot be performed when both the gene tree and corresponding species tree are non-binary. For a complete summary of functions that Notung can perform, see Table 1.1.
NOTE: If you are interested in using Notung to analyze non-binary trees, see Chapter 4 - Non-Binary Trees for more a more detailed and theoretical discussion on non-binary trees.
The species tree must be rooted, with leaf nodes labeled with species names. Internal nodes may be given taxonomic labels (e.g., “tetrapoda”), but this is not required. If the internal nodes are not labeled, Notung will assign alphanumeric labels (such as n1, n2, etc.). If the species tree has edge weights or branch lengths, this information will be ignored. For more information on species names, see Appendix A.4 - Specifying the Species Associated with Each Gene.
The tasks that Notung performs are based on the assumption that the user has selected a species tree that is a reliable representation of the true species relationships. Using Notung with an incorrect species tree will give incorrect results. For more information on selecting an appropriate species tree, see Chapter B - Building a Species Tree.
In order to perform its reconcile, rearrange and resolve functions, Notung requires a rooted gene tree. If the gene tree is not rooted, Notung can be used to root the gene tree. See Chapter 6 - Rooting Mode. The leaf nodes in the gene tree must be labeled with a unique identifier specifying the gene, as well as the species from which the gene was sampled. See Appendix A.4 - Specifying the Species Associated with Each Gene for more information. The internal nodes may be labeled. If the internal nodes are not labeled, Notung will assign alphanumeric labels (e.g. n5, n6, etc.).
In Rearrangement mode, Notung requires that the tree have edge weights. These are used to identify edges that are weakly supported and may be rearranged. These weights may be bootstrap values, posterior probabilities, edge lengths, or any other weighting scheme selected by the user. Several different fields in the Newick and NHX formats may be used to store edge weights. See Appendix A - File Formats for a detailed explanation of these formats and how to indicate to Notung which field is being used for edge weights in a particular input tree.
Many tree reconstruction programs represent an unrooted binary tree as a mostly binary tree, with a single trifurcation at the root. Unless a root is selected for these trees (in Notung or another program), Notung will incorrectly treat them as rooted non-binary trees. If such a tree is actually an unrooted binary tree, failing to root it will affect Notung’s diagnostics. See Chapter 6 - Rooting Mode for more information on rooting gene trees.
Notung’s graphical interface facilitates tree visualization and manipulation, enabling the user to inspect duplicated and lost nodes in a tree, view orthologs and paralogs, visualize alternate optimal trees, and color annotate genes for visual differentiation or presentation.
To run Notung:
Using the graphical user interface on Windows or Mac OS X:
Using the graphical user interface on Linux:
java -jar Notung-2.6.jar
In addition, Notung can perform many of its operations from the command line without launching the GUI. See Chapter 12 - Command Line Options and Batch Processing for a description of the command line interface.)
When Notung is first launched, the program window will be blank. Figure 3.1a and Figure 3.1b show Notung’s graphical interface once a gene tree and species tree have been opened. Notung’s graphical user interface has the following components:
Tree panel: The tree that is currently selected appears in the tree panel. Trees are rendered with the root at left and leaf nodes at right. Nodes are denoted by small blue squares in the tree. Edge weights and leaf node names appear in the tree by default. Notung fits the whole tree in the tree panel by default. The size of the tree and tree labels can be modified using the Zoom and Fonts menus, respectively. See Chapter 11 - Changing the Appearance of the Tree Panel.
Click on image to see larger version
Although multiple trees can be open in Notung at once, Notung operates on only one tree at a time. To facilitate working with many trees, Notung marks each open tree with a tab at the top of the tree panel. Clicking on a tab selects the corresponding tree. Tabs are labeled with the file name and special icons to identify them as a gene or species tree - a DNA helix for gene trees, and a cartoon of the evolution of humankind for species trees (see Figure 3.2).
Click on image to see larger version
Task panel: Operations on the tree are performed in the task panel (highlighted in blue in Figure 3.1). Tabs at the top of the task panel correspond to the various tasks that Notung can perform. Clicking on a tab puts Notung in the corresponding task mode, revealing the buttons that control tasks specific to that mode. If a gene tree is selected, six modes are available: History, Reconciliation, Rooting, Rearrange, Resolve, and Annotations. Only the History and Annotation modes can be used when a species tree is selected.
Parameter values: When a gene tree is selected, a box displaying the Edge Weight Threshold and Costs/Weights for Duplications, Conditional Duplications, and Losses appears in the bottom-right corner of the program window. These values can be changed by clicking the “Edit Values” button directly below them. Note that when a species tree is selected, the program window will not display the parameter values.
Notung can read and save tree files in Newick, NHX, and Notung file formats. NHX and Notung file formats are extensions of Newick; See Appendix A - File Formats for details. Notung can also save the image in the tree panel as a Portable Network Graphic (PNG) file.
To open trees:
NOTE: Notung cannot distinguish gene trees from species trees automatically. If a gene tree is opened as a species tree, or a species tree is opened as a gene tree, reconciliation will produce incorrect results.
To save trees:
NOTE: The default format for saving trees is the Notung File Format. If you have modified the tree in Notung and wish to reopen this tree in Notung, it may be best to save the tree in Notung format. If you wish to reopen the modified tree in another tree program, Newick format may be a better option.
To view text formatted trees in a dialog box:
To copy this information, click the “Copy to clipboard” button. This text can then be pasted in any text editor.
NOTE: Selecting “About Tree Formats” from the drop-down menu will provide a dialog box containing a summary on the different tree formats. See Appendix A - File Formats for more information.
To save the current view of a tree as a PNG file:
NOTE: This option saves only the image currently visible in the tree panel. If you have zoomed in on a tree, the PNG will save only the section in view.
To save an image of the whole tree as a PNG file:
NOTE: This option saves a “pretty print” version of the entire tree. Currently, display options set in Notung will not affect the output of this tree. More options for saving tree images are available via the command line, and are discussed in Section 12.4
To print an image of a tree:
NOTE: For most printers the default page layout will be portrait; however, the landscape layout is usually preferred for printing trees from Notung. You may wish to change your printer settings before printing.
NOTE: Printing a view of the tree that shows exactly what you want may be difficult as it may be necessary to change both the printer’s settings (i.e. page layout, margins, etc.) and the appearance of the tree so that the desired print area fits within the red rectangle. See Chapter 11.2 - Zoom for more information on zooming in and out of the tree. It may be easier to obtain the desired view by first saving the tree as a PNG image, and then editing and printing that image using another program.
To reload a tree:
Note: If the tree has been modified, a dialog box will be displayed. The dialog box will offer you one of the three following options : “Save tree”; “Reload tree without saving”; “Cancel reload”.
To export color annotations to a file:
NOTE: Exported annotations can be imported into other trees, or loaded on the command line using the option --annotationfile. For more information about color annotations, see Chapter 10 - Annotations.
To import color annotations from a file:
NOTE: Annotations can be imported from previously exported annotations files. Additionally, selecting a Notung format tree which contains annotations will import annotations from that tree. Annotations can also be loaded via the command line using the option --annotationfile. For more information about color annotations, see Chapter 10 - Annotations.
To close trees:
To quit Notung:
Notung compiles information on tree characteristics, such as height, number of leaves, number of nodes, etc. Notung reports this information in the general tree statistics box under the “About This Tree” menu. The properties examined depend on whether the given tree is a gene tree or a species tree, and whether the gene tree has been reconciled or not. A description of the possible information displayed is described below.
Figure 3.3 shows an example of the tree statistics provided for a species tree.
Click on image to see larger version
Under the heading Reconciliation Information:
Statistics about the topology of the tree (number of leaf nodes, number of internal nodes, etc.) are reported twice: once for the gene tree without losses, and once for the tree with losses.
In addition, the species tree used for reconciliation will be reported, as well as simple statistics for the pruned species tree. Figure 3.4 shows an example of the tree statistics displayed for a reconciled gene tree.
Click on image to see larger version
To get general statistics for a tree:
A window will appear containing information on the tree’s characteristics, as described above. To copy this information into your favorite text editor, click the “Copy to Clipboard” button, and paste in the text editor.
NOTE: Information on duplication bounds and losses can also be gathered through the About This Tree Menu with Duplication Bounds and Loss Counts. For more information on duplication bounds, see Chapter 12.2 - Duplication Bounds and Loss Information.
The parameter values used in Notung - the Edge Weight Threshold, Duplication Cost, Conditional Duplication Cost, and Loss Cost - can be specified by the user. These values influence the results produced by Notung’s tasks.
Notung uses a Duplication/Loss Score to score reconciled trees and evaluate alternate hypotheses. The D/L Score is defined to be: cL L + cD D + cC C where L is the number of losses, D is the number of duplications and C is the number of conditional duplications implied by the current reconciliation. The loss cost, cL, duplication cost, cD, and conditional duplication cost, cC reflect the relative importance of losses, duplications, and conditional duplications in scoring the tree. The cost of conditional duplications is only relevant when reconciling a gene tree with a non-binary species tree (see Chapter 4 - Non-Binary Trees). The default values are 1.0 for losses, 1.5 for duplications, and no cost for conditional duplications, but these values can be changed by the user. Notung displays the D/L Score of a reconciled tree, as well as the number of losses, duplications, and conditional duplications, in the bottom-left corner of the program window (see Figure 3.5).
Click on image to see larger version
The Edge Weight Threshold is a parameter used to define the set of strong edges in the gene tree. In Rearrange mode, edges weighted below the Edge Weight Threshold are considered weak and may be rearranged (for more information about rearrangement, see Chapter 7 - Rearrange Mode). Edges with no weight specified are assigned an edge weight of zero, and are considered to be weak. The default threshold is 90% of the highest edge weight in the gene tree file. If no edge weights are found, the threshold is set to one. The user may change this cutoff if a different threshold is desired for the current data set.
NOTE: For some sources of edge weights, such as bootstrap values, setting the threshold to a percentage of the highest edge weight works well. For other sources, such as branch lengths, where a single very large value could cause all other edges in the tree to be weak, it may be better to set the threshold with a fixed, minimum value.
To change the parameter values:
NOTE: This will change the value settings only for the gene tree that is currently selected. Also, each history state saves the parameter values used at that state; when moving through the history, parameter values may change depending on the state and tree viewed. For more information on history states, see Chapter 9 - History.