In Reconciliation mode, Notung compares a gene tree with a species tree to infer gene duplications and losses. Notung will display a reconciled tree in the tree panel with the inferred duplications and losses indicated on the tree. The D/L Score of a reconciled tree will be displayed in the lower left corner of the screen (see Figure 5.1(b)).
Click on image to see larger version
Notung requires that gene and species trees have compatible labels, so that the species from which each gene originated can be identified. An error message will appear if one or more gene labels cannot be matched to a label in the species tree. See Appendix A.4 - Specifying the Species Associated with Each Gene for further information on gene labels.
All species represented in the gene tree must be present in the species tree, but the species tree may include additional species. During reconciliation, Notung automatically identifies the species in the species tree that are not present in the gene tree, and generates a pruned species tree with those species removed. The pruned species tree is stored in Notung’s internal data structures. This tree is not shown or saved unless the user does so explicitly.
Once a gene tree has been reconciled, Notung can infer orthologous and paralogous relationships, described in Section 5.3. Notung can also determine lower and upper bounds on the time of each duplication and conditional duplication, where bounds are represented in terms of internal nodes in the species tree; i.e., relative to speciation events. The upper bound on the time of duplication is the most recent species in which the duplication was not present. The lower bound is the oldest species in which the duplication must have been present. This information, along with statistics on losses, can be viewed in a pop-up window by selecting “Duplication Bounds and Loss Counts” from the “About This Tree” menu. Duplications and bounds in this window are identified by internal node names. For losses, each node in the species tree is listed, followed by the number of losses associated with that taxon.
Notung can reconcile binary gene trees with non-binary species trees, as well as non-binary gene trees with binary species trees. The differences between these functions and traditional reconciliation of binary gene trees with binary species trees are summarized briefly here. For a more detailed discussion of reconciliation with non-binary trees, see Chapter 4 - Non-Binary Trees. Note that orthologs and paralogs can only be inferred on binary gene trees reconciled with binary species trees.
Reconciling a binary gene tree with a non-binary species tree results in a binary gene tree with duplications and losses added. Notung distinguishes between cases in which disagreement can only be explained by a gene duplication (required duplications) and cases in which it is not possible to determine whether the disagreement is due to deep coalescence or gene duplication (conditional duplications). When reconciling a gene tree with a non-binary species tree, duplications appear in the tree as small red squares with red D’s, while conditional duplications are small pink squares with pink cD’s (see Figure 5.2).
Click on image to see larger version
If two or more orthologous genes are missing from species that are children of the same polytomy, then it is more parsimonious to infer a loss of the common ancestor of those genes. We refer to such losses as polytomy losses. For example, in Figure 5.2, members of the hypothetical Y gene family are missing from two species, bandicoot and opossum. These species are children of the same polytomy in the species tree in Figure 4.1. Notung infers a single loss, labeled with the names of species from which the gene is absent, as well as the label of the corresponding polytomy in the species tree. By default, polytomy losses are labeled with the species that lack the gene. However, if a polytomy loss is associated with many sibling species, the default display can produce very long labels. Users can instead opt to label polytomy losses with the number of species in which the loss occurred, as well as the label and the total number of children of the polytomy, illustrated in Figure 5.2(b).
Reconciling a non-binary gene tree with a binary species tree results in a non-binary, reconciled gene tree. A reconciled, binary gene tree can be obtained by using the Resolve function (see Chapter 8 - Resolve Mode).
Reconciliation of a non-binary gene tree with a binary species tree differs from binary reconciliation in two important ways. First, a polytomy in a non-binary gene tree may be annotated with more than one duplication. For example, the reconciled non-binary gene tree in Figure 5.3(a) has a polytomy annotated with two duplications and a loss.
Click on image to see larger version
Recall that a gene tree polytomy is an indication that although its children evolved by successive binary divergences, the order in which the taxa diverged is unknown. Since this binary branching pattern is unknown, the relative order of duplications and losses with respect to those divergences cannot not be determined, either. The polytomy in Figure 5.3(a) communicates that at least two duplications and one loss occurred in the subtree descending from the polytomy, but the exact timing of those events is unknown. See Chapter 4 - Non-Binary Trees for a detailed explanation of duplications and losses in reconciled non-binary gene trees.
Second, there may be several alternate hypotheses for the reconciliation of a non-binary gene tree. Since the true binary branching pattern of a polytomy is unknown, Notung infers duplications and losses for all binary resolutions with minimal D/L Score. If there is more than one optimal binary resolution, multiple reconciliations will result. Notung addresses this issue by presenting all alternate event histories to the user. Each event history represents a different combination of duplications and losses that could result in the same minimal D/L Score. Initially, Notung arbitrarily selects one event history to present in the tree panel. The other optimal histories may be viewed using the drop-down menu labeled “Select an optimal event history,” as shown in Figure 5.3. This menu gives a list of up to 50 optimal event histories. If there are more than 50 optimal event histories, they can be generated using the Command Line Interface (see Chapter 12 - Command Line Options and Batch Processing). For a more detailed discussion of alternate event histories, see Chapter 7 - Rearrange Mode.
To reconcile a gene tree with a species tree:
If the convention selected by Notung is not the naming convention used in the gene tree, change it by selecting the appropriate radio button. See Appendix A.4 - Specifying the Species Associated with Each Gene for details about species tag specifications.
NOTE: The Prefix and Postfix formats require species names to be embedded in the gene names. NHX Species Tag format embeds the species information in a Newick comment field. When this format is used, the information will not appear on the screen unless the “Display Leaf Node Species Names” option in the Display Options menu is selected (See Chapter 11.1 - Display Options).
The reconciled tree appears in the tree panel (see Figure 5.1(b)). Duplication nodes are indicated by a square and the letter “D”, shown in red. In non-binary gene trees, the number of duplications associated with a polytomy will also be shown with a red D (e.g., Figure 5.3(a)). Loss nodes appear in light gray type and state in which species the loss occurred. A message at the bottom of the program window reminds you which species tree was used in reconciliation (e.g., “Reconciled with: <speciestreeName>”; see Figure 5.2).
To hide loss nodes/duplications:
The duplication marks or loss nodes can be hidden to avoid a cluttered image.
NOTE: When you uncheck “Display loss nodes,” Notung will reset the image so that the whole tree fits in the tree panel.
Options that are not currently available are displayed in gray type to indicate that they are disabled. In particular, the above options will be grayed out if no reconciliation has been performed. The “Display Conditional Duplications” option will also be displayed in gray if the gene tree was reconciled with a binary species tree.
To view alternate optimal event histories:
If the gene tree is non-binary, there may be more than one reconciliation. If more than one optimal event history exists for a rearranged tree, the drop down menu, “Select an optimal event history,” will be enabled.
The tree panel will now show a new tree corresponding to the selected alternate history.
If there is only one optimal history or if the tree has not been reconciled, the drop down menu will be grayed out. Recall that in Reconciliation mode multiple optimal histories are only possible when the gene tree is non-binary.
To undo the reconciliation:
To display a pruned species tree:
This option is grayed out if the gene tree has not been reconciled.
To show time bounds and information on losses:
The D/L Score of the reconciled tree appears at the top of the window, followed by duplication bounds described in three columns. The left column gives the internal node in the gene tree where the duplication occurred. The center column and right column give lower and upper bounds, respectively, on the time of duplication, expressed as node names in the pruned species tree. The total number of duplications appears below this table.
If the species tree is non-binary, conditional duplication bounds, if any, are described in the three columns below duplication bounds. The left column gives the internal node in the gene tree where the conditional duplication occurred. The center and right columns provide the lower and upper bounds, respectively, on the species tree node in which the event (duplication or allelic divergence) may have occurred. The total number of condition duplications is listed below this table.
Information on losses will appear in the two columns below the (conditional) duplication bounds. The left column lists all the nodes in the species tree. The right column gives the number of inferred losses that occurred in that species. Polytomy losses are assigned to the corresponding polytomy, rather than the individual species which lack the gene. For example, the polytomy loss in Figure 5.2 is reported as a single loss in Metatheria.
To display internal node names in the tree panel, “Display Internal Node Names” and “Display Internal Node Species Names” must be turned on in the “Display Options” menu (See Chapter 11.1 - Display Options) for both the gene and species tree.
This option is grayed out if the gene tree has not been reconciled.
To display the number of species in polytomy losses:
By default, polytomy losses are labeled with the names of the species from which they are absent.
This causes polytomy losses to be labeled with the number of children of the polytomy lost, the total number of children of the polytomy, and the name of the polytomy in which these losses occurred.
Notung can infer orthologous and paralogous relationships between genes in binary gene trees reconciled with binary species trees. Recall that two genes are orthologous if they diverged from a common ancestor via speciation. If they diverged by duplication, they are paralogous [7, 6]. Notung infers orthology by finding the least common ancestor of two genes in a gene tree. If that least common ancestor is a duplication node, then the two genes are paralogous. Otherwise, the two genes are orthologous.
Notung will output a matrix of pairwise orthologous and paralogous relationships in several table formats. In addition, the Notung GUI includes an interactive Ortholog/Paralog feature in the Reconciliation task panel, that allows the user to investigate these features through a point and click interface.
Orthologs and paralogs can be reported in comma-separated (CSV), tab
separated, or HTML formatted tables. For each of these options, genes
in the gene tree are listed in both column and row headers.
Orthologous genes are indicated by an “O” in the table, while
paralogous genes are indicated by a “P.” An example table, showing
orthologs and paralogs from genetree_SMALL
, is shown
in Table 5.1. In HTML tables, CSS is used to color cells
representing orthologs with a blue background, and cells representing
paralogs with a pink background.
Homolog Table for: genetree_SMALL
P == Paralogous
O == Orthologous
. == Genes on X and Y axis are the same.
gB_human gA_human gA_mouse g_gorilla gB_mouse gY_cow gX_cow gB_human . P P P P O O gA_human P . P P P O O gA_mouse P P . O P O O g_gorilla P P O . P O O gB_mouse P P P P . O O gY_cow O O O O O . P gX_cow O O O O O P .
To view an Ortholog/Paralog table:
NOTE: The selected table will be displayed in a popup dialog box. To copy the table, click “Copy to clipboard”. Tab delimited tables can usually be pasted directly into spreadsheet applications like Excel. CSV formatted tables can be opened by most spreadsheet programs via the file menu. HTML format tables can be pasted directly into web pages.
To enter the interactive Ortholog/Paralog mode, click on the “Orthologs/Paralogs” button in the Reconciliation task panel. A legend will appear in the tree panel. Mousing over or clicking on a gene will highlight it in light blue. Orthologs of this gene are highlighted in darker blue, and paralogs are highlighted in pink. The legend can be minimized by clicking on “hide”, in the legend. Click on the minimized legend to show the full legend again. The legend can be dismissed entirely by clicking “close”. The next time you enter Ortholog/Paralog mode, the legend will be visible again.
NOTE: If you use “File → Save Current View as Image (PNG)”, the image will contain the Ortholog/Paralog legend, and if a gene is currently selected, orthologs and paralogs of that gene. Currently, “File → Save Whole Tree as Image (PNG)” will not show orthologs and paralogs.