Notung can save trees in three different file formats: Newick file format, NHX file format, and Notung file format.
Newick file format specifies tree topology and node labels, but cannot be used to save reconciliation information or information about the species tree with which the gene tree was reconciled.
NHX and Notung file formats use the Newick comment field to store additional information not captured in the standard Newick specification. A reconciliation involves a gene tree, a species tree, the mapping from gene tree to species tree, and the inferred duplications and losses. Newick format stores only the gene tree. NHX format can store a gene tree, with additional information to indicate which nodes are duplications. Notung file format can store a gene tree, the species tree with which it was reconciled, and duplication and loss nodes. If you save a reconciled tree in Notung format, it will still be reconciled when you next open it in Notung.
The Notung file format holds more information, but may not be compatible with other software packages that use Newick format. The formal specification of Newick file format allows bracket-delimited comments. Programs that follow the formal specification and ignore information stored in comments will be able to read NHX or Notung format trees. However, not all programs allow comments. If you plan to use a program that does not allow Newick comments to further analyze trees saved by Notung, save your trees in standard Newick format.
Newick is widely used by phylogeny programs. PHYLIP [5], PAUP* [13], and many other programs will output trees in Newick.
The general Newick syntax looks like this:
treefile → subtree;subtree → descendant_list [internal_node_label] [:branch_length]
descendant_list → (subtree, subtree [, subtree]) | leaf_node_name
where descendant_list is a string that specifies the
organization of the subtree and
internal_node_label is the
label of the root of a subtree. The optional branch_length
field refers to the length of the edge from the root of the subtree to
its parent. The internal_node_label and
branch_length fields are optional. Some programs use these
fields to store other information. For example, Notung allows the
user to use either of these fields to store edge weight values.
Comments in Newick format are enclosed in square brackets and may appear anywhere newlines are permitted. Some programs use the comment field to store additional information that is not included in the Newick specification. By convention, this information is formatted as follows:
[&&ApplicationID:Application_specific_comments]
where ApplicationID indicates a specific program or format.
For more information about Newick file format, go to:
http://evolution.genetics.washington.edu/phylip/newicktree.html.
or
http://geta.life.uiuc.edu/~gary/Newicks\_845\_Tree\_Std.html.
NHX File Format is based on the Newick file format, but embeds additional information about each node in the tree in the comment fields, as follows:
[&&NHX:TagID1=value1:TagID2=value2]
where TagID1 and TagID2 can specify bootstrap values, species labels, or duplication information. This example has two tags, but NHX comments can have one or more tags. Trees saved in NHX file format include information produced by a reconciliation, including duplications and species labels, but do not record any visual annotations made in Notung. Nor do they record the species tree with which the gene tree was reconciled.
NOTE: The NHX format is case-sensitive.
More information about NHX format, including a complete list of tags used in comment fields, can be obtained at:
http://www.genetics.wustl.edu/eddy/forester/NHX.html.
Notung File Format further extends the NHX format. Notung file format can record duplication marks, edge weights, and color annotations. A reconciled gene tree file saved in Notung format will also have a pruned species tree embedded in it. When the reconciled gene tree is reopened in Notung, the pruned species tree can be extracted and used in the same way as any other species tree. A reconciled gene tree saved in Notung file format also stores additional information on parameter values, including edge weight threshold, loss cost, duplication cost, and conditional duplication cost. In addition, a non-binary gene tree reconciled with a binary species tree with more than one optimal history stores information regarding which history was displayed when saved. When the gene tree is reopened in Notung, the tree for that optimal history will be displayed.
To open an embedded species tree in a Notung format gene tree file:
NOTE: None of the three file formats used in Notung embed alternate histories for gene trees discovered through rearrangement. When saving after rearrangement, Notung saves only the history that currently appears in the tree panel. To access the other alternate histories when opening such a file, the tree must be rearranged again in Notung.
In order to perform reconciliation, Notung must determine the species from which each leaf taxon in the gene tree was derived. This is achieved by embedding the species name in the gene leaf label or by using information embedded in the NHX comment field.
Notung offers three different conventions for specifying the gene to species mapping, described below. Notung will attempt to guess the naming convention used; you can also specify this in the reconciliation dialog (see Chapter 5 - Reconciliation Mode).
NOTE: When using this format, no species label should be a prefix of another species label, such as with carp and carpinusBetulus. In this situation, Notung may incorrectly identify the gene carpinusBetulus_gene1 as a carp gene, rather than a hornbeam gene.
NOTE: Postfix mode cannot be used if species names include underscores (_); for example, Carpinus_betulus cannot be used in Postfix mode.
In previous versions of Notung, punctuation (-
, /
,
_
, .
, \
) in species names was used to indicate
that Notung should look for a shorter species tag in gene names,
rather than looking for the entire species name. For example, given
the species name Hu.Homo_Sapiens
, Notung would look for the
species label “Hu
” in gene names.
Because many users found this confusing, this functionality has been
removed in Notung 2.6. Notung now looks for entire species names
during reconciliation, which also allows users to use species names
like Pan_troglodytes
and Pan_paniscus
in the same tree
without creating a conflict. Unfortunately, this means that some
trees that were used in previous versions of Notung will not work in
the current version. This section explains how to change these
trees so that they can be used with Notung 2.6.
Any species tree with punctuation in the species names, where the full species names are not present in either the gene tree names or in NHX style species tags, will need to be converted. If your species names contain punctuation and you used them with older versions of Notung, then your trees probably fit this description. If Notung 2.6 is used to open an older Notung format tree that needs to be converted, a warning dialog will be shown.
There are three ways to convert trees with punctuation in species names. The correct method to use depends on your desired outcome.
Hu-gene01
”,
change “Hu.Homo_sapiens
” to “Hu
” in the species
tree. These shorter species names should now match the species
labels in the gene names.Hu-gene01
” to “Hu.Homo_sapiens-gene01
” in the
gene tree. This solution will not work in
Postfix mode if your species names contain underscores (_).If the gene tree is already in NHX or Notung format, modify
the NHX comment after each gene name.
To modify an existing NHX comment, find the species tag and replace
the shorter species label with the full species name. For example,
“[&&NHX:S=Hu]
” becomes
“[&&NHX:S=Hu_Homo_sapiens]
”.
If there are no comments in the file (i.e., the tree is in Newick
format), add the following after each gene name:
“[&&NHX:S=<speciesname>]
”, where <speciesname>
is
the corresponding full species name from the species tree. For
example, the gene tree:
(gene1_Hu, (gene2_Hu, gene2_Mu));
would become:
(gene1_Hu[&&NHX:S=Hu_Homo_sapiens], (gene2_Hu[&&NHX:S=Hu_Homo_sapiens], gene2_Mu[&&NHX:S=Mu_Mus_musculus]));
Notung uses edge weights to determine which edges are weakly supported and may be rearranged. These edge weights may correspond to bootstrap values, probabilities, branch lengths, or any other numerical indication of support.
Edge weight values can be located in one of three places in a tree file, depending on how the file was created. In Newick format, either the branch length field or the internal node name may be used to specify edge weights. Many programs store bootstrap values in the Newick node name field. In an NHX or Notung format file, edge weights can also be specified using the NHX bootstrap tag in the comment field.
The example below shows a tree with a single edge weight in each of the three tree formats:
(cow_gene1, (mouse_gene2, cow_gene2):100)
(cow_gene1, (mouse_gene2, cow_gene2)100)
(cow_gene1, (mouse_gene2, cow_gene2)[&&NHX:B=100])
Confusion can arise if an input tree has edge weights in more than one type of field. This could occur, for example, in a tree that has both branch lengths and bootstrap values. Notung tries to guess the type of edge weight specification in the file, but it is not always possible for Notung to determine this unequivocally. You can specify the location explicitly using command line options (see Chapter 12 - Command Line Options and Batch Processing) or using the “Select Location of Edge Weights” dialog in the Display Options menu (see Figure A.1).
Click on image to see larger version
To set the location of edge weights in Notung:
The gene tree will immediately reflect the change, so you can check the tree panel to verify that the choice you selected gives the desired values.