Notung offers a command line interface (CLI) that can perform most operations from the command line without launching the graphical user interface. The CLI allows the use of batch processing to apply Notung to many trees in a large-scale analysis without human intervention. It can also be used to analyze a small number of trees without launching the GUI, for example, by a user executing Notung on a remote computer over the network. The GUI can also be launched from the command line, rather than by clicking on an icon, allowing the user to initiate the GUI with parameter settings other than than the default settings. Finally, when used as an applet, Notung is launched from a web page using CLI syntax.
We follow the following stylistic conventions in this chapter.
-g <genetree>
, indicates that Notung expects a file name after
-g
.mygenetreefile
will produce the file
mygenetreefile.reconciled
. In this chapter, we use
<function>
to describe such file names, e.g.,
<genetree>.<function>
.mygenetreefile.reconciled.0
,
mygenetreefile.reconciled.1
, etc. We use a ’#
’ sign to
represent the number in such file names; e.g.,
<genetree>.<function>.#
.Prior to running Notung’s command line interface, you will need to open a command or “terminal” window.
Opening a command window
Click on the Start button, and select the “Run...” item. A dialog box will pop up. Enter “cmd.exe” into the box, and click “OK.”
Navigating to the Notung directory
In the command window, type the followingcd <pathname>where<pathname>
is the path of the Notung directory. If the folder location has any spaces in it, it must be enclosed in quotes. For example, if the following is the location of the Notung folder:C:\Documents and Settings\User\Desktop\Notung-2.6Then you should use quotes so that it looks like this in the command window:cd "C:\Documents and Settings\User\Desktop\Notung-2.6"Hit Enter, and you will now be in the Notung Folder.NOTE: To find the path of the Notung directory, select the Notung folder in Explorer, and right click on it. This will pop up a menu - select the Properties item. This will pop up a dialog listing the properties of the Notung folder, including its location.
Opening a command window in the Notung directory
Select the Notung folder in Explorer, and right click on it. This will pop up a menu - select “Start command window here.”
Opening a terminal
The Terminal application is located in the Applications folder in the Utilities subfolder.
Navigating to the Notung directory
In the terminal window, type the followingcd <pathname>where<pathname>
is the path of the Notung directory. If the folder location has any spaces in it, it must be enclosed in quotes. For example, if the following is the location of the Notung folder/Users/user/Desktop/New Folder/Notung-2.6Then it should look like this in the terminal windowcd "/Users/user/Desktop/New Folder/Notung-2.6"Hit Enter, and you will now be in the Notung Folder.NOTE: To find the path of the Notung directory, select the Notung folder in the Finder, and select “Get Info” from the File menu. This will pop up a dialog listing the properties of the Notung folder, including its location. You could also drag and drop the Notung folder into the Terminal window to paste the folder’s path into the window.
Navigating to the Notung directory
In the terminal window, type the followingcd <pathname>where<pathname>
is the path of the Notung directory. If the folder location has any spaces in it, it must be enclosed in quotes. For example, if the following is the location of the Notung folder/Users/user/Desktop/New Folder/Notung-2.6Then it should look like this in the terminal windowcd "/Users/user/Desktop/New Folder/Notung-2.6"Hit Enter, and you will now be in the Notung Folder.
Notung can carry out its four main tasks, reconcile, rearrange, rooting and resolve, from the command line. In each case, Notung reads in gene and species trees (the input trees) and executes the specified task, resulting in one or more modified trees (the output tree(s)). This modified tree is written to a file. Notung can also generate images in PNG format from the command line. This function can be carried out in conjunction with any of the four main tasks, or independently to generate an image of an existing tree without performing any analysis. The I/O requirements differ somewhat in the latter case; only one tree is required as input and an image rather than a tree file is generated as output. In this section, we discuss executing the four main tasks from the command line, postponing image generation to a later section. In Section 12.3 (Running Notung from a Batch File), automated execution of Notung is described. Commands and options specific to image generation are described in Section 12.4 (Saving PNG Images of Trees). Commands and options specific to reconciliation with non-binary species trees are described in Section 12.5 (Options for Reconciling with Non-Binary Trees).
For the four major tasks, Notung is executed from the command line using the following format:
java -jar Notung-2.6.jar [input tree(s)] [task] [options]
The four main tasks require both a gene tree and a species tree. These are usually supplied as two separate input files. A single file containing a previously reconciled tree in Notung format is also acceptable, since such files contain both a gene tree and species tree. If a gene tree file containing a reconciled tree in Notung format and a species tree in a separate file are both given, the latter is used; the species tree in the gene tree file is ignored. The task parameter must be one of --reconcile, --rearrange, --root, and --resolve (the fifth task, --savepng, is discussed in Section 12.4.) Options are described below.
NOTE:
- The input trees, tasks, and options may be given in any order.
- To launch the graphical interface from the command line, run Notung with no task option.
- Running Notung with the --help option causes it to print information regarding input, output, and other options.
- The commands given in this chapter will only work if you are currently in the same directory as the Notung jar file. In order to run Notung from any directory, add the Notung directory to your CLASSPATH. For example, if you run bash In Linux, you can do this by adding the following command to your .bashrc file:
setenv CLASSPATH $CLASSPATH:<pathname>
See a java manual for more information about CLASSPATH settings.
The following list describes Notung’s command line options. For more details on tree formats, including information on edge weights, species tags and output files, see Appendix A - File Formats.
If one of the four main functions is given, the output gene tree
will be saved to a file called <genetree>.<function>
(where
<function>
is one of the four major tasks, reconcile,
rearrange, resolve, or rooting.) If the analysis results in more
than one optimal history,
then the output files are
numbered, (e.g. <genetree.rearrange.0
,
<genetree.rearrange.1
, etc.).
By default only one tree is saved. To
save more than one tree, use --maxtrees.
If the --savepng option is given, an image of the tree is saved in PNG format. For more information on saving PNG images with --savepng, see Section 12.4 - Saving PNG Images of Trees.
If the species tree contains species that do not appear in the gene
tree, during reconciliation Notung constructs a pruned species tree
that only contains those species required to reconcile the gene
tree. If the --stpruned option is given, this pruned
species tree is saved in the file
<genetree>.<function>.species
.
When run on the command line, Notung outputs status information to
the terminal window. This information can be saved in the log file
<genetree>.<function>.ntglog
by using the --log
option. For a batch run, a log file is not saved for each tree;
rather, a single log file for the entire batch run is saved to the
file <batchfile>.<function>.ntglog
.
General tree statistics can be saved in the file
<genetree>.<function>.stats
by giving the option
--treestats. This file includes information on both the gene
tree and the pruned species tree. For more information on tree
statistics, see Section 3.4 - General Tree Statistics.
Information on the timing of
each duplication and loss is saved in the file
<genetree>.<function>.info
when the --info option
is used.
For each duplication, an upper
and lower bound (represented as nodes from the species tree) are
given. For losses, each node in the species tree is listed with the
number of losses associated with that taxon. For more information
on duplications and losses, see Chapter 5 - Reconciliation Mode.
Notung can output tables of orthologs and paralogs for all pairs of leaf nodes in the reconciled tree. This table can be generated in several formats: comma-separated values (CSV), tab-delimited values, or an html-formatted table. Use options --homologtablecsv, --homologtabletabs or --homologtablehtml, respectively. For more information on orthologs and paralogs, see Section 5.3 - Inferring Orthologs and Paralogs.
<genetree>
Load the file <genetree>
as a gene tree.
NOTE: The -g is optional.
<speciestree>
Load the file <speciestree>
as a species tree.
The -s is required.
<batchfile>
Load the trees listed in <batchfile>
. Requires that the
--speciestag option be set. If rearranging, requires the
--edgeweights and --threshold options. With
this option, -g <genetree>
and -s <speciestree>
should not be specified. See Section 12.3 - Running Notung from a Batch File for more information.
Files listed in <batchfile>
use absolute paths.
See Chapter 12.3 - Running Notung from a Batch File for more information.
-gu <gene tree URL location>
Load gene tree from a URL. This option is only used when running Notung as an applet.
-su <species tree URL location>
Load species tree from a URL. This option is only used when running Notung as an applet.
Reconcile a gene tree with a species tree. In batch mode, --speciestag is required. For more information on reconciliation, see Chapter 5 - Reconciliation Mode.
Rearrange the gene tree. The option --threshold must be set. In batch mode, --speciestag and --edgeweights are also required. For more information on rearranging gene trees, see Chapter 7 - Rearrange Mode.
This task, which removes polytomies from a non-binary tree, can only be carried out if the gene tree is non-binary. In batch mode, --speciestag is required. For more information on resolving non-binary nodes in a gene tree, see Chapter 8 - Resolve Mode.
Root the gene tree. The top <maxtrees>
best scoring
rooted trees are saved in files named
<genetree>.rooting.#
. By default, <maxtrees>
is
set to 1. In batch mode, --speciestag is required.
For more information on rooting gene trees, see Chapter 6 - Rooting Mode.
<duplication cost>
Sets the cost of gene duplications. If not set, the cost is set to 1.5, by default.
<conditional duplication cost>
Sets the cost of conditional gene duplications. These only occur when reconciling a binary gene tree with a non-binary species tree. If not set, the cost is set to zero, by default. See Chapter 5 - Reconciliation Mode for more information.
<lost gene cost>
Sets the cost of gene losses. If not set, the default cost of 1.0 is used.
Indicates the format of species tags in the gene tree. If not set, Notung tries to guess the correct format. See Appendix A.4 - Specifying the Species Associated with Each Gene.
<threshold>
|<percentage>
%Edges with weight higher than <threshold>
are preserved
during rearrangement. This can be given as an absolute value or
or as a percentage of the maximum value, using <percentage>%;
e.g. “--threshold 90%” sets the threshold at
90 percent of the highest edge weight in the tree.
See Section 3.5 - Parameter Values for more information.
Indicates where in the tree file the edge weights, if any, are specified. If this option is not set, and the gene tree has values in more than one location, Notung will guess the location of edge weights when using --rearrange. See Appendix A.6 - Location of Edge Weight Values for more information.
Same setting as --edgeweights. Kept for backwards compatibility.
<filename>
Attach the given annotation file to each input tree.
<filename>
Used with --savepng. Notung uses the contents of
<filename>
to create an image map file, which is saved in
<outputtreename>.png.html
. For more information,
see Section 12.4 - Saving PNG Images of Trees.
Specify output tree file format. See Appendix A - File Formats for more information.
Remove loss nodes from gene trees before they are saved. Useful when outputting tree in Newick or NHX formats, which do not recognize loss nodes, or with --savepng to output a tree image without loss nodes.
<maxtrees>
Maximum number of optimal trees to output during reconciliation, rearrangement, rooting, and resolving. Default is one.
<outputDir>
Save output files in the
directory, <outputDir>
. Default is the current
working directory.
Save output trees in the directory in which
<genetree>
is located.
Writes diagnostic output to the file
<genetree>.<function>.ntglog
, where <function>
is
one of the four modes. For batch runs, the log file is saved in
<batchfile>.<function>.ntglog
.
Save information on duplications and losses in the
file <genetree>.<function>.info
.
Save general statistics for a tree. Saved in
<genetree>.<function>.stats
. Statistics on the
pruned species tree will be included in this file.
See Section 3.4 - General Tree Statistics for more information.
Save a version of the species tree that
contains only the species found in the gene tree. Saved in the
file <genetree>.<function>.species
.
Report a list of ordered root scores to standard output (only used with --root). This option is useful for statistical examination of root scores for the gene tree. These scores can be saved in a file with the --log option.
Suppresses reporting of diagnostic information to the terminal.
In batch mode, print a simple progress bar to stderr for each tree analyzed. Useful with –silent.
Save the tree as a PNG image. Unlike Notung’s other main functions, this function does not require a species tree. For more information about --savepng, see Section 12.4 - Saving PNG Images of Trees.
For more information on orthologs and paralogs, see Section 5.3 - Inferring Orthologs and Paralogs.
Save a comma separated table of orthologs and paralogs to the file
<genetreename>.<function>.homologs.csv
.
Save a tab-delimited table of orthologs and paralogs to the file
<genetreename>.<function>.homologs.tabs
.
Save a table of orthologs and paralogs in html format to the file
<genetreename>.<function>.homologs.html
. This format
can be included in a a web page.
GUI only: if an input gene tree is reconciled, open the attached species tree in a separate tab. Useful for displaying Notung format trees in the Notung applet.
GUI only: if an input gene tree is reconciled, start Notung in the Reconciliation tab with the Orthologs/Paralogs button selected. Useful for ortholog / paralog analysis in the Notung applet.
Print information about these options.
Batch processing allows the user to apply Notung to many trees in a large-scale, automated analysis. The input trees are given in a batch file, which consists of a list of tree file names, one per line. Blank lines and lines which start with # are ignored.
NOTE: By default, Notung expects the tree file locations to be given relative to the location of the batch file. For example, if the batch file is in /username/batchRun, Notung expects the gene trees to be in the batchRun folder or in some subfolder of batchRun. Use the --absfilenames option to indicate that file names are absolute path names.NOTE: When using the --savepng option without any of the four main functions (--reconcile, --root, --rearrange and --resolve), each tree listed in the batch file is saved as an image. For more information, see Section 12.4 - Saving PNG Images of Trees
A sample batch file is provided with the Notung 2.6 distribution in the sampleTrees/batch directory. This batch file includes all combinations of binary and non-binary gene and species trees. Because not all of Notung’s task modes work for each of these combinations, you will receive one or more warnings and errors when running this batch file. In addition, the batch file lists a gene tree which does not exist, to give an example of the appropriate warning.
Use the -b <batchfile>
option.
For example, from the Notung directory, enter the following on the command line:
java -jar Notung-2.6.jar -b sampleTrees/batch/batch.run --reconcile --speciestag prefix
The --reconcile option tells Notung to reconcile all the gene
trees listed in batch.run
with the species tree listed in
batch.run
. The --speciestag prefix option tells
Notung how species labels are specified in the gene tree files, and is
required in batch mode. See Appendix A.4 - Specifying the Species Associated with Each Gene for more information on species labels.
NOTE: All gene trees in the same batch file must use the same species tag format, which is specified using the --speciestag option.
In batch mode, the --speciestag option is always required. In addition, when using --rearrange, --edgeweights and --threshold must be used to set the edge weight locations and threshold, respectively.
As Notung reads and processes each gene tree in the batch file, it prints diagnostic information to the terminal. Notung will also print this information to a log file when the --log option is given. Any errors that occur in the processing of a batch file are reported to the terminal as they occur. The total number of errors is reported at the end of the batch run.
To print status information to a file:
Use the --log option from the command line. The information
will then be written to the file
<batch_file_name>.ntglog
.
To save trees to a different directory:
By default, Notung saves each reconciled tree to the directory from which the program was run.
--outputdir <outputDir>
option from the
command line. The information will then be written to the
directory <outputDir>
.
Progress Bar
For long runs, it may be convenient to use the options --silent and --progressbar together. This will suppress all output to the terminal with the exception of a simple progress bar to stderr. The option --log can still be used to save the (now suppressed) output to a file.
The option --savepng saves a simple image representation of a tree in PNG format. The option --savepng can be used with one of the four main tasks (--reconcile, --root, --rearrange and --resolve), in which case an image of the final output tree is saved, in addition to the output tree file. This behavior is similar to other output options such as --treestats and --homologtablecsv. Alternatively, --savepng can be used alone to save an image of a tree without performing any other tasks.
When --savepng is used without one of the main four tasks, Notung reads in a tree and generates and saves an image of that tree in PNG format. Unless a batch file is used, only a single tree can be processed at a time (i.e., a gene tree and a species tree cannot both be given). If the input tree is a previously reconciled tree in Notung format, the image will show the appropriate duplications and losses (to save an image without losses, use --nolosses). If the tree has not been reconciled, the tree image will show only the structure of the tree and the names of the leaves of the tree.
When using a batch file, each tree specified in the file is saved as an image. When generating images without performing a major task, the batch file format format differs slightly: Species trees and gene trees can be listed in any order.
When --savepng is used alone, an image of the input tree is saved in the file
<treename>.png
. When used with --reconcile,
--root, --rearrange or --resolve, an image
of the output tree is saved in the file
<genetreename>.<function>.png
. For analyses with more than one
optimal history, an image file is saved for each history. The
number of files is limted by the parameter --maxtrees.
If a tree in Notung format contains color annotations, the leaves in images of that tree will be colored as specified by those annotations. Additionally, an annotation file can be specified with the option --annotationfile. For more information on color annotations, see Chapter 10 - Annotations.
Notung provides the option to produce an html imagemap for a tree
image. If an imagemap and image file are both included in a web page,
each gene in the image will provide a link to a specified web page.
The format of these links is determined by the imagemap specification
file given with --imagemapfile <imagemapfilename>
, described below.
The resulting imagemap is saved in the file
<outputtreename>.png.html
, where <outputtreename>
is
either <genetree>.<function>
or <treename>
.
To include the image and imagemap in a web page, insert the entire
contents of the saved imagemap file into the html of the web page.
The saved image must be in the same directory as the web page,
unless you specify a different location for the image by changing
<imagefile>
in the line:
<img border=0 src='<imagefile>' ...
The specification file given by --imagemapfile <imagemapfilename>
consists of a list of gene/link pairs. Blank lines and lines that
start with # are ignored. An example specification file:
# Danio rerio links: gene: Danio_rerio|(id) link: http://zfin.org/cgi-bin/ZFIN_jump?record=(id) # generic imagemap - everything else links to google gene: (id) link: http://www.google.com/search?q=(id)
Lines starting with ‘gene:’ match genes in the gene tree; lines starting with ‘link:’ specify the format of links for those genes. For each gene in the gene tree, the first gene/link pair that matches will be used. If a gene does not match any of the ‘gene:’ lines, a warning will be printed.
The identifier ‘(id)’ will match any text string, and that
text string is used in the link. Any other text present in the
‘gene:’ line must match gene names exactly. In the example
above, the gene Danio_rerio|ZDB-GENE-031007-1 would match
the first ‘gene:’ line. The identifier (id) would
be ZDB-GENE-031007-1, and the link would be
http://zfin.org/cgi-bin/ZFIN_jump?record=ZDB-GENE-031007-1.
The gene Homo_sapiens|gene1 would match the second pair,
because ‘(id)’ will match any text string. The resulting
link would be
http://www.google.com/search?q=Homo_sapiens|gene1.
An example gene tree and imagemap specification from the Princeton Protein Orthology Database (http://ortholog.princeton.edu/) are included in the Notung distribution.
When inferring losses during reconciliation with a non-binary species tree, it is not possible to determine unambiguously the edge in the the gene tree to which a loss should be assigned. Notung uses two different methods to deal with this problem. An exact algorithm finds all possible assignments that minimize the total number of losses but has exponential time complexity. A heuristic, which runs in polynomial time, is not guaranteed to find the optimal assignment, but usually does in practice. These issues and algorithms are discussed in detail in Section 4 (Non-Binary Trees).
Only the heuristic is implemented in the GUI. Either method may be used when executing Notung from the command line. The CLI runs the heuristic by default. To use the exact algorithm, include the --exact-losses option when running Notung from the command line with the --reconcile or --root tasks.
The running time of the exact algorithm is exponential in the size of
the largest polytomy. Even when --exact-losses is used,
Notung does not apply the exact algorithm to polytomies with more than
12 children. Instead, the heuristic is applied to these polytomies.
To change the maximum polytomy size for which Notung uses the
exact algorithm, use the --polytomy-cutoff <maxPolytomySize>
option when including the --exact-losses option in the
command line.
NOTE: Changing the polytomy cut-off to a larger value and using the exact algorithm on a species tree with a polytomy with more than 12 children may greatly increase running time.
Computes the minimum number of losses when reconciling a binary gene tree with a non-binary species tree. If this option is not included on the command line, the heuristic used. NOTE: In Notung 2.5, this option was named --combine-losses.
<maxPolytomySize>
Using this option with --exact-losses will change the
default value for polytomy cut-off. Only for losses associated
with polytomies less than or equal to <maxPolytomySize>
will the exact algorithm be used. The default value is 12. If
a polytomy greater than <maxPolytomySize>
is encountered,
a warning will be printed to the terminal window and/or log
file.
When run with --exact-losses, this option will report both the number of losses obtained with the heuristic and with the exact algorithm. This is useful for determining whether the heuristic is overestimating the number of losses and by how much. NOTE: In Notung 2.5, this option was named --report-explicit-losses.