Introduction
Downloads
Source Code
Options
Contact
Introduction

NPT is a fast implementation of n-point spatial statistics. It uses kd trees in a new way that allows computations of n-point statistics at far greater speed than a naive nontree-based implementation.

For general information about NPT, please refer to this paper.
For detailed information about NPT, please refer to this paper.
Downloads
NPT Linux tar package (compiled on gcc egcs-2.91.66 and libc.so.6)
NPT SunOS tar package (compiled on SunOS_4.1.4_sun4m with gcc 2.7.2)
NPT R Library (compiled on Linux)
Source Code
To obtain NPT source code, please fill out this form.
Options

The standard command line syntax for NPT includes an input file and the following list of options.

        ./npt npt in datafilename [options]

where datafilename is the name of an ascii file in which each datapoint is given on a separated line, and each line is a series of numbers specified by spaces or commas.

Optionally the datafile may have the first row be a set of non-numeric strings correspnding to attribute names.

So, for example, .csv (comma separated values) files are fine.

Options are....



Option Type Default Description
num_rows int 999999999 If num_rows is less than #points in dataset, will use a random sample of dataset containing "num_rows" points. Useful if you're doing experiments vs dataset size.
all_equal_metric bool TRUE Do we use the plain obvious distance metric? (TRUE means "don't autoscale the dimensions") (FALSE means "put everything inside a unit cube")
matcher matchspec 0.05 Specify the n-point predicate to use. The syntax is described below.
thresh_ntuples double 0 Skip any n-tuples of kdnodes in which the maximum possible number of tuples from them are less than thresh_ntuples. This argument is IGNORED if autofind is set to TRUE.
n int 2 The "n" in "n-point"
autofind bool FALSE If set to TRUE do repeated attempts at solving with successively smaller thresh_ntuples values, until you find one in which the maximum possible error is within fraction "errfrac" of the true count.
errfrac double 0.05 Only relevant if autofind is TRUE. This is "epsilon" in the paper.
verbose double 1 Set to 0 unless you want an animation (which'll slow it down)
rmin int 20 The leaf-list size in the kd-tree. Unclear what its general effect is, but 'smaller the better subject to fitting in main memory' is NOT the way to go. I suggest not playing with this. 20 usually is fine.
min_rel_width double 0.0001 kdnodes smaller than this are not split no matter how many points. I suggest not playing with this.
rdraw bool FALSE If set to true, pops up a window and does a very fast flickery animation of progress in the search
winsize int 512 Pixels in ag window edge (size of popup window)
binfile filename NULL (The following courtesy of Nick Konidaris)
A binfile is a /single line/ that looks like this:
<field1> <field2> <field3> ... <fieldN>
Where <fieldi> is a parameter that you would pass to the switch matcher. An example binfile looks like:

1,2 2,3 3,4 5,6 6,10 10,100

And it will iteratively go through each of these fields to find matches.
rdata string random.csv If you want to do an n-point count between two datasets (e.g. data and random) then this is the argument with which you can specify the random dataset.
use_permute bool TRUE You should not need to use this. By setting it to FALSE you can get the same behavior that used to appear in the "compound" predicate cases. i.e. a set of points is counted multiple times if it matches the template in multiple orderings and individual points can appear in the same tuple multiple times.
In its default (TRUE) setting, it now counts each set of points only once and a set must consist of unique points.


HORRIBLE WARNING FOR SETTING use_permute TO FALSE
Okay there's something ugly going on. For efficiency, if the the code ever sees that the same dataset is used in all components of the format, it uses symmetry to avoid wasting time doing redundant counting (e.g. in 2-point it doesn't count the same pair twice).
Of course, there's no such kind of symmetry with, say, DR, type searching.
The warning is: MAKE SURE YOU TAKE THIS INTO ACCOUNT IF YOU EVER TRY TO DO COMPARISONS OF A DD vs DR vs RR COUNT.
format string (n d's, i.e. "ddd...d" ) If you are doing 2-point, use format dd for data vs data
format dr for data vs random
format rr for random vs random

If you are doing 3-point, use (for example) format ddr
for data vs data vs random etc...

Note: the d's must be at the beginning and the r's must be at the end of the string.
nweights int 0 The last nweights columns in the data set are assumed to be weightings on the data. If this argument is non-zero, all npoint operations will return both the counts of matching n-tuples, and the weighted counts. Each tuple matching the template will be counted with a weight equal to the product of the weights of the points in the tuple.
Note: the approximation methods still use the plain counts to guide the coarseness of the approximation.

Caveat:
The weighted npoint computation currently only works for n <= 3. It can be extended for higher n if necessary.

The following is a sample NPT command line:

        ./npt npt in vsimple.csv rdraw TRUE
Contact
Webmaster
Auton Project webpage