This page contains links to a preliminary version of the HapMotif
package, which contains code for locating conserved sequences of
genetic polymorphisms in aligned haploid sequence data and applying
them to various problems related to association study design and
applications. The definitions and methods for motif detection are
described in the paper "Haplotype Motifs: An Algorithmic Approach to
Locating Evolutionarily Conserved Patterns in Haploid Sequences" by
Russell Schwartz, which appeared in the proceedings of the 2003
IEEE Computer Society Bioinformatics Conference. Applications to
downstream problems --- including missing site prediction, informative
SNP selection, and case-control association testing --- are described
in a forthcoming paper. For more information, see the following
documentation:
README.1st: general description of
the package contents and installation instructions
README.hapmotif: a description of
usage of the motif detection program
README.htsnp: a description of
usage of a program using motifs for htSNP selection
README.predict: a description of
usage of two programs for inferring missing sites using haplotype motifs
README.case-control: a description of
usage of a simple motif-based case-control association testing program
README.files: documentation on file formats
used in the code
Changes from version 0.0.0.1:
- statistical model has changed to more accurately estimate
motif frequencies
- correcting for sequencing errors/recent mutations is now an option in
the missing data inference programs
- htSNP methods are much faster and have lower memory overhead
The source code is in C++. Due to limited computational and labor
resources, it has only been tested on a Linux (RedHat 7.3, gcc 2.96)
and a Mac OS X (10.2, Apple Computer Inc. GCC 11161) computer. Both
binaries run in command-line mode only. I am unaware of any reason it
would not work with any ANSI-compliant C++ compiler, but tweaking of
code and Makefiles will likely be required. The source code and
precompiled Linux and Mac OS X binaries are available here:
source code in tarred gzip format
Mac OS X binaries
hapmotif
predictg
predictb
htsnp
case-control
Linux binaries
hapmotif
predictg
predictb
htsnp
case-control
Questions, comments, and bug reports may be sent to the author at russells@andrew.cmu.edu. Please note, however, that development of this code is a research project which is aimed at creating theoretical methods for computational genomics, not at producing production quality code. This code is being released to allow others to review, experiment with, and improve upon these methods. The code is not suitable for mission critical work and should not be used as if it were. The code and all associated materials are provided as is, with no warranty of any kind, explicit or implicit, and no explicit or implicit promise of support.
The HapMotif codes and any associated files are released under the
terms of the MIT License, reproduced below. The author would,
however, appreciate hearing about any interesting applications or
results derived with this tool and requests that the code be
appropriately cited in any publications making use of it.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.