Murasaki
What's Murasaki? †
Murasaki is an anchor alignment program that is
- exteremely fast (17 CPU hours for whole Human x Mouse genome (with 40 nodes: 35 wall minutes), or 8 mammals in 21 CPU hours (42 wall minutes))
- scalable (Arbitrarily parallelizable across multiple nodes using MPI)
- memory efficient. (Even a single node with 16GB of ram can handle over 1Gbp of sequence)
- unlimited by pattern length or selection
- repeat tolerant
Compatability †
Targeted for runs on 32/64bit Linux and other POSIX compatible operating systems.
Tested on:
- Debian (lenny, squeeze)
- FreeBSD 6, 7
- MacOS X 10.4, 10.5, 10.6
- Ubuntu 9.10
- Fedora 12
With some luck, sometimes working in win32 with mingw, but no guarantees for Windows.
License †
Murasaki is distributed under the GNU General Public License.
Download †
Murasaki download packages are available in Murasaki download area.
Or, keep up with the latest release using
Mercurial:
hg clone http://murasaki.hg.sourceforge.net:8000/hgroot/murasaki/murasaki
Subversion support is deprecated, but technically still exists. There will be no further releases to the subversion tree, so it's advised that you migrate to mercurial when convenient.
Requirements †
- Boost to build/run the core Murasaki algorithm
Optional requirements †
- Perl to assist in the build process and to interface with some of the optional packages described below.
- CryptoPP (optional, but enabled by default) provides CPU specific enhancements.
- If you don't want to use CryptoPP you can disable it any of the following ways:
- compiling via a command like "make WITH_LIBCRYPTOPP=NO"
- setting WITH_LIBCRYPTOPP=NO as an environment variable before running make
- setting WITH_LIBCRYPTOPP=NO somewhere in the Makefile
- If you don't want to use CryptoPP you can disable it any of the following ways:
- MPI. To use Murasaki in a cluster, you'll need some implementation of MPI.
Murasaki interfaces with a lot of other free software to generate graphs and statistical information. To use all the features of Murasaki, you should also have:
- BioPerl is required by the annotation reading parts of the perl scripts.
Build instructions †
Building under a debian based system (the intended audience) is very easy. Make sure you have the appropriate packages installed:
- On Debian lenny: aptitude install libboost-dev libcrypto++-dev g++ make perl
- On Debian squeeze: aptitude install libboost-all-dev libcrypto++-dev g++ make perl
- On Ubuntu [karmic,lucid]: aptitude install libboost-dev libboost-regex-dev libboost-filesystem-dev libcrypto++-dev g++ make perl
- On Mac OS X: port install gcc42 boost libcryptopp
- We recommend using Mac Ports to get/build these libraries/packages
If your system is already set up perfectly, once you've download one of the above packages, the following should work:
- cd murasaki
- make
In general, the included Makefile should find everything it needs automatically (including detecting whether or not you have Crypto++ and an MPI compiler available), but if something fails, or something isn't detected automatically, feel free to edit the Makefile accordingly. If all else fails, email us and we'll see what we can do to help.
Getting started †
Most of the documentation for Murasaki currently exists inside the various programs. You can find out what any command does by running it with the "--help" option. For example "./murasaki --help" lists how to run Murasaki. It's long, so you might want to use "./murasaki --help | less".
There's also a manpage in doc/murasaki.1 (or the .html or .txt formats) and installation notes in doc/INSTALL.
An example Murasaki run might go like this:
./murasaki seq/MtC.gbk seq/Mle.gbk -p[28:36] -b24 --name myalignment | Runs the core alignment program. "seq/MtC.gbk seq/Mle.gbk" specifies the input sequences. "-p[28:36]" uses a random string consisting of 28 1's and 8 0's. -b24 specifies to use only 24bit hash keys (as opposed to the default 26). This is desriable (possibly necessary) for machines with limited RAM. --name obviously the output file prefix. |
./simplegraph.pl output/myalignment.anchors | This generates (in this case 1) graph of the anchors produced. For multiple alignments this outputs all pairings of the component sequences. |
Obviously this is just a sample run. You're strongly encouraged to read the documentation (run with "--help") for each command. Murasaki includes a great deal of functionality without the need to write any custom scripts.
Sample alignments †
As an example of some of the huge alignments Murasaki is capable of, you can download the complete set of our whole genome mammalian alignments here. Be aware, however, that these alignments can be huge (for example, murasaki-mammals.tar.gz contains the Human-Mouse-Rat, Human-Chimp-Rhesus, and Human-Mouse alignments, and is a 340MB download which decompresses into about 1GB of files), and you may have to edit the .seq files to point to the correct data files (and download them from ensembl or UCSC Genome Browser.
Documentation †
Documentation is still a work in progress. For now please email questions to the author. I've started building an FAQ.
Citation †
If you use Murasaki in your work, please cite our PLoS ONE publication:
- Popendorf K, Tsuyoshi H, Osana Y, Sakakibara Y (2010) Murasaki: A Fast, Parallelizable Algorithm to Find Anchors from Multiple Genomes. PLoS ONE 5(9): e12651. doi:10.1371/journal.pone.0012651