FAQ

I get an error like "Encountered exception: Murasaki: Error creating System V IPC shared memory segment (size: 36.93 mb) for dna/human/chrX.fa.gz: Invalid argument"

The default linux kernel only permits a total of 32mb of shared memory. Using System V IPC shared memory with murasaki and large sequences you'll quickly run into this limit. You can set a higher limit (say 6gb) by running "sysctl -w kernel.shmmax=6442450944" (or setting this in /etc/sysctl.conf).

What's the -p (--pattern) argument?

The -p parameter specifies the pattern used when creating seeds. While there's a number of posters for Murasaki, the journal paper is still in preparation, so it's rather hard to explain, but spaced seeds are a common feature now in homology search programs, and if you want some more information on them, you should check out the PatternHunter paper which introduced the idea. Basically a pattern is sequence which represents which bases in a seed must match and which can be mismatched. For example, for the pattern 101, "ATA" matches "AAA" but not "AAT". The -p argument can take a specific pattern like -p101, but in general we find that random patterns are generally acceptable, so we specify "random patterns" to Murasaki in the form of -p[weight:length] (the [ ] characters can be omitted) where "weight" represents the number of 1s in the pattern and "length" is the total length. The longer the pattern and the more 1s in it, the more specific but less sensitive it becomes.

I'm comparing some complex genomes and it gets to "Extracting anchors from hash-space" and shows the % like 70 or 80 then it won't move. What's up with that?

Without knowledge about the input sequences or the other options you're supplying to Murasaki, it's hard to guess, but I suspect that you're running into a lot of repeats, which can cause Murasaki to take exponential time (depending on the number of input sequences). You can use "--mergefilter X" where X is maximum number of anchors to generate from a given seed by setting. Any seeds which would cause more anchors than X to be generated are classed as "repeats" and will be stored to <name>.repeats. Usually a number like 100 or so is safe here.

An alternate hypothesis is that your pattern is too short and you're generating repeats that (with a little more context) are not in fact repeats. Using a longer might also help.