EMBASSY: MIRA: emira

emira

Wiki

The master copies of EMBOSS documentation are available at http://emboss.open-bio.org/wiki/Appdocs on the EMBOSS Wiki.

Please help by correcting and extending the Wiki pages.

Function

MIRA fragment assembly program

Description

The swiss army knife of sequence assembly for efficient and accurate sequence assembly jobs. Particularly well suited to the assembly of extremely 'unfriendly' projects containing lots of repetitive sequences.

It perform true hybrid de-novo assemblies using reads gathered through Sanger, 454 or Solexa sequencing technologies. That is, it assembles reads instead of a mix of (eventually shredded) shredded consensus sequence and reads. It works for Sanger/454, and also with Sanger/Solexa or 454/Solexa or Sanger/454/Solexa. The length of the Solexa sequences is not restricted, they can be 36mers to 150mers or more.

MIRA contains integrated editors for Sanger and 454 sequences which iteratively remove many sequencing errors from the assembly project and improve the overal alignment quality.

It can also be used for mapping assemblies and automatic tagging of difference site (SNPs, insertions or deletions) of mutant strains against a reference sequence.

For organisms without exon/intron gene structure (bacteria, viruses etc.) and where annotated files in GenBank format are available, MIRA can generate tables which are ready to use for biologists as they show exactly which genes are hit and give a first estimate whether the function of the protein is attained by the change.

Algorithm

**************** EDIT HERE ****************

Usage

Here is a sample session with emira

% emira -setparam fasta -project cjejuni_demo -genome accurate -mxti -rns tigr -orh MIRA fragment assembly program This is MIRA V2.8.3 (production version). Please cite: Chevreux, B., Wetter, T. and Suhai, S. (1999), Genome Sequence Assembly Using Trace Signals and Additional Sequence Information. Computer Science and Biology: Proceedings of the German Conference on Bioinformatics (GCB) 99, pp. 45-56. Mail questions, bug reports, ideas or suggestions to: bach@chevreux.org Compiled in boundtracking mode. Compiled in bugtracking mode. Parsing parameters: -genomeaccurate -fasta -GE:project=cjejuni_demo -GE:mxti=yes -OUT:orh=yes -GE:rns=tigr Using quickmode switch -genomeaccurate : -GE:uti=yes -AS:mrl=40:nop=4:sep=yes:rbl=4:sd=yes:sdlpo=yes:ugpf=yes -DP:ure=yes:rewl=30:rewme=2:feip=0;leip=0:tpae=no -CL:pvc=yes:pvcmla=18:qc=no:mbc=no:emlc=yes:mlcr=25:smlc=30 -SK:bph=16:hss=4:pr=45:mhpr=200 -AL:bip=20:bmin=25:bmax=130:mo=15:ms=30:mrs=65:egp=yes:egpl=low -CO:rodirs=25:mr=yes:asir=no:mrpg=2:emea=25 amgb=yes:amgbemc=yes:amgbnbs=yes -ED:ace=no Using quickmode switch fasta : -GE:lj=fasta Parameters parsed without error, perfect. Used parameter settings: General (-GE): Project name (pro) : cjejuni_demo Load job (lj) : FASTA file (fasta) Filecheck only (fo) : No External quality (eq) : from SCF (scf) Ext. qual. override (eqo) : No Discard reads on e.q. error (droeqe): No Read naming scheme (rns) : TIGR (tigr) Merge with XML trace info (mxti) : Yes Use template information (uti) : Yes EST-assembly start step (ess) : 1 Assembly options (-AS): Minimum read length (mrl) : 40 Number of passes (nop) : 4 Skim each pass (sep) : Yes Maximum number of RMB break loops (rbl) : 4 Spoiler detection (sd) : Yes Last pass only (sdlpo) : Yes Base default quality (bdq) : Yes Use genomic pathfinder (ugpf) : Yes Use emergency search stop (uess) : Yes ESS partner depth (esspd) : 500 Use emergency blacklist (uebl) : Yes Use max. contig build time (umcbt) : No Build time in seconds (bts) : 10000 Strain and backbone options (-SB): Load straindata (lsd) : No Load backbone (lb) : No Start backbone usage in pass (sbuip): 3 Backbone strain name (bsn) : (none) Backbone file type (bft) : FASTA file (fasta) Backbone rail length (brl) : 2500 Backbone base quality (bbq) : 0 Also build new contigs (abnc) : Yes Dataprocessing options (-DP): Use read extensions (ure) : Yes Read extension window length (rewl) : 30 Read extension w. maxerrors (rewme) : 2 First extension in pass (feip) : 0 Last extension in pass (leip) : 0 Tag poly A/T at ends (tpae) : No Polybase window length (pbwl) : 7 Polybase window maxerrors (pbwme) : 2 Polyb. window grace distance (pbwgc): 9 Clipping options (-CL): Possible vector leftover clip (pvc) : Yes maximum len allowed (pvcmla) : 18 Quality clip (qc) : No Minimum quality (qcmq) : 20 Window length (qcwl) : 30 Masked bases clip (mbc) : No Gap size (mbcgs) : 20 Max front gap (mbcmfg) : 40 Max end gap (mbcmeg) : 60 Ensure minimum left clip (emlc) : Yes Minimum left clip req. (mlcr) : 25 Set minimum left clip to (smlc) : 30 Parameters for SKIM algorithm (-SK): Bases per hash (bph) : 16 Hash save stepping (hss) : 4 Percent required (pr) : 45 Maximum hashes in memory (mhim) : 15000000 Max hits per read (mhpr) : 200 Align parameters for Smith-Waterman align (-AL): Bandwidth in percent (bip) : 20 Bandwidth max (bmax) : 130 Bandwidth min (bmin) : 25 Minimum score (ms) : 30 Minimum overlap (mo) : 15 Minimum relative score in % (mrs) : 65 Extra gap penalty (egp) : Yes extra gap penalty level (egpl) : low Max. egp in percent (megpp) : 100 Contig parameters (-CO): Name prefix (np) : cjejuni_demo Error analysis (an) : SCF signal (signal) Reject on drop in relative alignment score (%) : 25 Max. error rate in dangerous zones in % (dmer) : 1 Mark repeats (mr) : Yes Assume SNP instead of repeats (asir) : No Minimum reads per group needed for tagging (mrpg) : 2 Minimum neighbour quality needed for tagging (mnq) : 20 Minimum Group Quality needed for RMB Tagging (mgqrt) : 30 End-read Marking Exclusion Area in bases (emea) : 25 Also mark gap bases (amgb) : Yes Also mark gap bases - even multicolumn (amgbemc) : Yes Also mark gap bases - need both strands (amgbnbs): Yes Default template insert size minimum (dismin) : 500 Default template insert size maximum (dismax) : 5000 Edit options (-ED): Automatic contig editing (ace) : No Strict editing mode (sem) : No Confirmation threshold in percent (ct): 50 Directories (-DI): When loading EXP files: When loading SCF files: For writing log files : cjejuni_demo_log For writing gap4 DA res.: cjejuni_demo_out Input files (-FI): When loading EXP fofn : cjejuni_demo_in.fofn When loading project from PHD : cjejuni_demo_in.phd.1 When loading project from CAF : cjejuni_demo_in.caf When loading sequences from FASTA : cjejuni_demo_in.fasta When loading qualities from FASTA quality: cjejuni_demo_in.fasta.qual When loading straindata : cjejuni_demo_straindata_in.txt When loading XML trace info files : cjejuni_demo_traceinfo_in.xml When loading backbone from CAF : cjejuni_demo_backbone_in.caf When loading backbone from GenBank : cjejuni_demo_backbone_in.gbf When loading backbone from FASTA : cjejuni_demo_backbone_in.fasta Output files (-OUTPUT/-OUT): Result files: Saved as CAF (orc): Yes Saved as FASTA (orf): Yes Saved as GAP4 (directed assembly) (org): Yes Saved as phrap ACE (ora): Yes Saved as HTML (orh): Yes Saved as Transposed Contig Summary (ors): Yes Saved as simple text format (ort): Yes Temporary result files: Saved as CAF (otc): No Saved as FASTA (otf): No Saved as GAP4 (directed assembly) (otg): No Saved as phrap ACE (ota): No Saved as HTML (oth): No Saved as Transposed Contig Summary(ots): No Saved as simple text format (ott): No Extended temporary result files: Saved as CAF (oetc): No Saved as FASTA (oetf): No Saved as GAP4 (directed assembly) (oetg): No Saved as phrap ACE (oeta): No Saved as HTML (oeth): No Save also singlets (oetas): No Alignment output customisation: TEXT characters per line (tcpl): 60 HTML characters per line (hcpl): 60 TEXT characters per line (tegfc): ' ' HTML characters per line (hegfc): ' ' File / directory names: CAF : cjejuni_demo_out.caf FASTA : cjejuni_demo_out.unpadded.fasta FASTA quality : cjejuni_demo_out.unpadded.fasta.qual FASTA (padded) : cjejuni_demo_out.padded.fasta FASTA qual.(pad): cjejuni_demo_out.padded.fasta.qual GAP4 (directory): cjejuni_demo_out.gap4da ACE : cjejuni_demo_out.ace HTML : cjejuni_demo_out.html Simple text : cjejuni_demo_out.txt TCS overview : cjejuni_demo_out.tcs Creating directory cjejuni_demo_log ... done. Creating directory cjejuni_demo_results ... done. Creating directory cjejuni_demo_info ... done. Localtime: Thu Jul 15 12:00:00 2010 Loading data normal (probably Sanger type) from FASTA file cjejuni_demo_in.fasta Counting sequences in FASTA file: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Loading sequence data from FASTA file: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Loading quality data from FASTA quality file: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Done. There haven been 544 reads given, 544 of which have quality accounted for. Localtime: Thu Jul 15 12:00:00 2010 Checking SCF files (loading qualities only if needed): [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Done. 0 SCF files loaded ok. 544 SCF files were not found (see 'cjejuni_demo_log/cjejuni_demo_info_scfreadfail.0' for a list of names). Localtime: Thu Jul 15 12:00:00 2010 Merging data from XML trace info file cjejuni_demo_traceinfo_in.xml ...Num reads: 496 Building hash table ... done. Localtime: Thu Jul 15 12:00:00 2010 Generated 288 unique template ids for 544 valid reads. Done merging XML data, matched 496 reads. Localtime: Thu Jul 15 12:00:00 2010 Checking SCF files (loading qualities only if needed): [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Done. 0 SCF files loaded ok. 544 SCF files were not found (see 'cjejuni_demo_log/cjejuni_demo_info_scfreadfail.0' for a list of names). Starting minimum left vector clip ... done. Pool has 544 reads . Checking reads for trace data: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] No SCF data present in any read, automatic contig editing is now switched off. 544 reads with valid data for assembly. For the reads that are neither backbones nor rails: - 0 reads have not enough good bases for assembly. - 544 reads used for assembly. - 0 reads have no real quality (see miralog.noqualities). - mean length of good parts of used reads: 626 Localtime: Thu Jul 15 12:00:00 2010 Generated 288 unique template ids for 544 valid reads. Localtime: Thu Jul 15 12:00:00 2010 Generated 0 unique strain ids for 544 reads. Localtime: Thu Jul 15 12:00:00 2010 Searching for possible overlaps: Localtime: Thu Jul 15 12:00:00 2010 We will get 1 partitions. Progressend: 1088 Now running partitioned skimmer with 1 partitions: Working on partition 1/1 Will contain read IDs 0 to 543 [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Total megahubs: 0 Skim summary: accepted: 4243 possible: 4607 permbans: 0 Hits chosen: 4243 Localtime: Thu Jul 15 12:00:00 2010 Pre-assembly alignment search for read extension and / or vector clipping: Making alignments. Localtime: Thu Jul 15 12:00:00 2010 Aligning possible forward matches: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Localtime: Thu Jul 15 12:00:00 2010 Aligning possible complement matches: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Localtime: Thu Jul 15 12:00:00 2010 Calculating possible vector leftovers ... done. Loading confirmed overlaps from disk (will need approximately 1.2 M.): [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Sorting confirmed overlaps (this may take a while) ... done. Generating clusters: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Pre-assembly read extension: Localtime: Thu Jul 15 12:00:00 2010 Searching possible read extensions: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Changed length of 258 sequences. Mean length gained in these sequences: 73.2713 bases. Pre-assembly vector clipping Performing vector clipping ... done. Pool has 544 reads . Checking reads for trace data: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] No SCF data present in any read, automatic contig editing is now switched off. 544 reads with valid data for assembly. For the reads that are neither backbones nor rails: - 0 reads have not enough good bases for assembly. - 544 reads used for assembly. - 0 reads have no real quality (see miralog.noqualities). - mean length of good parts of used reads: 660 Localtime: Thu Jul 15 12:00:00 2010 Generated 288 unique template ids for 544 valid reads. Localtime: Thu Jul 15 12:00:00 2010 Generated 0 unique strain ids for 544 reads. Localtime: Thu Jul 15 12:00:00 2010 Searching for possible overlaps: Localtime: Thu Jul 15 12:00:00 2010 We will get 1 partitions. Progressend: 1088 Now running partitioned skimmer with 1 partitions: Working on partition 1/1 Will contain read IDs 0 to 543 [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Total megahubs: 0 Skim summary: accepted: 4512 possible: 4913 permbans: 0 Hits chosen: 4512 Localtime: Thu Jul 15 12:00:00 2010 Pass: 1 Making alignments. Localtime: Thu Jul 15 12:00:00 2010 Aligning possible forward matches: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Localtime: Thu Jul 15 12:00:00 2010 Aligning possible complement matches: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Localtime: Thu Jul 15 12:00:00 2010 Calculating possible vector leftovers ... done. Loading confirmed overlaps from disk (will need approximately 1.3 M.): [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Sorting confirmed overlaps (this may take a while) ... done. Generating clusters: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Localtime: Thu Jul 15 12:00:00 2010 Building new contig 1 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 544 +[1] t+t++++a+aaaaar RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 1 Contig length: 2467 Avg. contig coverage: 2.36 Consensus contains: A: 701 C: 457 G: 592 T: 690 N: 0 IUPAC: 7 Funny: 0 *: 20 Num reads: 7 Avg. read length: 833 Reads contain 5780 bases, 0 Ns and 55 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 2 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 537 +[1] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [61] ++++++++++++++++++++++++++++++++++++++++++++++++++++++t+++++ [120] ++++++++++++++++++++++++++++++++++++++++++a+++a+++++++++++++ [178] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [238] ++++++++++++a+++++a+++++++++++++++++++++++++++++++++++++++++ [296] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [356] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [416] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [476] +++++a++++++++a+a++++++++++++++++a++++++++++++++++++++ RL1 [526] aaaThat's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 2 Contig length: 40028 Avg. contig coverage: 8.66 Consensus contains: A: 13590 C: 5845 G: 6941 T: 13404 N: 0 IUPAC: 24 Funny: 0 *: 224 Num reads: 526 Avg. read length: 659 Reads contain 343983 bases, 0 Ns and 2661 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Marking possibly misassembled repeats ...done. Found - 1 Strong RMB - 3 Weak RMB - 0 SNP positions tagged.Transfering contig RMB permanent pair bans. Transfering tags to readpool. The previously assembled contig had grave misassemblies, rebuilding contig 2 now. Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 537 +[1] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [61] ++++++++++++++++++++++++++++++++++++++++++++++++++++++t+++++ [120] ++++++++++++++++++++++++++++++++++++++++++a+++a+++++++++++++ [178] +++++++++++++++++++++++++++++++++++p+++p++++++++++++++++++++ [236] +++++++++a+++++a++++++++++++++++++++++++++++++++++++++++++++ [294] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [354] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [414] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [474] +++++++++a++++p+a+p+++++++++a+++++a+++++++++++++++++++++ RL1 [524] aaapThat's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 2 Contig length: 40021 Avg. contig coverage: 8.62 Consensus contains: A: 13590 C: 5845 G: 6951 T: 13404 N: 0 IUPAC: 14 Funny: 0 *: 217 Num reads: 524 Avg. read length: 658 Reads contain 342555 bases, 0 Ns and 2577 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Marking possibly misassembled repeats ...done. Found - 0 Strong RMB - 3 Weak RMB - 0 SNP positions tagged.Transfering contig RMB permanent pair bans. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 3 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 13 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 3 Contig length: 805 Avg. contig coverage: 1 Consensus contains: A: 303 C: 146 G: 115 T: 241 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 805 Reads contain 805 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 4 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 12 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 4 Contig length: 788 Avg. contig coverage: 1 Consensus contains: A: 285 C: 152 G: 124 T: 227 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 788 Reads contain 788 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 5 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 11 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 5 Contig length: 788 Avg. contig coverage: 1 Consensus contains: A: 254 C: 118 G: 133 T: 283 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 788 Reads contain 788 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 6 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 10 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 6 Contig length: 786 Avg. contig coverage: 1 Consensus contains: A: 281 C: 129 G: 138 T: 238 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 786 Reads contain 786 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 7 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 9 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 7 Contig length: 865 Avg. contig coverage: 1 Consensus contains: A: 314 C: 149 G: 103 T: 299 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 865 Reads contain 865 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 8 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 8 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 8 Contig length: 963 Avg. contig coverage: 1 Consensus contains: A: 215 C: 286 G: 205 T: 257 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 963 Reads contain 963 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 9 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 7 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 9 Contig length: 1052 Avg. contig coverage: 1 Consensus contains: A: 308 C: 286 G: 166 T: 292 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 1052 Reads contain 1052 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 10 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 6 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 10 Contig length: 563 Avg. contig coverage: 1 Consensus contains: A: 195 C: 71 G: 110 T: 187 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 563 Reads contain 563 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 11 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 5 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 11 Contig length: 893 Avg. contig coverage: 1 Consensus contains: A: 251 C: 177 G: 136 T: 329 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 893 Reads contain 893 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 12 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 4 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 12 Contig length: 478 Avg. contig coverage: 1 Consensus contains: A: 116 C: 160 G: 101 T: 101 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 478 Reads contain 478 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 13 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 3 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 13 Contig length: 869 Avg. contig coverage: 1 Consensus contains: A: 286 C: 245 G: 93 T: 245 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 869 Reads contain 869 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 14 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 2 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 14 Contig length: 973 Avg. contig coverage: 1 Consensus contains: A: 266 C: 228 G: 254 T: 225 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 973 Reads contain 973 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 15 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 1 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 15 Contig length: 972 Avg. contig coverage: 1 Consensus contains: A: 284 C: 230 G: 123 T: 335 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 972 Reads contain 972 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Saving project statistics to file: cjejuni_demo_log/cjejuni_demo_info_contigstats_pass.1.txt Saving read tag list to file: cjejuni_demo_log/cjejuni_demo_info_readtaglist.1.txt Saving contig tag list to file: cjejuni_demo_log/cjejuni_demo_info_consensustaglist.1.txt Saving project contig<->read list to file: cjejuni_demo_log/cjejuni_demo_info_contigreadlist_pass.1.txt Pass: 2 Performing vector clipping ... done. Pool has 544 reads . Checking reads for trace data: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] No SCF data present in any read, automatic contig editing is now switched off. 544 reads with valid data for assembly. For the reads that are neither backbones nor rails: - 0 reads have not enough good bases for assembly. - 544 reads used for assembly. - 0 reads have no real quality (see miralog.noqualities). - mean length of good parts of used reads: 660 Localtime: Thu Jul 15 12:00:00 2010 Generated 288 unique template ids for 544 valid reads. Localtime: Thu Jul 15 12:00:00 2010 Generated 0 unique strain ids for 544 reads. Localtime: Thu Jul 15 12:00:00 2010 Searching for possible overlaps: Localtime: Thu Jul 15 12:00:00 2010 We will get 1 partitions. Progressend: 1088 Now running partitioned skimmer with 1 partitions: Working on partition 1/1 Will contain read IDs 0 to 543 [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Total megahubs: 0 Skim summary: accepted: 4512 possible: 4913 permbans: 0 Hits chosen: 4512 Localtime: Thu Jul 15 12:00:00 2010 Making alignments. Localtime: Thu Jul 15 12:00:00 2010 Aligning possible forward matches: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Localtime: Thu Jul 15 12:00:00 2010 Aligning possible complement matches: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Localtime: Thu Jul 15 12:00:00 2010 Calculating possible vector leftovers ... done. Loading confirmed overlaps from disk (will need approximately 1.3 M.): [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Sorting confirmed overlaps (this may take a while) ... done. Generating clusters: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Localtime: Thu Jul 15 12:00:00 2010 Building new contig 1 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 544 +[1] t+t++++a+aaaaar RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 1 Contig length: 2467 Avg. contig coverage: 2.36 Consensus contains: A: 701 C: 457 G: 592 T: 690 N: 0 IUPAC: 7 Funny: 0 *: 20 Num reads: 7 Avg. read length: 833 Reads contain 5780 bases, 0 Ns and 55 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 2 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 537 +[1] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [61] +++++++++++++++++++++++++++++++++++++++++++++++++++t++++++++ [120] ++++++++++++++++++++++++++++++++++++++a+a++++++aa+++++++++++ [176] +++++++++++++++++++++++++++++++++++++p++++++++++p+++++++++++ [234] ++++++++++a+++++a+++++++++++++++++++++++++++++++++++++++++++ [292] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [352] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [412] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [472] +++++++++++++++++p+++++++a+++a+++++++++++++++++++++++++ RL1 [524] aapaThat's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 2 Contig length: 40021 Avg. contig coverage: 8.62 Consensus contains: A: 13590 C: 5845 G: 6951 T: 13404 N: 0 IUPAC: 14 Funny: 0 *: 217 Num reads: 524 Avg. read length: 658 Reads contain 342548 bases, 0 Ns and 2577 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Marking possibly misassembled repeats ...done. Found - 0 Strong RMB - 3 Weak RMB - 0 SNP positions tagged.Transfering contig RMB permanent pair bans. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 3 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 13 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 3 Contig length: 805 Avg. contig coverage: 1 Consensus contains: A: 303 C: 146 G: 115 T: 241 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 805 Reads contain 805 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 4 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 12 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 4 Contig length: 788 Avg. contig coverage: 1 Consensus contains: A: 285 C: 152 G: 124 T: 227 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 788 Reads contain 788 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 5 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 11 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 5 Contig length: 788 Avg. contig coverage: 1 Consensus contains: A: 254 C: 118 G: 133 T: 283 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 788 Reads contain 788 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 6 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 10 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 6 Contig length: 786 Avg. contig coverage: 1 Consensus contains: A: 281 C: 129 G: 138 T: 238 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 786 Reads contain 786 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 7 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 9 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 7 Contig length: 865 Avg. contig coverage: 1 Consensus contains: A: 314 C: 149 G: 103 T: 299 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 865 Reads contain 865 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 8 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 8 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 8 Contig length: 963 Avg. contig coverage: 1 Consensus contains: A: 215 C: 286 G: 205 T: 257 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 963 Reads contain 963 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 9 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 7 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 9 Contig length: 1052 Avg. contig coverage: 1 Consensus contains: A: 308 C: 286 G: 166 T: 292 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 1052 Reads contain 1052 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 10 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 6 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 10 Contig length: 563 Avg. contig coverage: 1 Consensus contains: A: 195 C: 71 G: 110 T: 187 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 563 Reads contain 563 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 11 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 5 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 11 Contig length: 893 Avg. contig coverage: 1 Consensus contains: A: 251 C: 177 G: 136 T: 329 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 893 Reads contain 893 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 12 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 4 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 12 Contig length: 478 Avg. contig coverage: 1 Consensus contains: A: 116 C: 160 G: 101 T: 101 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 478 Reads contain 478 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 13 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 3 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 13 Contig length: 869 Avg. contig coverage: 1 Consensus contains: A: 286 C: 245 G: 93 T: 245 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 869 Reads contain 869 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 14 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 2 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 14 Contig length: 973 Avg. contig coverage: 1 Consensus contains: A: 266 C: 228 G: 254 T: 225 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 973 Reads contain 973 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 15 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 1 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 15 Contig length: 972 Avg. contig coverage: 1 Consensus contains: A: 284 C: 230 G: 123 T: 335 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 972 Reads contain 972 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Saving project statistics to file: cjejuni_demo_log/cjejuni_demo_info_contigstats_pass.2.txt Saving read tag list to file: cjejuni_demo_log/cjejuni_demo_info_readtaglist.2.txt Saving contig tag list to file: cjejuni_demo_log/cjejuni_demo_info_consensustaglist.2.txt Saving project contig<->read list to file: cjejuni_demo_log/cjejuni_demo_info_contigreadlist_pass.2.txt Pass: 3 Performing vector clipping ... done. Pool has 544 reads . Checking reads for trace data: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] No SCF data present in any read, automatic contig editing is now switched off. 544 reads with valid data for assembly. For the reads that are neither backbones nor rails: - 0 reads have not enough good bases for assembly. - 544 reads used for assembly. - 0 reads have no real quality (see miralog.noqualities). - mean length of good parts of used reads: 660 Localtime: Thu Jul 15 12:00:00 2010 Generated 288 unique template ids for 544 valid reads. Localtime: Thu Jul 15 12:00:00 2010 Generated 0 unique strain ids for 544 reads. Localtime: Thu Jul 15 12:00:00 2010 Searching for possible overlaps: Localtime: Thu Jul 15 12:00:00 2010 We will get 1 partitions. Progressend: 1088 Now running partitioned skimmer with 1 partitions: Working on partition 1/1 Will contain read IDs 0 to 543 [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Total megahubs: 0 Skim summary: accepted: 4498 possible: 4913 permbans: 14 Hits chosen: 4498 Localtime: Thu Jul 15 12:00:00 2010 Making alignments. Localtime: Thu Jul 15 12:00:00 2010 Aligning possible forward matches: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Localtime: Thu Jul 15 12:00:00 2010 Aligning possible complement matches: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Localtime: Thu Jul 15 12:00:00 2010 Calculating possible vector leftovers ... done. Loading confirmed overlaps from disk (will need approximately 1.3 M.): [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Sorting confirmed overlaps (this may take a while) ... done. Generating clusters: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Localtime: Thu Jul 15 12:00:00 2010 Building new contig 1 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 544 +[1] t+t++++a+aaaaar RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 1 Contig length: 2467 Avg. contig coverage: 2.36 Consensus contains: A: 701 C: 457 G: 592 T: 690 N: 0 IUPAC: 7 Funny: 0 *: 20 Num reads: 7 Avg. read length: 833 Reads contain 5780 bases, 0 Ns and 55 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 2 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 537 +[1] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [61] +++++++++++++++++++++++++++++++++++++++++++++++++++t++++++++ [120] ++++++++++++++++++++++++++++++++++++++a+a++++++aa+++++++++++ [176] +++++++++++++++++++++++++++++++++++++p++++++++++p+++++++++++ [234] ++++++++++a+++++a+++++++++++++++++++++++++++++++++++++++++++ [292] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [352] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [412] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [472] +++++++++++++++++p+++++++a+++a+++++++++++++++++++++++++ RL1 [524] aapaThat's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 2 Contig length: 40021 Avg. contig coverage: 8.62 Consensus contains: A: 13590 C: 5845 G: 6951 T: 13404 N: 0 IUPAC: 14 Funny: 0 *: 217 Num reads: 524 Avg. read length: 658 Reads contain 342548 bases, 0 Ns and 2577 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Marking possibly misassembled repeats ...done. Found - 0 Strong RMB - 3 Weak RMB - 0 SNP positions tagged.Transfering contig RMB permanent pair bans. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 3 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 13 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 3 Contig length: 805 Avg. contig coverage: 1 Consensus contains: A: 303 C: 146 G: 115 T: 241 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 805 Reads contain 805 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 4 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 12 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 4 Contig length: 788 Avg. contig coverage: 1 Consensus contains: A: 285 C: 152 G: 124 T: 227 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 788 Reads contain 788 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 5 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 11 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 5 Contig length: 788 Avg. contig coverage: 1 Consensus contains: A: 254 C: 118 G: 133 T: 283 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 788 Reads contain 788 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 6 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 10 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 6 Contig length: 786 Avg. contig coverage: 1 Consensus contains: A: 281 C: 129 G: 138 T: 238 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 786 Reads contain 786 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 7 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 9 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 7 Contig length: 865 Avg. contig coverage: 1 Consensus contains: A: 314 C: 149 G: 103 T: 299 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 865 Reads contain 865 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 8 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 8 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 8 Contig length: 963 Avg. contig coverage: 1 Consensus contains: A: 215 C: 286 G: 205 T: 257 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 963 Reads contain 963 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 9 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 7 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 9 Contig length: 1052 Avg. contig coverage: 1 Consensus contains: A: 308 C: 286 G: 166 T: 292 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 1052 Reads contain 1052 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 10 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 6 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 10 Contig length: 563 Avg. contig coverage: 1 Consensus contains: A: 195 C: 71 G: 110 T: 187 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 563 Reads contain 563 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 11 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 5 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 11 Contig length: 893 Avg. contig coverage: 1 Consensus contains: A: 251 C: 177 G: 136 T: 329 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 893 Reads contain 893 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 12 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 4 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 12 Contig length: 478 Avg. contig coverage: 1 Consensus contains: A: 116 C: 160 G: 101 T: 101 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 478 Reads contain 478 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 13 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 3 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 13 Contig length: 869 Avg. contig coverage: 1 Consensus contains: A: 286 C: 245 G: 93 T: 245 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 869 Reads contain 869 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 14 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 2 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 14 Contig length: 973 Avg. contig coverage: 1 Consensus contains: A: 266 C: 228 G: 254 T: 225 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 973 Reads contain 973 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 15 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 1 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 15 Contig length: 972 Avg. contig coverage: 1 Consensus contains: A: 284 C: 230 G: 123 T: 335 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 972 Reads contain 972 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Saving project statistics to file: cjejuni_demo_log/cjejuni_demo_info_contigstats_pass.3.txt Saving read tag list to file: cjejuni_demo_log/cjejuni_demo_info_readtaglist.3.txt Saving contig tag list to file: cjejuni_demo_log/cjejuni_demo_info_consensustaglist.3.txt Saving project contig<->read list to file: cjejuni_demo_log/cjejuni_demo_info_contigreadlist_pass.3.txt Localtime: Thu Jul 15 12:00:00 2010 Hunting contig join spoiler ... done. Localtime: Thu Jul 15 12:00:00 2010 Pass: 4 Performing vector clipping ... done. Pool has 544 reads . Checking reads for trace data: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] No SCF data present in any read, automatic contig editing is now switched off. 544 reads with valid data for assembly. For the reads that are neither backbones nor rails: - 0 reads have not enough good bases for assembly. - 544 reads used for assembly. - 0 reads have no real quality (see miralog.noqualities). - mean length of good parts of used reads: 660 Localtime: Thu Jul 15 12:00:00 2010 Generated 288 unique template ids for 544 valid reads. Localtime: Thu Jul 15 12:00:00 2010 Generated 0 unique strain ids for 544 reads. Localtime: Thu Jul 15 12:00:00 2010 Searching for possible overlaps: Localtime: Thu Jul 15 12:00:00 2010 We will get 1 partitions. Progressend: 1088 Now running partitioned skimmer with 1 partitions: Working on partition 1/1 Will contain read IDs 0 to 543 [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Total megahubs: 0 Skim summary: accepted: 4498 possible: 4913 permbans: 14 Hits chosen: 4498 Localtime: Thu Jul 15 12:00:00 2010 Making alignments. Localtime: Thu Jul 15 12:00:00 2010 Aligning possible forward matches: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Localtime: Thu Jul 15 12:00:00 2010 Aligning possible complement matches: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Localtime: Thu Jul 15 12:00:00 2010 Calculating possible vector leftovers ... done. Loading confirmed overlaps from disk (will need approximately 1.3 M.): [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Sorting confirmed overlaps (this may take a while) ... done. Generating clusters: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Localtime: Thu Jul 15 12:00:00 2010 Building new contig 1 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 544 +[1] t+t++++a+aaaaar RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 1 Contig length: 2467 Avg. contig coverage: 2.36 Consensus contains: A: 701 C: 457 G: 592 T: 690 N: 0 IUPAC: 7 Funny: 0 *: 20 Num reads: 7 Avg. read length: 833 Reads contain 5780 bases, 0 Ns and 55 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 2 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 537 +[1] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [61] +++++++++++++++++++++++++++++++++++++++++++++++++++t++++++++ [120] ++++++++++++++++++++++++++++++++++++++a+a++++++aa+++++++++++ [176] +++++++++++++++++++++++++++++++++++++p++++++++++p+++++++++++ [234] ++++++++++a+++++a+++++++++++++++++++++++++++++++++++++++++++ [292] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [352] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [412] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [472] +++++++++++++++++p+++++++a+++a+++++++++++++++++++++++++ RL1 [524] aapaThat's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 2 Contig length: 40021 Avg. contig coverage: 8.62 Consensus contains: A: 13590 C: 5845 G: 6951 T: 13404 N: 0 IUPAC: 14 Funny: 0 *: 217 Num reads: 524 Avg. read length: 658 Reads contain 342548 bases, 0 Ns and 2577 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Marking possibly misassembled repeats ...done. Found - 0 Strong RMB - 3 Weak RMB - 0 SNP positions tagged.Transfering contig RMB permanent pair bans. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 3 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 13 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 3 Contig length: 805 Avg. contig coverage: 1 Consensus contains: A: 303 C: 146 G: 115 T: 241 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 805 Reads contain 805 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 4 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 12 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 4 Contig length: 788 Avg. contig coverage: 1 Consensus contains: A: 285 C: 152 G: 124 T: 227 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 788 Reads contain 788 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 5 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 11 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 5 Contig length: 788 Avg. contig coverage: 1 Consensus contains: A: 254 C: 118 G: 133 T: 283 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 788 Reads contain 788 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 6 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 10 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 6 Contig length: 786 Avg. contig coverage: 1 Consensus contains: A: 281 C: 129 G: 138 T: 238 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 786 Reads contain 786 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 7 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 9 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 7 Contig length: 865 Avg. contig coverage: 1 Consensus contains: A: 314 C: 149 G: 103 T: 299 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 865 Reads contain 865 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 8 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 8 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 8 Contig length: 963 Avg. contig coverage: 1 Consensus contains: A: 215 C: 286 G: 205 T: 257 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 963 Reads contain 963 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 9 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 7 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 9 Contig length: 1052 Avg. contig coverage: 1 Consensus contains: A: 308 C: 286 G: 166 T: 292 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 1052 Reads contain 1052 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 10 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 6 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 10 Contig length: 563 Avg. contig coverage: 1 Consensus contains: A: 195 C: 71 G: 110 T: 187 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 563 Reads contain 563 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 11 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 5 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 11 Contig length: 893 Avg. contig coverage: 1 Consensus contains: A: 251 C: 177 G: 136 T: 329 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 893 Reads contain 893 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 12 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 4 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 12 Contig length: 478 Avg. contig coverage: 1 Consensus contains: A: 116 C: 160 G: 101 T: 101 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 478 Reads contain 478 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 13 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 3 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 13 Contig length: 869 Avg. contig coverage: 1 Consensus contains: A: 286 C: 245 G: 93 T: 245 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 869 Reads contain 869 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 14 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 2 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 14 Contig length: 973 Avg. contig coverage: 1 Consensus contains: A: 266 C: 228 G: 254 T: 225 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 973 Reads contain 973 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 15 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 1 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 15 Contig length: 972 Avg. contig coverage: 1 Consensus contains: A: 284 C: 230 G: 123 T: 335 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 972 Reads contain 972 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Saving project statistics to file: cjejuni_demo_log/cjejuni_demo_info_contigstats_pass.4.txt Saving read tag list to file: cjejuni_demo_log/cjejuni_demo_info_readtaglist.4.txt Saving contig tag list to file: cjejuni_demo_log/cjejuni_demo_info_consensustaglist.4.txt Saving project contig<->read list to file: cjejuni_demo_log/cjejuni_demo_info_contigreadlist_pass.4.txt Assembly finished, saving final results. Localtime: Thu Jul 15 12:00:00 2010 Saving project statistics to file: cjejuni_demo_info/cjejuni_demo_info_contigstats.txt Localtime: Thu Jul 15 12:00:00 2010 Saving read tag list to file: cjejuni_demo_info/cjejuni_demo_info_readtaglist.txt Localtime: Thu Jul 15 12:00:00 2010 Saving contig tag list to file: cjejuni_demo_info/cjejuni_demo_info_consensustaglist.txt Localtime: Thu Jul 15 12:00:00 2010 Saving project contig<->read list to file: cjejuni_demo_info/cjejuni_demo_info_contigreadlist.txt Localtime: Thu Jul 15 12:00:00 2010 Saving contigs to file: cjejuni_demo_results/cjejuni_demo_out.caf Localtime: Thu Jul 15 12:00:00 2010 Saving contigs to directory: cjejuni_demo_results/cjejuni_demo_out.gap4da (first deleting old directory) (now creating new directory) (saving contigs) Done. Localtime: Thu Jul 15 12:00:00 2010 Saving contigs to FASTA file: cjejuni_demo_results/cjejuni_demo_out.unpadded.fasta Saving padded contigs to FASTA file: cjejuni_demo_results/cjejuni_demo_out.padded.fasta Saving contig qualities to FASTA quality file: cjejuni_demo_results/cjejuni_demo_out.unpadded.fasta.qual Saving padded contig qualities to FASTA quality file: cjejuni_demo_results/cjejuni_demo_out.padded.fasta.qual Localtime: Thu Jul 15 12:00:00 2010 Saving contigs TCS to file: cjejuni_demo_results/cjejuni_demo_out.tcs Localtime: Thu Jul 15 12:00:00 2010 Saving SNP analysis to file: cjejuni_demo_info/cjejuni_demo_info_snpanalysis.txt Saving contigs to file: cjejuni_demo_results/cjejuni_demo_out.txt Localtime: Thu Jul 15 12:00:00 2010 Saving contigs to file: cjejuni_demo_results/cjejuni_demo_out.ace Localtime: Thu Jul 15 12:00:00 2010 Saving contigs to file: cjejuni_demo_results/cjejuni_demo_out.html Localtime: Thu Jul 15 12:00:00 2010 End of assembly process, thank you for using MIRA.

Go to the output files for this example

Command line arguments

MIRA fragment assembly program
Version: EMBOSS:6.4.0.0

   Standard (Mandatory) qualifiers:
   -technology         menu       [sanger] Which sequencing technologies have
                                  created your reads (Values: sanger
                                  (Dideoxy); 454 (Roche); solexa (Illumina);
                                  solid (ABI SOLiD))
   -jobtype            menu       [genome] Are the data you are assembling
                                  forming a larger contiguous sequence
                                  (choose: genome) or are you assembling small
                                  fragments like in EST or mRNA libraries
                                  (choose: est) (Values: genome (Whole
                                  genome); est (Short fragments))
   -method             menu       [denovo] Are you building an assembly from
                                  scratch (choose: denovo) or are you mapping
                                  reads to an existing backbone sequence
                                  (choose: mapping) (Values: denovo (de novo
                                  assembly); mapping (align to a reference
                                  sequence))
   -grade              menu       [normal] Quality grades of de-novo assembly
                                  or mapping. Draft is quick-and-dirty, suited
                                  to get a first look on approximate coverage
                                  of a running project. Should not be used
                                  for anything else. Normal is the default
                                  parameter set of mira that is able to tackle
                                  most genomes. A bit slower than the draft
                                  version, but includes such options as read
                                  extension and vector remnant clipping.
                                  Accurate is still slower than the normal
                                  mode but should be used for genomes that
                                  pose a problem to the normal mode. (Values:
                                  draft (Draft); normal (Normal); accurate
                                  (Accurate))

   Additional (Optional) qualifiers:
   -setparams          menu       [unspecified] Sets parameters suited for
                                  loading sequences from FASTA, PHD or CAF
                                  files. The default is not to specify the
                                  type of input file. (Values: unspecified
                                  (Unspecified); fasta (Fasta); phd (PHD); caf
                                  (CAF))
   -highlyrepetitive   boolean    [N] A modifier switch for genome data that
                                  is deemed to be highly repetitive. The
                                  assemblies will run slower due to more
                                  iterative cycles that give mira a chance to
                                  resolve nasty repeats.
   -noclipping         menu       [$(technology)] Switches off clipping
                                  options for given sequencing technologies.
                                  (Values: sanger (Dideoxy); 454 (Roche);
                                  solexa (Illumina); solid (ABI SOLiD))

   Advanced (Unprompted) qualifiers:
   -parameterfile      infile     Loads parameters from the filename given.
                                  Allows a maximum of 10 levels of recursion,
                                  i.e. a -params option appearing within a
                                  file that loads other parameter files
   -project            string     [mira] Default is mira. Defines the project
                                  name for this assembly. The project name
                                  automatically influences the name of input
                                  and output files or directories. E.g. in the
                                  default setting, the file names for the
                                  output of the assembly in FASTA format would
                                  be mira_out.fasta and mira_out.fasta.qual.
                                  Setting the project name to 'MyProject'
                                  would generate MyProject_out.fasta and
                                  MyProject_out.fasta.qual. (Any string)
   -inproject          string     [$(project)] Default is mira. Defines the
                                  input project name for this assembly. The
                                  input project name automatically influences
                                  the name of input files or directories only
                                  (Any string)
   -bft                menu       [fasta] Defines the filetype of the backbone
                                  file given. Currently (2.8.3) only FASTA,
                                  CAF and GBF files are supported. When GBF
                                  (GenBank files, also named .gbk) files are
                                  loaded, the features within these files are
                                  automatically transformed into
                                  Staden-compatible tags and get passed
                                  through the assembly. (Values: fasta
                                  (FASTA); caf (CAF); gbf (Genbank))
   -expdir             directory  [.] Defines the directory where mira should
                                  search for experiment files (EXP).
   -scfdir             directory  [.] Defines the directory where mira should
                                  search for SCF files
   -feifile            infile     [$(inproject)_in.fofn] Defines the file of
                                  filenames where the names of the EXP files
                                  of a project are located.
   -fpifile            infile     [$(inproject)_in.fofn] Defines the file of
                                  filenames where the names of the PHD files
                                  of a project are located.
   -pifile             infile     [$(inproject)_in.phd] Defines the PHD file
                                  to load sequences of a project from.
   -faifile            infile     [$(inproject)_in.fasta] Defines the FASTA
                                  file to load sequences of a project from.
   -fquifile           infile     [$(inproject)_in.fasta.qual] Defines the
                                  fasta file to load base qualities of a
                                  project from. Although the order of reads in
                                  the quality file does not need to be the
                                  same as in the fasta or fofn projects
                                  (although it saves a bit of time if they
                                  are).
   -fqifile            infile     [$(inproject)_in.fastq] Defines the FASTQ
                                  file to load sequences of a project from.
   -cifile             infile     [$(inproject)_in.caf] Defines the file to
                                  load a CAF project from. Filename must end
                                  with '.caf'.
   -sdifile            infile     [$(inproject)_straindata_in.txt] Defines the
                                  file to load straindata from. Only used in
                                  EST projects (miraEST).
   -xtiifile           infile     [$(inproject)_xmltraceinfo_in.xml] Defines
                                  the file to load a trace info file in XML
                                  format from. This can be used both when
                                  merging XML data to loaded files or when
                                  loading a project from an XML trace info
                                  file.
   -svsifile           infile     [$(inproject)_ssaha2vectorscreen_in.txt]
                                  Defines the file to load the info about
                                  possible vector sequence stretches.
   -bbifile            infile     [$(inproject)_in.$(technology).$(bft)]
                                  Defines the file to load the backbone
                                  sequence or assembly. Note that you still
                                  must define the file type with [-bft].
   -[no]traceinfo      toggle     [Y] Load traceinfo ancilliary data in XML
                                  files
   -lsd                boolean    [N] Straindata is a key value file, one read
                                  per line. First the name of the read, then
                                  the strain name of the organism the read
                                  comes from. It is used by the program to
                                  differentiate different types of SNPs
                                  appearing in organisms and classifying them.
   -brl                integer    [2500] Parameter for the internal sectioning
                                  size of the backbone. Extremely repetitive
                                  sequences may require reducing the default
                                  value, but the default value should work
                                  well in 99.9% of all cases. (Integer from
                                  1000 to 3000)
   -mrl                integer    [40] Minimum length that reads must have to
                                  be considered for the assembly. Shorter
                                  sequences will be filtered out at the
                                  beginning of the process and won't be
                                  present in the final project. (Integer 20 or
                                  more)
   -nop                integer    [3] Defines how many iterations of the whole
                                  assembly process are done. Rule of thumb -
                                  for quick and dirty assembly use 1 (not
                                  recommended). For assembly using read
                                  extensions and / or automatic contig editing
                                  (-ure and -ace) use at least 2. The
                                  recommended setting is 3 or higher, as some
                                  knowledge generated by the assembler can be
                                  used only from the third iteration on. More
                                  than 3 passes might be useful for projects
                                  containing many repetitive elements. See
                                  also -rbl and -mr for parameters that affect
                                  the assembly and disentanglement of
                                  possible repeats. (Integer 1 or more)
   -[no]sep            boolean    [Y] Defines whether the skim algorithm (and
                                  with it also the recalculation of
                                  Smith-Waterman alignments) is called in
                                  between each main pass. If set to 'N',
                                  skimming is done only when needed by the
                                  workflow, either when read extensions are
                                  searched for (-ure) or when possible vector
                                  leftovers are to be clipped (-pvc). Setting
                                  this option to 'Y' is highly recommended,
                                  setting it to 'N' is only for quick and
                                  dirty assemblies.
   -rbl                integer    [2] Defines the maximum number of times a
                                  contig can be rebuilt during main assembly
                                  passes (-nop) if misassemblies, due to
                                  possible repeats, are found. (Integer 1 or
                                  more)
   -not                integer    [2] Number of threads to use (see also -snot
                                  for SKIM algorithm) (Integer from 1 to 256)
   -[no]amm            boolean    [Y] Whether mira tries to optimise run time
                                  of certain algorithms in a space/time
                                  trade-off memory usage, increasing or
                                  reducing some internal tables as memory
                                  permits
   -mps                integer    [0] Maximum memory in GB (Integer 0 or more)
   -kpmf               integer    [15] Keep percentage of memory free (Integer
                                  from 0 to 100)
   -kcim               boolean    [N] Keep contigs in memory
   -esps               integer    [0] EST-SNP pipeline steps (Integer from 0
                                  to 4)
   -[no]uti            boolean    [Y] Two reads sequenced from the same clone
                                  template form a read pair with a known
                                  minimum and maximum distance. This feature
                                  will definitively help for contigs
                                  containing lots of repeats. Set this to 'Y'
                                  if your data contains information on insert
                                  sizes. Information on insert sizes can be
                                  given via the SI tag in EXP files (for each
                                  read pair individually), or for the whole
                                  project using dismin and dismax
   -tismin             integer    [-1] Template insert minimum size (Integer
                                  -1 or more)
   -tismax             integer    [-1] Template insert maximum size (Integer
                                  -1 or more)
   -[no]crhf           boolean    [Y] Colour reads by hash frequency
   -[no]pd             boolean    [Y] Controls whether date and time are
                                  printed out during the assembly. Suppressing
                                  it isn't useful in normal operation, only
                                  when debugging or benchmarking.
   -ft                 menu       [fasta] Defines whether to load and assemble
                                  EXP files from a file of filenames
                                  ('mira_in.fofn'), load and assemble FASTA
                                  sequences ('mira_in.fasta') and their
                                  qualities ('mira_in.fasta.qual'), load and
                                  assemble FASTQ sequences and qualities
                                  ('mira_in.fastq'), load and assemble
                                  sequences or qualities from a phd file
                                  ('mira_in.phd') or to load a project from a
                                  CAF file ('mira_in.caf') and assemble or
                                  eventually reassemble it. N.B. fofnphd is
                                  not currently available. (Values: fofnexp
                                  (file of EXP filenames); fasta (FASTA and
                                  quality files); fastq (FASTQ file); caf (CAF
                                  file); phd (PHD file); fofnphd (file of PHD
                                  filenames))
   -eq                 menu       [scf] Defines the source format for reading
                                  qualities from external sources. Normally
                                  takes effect only when these are not present
                                  in the format of the load_job project (EXP
                                  and FASTA can have them, CAF and PHD must
                                  have them). (Values: none (Use qualities
                                  from input files); scf (SCF quality scores))
   -eqo                boolean    [N] Only takes effect when 'lj' is fofnexp.
                                  Defines whether or not the qualities from
                                  the external source override the possibly
                                  loaded qualities from the load job project.
                                  This might be of use in case some
                                  post-processing software fiddles around with
                                  the quality values of the input file but
                                  one wants to have the original ones.
   -droeqe             boolean    [N] Should there be a major mismatch between
                                  the external quality source and the
                                  sequence (e.g. the base sequence read from a
                                  SCF file does not match the originally read
                                  base sequence), should the read be excluded
                                  from assembly or not. If not, it will use
                                  the qualities it had before trying to load
                                  the external qualities (either default
                                  qualities or the ones loaded from the
                                  original source).
   -ssiqf              boolean    [N] Solexa scores in quality file
   -fqqo               integer    [0] FASTQ quality offset (Integer from 0 to
                                  64)
   -[no]wqf            boolean    [Y] Wants quality file
   -rns                menu       [$(technology)] Defines the centre naming
                                  scheme for read suffixes. Currently, only
                                  Sanger Institute and TIGR naming schemes are
                                  supported out of the box. How to choose?
                                  Please read the documentation available at
                                  the different centres or ask your sequence
                                  provider. In a nutshell, the Sanger scheme
                                  is
                                  'somename.[pqsfrw][12][bckdeflmnpt][a|b|c|...'
                                  (e.g. U13a08f10.p1ca), TIGR scheme is
                                  'somenameTF*|TR*|TA*' (e.g. GCPBN02TF or
                                  GCPDL68TABRPT103A58B). (Values: sanger
                                  (Sanger centre); tigr (TIGR); fr (454 simple
                                  forward/reverse); stlouis (WashU); solexa
                                  (Illumina))
   -mxti               boolean    [N] Some file formats above (FASTA, PHD or
                                  even CAF and EXP) possibly don't contain all
                                  the info necessary or useful for each read
                                  of an assembly. Should additional
                                  information, such as like clipping positions
                                  etc., be available in a XML trace info file
                                  in NCBI format (see File formats), then set
                                  this option to 'Y' and it will be merged to
                                  the data loaded. Please note, quality
                                  clippings given here will override quality
                                  clippings loaded earlier or performed by
                                  mira. Minimum clippings will still be made
                                  by the program, though.
   -fo                 boolean    [N] If set to 'Y', the project will not be
                                  assembled and no assembly output files will
                                  be produced. Instead, the project files will
                                  only be loaded. This switch is useful for
                                  checking consistency of input files.
   -bdq                integer    [10] Defines the default base quality of
                                  reads that have no quality read from a file.
                                  (Integer 0 or more)
   -[no]epoq           boolean    [Y] Stops MIRA if a read has no quality
                                  values
   -[no]ard            boolean    [Y] Automatic repeat detection
   -ardct              float      [2.0] Automatic read detection coverage
                                  threshold (Number 1.000 or more)
   -ardml              integer    [400] Default is 200 for 454 technology
                                  (Integer 2 or more)
   -ardgl              integer    [40] Default depends on technology (Integer
                                  2 or more)
   -[no]urd            boolean    [Y] Default true for most genome assembly,
                                  false for EST assembly or Solexa data
   -urdsip             integer    [3] Default depends on technology and
                                  assembly quality level (Integer 1 or more)
   -urdcm              float      [1.5] Default depends on technology and
                                  assembly quality level (Number 1.000 or
                                  more)
   -klrs               boolean    [N] Default depends on assembly quality
                                  level and EST/genome assembly
   -[no]sd             boolean    [Y] Default is 'Y' for mira and 'N' for
                                  miraEST. A spoiler can be either a chimeric
                                  read or it is a read with long parts of
                                  unclipped vector sequence still included
                                  (that was too long for the -pvc vector
                                  leftover clipping routines). A spoiler
                                  typically prevents contigs being joined;
                                  MIRA will cut them back so that they present
                                  no more harm to the assembly. Recommended
                                  for assemblies of mid-to-high coverage
                                  genomic assemblies; not recommended for
                                  assemblies of ESTs as one might lose splice
                                  variants with that. A minimum number of two
                                  assembly passes (-nop) must be run for this
                                  option to take effect.
   -[no]ugpf           boolean    [Y] MIRA has two different pathfinder
                                  algorithms it chooses from to find its way
                                  through the (more or less) complete set of
                                  possible sequence overlaps; a genomic and an
                                  EST pathfinder. The genomic looks a bit
                                  into the future of the assembly and tries to
                                  stay on safe grounds using a maximum of
                                  information already present in the contig
                                  that is being built. The EST version, on the
                                  contrary, will directly jump at the complex
                                  cases posed by very similar repetitive
                                  sequences and try to solve those first; it
                                  is willing to fall down to brute force when
                                  really bad cases (such as coverage with
                                  thousands of sequences) are encountered.
                                  Generally, the genomic pathfinder will also
                                  work quite well with EST sequences (but
                                  might get slowed down a lot in pathological
                                  cases), while the EST algorithm does not
                                  work so well on genomes. If in doubt,
                                  leaveas 'Y' for genome projects and set to
                                  'N' for EST projects.
   -[no]uess           boolean    [Y] Another important switch if you plan to
                                  assemble non-normalised EST libraries, where
                                  some ESTs may reach coverages of several
                                  hundreds or thousands of reads. This switch
                                  lets MIRA save a lot of computational time
                                  when aligning those extremely high coverage
                                  areas (but only there), at the expense of
                                  some accuracy.
   -esspd              integer    [500] Defines the number of potential
                                  partners a read must have for MIRA switching
                                  into emergency search stop mode for that
                                  read. (Integer 1 or more)
   -[no]uebl           boolean    [Y] Use emergency blacklist
   -umcbt              boolean    [N] Defines whether there is an upper limit
                                  of time to be used to build one contig. Set
                                  this to 'Y' in EST assemblies where you
                                  think that extremely high coverages occur.
                                  Less useful for assembly of genomic
                                  sequences.
   -bts                integer    [10000] Depending on -umcbt above, this
                                  number defines the time in seconds alloted
                                  to building one contig. (Integer 1 or more)
   -lsbd               boolean    [N] Straindata is a key value file, one read
                                  per line. First the name of the read, then
                                  the strain name of the organism the read
                                  comes from. It is used by the program to
                                  differentiate different types of SNPs
                                  appearing in organisms and classifying them.
   -lb                 boolean    [N] A backbone is a sequence (or a previous
                                  assembly) that is used as a template for the
                                  current assembly. The current assembly
                                  process will first assemble reads to loaded
                                  backbone contigs before creating new
                                  contigs. This feature is helpful for
                                  assembling against previous (and already
                                  possibly edited) assembly iterations, or to
                                  make a comparative assembly of two very
                                  closely related organisms. Please read 'very
                                  closely related' as in 'only SNP mutations
                                  or short indels present'.
   -sbuip              integer    [3] When assembling against backbones, this
                                  parameter defines the pass iteration (see
                                  nop) from which on the backbones will be
                                  really used. In the passes preceding this
                                  number, the non-backbone reads will be
                                  assembled together as if no backbones
                                  existed. This allows mira to correctly spot
                                  repetitive stretches that differ by single
                                  bases and tag them accordingly. Rule of
                                  thumb - if backbones belong to the same
                                  strain as the reads to assemble, set to 1.
                                  If backbones are a different strain, then
                                  set sbuib to 1 lower than nop (example - nop
                                  4 and sbuip 3). (Integer 0 or more)
   -bbq                integer    [30] Defines the default quality that the
                                  backbone sequences have if they came without
                                  quality values in their files (like in GBF
                                  format or when FASTA is used without .qual
                                  files). A value of -1 causes mira to use the
                                  same default quality for backbones as for
                                  reads. (Integer from -1 to 100)
   -bsn                string     Defines the name of the strain that the
                                  backbone sequences have. (Any string)
   -bsnffa             boolean    [N] Backbone strain name force for all
   -brfs               string     Backbone rail from strain (Any string)
   -bro                integer    [0] Backbone rail overlap (Integer from 0 to
                                  2000)
   -[no]abnc           boolean    [Y] The standard mode of the assembler is to
                                  assemble available reads to a backbone and
                                  make new contigs with the remaining reads.
                                  If this option is set to 'N', the reads that
                                  cannot be assembled into existing contigs
                                  are put as singlets into the assembly, not
                                  forming new contigs.
   -[no]ure            boolean    [Y] Defines whether there is an upper limit
                                  of time to be used to build one contig. Set
                                  this to 'Y' in EST assemblies where you
                                  think that extremely high coverages occur.
                                  Less useful for assembly of genomic
                                  sequences.Default depends on technology
   -rewl               integer    [30] Only takes effect when -ure is set to
                                  'Y'. The read extension routines use a
                                  sliding window approach on Smith-Waterman
                                  alignments. This parameter defines the
                                  window length. Default depends on technology
                                  (Integer 0 or more)
   -rewme              integer    [2] Only takes effect when -ure is set to
                                  'Y'. The read extension routines use a
                                  sliding window approach on Smith-Waterman
                                  alignments. This parameter defines the
                                  number maximum number of errors
                                  (disagreements) between two alignments in
                                  the given window. Default depends on
                                  technology (Integer 0 or more)
   -feip               integer    [0] Only takes effect when -ure is set to
                                  'Y'. The read extension routines can be
                                  called before assembly and/or after each
                                  assembly pass (see -nop). This parameter
                                  defines the first pass in which the read
                                  extension routines are called. The default
                                  of 0 tells mira to extend the reads the
                                  first time before the first assembly pass.
                                  (Integer 0 or more)
   -leip               integer    [0] Only takes effect when -ure is set to
                                  'Y'. The read extension routines can be
                                  called before assembly and/or after each
                                  assembly pass (see -nop). This parameter
                                  defines the last pass in which the read
                                  extension routines are called. The default
                                  of 0 tells mira to extend the reads the last
                                  time before the first assembly pass.
                                  (Integer 0 or more)
   -msvs               boolean    [N] Merge with SSAHA vector screen
   -msvsgs             integer    [10] Default depends on the sequencing
                                  technology (Integer 0 or more)
   -msvsmfg            integer    [60] Default depends on the sequencing
                                  technology (Integer 0 or more)
   -msvsmeg            integer    [120] Default depends on the sequencing
                                  technology (Integer 0 or more)
   -msvssfc            integer    [0] Default depends on the sequencing
                                  technology (Integer 0 or more)
   -msvssec            integer    [0] Default depends on the sequencing
                                  technology (Integer 0 or more)
   -[no]pvlc           boolean    [Y] Possible vector leftover clip
   -qcmq               integer    [20] This is the minimum quality required of
                                  bases in a window in order to be accepted.
                                  Please be cautious and don't use extreme
                                  values here, because then the clipping will
                                  be too lax or too harsh. Values below 15 and
                                  higher than 35 are disallowed. (Integer
                                  from 15 to 35)
   -qcwl               integer    [30] This is the length of a window in bases
                                  for the quality clip. Default depends on
                                  sequencing technology (Integer 10 or more)
   -[no]bsqc           boolean    [Y] Bad stretch quality clip
   -bsqcmq             integer    [20] Default depends on sequencing
                                  technology (Integer 0 or more)
   -bsqcwl             integer    [30] Default depends on sequencing
                                  technology (Integer 0 or more)
   -[no]mbc            boolean    [Y] This will let mira perform a 'clipping'
                                  of bases that were masked out (replaced with
                                  the character X). It is generally not a
                                  good idea to use mask bases to remove
                                  unwanted portions of a sequence; the EXP
                                  file format and the NCBI traceinfo format
                                  have excellent possibilities to circumvent
                                  this. But because a lot of pre-processing
                                  software is built around cross_match,
                                  scylla- and phrap-style base masking, the
                                  need arised for mira to be able to handle
                                  this too. mira will look at the start and
                                  end of each sequence to see whether there
                                  are masked bases that should be 'clipped'.
   -mbcgs              integer    [20] While performing the clip of masked
                                  bases, mira will look if it can merge larger
                                  chunks of masked bases that are a maximum
                                  of -mbcgs apart. (Integer 0 or more)
   -mbcmfg             integer    [40] While performing the clip of masked
                                  bases at the start of a sequence, mira will
                                  allow up to this number of unmasked bases in
                                  front of a masked stretch. Default depends
                                  on sequencing technology. (Integer 0 or
                                  more)
   -mbcmeg             integer    [60] While performing the clip of masked
                                  bases at the end of a sequence, mira will
                                  allow up to this number of unmasked bases
                                  behind a masked stretch. Default depends on
                                  sequencing technology (Integer 0 or more)
   -lcc                boolean    [N] Default depends on sequencing technology
   -cpat               boolean    [N] Used in EST assembly
   -cpkps              boolean    [N] Clip polyA tail keep polyA signal
   -cpmsl              integer    [12] Clip polyA tail max signal length
                                  (Integer 0 or more)
   -cpmea              integer    [1] Clip polyA tail max errors allowed
                                  (Integer 1 or more)
   -cpmgfe             integer    [9] Clip polyA tail max gap from end
                                  (Integer 1 or more)
   -[no]emlc           boolean    [Y] If on, ensures a minimum left clip on
                                  each read according to the parameters in
                                  -mlcr & -smlc. Default depends on sequencing
                                  technology
   -mlcr               integer    [25] If -emlc is 'Y', checks whether there
                                  is a left clip whose length is at least the
                                  size specified here. Default depends on
                                  sequencing technology (Integer 0 or more)
   -smlc               integer    [30] If -emlc is 'Y' and the actual left
                                  clip is < -mlcr, then set the left clip of
                                  read to the value given here. Default
                                  depends on sequencing technology (Integer 0
                                  or more)
   -emrc               boolean    [N] If on, ensures a minimum right clip on
                                  each read according to the parameters in
                                  -mrcr & -smrc. Default depends on sequencing
                                  technology
   -mrcr               integer    [10] If -emrc is 'Y', checks whether there
                                  is a right clip whose length is at least the
                                  size specified here. Default depends on
                                  sequencing technology (Integer 0 or more)
   -smrc               integer    [20] If -emrc is 'Y' and the actual right
                                  clip is < -mrcr, then set the right clip of
                                  read to the value given here. Default
                                  depends on sequencing technology (Integer 0
                                  or more)
   -[no]pec            boolean    [Y] Default depends on other choices
   -pecbph             integer    [17] Default is 14 on 32 bit systems and 16
                                  on 64 bit systems. Controls the number of
                                  consecutive bases n which are used as a word
                                  hash. The higher the value the faster the
                                  search. The lower the value the more weak
                                  matches are found. Values below 10 are not
                                  recommended. Default depends on sequencing
                                  technology (Integer 10 or more)
   -snot               integer    [2] Number of threads to use in SKIM
                                  algorithm (Integer from 1 to 256)
   -bph                integer    [17] Default depends on system. Controls the
                                  number of consecutive bases n which are
                                  used as a word hash. The higher the value
                                  the faster the search. The lower the value
                                  the more weak matches are found. Values
                                  below 10 are not recommended. (Integer 1 or
                                  more)
   -hss                integer    [4] This is a parameter controlling the
                                  stepping increments with which hashes are
                                  generated. This allows for a more
                                  fine-grained search as matches are now found
                                  with at least n+s (see -bph) equal bases
                                  instead of the SSAHA 2n. The higher the
                                  value the faster the search. The lower the
                                  value the more weak matches are found.
                                  (Integer 1 or more)
   -pr                 integer    [70] Controls the relative percentage of
                                  exact word matches in an approximate overlap
                                  that has to be reached to accept the
                                  overlap as a possible match. Increasing this
                                  number will decrease the number of possible
                                  alignments that have to be checked by
                                  Smith-Waterman later on in the assembly, but
                                  it might also lead to the rejection of
                                  weaker overlaps (i.e. overlaps that contain
                                  a higher number of mismatches). (Integer 1
                                  or more)
   -mhpr               integer    [2000] Controls the maximum number of
                                  possible hits one read can maximally
                                  transport to the Smith-Waterman alignment
                                  phase. If more potential hits are found,
                                  only the best ones are taken. This is an
                                  important option for tackling projects that
                                  contain extreme assembly conditions. For
                                  example, 5000 reads that are all very
                                  similar would generate around 40 to 50
                                  million possible alignments (forward and
                                  reverse complement). Setting this parameter
                                  to 200 reduces the number of alignments to
                                  check to around 1.5-2 million. As the
                                  assembly increases in passes (-nop),
                                  different combinations of possible hits will
                                  be checked, always the probably best ones
                                  first. So the accuracy of the assembly
                                  should only suffer when lowering this number
                                  too much. (Integer 1 or more)
   -mmhr               integer    [0] If the number of reads identified as
                                  megahubs exceeds the al- lowed ratio, mira
                                  will abort.
                                  This is a fail-safe parameter to avoid
                                  assemblies where things look fishy. In case
                                  you see this, you might want to ask for
                                  advice on the mira_talk mailing list. In
                                  short: bacteria should never have megahubs
                                  (90% of all cases reported were
                                  contamination of some sort and the 10% were
                                  due to incredibly high coverage numbers).
                                  Eukaryotes are likely to contain megahubs if
                                  filtering is [-mnr] not on. (Integer 0 or
                                  more)
   -fenn               float      [0.4] Freq. est. min normal (Number 0.000 or
                                  more)
   -fexn               float      [1.6] Freq. est. max normal (Number 0.000 or
                                  more)
   -fer                float      [1.9] Freq. est. repeat (Number 0.000 or
                                  more)
   -fehr               float      [8.0] Freq. est. heavy repeat (Number 0.000
                                  or more)
   -fecr               float      [20.0] Freq. est. crazy repeat (Number 0.000
                                  or more)
   -[no]mnr            boolean    [Y] Default is dependent on --job type 'yes'
                                  for de-novo, 'no' for mapping. Tells mira
                                  to mask during the SKIM phase subsequences
                                  of size [-nph] nucleotides that appear more
                                  often than the median occurrence of
                                  subsequences would otherwise suggest. The
                                  threshold from which subsequences are
                                  considered nasty is set by -nrr
   -nrr                integer    [100] Sets the ratio from which on
                                  subsequences are considered nasty and hidden
                                  from the SKIM overlapper. The default of 10
                                  means 'mask all k-mers of [-bph] length
                                  which are occurring more than 10 times more
                                  often than the average of the project.'
                                  (Integer 2 or more)
   -mhim               integer    [15000000] Has no influence on the quality
                                  of the assembly, only on the maximum memory
                                  size needed during the skimming. The default
                                  value is equivalent to approximately 500MB.
                                  (Integer 100000 or more)
   -mchr               integer    [2048] Default depends on sequencing
                                  technology. Maximum memory used (in BM)
                                  during the reduction of skim hits. (Integer
                                  10 or more)
   -[no]uqr            boolean    [Y] Use quick rule
   -qrmla              integer    [200] Quick rule min len 1 (Any integer
                                  value)
   -qrmsa              integer    [90] Quick rule min sim 1 (Any integer
                                  value)
   -qrmlb              integer    [100] Quick rule min len 2 (Any integer
                                  value)
   -qrmsb              integer    [95] Quick rule min sim 2 (Any integer
                                  value)
   -bqoml              integer    [150] Backbone quick overlap min len (Any
                                  integer value)
   -bip                integer    [15] The banded Smith-Waterman alignment
                                  uses this percentage number to compute the
                                  bandwidth it has to use when computing the
                                  alignment matrix. E.g. expected overlap is
                                  150 bases, bip=10 -> the banded SW will
                                  compute a band of 15 bases to each side of
                                  the expected alignment diagonal, thus
                                  allowing up to 15 unbalanced inserts /
                                  deletes in the alignment. INCREASING AND
                                  DECREASING THIS NUMBER - increasing will
                                  find more non-optimal alignments but will
                                  also increase SW runtime between linear and
                                  ^2, decreasing will work the other way round
                                  (it might miss a few bad alignments but
                                  gain speed). (Integer from 1 to 100)
   -bmin               integer    [25] Minimum bandwidth in bases to each
                                  side. (Integer 1 or more)
   -bmax               integer    [100] Maximum bandwidth in bases to each
                                  side. (Integer 1 or more)
   -mo                 integer    [15] Minimum number of overlapping bases
                                  needed in an alignment of two sequences to
                                  be accepted. (Integer 1 or more)
   -ms                 integer    [30] Describes the minimum score of an
                                  overlap to be taken into account for
                                  assembly. mira uses a default scoring scheme
                                  for SW align. Each match counts 1, a match
                                  with an N counts 0, each mismatch with a
                                  non-N base -1 and each gap -2. Use a bigger
                                  score to weed out a number of chance
                                  matches, a lower score to perhaps find the
                                  single (short) alignment that might join two
                                  contigs together (at the expense of
                                  computing time and memory). (Integer 1 or
                                  more)
   -mrs                integer    [65] Describes the min percentage of
                                  matching between two reads to be considered
                                  for assembly. Increasing this number will
                                  save memory but one might lose possible
                                  alignments. A maximum of 80 is probably
                                  sensible here. Decreasing below 55 will
                                  probably make memory and time consumption
                                  explode. (Integer from 1 to 100)
   -egp                boolean    [N] Defines whether or not to increase
                                  penalties applied to alignments containing
                                  long gaps. Setting this to 'Y' might help in
                                  projects with frequent repeats. On the
                                  other hand, it is definitively disturbing
                                  when assembling very long reads containing
                                  multiple long indels in the called base
                                  sequence ... although this should not happen
                                  in the first place and is a sure sign for
                                  problems lying ahead. When in doubt, set it
                                  to 'Y' for EST projects and de-novo genome
                                  assembly, set it to 'N' for assembly of
                                  closely related strains (assembly against a
                                  backbone). When set to 'N', it is
                                  recommended to have -amgb and -amgbemc both
                                  set to 'Y'.
   -egpl               menu       [low] Has no effect if extra_gap_penalty is
                                  off. Defines an extra penalty applied to
                                  'long' gaps. There are these predefined
                                  levels - 1. low - use this if you expect
                                  your base caller frequently misses two or
                                  more bases. 2. medium - use this if your
                                  base caller is expected to frequently miss
                                  one to two bases. 3. high - use this if your
                                  base caller does not frequently miss more
                                  than one base. For some stages of the EST
                                  assembly process, a special value 'est' is
                                  used. (Values: low (Low); medium (Medium);
                                  high (High); est (EST split splices))
   -megpp              integer    [100] Has no effect if extra_gap_penalty is
                                  off. Defines the maximum extra penalty in
                                  percent applied to 'long' gaps. (Integer
                                  from 1 to 100)
   -np                 string     [$(inproject)] Contigs will have this string
                                  prepended to their names. (Any string)
   -rodirs             integer    [20] When adding reads to a contig, reject
                                  the reads if the drop in the quality of the
                                  consensus is > the given value in %. Lower
                                  values mean stricter checking. This value is
                                  doubled should a read be entered that has a
                                  template partner (a read pair) at the right
                                  distance. (Integer from 1 to 100)
   -[no]mr             boolean    [Y] One of the most important switches in
                                  MIRA. If set to 'Y', MIRA will try to
                                  resolve misassemblies due to repeats by
                                  identifying single base stretch differences
                                  and tag those critical bases as RMB (Repeat
                                  Marker Base, weak or strong). This switch is
                                  also needed when MIRA is run in EST mode to
                                  identify possible inter-, intra- and
                                  intra-and-interorganism SNPs.
   -mroir              boolean    [N] Only takes effect when [-mr] is set to
                                  yes. If set to yes, MIRA will not use the
                                  repeat resolving algorithm during build time
                                  (and therefore will not be able to take
                                  advantage of this), but only before saving
                                  results to disk.
   -asir               boolean    [N] Only takes effect when -mr is set to
                                  'Y', effect is also dependent on the fact
                                  whether strain data (see -lsd) is present or
                                  not. Usually, mira will mark bases that
                                  differentiate between repeats, when a
                                  conflict occurs between reads that belong to
                                  one strain. If the conflict occurs between
                                  reads belonging to different strains they
                                  are marked as SNP. However, if this switch
                                  is set to 'Y',= then conflicts within a
                                  strain are also marked as SNP. This switch
                                  is mainly used in assemblies of ESTs; it
                                  should not be set for genomic assembly.
   -mrpg               integer    [2] Only takes effect when -mr is set to
                                  'Y'. This defines the minimum number of
                                  reads in a group that are needed for the RMB
                                  (Repeat Marker Bases) or SNP detection
                                  routines to be triggered. A group is defined
                                  by the reads carrying the same nucleotide
                                  for a given position, i.e., an assembly with
                                  mrpg=2 will need at least two times two
                                  reads with the same nucleotide (having at
                                  least a quality as defined in -mgqrt) to be
                                  recognised as repeat marker or a SNP.
                                  Setting this to a low number increases
                                  sensitivity, but might produce a few false
                                  positives, resulting in reads being thrown
                                  out of contigs because of falsely identified
                                  possible repeat markers (or wrongly
                                  recognised as SNP). (Integer 2 or more)
   -mnq                integer    [20] Default is dependent of the sequencing
                                  technology used. Takes only effect when
                                  [-mr] is set to yes. This defines the
                                  minimum quality of neighbouring bases that a
                                  base must have for being taken into
                                  consideration during the decision whether
                                  column base mismatches are relevant or not.
                                  (Integer 10 or more)
   -mgqrt              integer    [30] Only takes effect when -mr is set to
                                  'Y'. This defines the minimum quality of a
                                  group of bases to be taken into account as
                                  potential repeat marker. The lower the
                                  number, the more sensitive you get, but
                                  lowering below 25 is not recommended as a
                                  lot of wrongly called bases can have a
                                  quality approaching this value and you'd end
                                  up with a lot of false positives. The
                                  higher the overall coverage of your project
                                  the better, and the higher you can set this
                                  number. A value of 35 will probably remove
                                  all false positives, a value of 40 will
                                  probably never show false positives.
                                  (Integer 25 or more)
   -emea               integer    [25] Only takes effect when -mr is set to
                                  'Y'. Using the end of sequences of Sanger
                                  type shotgun sequencing is always a bit
                                  risky, as wrongly called bases tend to crowd
                                  there or some sequencing vector relicts
                                  hang around. It is even more risky to use
                                  these stretches for detecting possible
                                  repeats, so one can define an exclusion area
                                  where the bases are not used when
                                  determining whether a mismatch is due to
                                  repeats or not. (Integer 0 or more)
   -[no]amgb           boolean    [Y] Determines whether columns containing
                                  gap bases (indels) are also tagged.
   -[no]amgbemc        boolean    [Y] Only takes effect when -amgb is set to
                                  'Y'. Determines whether multiple columns
                                  containing gap bases (indels) are also
                                  tagged.
   -[no]amgbnbs        boolean    [Y] Only takes effect when -amgb is set to
                                  'Y'. Determines whether, for both tagging
                                  columns containing gap bases, both strands
                                  need to have a gap. Setting this to 'N' is
                                  not recommended except when working in
                                  desperately low coverage situations.
   -fnicpst            boolean    [N] If set to yes, mira will be forced to
                                  make a choice for a consensus base (A,C,G,T
                                  or gap) even in unclear cases where it would
                                  normally put a IUPAC base. All other things
                                  being equal (like quality of the possible
                                  consensus base and other things), mira will
                                  choose a base by either looking for a
                                  majority vote or, if that also is not clear,
                                  by preferring gaps over T over G over C
                                  over finally A.
   -msr                boolean    [N] Can only be used in mapping assemblies.
                                  If set to yes, mira will merge all perfectly
                                  mapping Solexa reads into longer reads
                                  while keeping quality and coverage
                                  information intact. This features hugely
                                  reduces the number of Solexa reads and makes
                                  assembly results with Solexa data small
                                  enough to be handled by current finishing
                                  programs (gap4, consed, others) on normal
                                  workstations.
   -gor                integer    [66] Gap override ratio (Integer 0 or more)
   -ace                boolean    [N] Once contigs have been build, mira can
                                  call a built-in version of the automatic
                                  contig editor EdIt. EdIt will try to resolve
                                  discrepancies in the contig by performing
                                  trace analysis and correct even hard to
                                  resolve errors. This option is always
                                  useful, but especially in conjunction with
                                  -nop and -ure. Notice: the current
                                  development version has a memory leak in the
                                  editor, therefore the option is not
                                  automatically turned on.
   -[no]sem            boolean    [Y] If set to 'Y' the automatic editor will
                                  not take error hypotheses with a low
                                  probability into account, even if all the
                                  requirements to make an edit are fulfilled.
   -ct                 integer    [50] The higher this value, the more strict
                                  the automatic editor will apply its internal
                                  rule set. Going below 40 is not
                                  recommended. (Integer from 1 to 100)
   -outproject         string     [$(project)] Default is mira. Defines the
                                  output project name for this assembly. The
                                  output project name automatically influences
                                  the name of output files or directories
                                  only (Any string)
   -sssip              boolean    [N] Controls whether �unimportant� singlets
                                  are written to the result files.
   -[no]stsip          boolean    [Y] Controls whether singlets which have
                                  certain tags (SRMr, CRMr, WRMr, SROr, SAOr,
                                  SIOr) are written to the result files, even
                                  if [-sssip] is set.
   -[no]rrol           boolean    [Y] Removes log files once they should not
                                  be needed anymore during the assembly
                                  process.
   -rld                boolean    [N] Removes the complete log directory at
                                  the end of the assembly process. Some logs
                                  contain useful information that you may want
                                  to analyse though.
   -[no]orc            boolean    [Y] Output CAF results
   -[no]orf            boolean    [Y] Output FASTA results
   -org                boolean    [N] Output GAP4DA results
   -[no]ora            boolean    [Y] Output phrap ACE results
   -orh                boolean    [N] Output HTML results
   -[no]ors            boolean    [Y] Output transposed contig summary results
   -ort                boolean    [N] Output simple text results
   -[no]orw            boolean    [Y] Output wiggle results
   -[no]otc            boolean    [Y] Output temporary CAF results
   -otm                boolean    [N] Output temporary MAF results
   -otf                boolean    [N] Output temporary FASTA results
   -otg                boolean    [N] Output temporary GAP4 results
   -ota                boolean    [N] Output temporary phrap ACE results
   -oth                boolean    [N] Output temporary HTML results
   -ots                boolean    [N] Output temporary transposed contig
                                  summary results
   -ott                boolean    [N] Output temporary text results
   -oetc               boolean    [N] Output extra temporary CAF results
   -oetf               boolean    [N] Output extra temporary FASTA results
   -oetg               boolean    [N] Output extra temporary GAP4DA results
   -oeta               boolean    [N] Output extra temporary phrap ACE results
   -oeth               boolean    [N] Output extra temporary HTML results
   -oetas              boolean    [N] Output extra temporary also singlets
                                  results
   -tcpl               integer    [60] When producing an output in text format
                                  (-ort|ott|oett), this parameter defines how
                                  many bases each line of an alignment should
                                  contain. (Integer 1 or more)
   -hcpl               integer    [60] When producing an output in text format
                                  (-orh|oth|oeth), this parameter defines how
                                  many bases each line of an alignment should
                                  contain. (Integer 1 or more)
   -tegfc              string     When producing an output in text format
                                  (-ort|ott|oett), endgaps are filled up with
                                  this character. (Any string)
   -hegfc              string     When producing an output in HTML format
                                  (-orh|oth|oeth), endgaps are filled up with
                                  this character. (Any string)
   -[no]sdlpo          boolean    [Y] Defines whether the spoiler detection
                                  algorithms are run only for the last pass or
                                  for all passes (-nop). Takes effect only if
                                  spoiler detection (-sd) is on.
   -tpae               boolean    [N] This option is useful in EST assembly.
                                  Poly-AT stretches at the end of reads that
                                  were not correctly masked or clipped in
                                  pre-processing steps from external programs
                                  get tagged here. The assembler will not use
                                  these stretches for critical operations.
                                  Additionally, the tags do provide a good
                                  visual anchor when looking at the assembly
                                  with different programs.
   -pbwl               integer    [7] Only takes effect when -tpae is set to
                                  'Y'. Defines the window length within which
                                  all bases (except the maximum number of
                                  errors allowed) must be either A or T to be
                                  considered a polybase stretch. (Integer 1 or
                                  more)
   -pbwme              integer    [2] Only takes effect when -tpae is set to
                                  'Y. Defines the maximum number of errors
                                  allowed in a given window length such that a
                                  stretch is considered to be a polybase
                                  stretch. The distribution of these errors is
                                  not important. (Integer 1 or more)
   -pbwgd              integer    [9] Only takes effect when -tpae is set to
                                  'Y'. Defines the number of bases from the
                                  end of a sequence (if masked, from the end
                                  of the masked area) within which a polybase
                                  stretch is looked for without finding one.
                                  (Integer 1 or more)
   -[no]pvc            boolean    [Y] Mira will try to identify possible
                                  sequencing vector relicts present at the
                                  start of a sequence and clip them away.
                                  These relicts are usually a few bases long
                                  and were not correctly removed from the
                                  sequence in data pre-processing steps of
                                  external programs. You might want to turn
                                  off this option if you know (or think) that
                                  your data contains a lot of repeats and the
                                  option below to fine tune the clipping
                                  behaviour does not give the expected
                                  results.
   -pvcmla             integer    [18] The clipping of possible vector relicts
                                  option works quite well. Unfortunately the
                                  bounds of repeats or differences in EST
                                  splice variants sometimes show the same
                                  alignment behaviour as possible sequencing
                                  vector relicts and could therefore also be
                                  clipped. To stop the vector clipping from
                                  mistakenly clipping repetitive regions or
                                  EST splice variants, this option puts an
                                  upper bound to the number of bases a
                                  potential clip is allowed to have. If the
                                  number of bases is below or equal to this
                                  threshold then the bases are clipped. If the
                                  number of bases exceeds the threshold then
                                  the clip is NOT performed. Setting the value
                                  to 0 turns off the threshold i.e. clips are
                                  then always performed if a potential vector
                                  is found. (Integer 0 or more)
   -qc                 boolean    [N] Default is 'N', but is automatically set
                                  to 'Y' when using the setparam options
                                  'fasta' or 'phd' (can be turned off again by
                                  subsequent options afterwards). This will
                                  let mira perform its own quality clipping
                                  before sequences are entered into the
                                  assembly. The clip function performed is a
                                  sequence end window quality clip with back
                                  iteration to get a maximum number of bases
                                  as useful sequence. Note that the bases
                                  clipped away here can still be used
                                  afterwards if there is enough evidence
                                  supporting their correctness when the option
                                  -ure is turned on.
   -an                 menu       [signal] When adding reads to a contig,
                                  dangerous regions can get an extra integrity
                                  check. none = no extra check. text = check
                                  is only text-based. signal = check is signal
                                  based, if the SCF trace is not available,
                                  fallback is 'text'. For the time being, only
                                  regions tagged as ALUS or REPT in the
                                  experiment file are considered dangerous.
                                  (Values: none (None); text (Text); signal
                                  (Signal))
   -dmer               integer    [1] When adding reads to a contig, reject
                                  the reads if the error in zones known as
                                  dangerous exceeds the given value in %.
                                  Lower values mean stricter checking in these
                                  danger zones. For the time being, only
                                  regions tagged as ALUS or REPT in the
                                  experiment file are considered dangerous.
                                  (Integer from 1 to 100)
   -dismin             integer    [500] The minimum distance that read pairs
                                  may be apart. There is an additional error
                                  margin of 10% subtracted from this value
                                  during internal computations. (Integer 0 or
                                  more)
   -dismax             integer    [5000] The maximum distance that read pairs
                                  may be apart. There is an additional error
                                  margin of 10% added to this value during
                                  internal computations. (Integer 0 or more)
   -oett               boolean    [N] Output extra temporary TXT results
   -gapfda             string     [gap4da] Defines the extension of the
                                  directory where mira will write the result
                                  of an assembly ready to import into the
                                  Staden package (GAP4) in Direct Assembly
                                  format. The name of the directory will then
                                  be _. (Any string)
   -log                string     [miralog] Defines the directory where mira
                                  will write some log files to. Note that the
                                  name of the actual project will be
                                  prepended. (Any string)
   -co                 string     [mira_out.caf] Defines the file in CAF
                                  format to save an assembled project to.
                                  Filename must end with '.caf'. (Any string)

   Associated qualifiers:

   "-expdir" associated qualifiers
   -extension          string     Default file extension

   "-scfdir" associated qualifiers
   -extension          string     Default file extension

   General qualifiers:
   -auto               boolean    Turn off prompts
   -stdout             boolean    Write first file to standard output
   -filter             boolean    Read first file from standard input, write
                                  first file to standard output
   -options            boolean    Prompt for standard and additional values
   -debug              boolean    Write debug output to program.dbg
   -verbose            boolean    Report some/full command line options
   -help               boolean    Report command line options and exit. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning            boolean    Report warnings
   -error              boolean    Report errors
   -fatal              boolean    Report fatal errors
   -die                boolean    Report dying program messages
   -version            boolean    Report version number and exit

Qualifier Type Description Allowed values Default

Standard (Mandatory) qualifiers

-technology list Which sequencing technologies have created your reads
sanger (Dideoxy)
454 (Roche)
solexa (Illumina)
solid (ABI SOLiD)
sanger

-jobtype list Are the data you are assembling forming a larger contiguous sequence (choose: genome) or are you assembling small fragments like in EST or mRNA libraries (choose: est)
genome (Whole genome)
est (Short fragments)
genome

-method list Are you building an assembly from scratch (choose: denovo) or are you mapping reads to an existing backbone sequence (choose: mapping)
denovo (de novo assembly)
mapping (align to a reference sequence)
denovo

-grade list Quality grades of de-novo assembly or mapping. Draft is quick-and-dirty, suited to get a first look on approximate coverage of a running project. Should not be used for anything else. Normal is the default parameter set of mira that is able to tackle most genomes. A bit slower than the draft version, but includes such options as read extension and vector remnant clipping. Accurate is still slower than the normal mode but should be used for genomes that pose a problem to the normal mode.
draft (Draft)
normal (Normal)
accurate (Accurate)
normal

Additional (Optional) qualifiers

-setparams list Sets parameters suited for loading sequences from FASTA, PHD or CAF files. The default is not to specify the type of input file.
unspecified (Unspecified)
fasta (Fasta)
phd (PHD)
caf (CAF)
unspecified

-highlyrepetitive boolean A modifier switch for genome data that is deemed to be highly repetitive. The assemblies will run slower due to more iterative cycles that give mira a chance to resolve nasty repeats. Boolean value Yes/No No

-noclipping list Switches off clipping options for given sequencing technologies.
sanger (Dideoxy)
454 (Roche)
solexa (Illumina)
solid (ABI SOLiD)
$(technology)

Advanced (Unprompted) qualifiers

-parameterfile infile Loads parameters from the filename given. Allows a maximum of 10 levels of recursion, i.e. a -params option appearing within a file that loads other parameter files Input file Required

-project string Default is mira. Defines the project name for this assembly. The project name automatically influences the name of input and output files or directories. E.g. in the default setting, the file names for the output of the assembly in FASTA format would be mira_out.fasta and mira_out.fasta.qual. Setting the project name to 'MyProject' would generate MyProject_out.fasta and MyProject_out.fasta.qual. Any string mira

-inproject string Default is mira. Defines the input project name for this assembly. The input project name automatically influences the name of input files or directories only Any string $(project)

-bft list Defines the filetype of the backbone file given. Currently (2.8.3) only FASTA, CAF and GBF files are supported. When GBF (GenBank files, also named .gbk) files are loaded, the features within these files are automatically transformed into Staden-compatible tags and get passed through the assembly.
fasta (FASTA)
caf (CAF)
gbf (Genbank)
fasta

-expdir directory Defines the directory where mira should search for experiment files (EXP). Directory .

-scfdir directory Defines the directory where mira should search for SCF files Directory .

-feifile infile Defines the file of filenames where the names of the EXP files of a project are located. Input file $(inproject)_in.fofn

-fpifile infile Defines the file of filenames where the names of the PHD files of a project are located. Input file $(inproject)_in.fofn

-pifile infile Defines the PHD file to load sequences of a project from. Input file $(inproject)_in.phd

-faifile infile Defines the FASTA file to load sequences of a project from. Input file $(inproject)_in.fasta

-fquifile infile Defines the fasta file to load base qualities of a project from. Although the order of reads in the quality file does not need to be the same as in the fasta or fofn projects (although it saves a bit of time if they are). Input file $(inproject)_in.fasta.qual

-fqifile infile Defines the FASTQ file to load sequences of a project from. Input file $(inproject)_in.fastq

-cifile infile Defines the file to load a CAF project from. Filename must end with '.caf'. Input file $(inproject)_in.caf

-sdifile infile Defines the file to load straindata from. Only used in EST projects (miraEST). Input file $(inproject)_straindata_in.txt

-xtiifile infile Defines the file to load a trace info file in XML format from. This can be used both when merging XML data to loaded files or when loading a project from an XML trace info file. Input file $(inproject)_xmltraceinfo_in.xml

-svsifile infile Defines the file to load the info about possible vector sequence stretches. Input file $(inproject)_ssaha2vectorscreen_in.txt

-bbifile infile Defines the file to load the backbone sequence or assembly. Note that you still must define the file type with [-bft]. Input file $(inproject)_in.$(technology).$(bft)

-[no]traceinfo toggle Load traceinfo ancilliary data in XML files Toggle value Yes/No Yes

-lsd boolean Straindata is a key value file, one read per line. First the name of the read, then the strain name of the organism the read comes from. It is used by the program to differentiate different types of SNPs appearing in organisms and classifying them. Boolean value Yes/No No

-brl integer Parameter for the internal sectioning size of the backbone. Extremely repetitive sequences may require reducing the default value, but the default value should work well in 99.9% of all cases. Integer from 1000 to 3000 2500

-mrl integer Minimum length that reads must have to be considered for the assembly. Shorter sequences will be filtered out at the beginning of the process and won't be present in the final project. Integer 20 or more 40

-nop integer Defines how many iterations of the whole assembly process are done. Rule of thumb - for quick and dirty assembly use 1 (not recommended). For assembly using read extensions and / or automatic contig editing (-ure and -ace) use at least 2. The recommended setting is 3 or higher, as some knowledge generated by the assembler can be used only from the third iteration on. More than 3 passes might be useful for projects containing many repetitive elements. See also -rbl and -mr for parameters that affect the assembly and disentanglement of possible repeats. Integer 1 or more 3

-[no]sep boolean Defines whether the skim algorithm (and with it also the recalculation of Smith-Waterman alignments) is called in between each main pass. If set to 'N', skimming is done only when needed by the workflow, either when read extensions are searched for (-ure) or when possible vector leftovers are to be clipped (-pvc). Setting this option to 'Y' is highly recommended, setting it to 'N' is only for quick and dirty assemblies. Boolean value Yes/No Yes

-rbl integer Defines the maximum number of times a contig can be rebuilt during main assembly passes (-nop) if misassemblies, due to possible repeats, are found. Integer 1 or more 2

-not integer Number of threads to use (see also -snot for SKIM algorithm) Integer from 1 to 256 2

-[no]amm boolean Whether mira tries to optimise run time of certain algorithms in a space/time trade-off memory usage, increasing or reducing some internal tables as memory permits Boolean value Yes/No Yes

-mps integer Maximum memory in GB Integer 0 or more 0

-kpmf integer Keep percentage of memory free Integer from 0 to 100 15

-kcim boolean Keep contigs in memory Boolean value Yes/No No

-esps integer EST-SNP pipeline steps Integer from 0 to 4 0

-[no]uti boolean Two reads sequenced from the same clone template form a read pair with a known minimum and maximum distance. This feature will definitively help for contigs containing lots of repeats. Set this to 'Y' if your data contains information on insert sizes. Information on insert sizes can be given via the SI tag in EXP files (for each read pair individually), or for the whole project using dismin and dismax Boolean value Yes/No Yes

-tismin integer Template insert minimum size Integer -1 or more -1

-tismax integer Template insert maximum size Integer -1 or more -1

-[no]crhf boolean Colour reads by hash frequency Boolean value Yes/No Yes

-[no]pd boolean Controls whether date and time are printed out during the assembly. Suppressing it isn't useful in normal operation, only when debugging or benchmarking. Boolean value Yes/No Yes

-ft list Defines whether to load and assemble EXP files from a file of filenames ('mira_in.fofn'), load and assemble FASTA sequences ('mira_in.fasta') and their qualities ('mira_in.fasta.qual'), load and assemble FASTQ sequences and qualities ('mira_in.fastq'), load and assemble sequences or qualities from a phd file ('mira_in.phd') or to load a project from a CAF file ('mira_in.caf') and assemble or eventually reassemble it. N.B. fofnphd is not currently available.
fofnexp (file of EXP filenames)
fasta (FASTA and quality files)
fastq (FASTQ file)
caf (CAF file)
phd (PHD file)
fofnphd (file of PHD filenames)
fasta

-eq list Defines the source format for reading qualities from external sources. Normally takes effect only when these are not present in the format of the load_job project (EXP and FASTA can have them, CAF and PHD must have them).
none (Use qualities from input files)
scf (SCF quality scores)
scf

-eqo boolean Only takes effect when 'lj' is fofnexp. Defines whether or not the qualities from the external source override the possibly loaded qualities from the load job project. This might be of use in case some post-processing software fiddles around with the quality values of the input file but one wants to have the original ones. Boolean value Yes/No No

-droeqe boolean Should there be a major mismatch between the external quality source and the sequence (e.g. the base sequence read from a SCF file does not match the originally read base sequence), should the read be excluded from assembly or not. If not, it will use the qualities it had before trying to load the external qualities (either default qualities or the ones loaded from the original source). Boolean value Yes/No No

-ssiqf boolean Solexa scores in quality file Boolean value Yes/No No

-fqqo integer FASTQ quality offset Integer from 0 to 64 0

-[no]wqf boolean Wants quality file Boolean value Yes/No Yes

-rns list Defines the centre naming scheme for read suffixes. Currently, only Sanger Institute and TIGR naming schemes are supported out of the box. How to choose? Please read the documentation available at the different centres or ask your sequence provider. In a nutshell, the Sanger scheme is 'somename.[pqsfrw][12][bckdeflmnpt][a|b|c|...' (e.g. U13a08f10.p1ca), TIGR scheme is 'somenameTF*|TR*|TA*' (e.g. GCPBN02TF or GCPDL68TABRPT103A58B).
sanger (Sanger centre)
tigr (TIGR)
fr (454 simple forward/reverse)
stlouis (WashU)
solexa (Illumina)
$(technology)

-mxti boolean Some file formats above (FASTA, PHD or even CAF and EXP) possibly don't contain all the info necessary or useful for each read of an assembly. Should additional information, such as like clipping positions etc., be available in a XML trace info file in NCBI format (see File formats), then set this option to 'Y' and it will be merged to the data loaded. Please note, quality clippings given here will override quality clippings loaded earlier or performed by mira. Minimum clippings will still be made by the program, though. Boolean value Yes/No No

-fo boolean If set to 'Y', the project will not be assembled and no assembly output files will be produced. Instead, the project files will only be loaded. This switch is useful for checking consistency of input files. Boolean value Yes/No No

-bdq integer Defines the default base quality of reads that have no quality read from a file. Integer 0 or more 10

-[no]epoq boolean Stops MIRA if a read has no quality values Boolean value Yes/No Yes

-[no]ard boolean Automatic repeat detection Boolean value Yes/No Yes

-ardct float Automatic read detection coverage threshold Number 1.000 or more 2.0

-ardml integer Default is 200 for 454 technology Integer 2 or more 400

-ardgl integer Default depends on technology Integer 2 or more 40

-[no]urd boolean Default true for most genome assembly, false for EST assembly or Solexa data Boolean value Yes/No Yes

-urdsip integer Default depends on technology and assembly quality level Integer 1 or more 3

-urdcm float Default depends on technology and assembly quality level Number 1.000 or more 1.5

-klrs boolean Default depends on assembly quality level and EST/genome assembly Boolean value Yes/No No

-[no]sd boolean Default is 'Y' for mira and 'N' for miraEST. A spoiler can be either a chimeric read or it is a read with long parts of unclipped vector sequence still included (that was too long for the -pvc vector leftover clipping routines). A spoiler typically prevents contigs being joined; MIRA will cut them back so that they present no more harm to the assembly. Recommended for assemblies of mid-to-high coverage genomic assemblies; not recommended for assemblies of ESTs as one might lose splice variants with that. A minimum number of two assembly passes (-nop) must be run for this option to take effect. Boolean value Yes/No Yes

-[no]ugpf boolean MIRA has two different pathfinder algorithms it chooses from to find its way through the (more or less) complete set of possible sequence overlaps; a genomic and an EST pathfinder. The genomic looks a bit into the future of the assembly and tries to stay on safe grounds using a maximum of information already present in the contig that is being built. The EST version, on the contrary, will directly jump at the complex cases posed by very similar repetitive sequences and try to solve those first; it is willing to fall down to brute force when really bad cases (such as coverage with thousands of sequences) are encountered. Generally, the genomic pathfinder will also work quite well with EST sequences (but might get slowed down a lot in pathological cases), while the EST algorithm does not work so well on genomes. If in doubt, leaveas 'Y' for genome projects and set to 'N' for EST projects. Boolean value Yes/No Yes

-[no]uess boolean Another important switch if you plan to assemble non-normalised EST libraries, where some ESTs may reach coverages of several hundreds or thousands of reads. This switch lets MIRA save a lot of computational time when aligning those extremely high coverage areas (but only there), at the expense of some accuracy. Boolean value Yes/No Yes

-esspd integer Defines the number of potential partners a read must have for MIRA switching into emergency search stop mode for that read. Integer 1 or more 500

-[no]uebl boolean Use emergency blacklist Boolean value Yes/No Yes

-umcbt boolean Defines whether there is an upper limit of time to be used to build one contig. Set this to 'Y' in EST assemblies where you think that extremely high coverages occur. Less useful for assembly of genomic sequences. Boolean value Yes/No No

-bts integer Depending on -umcbt above, this number defines the time in seconds alloted to building one contig. Integer 1 or more 10000

-lsbd boolean Straindata is a key value file, one read per line. First the name of the read, then the strain name of the organism the read comes from. It is used by the program to differentiate different types of SNPs appearing in organisms and classifying them. Boolean value Yes/No No

-lb boolean A backbone is a sequence (or a previous assembly) that is used as a template for the current assembly. The current assembly process will first assemble reads to loaded backbone contigs before creating new contigs. This feature is helpful for assembling against previous (and already possibly edited) assembly iterations, or to make a comparative assembly of two very closely related organisms. Please read 'very closely related' as in 'only SNP mutations or short indels present'. Boolean value Yes/No No

-sbuip integer When assembling against backbones, this parameter defines the pass iteration (see nop) from which on the backbones will be really used. In the passes preceding this number, the non-backbone reads will be assembled together as if no backbones existed. This allows mira to correctly spot repetitive stretches that differ by single bases and tag them accordingly. Rule of thumb - if backbones belong to the same strain as the reads to assemble, set to 1. If backbones are a different strain, then set sbuib to 1 lower than nop (example - nop 4 and sbuip 3). Integer 0 or more 3

-bbq integer Defines the default quality that the backbone sequences have if they came without quality values in their files (like in GBF format or when FASTA is used without .qual files). A value of -1 causes mira to use the same default quality for backbones as for reads. Integer from -1 to 100 30

-bsn string Defines the name of the strain that the backbone sequences have. Any string

-bsnffa boolean Backbone strain name force for all Boolean value Yes/No No

-brfs string Backbone rail from strain Any string

-bro integer Backbone rail overlap Integer from 0 to 2000 0

-[no]abnc boolean The standard mode of the assembler is to assemble available reads to a backbone and make new contigs with the remaining reads. If this option is set to 'N', the reads that cannot be assembled into existing contigs are put as singlets into the assembly, not forming new contigs. Boolean value Yes/No Yes

-[no]ure boolean Defines whether there is an upper limit of time to be used to build one contig. Set this to 'Y' in EST assemblies where you think that extremely high coverages occur. Less useful for assembly of genomic sequences.Default depends on technology Boolean value Yes/No Yes

-rewl integer Only takes effect when -ure is set to 'Y'. The read extension routines use a sliding window approach on Smith-Waterman alignments. This parameter defines the window length. Default depends on technology Integer 0 or more 30

-rewme integer Only takes effect when -ure is set to 'Y'. The read extension routines use a sliding window approach on Smith-Waterman alignments. This parameter defines the number maximum number of errors (disagreements) between two alignments in the given window. Default depends on technology Integer 0 or more 2

-feip integer Only takes effect when -ure is set to 'Y'. The read extension routines can be called before assembly and/or after each assembly pass (see -nop). This parameter defines the first pass in which the read extension routines are called. The default of 0 tells mira to extend the reads the first time before the first assembly pass. Integer 0 or more 0

-leip integer Only takes effect when -ure is set to 'Y'. The read extension routines can be called before assembly and/or after each assembly pass (see -nop). This parameter defines the last pass in which the read extension routines are called. The default of 0 tells mira to extend the reads the last time before the first assembly pass. Integer 0 or more 0

-msvs boolean Merge with SSAHA vector screen Boolean value Yes/No No

-msvsgs integer Default depends on the sequencing technology Integer 0 or more 10

-msvsmfg integer Default depends on the sequencing technology Integer 0 or more 60

-msvsmeg integer Default depends on the sequencing technology Integer 0 or more 120

-msvssfc integer Default depends on the sequencing technology Integer 0 or more 0

-msvssec integer Default depends on the sequencing technology Integer 0 or more 0

-[no]pvlc boolean Possible vector leftover clip Boolean value Yes/No Yes

-qcmq integer This is the minimum quality required of bases in a window in order to be accepted. Please be cautious and don't use extreme values here, because then the clipping will be too lax or too harsh. Values below 15 and higher than 35 are disallowed. Integer from 15 to 35 20

-qcwl integer This is the length of a window in bases for the quality clip. Default depends on sequencing technology Integer 10 or more 30

-[no]bsqc boolean Bad stretch quality clip Boolean value Yes/No Yes

-bsqcmq integer Default depends on sequencing technology Integer 0 or more 20

-bsqcwl integer Default depends on sequencing technology Integer 0 or more 30

-[no]mbc boolean This will let mira perform a 'clipping' of bases that were masked out (replaced with the character X). It is generally not a good idea to use mask bases to remove unwanted portions of a sequence; the EXP file format and the NCBI traceinfo format have excellent possibilities to circumvent this. But because a lot of pre-processing software is built around cross_match, scylla- and phrap-style base masking, the need arised for mira to be able to handle this too. mira will look at the start and end of each sequence to see whether there are masked bases that should be 'clipped'. Boolean value Yes/No Yes

-mbcgs integer While performing the clip of masked bases, mira will look if it can merge larger chunks of masked bases that are a maximum of -mbcgs apart. Integer 0 or more 20

-mbcmfg integer While performing the clip of masked bases at the start of a sequence, mira will allow up to this number of unmasked bases in front of a masked stretch. Default depends on sequencing technology. Integer 0 or more 40

-mbcmeg integer While performing the clip of masked bases at the end of a sequence, mira will allow up to this number of unmasked bases behind a masked stretch. Default depends on sequencing technology Integer 0 or more 60

-lcc boolean Default depends on sequencing technology Boolean value Yes/No No

-cpat boolean Used in EST assembly Boolean value Yes/No No

-cpkps boolean Clip polyA tail keep polyA signal Boolean value Yes/No No

-cpmsl integer Clip polyA tail max signal length Integer 0 or more 12

-cpmea integer Clip polyA tail max errors allowed Integer 1 or more 1

-cpmgfe integer Clip polyA tail max gap from end Integer 1 or more 9

-[no]emlc boolean If on, ensures a minimum left clip on each read according to the parameters in -mlcr & -smlc. Default depends on sequencing technology Boolean value Yes/No Yes

-mlcr integer If -emlc is 'Y', checks whether there is a left clip whose length is at least the size specified here. Default depends on sequencing technology Integer 0 or more 25

-smlc integer If -emlc is 'Y' and the actual left clip is < -mlcr, then set the left clip of read to the value given here. Default depends on sequencing technology Integer 0 or more 30

-emrc boolean If on, ensures a minimum right clip on each read according to the parameters in -mrcr & -smrc. Default depends on sequencing technology Boolean value Yes/No No

-mrcr integer If -emrc is 'Y', checks whether there is a right clip whose length is at least the size specified here. Default depends on sequencing technology Integer 0 or more 10

-smrc integer If -emrc is 'Y' and the actual right clip is < -mrcr, then set the right clip of read to the value given here. Default depends on sequencing technology Integer 0 or more 20

-[no]pec boolean Default depends on other choices Boolean value Yes/No Yes

-pecbph integer Default is 14 on 32 bit systems and 16 on 64 bit systems. Controls the number of consecutive bases n which are used as a word hash. The higher the value the faster the search. The lower the value the more weak matches are found. Values below 10 are not recommended. Default depends on sequencing technology Integer 10 or more 17

-snot integer Number of threads to use in SKIM algorithm Integer from 1 to 256 2

-bph integer Default depends on system. Controls the number of consecutive bases n which are used as a word hash. The higher the value the faster the search. The lower the value the more weak matches are found. Values below 10 are not recommended. Integer 1 or more 17

-hss integer This is a parameter controlling the stepping increments with which hashes are generated. This allows for a more fine-grained search as matches are now found with at least n+s (see -bph) equal bases instead of the SSAHA 2n. The higher the value the faster the search. The lower the value the more weak matches are found. Integer 1 or more 4

-pr integer Controls the relative percentage of exact word matches in an approximate overlap that has to be reached to accept the overlap as a possible match. Increasing this number will decrease the number of possible alignments that have to be checked by Smith-Waterman later on in the assembly, but it might also lead to the rejection of weaker overlaps (i.e. overlaps that contain a higher number of mismatches). Integer 1 or more 70

-mhpr integer Controls the maximum number of possible hits one read can maximally transport to the Smith-Waterman alignment phase. If more potential hits are found, only the best ones are taken. This is an important option for tackling projects that contain extreme assembly conditions. For example, 5000 reads that are all very similar would generate around 40 to 50 million possible alignments (forward and reverse complement). Setting this parameter to 200 reduces the number of alignments to check to around 1.5-2 million. As the assembly increases in passes (-nop), different combinations of possible hits will be checked, always the probably best ones first. So the accuracy of the assembly should only suffer when lowering this number too much. Integer 1 or more 2000

-mmhr integer If the number of reads identified as megahubs exceeds the al- lowed ratio, mira will abort. This is a fail-safe parameter to avoid assemblies where things look fishy. In case you see this, you might want to ask for advice on the mira_talk mailing list. In short: bacteria should never have megahubs (90% of all cases reported were contamination of some sort and the 10% were due to incredibly high coverage numbers). Eukaryotes are likely to contain megahubs if filtering is [-mnr] not on. Integer 0 or more 0

-fenn float Freq. est. min normal Number 0.000 or more 0.4

-fexn float Freq. est. max normal Number 0.000 or more 1.6

-fer float Freq. est. repeat Number 0.000 or more 1.9

-fehr float Freq. est. heavy repeat Number 0.000 or more 8.0

-fecr float Freq. est. crazy repeat Number 0.000 or more 20.0

-[no]mnr boolean Default is dependent on --job type 'yes' for de-novo, 'no' for mapping. Tells mira to mask during the SKIM phase subsequences of size [-nph] nucleotides that appear more often than the median occurrence of subsequences would otherwise suggest. The threshold from which subsequences are considered nasty is set by -nrr Boolean value Yes/No Yes

-nrr integer Sets the ratio from which on subsequences are considered nasty and hidden from the SKIM overlapper. The default of 10 means 'mask all k-mers of [-bph] length which are occurring more than 10 times more often than the average of the project.' Integer 2 or more 100

-mhim integer Has no influence on the quality of the assembly, only on the maximum memory size needed during the skimming. The default value is equivalent to approximately 500MB. Integer 100000 or more 15000000

-mchr integer Default depends on sequencing technology. Maximum memory used (in BM) during the reduction of skim hits. Integer 10 or more 2048

-[no]uqr boolean Use quick rule Boolean value Yes/No Yes

-qrmla integer Quick rule min len 1 Any integer value 200

-qrmsa integer Quick rule min sim 1 Any integer value 90

-qrmlb integer Quick rule min len 2 Any integer value 100

-qrmsb integer Quick rule min sim 2 Any integer value 95

-bqoml integer Backbone quick overlap min len Any integer value 150

-bip integer The banded Smith-Waterman alignment uses this percentage number to compute the bandwidth it has to use when computing the alignment matrix. E.g. expected overlap is 150 bases, bip=10 -> the banded SW will compute a band of 15 bases to each side of the expected alignment diagonal, thus allowing up to 15 unbalanced inserts / deletes in the alignment. INCREASING AND DECREASING THIS NUMBER - increasing will find more non-optimal alignments but will also increase SW runtime between linear and ^2, decreasing will work the other way round (it might miss a few bad alignments but gain speed). Integer from 1 to 100 15

-bmin integer Minimum bandwidth in bases to each side. Integer 1 or more 25

-bmax integer Maximum bandwidth in bases to each side. Integer 1 or more 100

-mo integer Minimum number of overlapping bases needed in an alignment of two sequences to be accepted. Integer 1 or more 15

-ms integer Describes the minimum score of an overlap to be taken into account for assembly. mira uses a default scoring scheme for SW align. Each match counts 1, a match with an N counts 0, each mismatch with a non-N base -1 and each gap -2. Use a bigger score to weed out a number of chance matches, a lower score to perhaps find the single (short) alignment that might join two contigs together (at the expense of computing time and memory). Integer 1 or more 30

-mrs integer Describes the min percentage of matching between two reads to be considered for assembly. Increasing this number will save memory but one might lose possible alignments. A maximum of 80 is probably sensible here. Decreasing below 55 will probably make memory and time consumption explode. Integer from 1 to 100 65

-egp boolean Defines whether or not to increase penalties applied to alignments containing long gaps. Setting this to 'Y' might help in projects with frequent repeats. On the other hand, it is definitively disturbing when assembling very long reads containing multiple long indels in the called base sequence ... although this should not happen in the first place and is a sure sign for problems lying ahead. When in doubt, set it to 'Y' for EST projects and de-novo genome assembly, set it to 'N' for assembly of closely related strains (assembly against a backbone). When set to 'N', it is recommended to have -amgb and -amgbemc both set to 'Y'. Boolean value Yes/No No

-egpl list Has no effect if extra_gap_penalty is off. Defines an extra penalty applied to 'long' gaps. There are these predefined levels - 1. low - use this if you expect your base caller frequently misses two or more bases. 2. medium - use this if your base caller is expected to frequently miss one to two bases. 3. high - use this if your base caller does not frequently miss more than one base. For some stages of the EST assembly process, a special value 'est' is used.
low (Low)
medium (Medium)
high (High)
est (EST split splices)
low

-megpp integer Has no effect if extra_gap_penalty is off. Defines the maximum extra penalty in percent applied to 'long' gaps. Integer from 1 to 100 100

-np string Contigs will have this string prepended to their names. Any string $(inproject)

-rodirs integer When adding reads to a contig, reject the reads if the drop in the quality of the consensus is > the given value in %. Lower values mean stricter checking. This value is doubled should a read be entered that has a template partner (a read pair) at the right distance. Integer from 1 to 100 20

-[no]mr boolean One of the most important switches in MIRA. If set to 'Y', MIRA will try to resolve misassemblies due to repeats by identifying single base stretch differences and tag those critical bases as RMB (Repeat Marker Base, weak or strong). This switch is also needed when MIRA is run in EST mode to identify possible inter-, intra- and intra-and-interorganism SNPs. Boolean value Yes/No Yes

-mroir boolean Only takes effect when [-mr] is set to yes. If set to yes, MIRA will not use the repeat resolving algorithm during build time (and therefore will not be able to take advantage of this), but only before saving results to disk. Boolean value Yes/No No

-asir boolean Only takes effect when -mr is set to 'Y', effect is also dependent on the fact whether strain data (see -lsd) is present or not. Usually, mira will mark bases that differentiate between repeats, when a conflict occurs between reads that belong to one strain. If the conflict occurs between reads belonging to different strains they are marked as SNP. However, if this switch is set to 'Y',= then conflicts within a strain are also marked as SNP. This switch is mainly used in assemblies of ESTs; it should not be set for genomic assembly. Boolean value Yes/No No

-mrpg integer Only takes effect when -mr is set to 'Y'. This defines the minimum number of reads in a group that are needed for the RMB (Repeat Marker Bases) or SNP detection routines to be triggered. A group is defined by the reads carrying the same nucleotide for a given position, i.e., an assembly with mrpg=2 will need at least two times two reads with the same nucleotide (having at least a quality as defined in -mgqrt) to be recognised as repeat marker or a SNP. Setting this to a low number increases sensitivity, but might produce a few false positives, resulting in reads being thrown out of contigs because of falsely identified possible repeat markers (or wrongly recognised as SNP). Integer 2 or more 2

-mnq integer Default is dependent of the sequencing technology used. Takes only effect when [-mr] is set to yes. This defines the minimum quality of neighbouring bases that a base must have for being taken into consideration during the decision whether column base mismatches are relevant or not. Integer 10 or more 20

-mgqrt integer Only takes effect when -mr is set to 'Y'. This defines the minimum quality of a group of bases to be taken into account as potential repeat marker. The lower the number, the more sensitive you get, but lowering below 25 is not recommended as a lot of wrongly called bases can have a quality approaching this value and you'd end up with a lot of false positives. The higher the overall coverage of your project the better, and the higher you can set this number. A value of 35 will probably remove all false positives, a value of 40 will probably never show false positives. Integer 25 or more 30

-emea integer Only takes effect when -mr is set to 'Y'. Using the end of sequences of Sanger type shotgun sequencing is always a bit risky, as wrongly called bases tend to crowd there or some sequencing vector relicts hang around. It is even more risky to use these stretches for detecting possible repeats, so one can define an exclusion area where the bases are not used when determining whether a mismatch is due to repeats or not. Integer 0 or more 25

-[no]amgb boolean Determines whether columns containing gap bases (indels) are also tagged. Boolean value Yes/No Yes

-[no]amgbemc boolean Only takes effect when -amgb is set to 'Y'. Determines whether multiple columns containing gap bases (indels) are also tagged. Boolean value Yes/No Yes

-[no]amgbnbs boolean Only takes effect when -amgb is set to 'Y'. Determines whether, for both tagging columns containing gap bases, both strands need to have a gap. Setting this to 'N' is not recommended except when working in desperately low coverage situations. Boolean value Yes/No Yes

-fnicpst boolean If set to yes, mira will be forced to make a choice for a consensus base (A,C,G,T or gap) even in unclear cases where it would normally put a IUPAC base. All other things being equal (like quality of the possible consensus base and other things), mira will choose a base by either looking for a majority vote or, if that also is not clear, by preferring gaps over T over G over C over finally A. Boolean value Yes/No No

-msr boolean Can only be used in mapping assemblies. If set to yes, mira will merge all perfectly mapping Solexa reads into longer reads while keeping quality and coverage information intact. This features hugely reduces the number of Solexa reads and makes assembly results with Solexa data small enough to be handled by current finishing programs (gap4, consed, others) on normal workstations. Boolean value Yes/No No

-gor integer Gap override ratio Integer 0 or more 66

-ace boolean Once contigs have been build, mira can call a built-in version of the automatic contig editor EdIt. EdIt will try to resolve discrepancies in the contig by performing trace analysis and correct even hard to resolve errors. This option is always useful, but especially in conjunction with -nop and -ure. Notice: the current development version has a memory leak in the editor, therefore the option is not automatically turned on. Boolean value Yes/No No

-[no]sem boolean If set to 'Y' the automatic editor will not take error hypotheses with a low probability into account, even if all the requirements to make an edit are fulfilled. Boolean value Yes/No Yes

-ct integer The higher this value, the more strict the automatic editor will apply its internal rule set. Going below 40 is not recommended. Integer from 1 to 100 50

-outproject string Default is mira. Defines the output project name for this assembly. The output project name automatically influences the name of output files or directories only Any string $(project)

-sssip boolean Controls whether �unimportant� singlets are written to the result files. Boolean value Yes/No No

-[no]stsip boolean Controls whether singlets which have certain tags (SRMr, CRMr, WRMr, SROr, SAOr, SIOr) are written to the result files, even if [-sssip] is set. Boolean value Yes/No Yes

-[no]rrol boolean Removes log files once they should not be needed anymore during the assembly process. Boolean value Yes/No Yes

-rld boolean Removes the complete log directory at the end of the assembly process. Some logs contain useful information that you may want to analyse though. Boolean value Yes/No No

-[no]orc boolean Output CAF results Boolean value Yes/No Yes

-[no]orf boolean Output FASTA results Boolean value Yes/No Yes

-org boolean Output GAP4DA results Boolean value Yes/No No

-[no]ora boolean Output phrap ACE results Boolean value Yes/No Yes

-orh boolean Output HTML results Boolean value Yes/No No

-[no]ors boolean Output transposed contig summary results Boolean value Yes/No Yes

-ort boolean Output simple text results Boolean value Yes/No No

-[no]orw boolean Output wiggle results Boolean value Yes/No Yes

-[no]otc boolean Output temporary CAF results Boolean value Yes/No Yes

-otm boolean Output temporary MAF results Boolean value Yes/No No

-otf boolean Output temporary FASTA results Boolean value Yes/No No

-otg boolean Output temporary GAP4 results Boolean value Yes/No No

-ota boolean Output temporary phrap ACE results Boolean value Yes/No No

-oth boolean Output temporary HTML results Boolean value Yes/No No

-ots boolean Output temporary transposed contig summary results Boolean value Yes/No No

-ott boolean Output temporary text results Boolean value Yes/No No

-oetc boolean Output extra temporary CAF results Boolean value Yes/No No

-oetf boolean Output extra temporary FASTA results Boolean value Yes/No No

-oetg boolean Output extra temporary GAP4DA results Boolean value Yes/No No

-oeta boolean Output extra temporary phrap ACE results Boolean value Yes/No No

-oeth boolean Output extra temporary HTML results Boolean value Yes/No No

-oetas boolean Output extra temporary also singlets results Boolean value Yes/No No

-tcpl integer When producing an output in text format (-ort|ott|oett), this parameter defines how many bases each line of an alignment should contain. Integer 1 or more 60

-hcpl integer When producing an output in text format (-orh|oth|oeth), this parameter defines how many bases each line of an alignment should contain. Integer 1 or more 60

-tegfc string When producing an output in text format (-ort|ott|oett), endgaps are filled up with this character. Any string

-hegfc string When producing an output in HTML format (-orh|oth|oeth), endgaps are filled up with this character. Any string

-[no]sdlpo boolean Defines whether the spoiler detection algorithms are run only for the last pass or for all passes (-nop). Takes effect only if spoiler detection (-sd) is on. Boolean value Yes/No Yes

-tpae boolean This option is useful in EST assembly. Poly-AT stretches at the end of reads that were not correctly masked or clipped in pre-processing steps from external programs get tagged here. The assembler will not use these stretches for critical operations. Additionally, the tags do provide a good visual anchor when looking at the assembly with different programs. Boolean value Yes/No No

-pbwl integer Only takes effect when -tpae is set to 'Y'. Defines the window length within which all bases (except the maximum number of errors allowed) must be either A or T to be considered a polybase stretch. Integer 1 or more 7

-pbwme integer Only takes effect when -tpae is set to 'Y. Defines the maximum number of errors allowed in a given window length such that a stretch is considered to be a polybase stretch. The distribution of these errors is not important. Integer 1 or more 2

-pbwgd integer Only takes effect when -tpae is set to 'Y'. Defines the number of bases from the end of a sequence (if masked, from the end of the masked area) within which a polybase stretch is looked for without finding one. Integer 1 or more 9

-[no]pvc boolean Mira will try to identify possible sequencing vector relicts present at the start of a sequence and clip them away. These relicts are usually a few bases long and were not correctly removed from the sequence in data pre-processing steps of external programs. You might want to turn off this option if you know (or think) that your data contains a lot of repeats and the option below to fine tune the clipping behaviour does not give the expected results. Boolean value Yes/No Yes

-pvcmla integer The clipping of possible vector relicts option works quite well. Unfortunately the bounds of repeats or differences in EST splice variants sometimes show the same alignment behaviour as possible sequencing vector relicts and could therefore also be clipped. To stop the vector clipping from mistakenly clipping repetitive regions or EST splice variants, this option puts an upper bound to the number of bases a potential clip is allowed to have. If the number of bases is below or equal to this threshold then the bases are clipped. If the number of bases exceeds the threshold then the clip is NOT performed. Setting the value to 0 turns off the threshold i.e. clips are then always performed if a potential vector is found. Integer 0 or more 18

-qc boolean Default is 'N', but is automatically set to 'Y' when using the setparam options 'fasta' or 'phd' (can be turned off again by subsequent options afterwards). This will let mira perform its own quality clipping before sequences are entered into the assembly. The clip function performed is a sequence end window quality clip with back iteration to get a maximum number of bases as useful sequence. Note that the bases clipped away here can still be used afterwards if there is enough evidence supporting their correctness when the option -ure is turned on. Boolean value Yes/No No

-an list When adding reads to a contig, dangerous regions can get an extra integrity check. none = no extra check. text = check is only text-based. signal = check is signal based, if the SCF trace is not available, fallback is 'text'. For the time being, only regions tagged as ALUS or REPT in the experiment file are considered dangerous.
none (None)
text (Text)
signal (Signal)
signal

-dmer integer When adding reads to a contig, reject the reads if the error in zones known as dangerous exceeds the given value in %. Lower values mean stricter checking in these danger zones. For the time being, only regions tagged as ALUS or REPT in the experiment file are considered dangerous. Integer from 1 to 100 1

-dismin integer The minimum distance that read pairs may be apart. There is an additional error margin of 10% subtracted from this value during internal computations. Integer 0 or more 500

-dismax integer The maximum distance that read pairs may be apart. There is an additional error margin of 10% added to this value during internal computations. Integer 0 or more 5000

-oett boolean Output extra temporary TXT results Boolean value Yes/No No

-gapfda string Defines the extension of the directory where mira will write the result of an assembly ready to import into the Staden package (GAP4) in Direct Assembly format. The name of the directory will then be <projectname>_.<extension> Any string gap4da

-log string Defines the directory where mira will write some log files to. Note that the name of the actual project will be prepended. Any string miralog

-co string Defines the file in CAF format to save an assembled project to. Filename must end with '.caf'. Any string mira_out.caf

Associated qualifiers

"-expdir" associated directory qualifiers

-extension string Default file extension Any string

"-scfdir" associated directory qualifiers

-extension string Default file extension Any string

General qualifiers

-auto boolean Turn off prompts Boolean value Yes/No N

-stdout boolean Write first file to standard output Boolean value Yes/No N

-filter boolean Read first file from standard input, write first file to standard output Boolean value Yes/No N

-options boolean Prompt for standard and additional values Boolean value Yes/No N

-debug boolean Write debug output to program.dbg Boolean value Yes/No N

-verbose boolean Report some/full command line options Boolean value Yes/No Y

-help boolean Report command line options and exit. More information on associated and general qualifiers can be found with -help -verbose Boolean value Yes/No N

-warning boolean Report warnings Boolean value Yes/No Y

-error boolean Report errors Boolean value Yes/No Y

-fatal boolean Report fatal errors Boolean value Yes/No Y

-die boolean Report dying program messages Boolean value Yes/No Y

-version boolean Report version number and exit Boolean value Yes/No N

Input file format

emira reads any normal sequence USAs.

Output file format

emira outputs a graph to the specified graphics device. outputs a report format file. The default format is ...

Output files for usage example

File: EdIt.log

Directory: cjejuni_demo_info

This directory contains output files.

Directory: cjejuni_demo_log

This directory contains output files.

Directory: cjejuni_demo_results

This directory contains output files.

Data files

**************** EDIT HERE ****************

Notes

None.

References

None.

Warnings

None.

Diagnostic Error Messages

None.

Exit status

It always exits with status 0.

Known bugs

None.

Program name	Description
emiraest	MIRAest fragment assembly program

Author(s)

This program is an EMBOSS wrapper for a program written by Bastien Chevreux as part of the MIRA package.

History

Target users

This program is intended to be used by everyone and everything, from naive users to embedded scripts.

Comments

None