emira |
Please help by correcting and extending the Wiki pages.
The swiss army knife of sequence assembly for efficient and accurate sequence assembly jobs. Particularly well suited to the assembly of extremely 'unfriendly' projects containing lots of repetitive sequences.
It perform true hybrid de-novo assemblies using reads gathered through Sanger, 454 or Solexa sequencing technologies. That is, it assembles reads instead of a mix of (eventually shredded) shredded consensus sequence and reads. It works for Sanger/454, and also with Sanger/Solexa or 454/Solexa or Sanger/454/Solexa. The length of the Solexa sequences is not restricted, they can be 36mers to 150mers or more.
MIRA contains integrated editors for Sanger and 454 sequences which iteratively remove many sequencing errors from the assembly project and improve the overal alignment quality.
It can also be used for mapping assemblies and automatic tagging of difference site (SNPs, insertions or deletions) of mutant strains against a reference sequence.
For organisms without exon/intron gene structure (bacteria, viruses etc.) and where annotated files in GenBank format are available, MIRA can generate tables which are ready to use for biologists as they show exactly which genes are hit and give a first estimate whether the function of the protein is attained by the change.
% emira -setparam fasta -project cjejuni_demo -genome accurate -mxti -rns tigr -orh MIRA fragment assembly program This is MIRA V2.8.3 (production version). Please cite: Chevreux, B., Wetter, T. and Suhai, S. (1999), Genome Sequence Assembly Using Trace Signals and Additional Sequence Information. Computer Science and Biology: Proceedings of the German Conference on Bioinformatics (GCB) 99, pp. 45-56. Mail questions, bug reports, ideas or suggestions to: bach@chevreux.org Compiled in boundtracking mode. Compiled in bugtracking mode. Parsing parameters: -genomeaccurate -fasta -GE:project=cjejuni_demo -GE:mxti=yes -OUT:orh=yes -GE:rns=tigr Using quickmode switch -genomeaccurate : -GE:uti=yes -AS:mrl=40:nop=4:sep=yes:rbl=4:sd=yes:sdlpo=yes:ugpf=yes -DP:ure=yes:rewl=30:rewme=2:feip=0;leip=0:tpae=no -CL:pvc=yes:pvcmla=18:qc=no:mbc=no:emlc=yes:mlcr=25:smlc=30 -SK:bph=16:hss=4:pr=45:mhpr=200 -AL:bip=20:bmin=25:bmax=130:mo=15:ms=30:mrs=65:egp=yes:egpl=low -CO:rodirs=25:mr=yes:asir=no:mrpg=2:emea=25 amgb=yes:amgbemc=yes:amgbnbs=yes -ED:ace=no Using quickmode switch fasta : -GE:lj=fasta Parameters parsed without error, perfect. Used parameter settings: General (-GE): Project name (pro) : cjejuni_demo Load job (lj) : FASTA file (fasta) Filecheck only (fo) : No External quality (eq) : from SCF (scf) Ext. qual. override (eqo) : No Discard reads on e.q. error (droeqe): No Read naming scheme (rns) : TIGR (tigr) Merge with XML trace info (mxti) : Yes Use template information (uti) : Yes EST-assembly start step (ess) : 1 Assembly options (-AS): Minimum read length (mrl) : 40 Number of passes (nop) : 4 Skim each pass (sep) : Yes Maximum number of RMB break loops (rbl) : 4 Spoiler detection (sd) : Yes Last pass only (sdlpo) : Yes Base default quality (bdq) : Yes Use genomic pathfinder (ugpf) : Yes Use emergency search stop (uess) : Yes ESS partner depth (esspd) : 500 Use emergency blacklist (uebl) : Yes Use max. contig build time (umcbt) : No Build time in seconds (bts) : 10000 Strain and backbone options (-SB): Load straindata (lsd) : No Load backbone (lb) : No Start backbone usage in pass (sbuip): 3 Backbone strain name (bsn) : (none) Backbone file type (bft) : FASTA file (fasta) Backbone rail length (brl) : 2500 Backbone base quality (bbq) : 0 Also build new contigs (abnc) : Yes Dataprocessing options (-DP): Use read extensions (ure) : Yes Read extension window length (rewl) : 30 Read extension w. maxerrors (rewme) : 2 First extension in pass (feip) : 0 Last extension in pass (leip) : 0 Tag poly A/T at ends (tpae) : No Polybase window length (pbwl) : 7 Polybase window maxerrors (pbwme) : 2 Polyb. window grace distance (pbwgc): 9 Clipping options (-CL): Possible vector leftover clip (pvc) : Yes maximum len allowed (pvcmla) : 18 Quality clip (qc) : No Minimum quality (qcmq) : 20 Window length (qcwl) : 30 Masked bases clip (mbc) : No Gap size (mbcgs) : 20 Max front gap (mbcmfg) : 40 Max end gap (mbcmeg) : 60 Ensure minimum left clip (emlc) : Yes Minimum left clip req. (mlcr) : 25 Set minimum left clip to (smlc) : 30 Parameters for SKIM algorithm (-SK): Bases per hash (bph) : 16 Hash save stepping (hss) : 4 Percent required (pr) : 45 Maximum hashes in memory (mhim) : 15000000 Max hits per read (mhpr) : 200 Align parameters for Smith-Waterman align (-AL): Bandwidth in percent (bip) : 20 Bandwidth max (bmax) : 130 Bandwidth min (bmin) : 25 Minimum score (ms) : 30 Minimum overlap (mo) : 15 Minimum relative score in % (mrs) : 65 Extra gap penalty (egp) : Yes extra gap penalty level (egpl) : low Max. egp in percent (megpp) : 100 Contig parameters (-CO): Name prefix (np) : cjejuni_demo Error analysis (an) : SCF signal (signal) Reject on drop in relative alignment score (%) : 25 Max. error rate in dangerous zones in % (dmer) : 1 Mark repeats (mr) : Yes Assume SNP instead of repeats (asir) : No Minimum reads per group needed for tagging (mrpg) : 2 Minimum neighbour quality needed for tagging (mnq) : 20 Minimum Group Quality needed for RMB Tagging (mgqrt) : 30 End-read Marking Exclusion Area in bases (emea) : 25 Also mark gap bases (amgb) : Yes Also mark gap bases - even multicolumn (amgbemc) : Yes Also mark gap bases - need both strands (amgbnbs): Yes Default template insert size minimum (dismin) : 500 Default template insert size maximum (dismax) : 5000 Edit options (-ED): Automatic contig editing (ace) : No Strict editing mode (sem) : No Confirmation threshold in percent (ct): 50 Directories (-DI): When loading EXP files: When loading SCF files: For writing log files : cjejuni_demo_log For writing gap4 DA res.: cjejuni_demo_out Input files (-FI): When loading EXP fofn : cjejuni_demo_in.fofn When loading project from PHD : cjejuni_demo_in.phd.1 When loading project from CAF : cjejuni_demo_in.caf When loading sequences from FASTA : cjejuni_demo_in.fasta When loading qualities from FASTA quality: cjejuni_demo_in.fasta.qual When loading straindata : cjejuni_demo_straindata_in.txt When loading XML trace info files : cjejuni_demo_traceinfo_in.xml When loading backbone from CAF : cjejuni_demo_backbone_in.caf When loading backbone from GenBank : cjejuni_demo_backbone_in.gbf When loading backbone from FASTA : cjejuni_demo_backbone_in.fasta Output files (-OUTPUT/-OUT): Result files: Saved as CAF (orc): Yes Saved as FASTA (orf): Yes Saved as GAP4 (directed assembly) (org): Yes Saved as phrap ACE (ora): Yes Saved as HTML (orh): Yes Saved as Transposed Contig Summary (ors): Yes Saved as simple text format (ort): Yes Temporary result files: Saved as CAF (otc): No Saved as FASTA (otf): No Saved as GAP4 (directed assembly) (otg): No Saved as phrap ACE (ota): No Saved as HTML (oth): No Saved as Transposed Contig Summary(ots): No Saved as simple text format (ott): No Extended temporary result files: Saved as CAF (oetc): No Saved as FASTA (oetf): No Saved as GAP4 (directed assembly) (oetg): No Saved as phrap ACE (oeta): No Saved as HTML (oeth): No Save also singlets (oetas): No Alignment output customisation: TEXT characters per line (tcpl): 60 HTML characters per line (hcpl): 60 TEXT characters per line (tegfc): ' ' HTML characters per line (hegfc): ' ' File / directory names: CAF : cjejuni_demo_out.caf FASTA : cjejuni_demo_out.unpadded.fasta FASTA quality : cjejuni_demo_out.unpadded.fasta.qual FASTA (padded) : cjejuni_demo_out.padded.fasta FASTA qual.(pad): cjejuni_demo_out.padded.fasta.qual GAP4 (directory): cjejuni_demo_out.gap4da ACE : cjejuni_demo_out.ace HTML : cjejuni_demo_out.html Simple text : cjejuni_demo_out.txt TCS overview : cjejuni_demo_out.tcs Creating directory cjejuni_demo_log ... done. Creating directory cjejuni_demo_results ... done. Creating directory cjejuni_demo_info ... done. Localtime: Thu Jul 15 12:00:00 2010 Loading data normal (probably Sanger type) from FASTA file cjejuni_demo_in.fasta Counting sequences in FASTA file: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Loading sequence data from FASTA file: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Loading quality data from FASTA quality file: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Done. There haven been 544 reads given, 544 of which have quality accounted for. Localtime: Thu Jul 15 12:00:00 2010 Checking SCF files (loading qualities only if needed): [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Done. 0 SCF files loaded ok. 544 SCF files were not found (see 'cjejuni_demo_log/cjejuni_demo_info_scfreadfail.0' for a list of names). Localtime: Thu Jul 15 12:00:00 2010 Merging data from XML trace info file cjejuni_demo_traceinfo_in.xml ...Num reads: 496 Building hash table ... done. Localtime: Thu Jul 15 12:00:00 2010 Generated 288 unique template ids for 544 valid reads. Done merging XML data, matched 496 reads. Localtime: Thu Jul 15 12:00:00 2010 Checking SCF files (loading qualities only if needed): [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Done. 0 SCF files loaded ok. 544 SCF files were not found (see 'cjejuni_demo_log/cjejuni_demo_info_scfreadfail.0' for a list of names). Starting minimum left vector clip ... done. Pool has 544 reads . Checking reads for trace data: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] No SCF data present in any read, automatic contig editing is now switched off. 544 reads with valid data for assembly. For the reads that are neither backbones nor rails: - 0 reads have not enough good bases for assembly. - 544 reads used for assembly. - 0 reads have no real quality (see miralog.noqualities). - mean length of good parts of used reads: 626 Localtime: Thu Jul 15 12:00:00 2010 Generated 288 unique template ids for 544 valid reads. Localtime: Thu Jul 15 12:00:00 2010 Generated 0 unique strain ids for 544 reads. Localtime: Thu Jul 15 12:00:00 2010 Searching for possible overlaps: Localtime: Thu Jul 15 12:00:00 2010 We will get 1 partitions. Progressend: 1088 Now running partitioned skimmer with 1 partitions: Working on partition 1/1 Will contain read IDs 0 to 543 [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Total megahubs: 0 Skim summary: accepted: 4243 possible: 4607 permbans: 0 Hits chosen: 4243 Localtime: Thu Jul 15 12:00:00 2010 Pre-assembly alignment search for read extension and / or vector clipping: Making alignments. Localtime: Thu Jul 15 12:00:00 2010 Aligning possible forward matches: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Localtime: Thu Jul 15 12:00:00 2010 Aligning possible complement matches: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Localtime: Thu Jul 15 12:00:00 2010 Calculating possible vector leftovers ... done. Loading confirmed overlaps from disk (will need approximately 1.2 M.): [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Sorting confirmed overlaps (this may take a while) ... done. Generating clusters: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Pre-assembly read extension: Localtime: Thu Jul 15 12:00:00 2010 Searching possible read extensions: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Changed length of 258 sequences. Mean length gained in these sequences: 73.2713 bases. Pre-assembly vector clipping Performing vector clipping ... done. Pool has 544 reads . Checking reads for trace data: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] No SCF data present in any read, automatic contig editing is now switched off. 544 reads with valid data for assembly. For the reads that are neither backbones nor rails: - 0 reads have not enough good bases for assembly. - 544 reads used for assembly. - 0 reads have no real quality (see miralog.noqualities). - mean length of good parts of used reads: 660 Localtime: Thu Jul 15 12:00:00 2010 Generated 288 unique template ids for 544 valid reads. Localtime: Thu Jul 15 12:00:00 2010 Generated 0 unique strain ids for 544 reads. Localtime: Thu Jul 15 12:00:00 2010 Searching for possible overlaps: Localtime: Thu Jul 15 12:00:00 2010 We will get 1 partitions. Progressend: 1088 Now running partitioned skimmer with 1 partitions: Working on partition 1/1 Will contain read IDs 0 to 543 [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Total megahubs: 0 Skim summary: accepted: 4512 possible: 4913 permbans: 0 Hits chosen: 4512 Localtime: Thu Jul 15 12:00:00 2010 Pass: 1 Making alignments. Localtime: Thu Jul 15 12:00:00 2010 Aligning possible forward matches: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Localtime: Thu Jul 15 12:00:00 2010 Aligning possible complement matches: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Localtime: Thu Jul 15 12:00:00 2010 Calculating possible vector leftovers ... done. Loading confirmed overlaps from disk (will need approximately 1.3 M.): [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Sorting confirmed overlaps (this may take a while) ... done. Generating clusters: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Localtime: Thu Jul 15 12:00:00 2010 Building new contig 1 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 544 +[1] t+t++++a+aaaaar RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 1 Contig length: 2467 Avg. contig coverage: 2.36 Consensus contains: A: 701 C: 457 G: 592 T: 690 N: 0 IUPAC: 7 Funny: 0 *: 20 Num reads: 7 Avg. read length: 833 Reads contain 5780 bases, 0 Ns and 55 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 2 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 537 +[1] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [61] ++++++++++++++++++++++++++++++++++++++++++++++++++++++t+++++ [120] ++++++++++++++++++++++++++++++++++++++++++a+++a+++++++++++++ [178] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [238] ++++++++++++a+++++a+++++++++++++++++++++++++++++++++++++++++ [296] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [356] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [416] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [476] +++++a++++++++a+a++++++++++++++++a++++++++++++++++++++ RL1 [526] aaaThat's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 2 Contig length: 40028 Avg. contig coverage: 8.66 Consensus contains: A: 13590 C: 5845 G: 6941 T: 13404 N: 0 IUPAC: 24 Funny: 0 *: 224 Num reads: 526 Avg. read length: 659 Reads contain 343983 bases, 0 Ns and 2661 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Marking possibly misassembled repeats ...done. Found - 1 Strong RMB - 3 Weak RMB - 0 SNP positions tagged.Transfering contig RMB permanent pair bans. Transfering tags to readpool. The previously assembled contig had grave misassemblies, rebuilding contig 2 now. Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 537 +[1] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [61] ++++++++++++++++++++++++++++++++++++++++++++++++++++++t+++++ [120] ++++++++++++++++++++++++++++++++++++++++++a+++a+++++++++++++ [178] +++++++++++++++++++++++++++++++++++p+++p++++++++++++++++++++ [236] +++++++++a+++++a++++++++++++++++++++++++++++++++++++++++++++ [294] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [354] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [414] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [474] +++++++++a++++p+a+p+++++++++a+++++a+++++++++++++++++++++ RL1 [524] aaapThat's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 2 Contig length: 40021 Avg. contig coverage: 8.62 Consensus contains: A: 13590 C: 5845 G: 6951 T: 13404 N: 0 IUPAC: 14 Funny: 0 *: 217 Num reads: 524 Avg. read length: 658 Reads contain 342555 bases, 0 Ns and 2577 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Marking possibly misassembled repeats ...done. Found - 0 Strong RMB - 3 Weak RMB - 0 SNP positions tagged.Transfering contig RMB permanent pair bans. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 3 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 13 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 3 Contig length: 805 Avg. contig coverage: 1 Consensus contains: A: 303 C: 146 G: 115 T: 241 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 805 Reads contain 805 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 4 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 12 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 4 Contig length: 788 Avg. contig coverage: 1 Consensus contains: A: 285 C: 152 G: 124 T: 227 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 788 Reads contain 788 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 5 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 11 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 5 Contig length: 788 Avg. contig coverage: 1 Consensus contains: A: 254 C: 118 G: 133 T: 283 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 788 Reads contain 788 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 6 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 10 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 6 Contig length: 786 Avg. contig coverage: 1 Consensus contains: A: 281 C: 129 G: 138 T: 238 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 786 Reads contain 786 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 7 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 9 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 7 Contig length: 865 Avg. contig coverage: 1 Consensus contains: A: 314 C: 149 G: 103 T: 299 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 865 Reads contain 865 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 8 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 8 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 8 Contig length: 963 Avg. contig coverage: 1 Consensus contains: A: 215 C: 286 G: 205 T: 257 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 963 Reads contain 963 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 9 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 7 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 9 Contig length: 1052 Avg. contig coverage: 1 Consensus contains: A: 308 C: 286 G: 166 T: 292 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 1052 Reads contain 1052 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 10 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 6 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 10 Contig length: 563 Avg. contig coverage: 1 Consensus contains: A: 195 C: 71 G: 110 T: 187 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 563 Reads contain 563 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 11 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 5 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 11 Contig length: 893 Avg. contig coverage: 1 Consensus contains: A: 251 C: 177 G: 136 T: 329 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 893 Reads contain 893 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 12 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 4 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 12 Contig length: 478 Avg. contig coverage: 1 Consensus contains: A: 116 C: 160 G: 101 T: 101 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 478 Reads contain 478 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 13 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 3 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 13 Contig length: 869 Avg. contig coverage: 1 Consensus contains: A: 286 C: 245 G: 93 T: 245 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 869 Reads contain 869 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 14 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 2 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 14 Contig length: 973 Avg. contig coverage: 1 Consensus contains: A: 266 C: 228 G: 254 T: 225 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 973 Reads contain 973 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 15 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 1 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 15 Contig length: 972 Avg. contig coverage: 1 Consensus contains: A: 284 C: 230 G: 123 T: 335 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 972 Reads contain 972 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Saving project statistics to file: cjejuni_demo_log/cjejuni_demo_info_contigstats_pass.1.txt Saving read tag list to file: cjejuni_demo_log/cjejuni_demo_info_readtaglist.1.txt Saving contig tag list to file: cjejuni_demo_log/cjejuni_demo_info_consensustaglist.1.txt Saving project contig<->read list to file: cjejuni_demo_log/cjejuni_demo_info_contigreadlist_pass.1.txt Pass: 2 Performing vector clipping ... done. Pool has 544 reads . Checking reads for trace data: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] No SCF data present in any read, automatic contig editing is now switched off. 544 reads with valid data for assembly. For the reads that are neither backbones nor rails: - 0 reads have not enough good bases for assembly. - 544 reads used for assembly. - 0 reads have no real quality (see miralog.noqualities). - mean length of good parts of used reads: 660 Localtime: Thu Jul 15 12:00:00 2010 Generated 288 unique template ids for 544 valid reads. Localtime: Thu Jul 15 12:00:00 2010 Generated 0 unique strain ids for 544 reads. Localtime: Thu Jul 15 12:00:00 2010 Searching for possible overlaps: Localtime: Thu Jul 15 12:00:00 2010 We will get 1 partitions. Progressend: 1088 Now running partitioned skimmer with 1 partitions: Working on partition 1/1 Will contain read IDs 0 to 543 [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Total megahubs: 0 Skim summary: accepted: 4512 possible: 4913 permbans: 0 Hits chosen: 4512 Localtime: Thu Jul 15 12:00:00 2010 Making alignments. Localtime: Thu Jul 15 12:00:00 2010 Aligning possible forward matches: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Localtime: Thu Jul 15 12:00:00 2010 Aligning possible complement matches: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Localtime: Thu Jul 15 12:00:00 2010 Calculating possible vector leftovers ... done. Loading confirmed overlaps from disk (will need approximately 1.3 M.): [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Sorting confirmed overlaps (this may take a while) ... done. Generating clusters: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Localtime: Thu Jul 15 12:00:00 2010 Building new contig 1 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 544 +[1] t+t++++a+aaaaar RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 1 Contig length: 2467 Avg. contig coverage: 2.36 Consensus contains: A: 701 C: 457 G: 592 T: 690 N: 0 IUPAC: 7 Funny: 0 *: 20 Num reads: 7 Avg. read length: 833 Reads contain 5780 bases, 0 Ns and 55 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 2 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 537 +[1] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [61] +++++++++++++++++++++++++++++++++++++++++++++++++++t++++++++ [120] ++++++++++++++++++++++++++++++++++++++a+a++++++aa+++++++++++ [176] +++++++++++++++++++++++++++++++++++++p++++++++++p+++++++++++ [234] ++++++++++a+++++a+++++++++++++++++++++++++++++++++++++++++++ [292] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [352] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [412] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [472] +++++++++++++++++p+++++++a+++a+++++++++++++++++++++++++ RL1 [524] aapaThat's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 2 Contig length: 40021 Avg. contig coverage: 8.62 Consensus contains: A: 13590 C: 5845 G: 6951 T: 13404 N: 0 IUPAC: 14 Funny: 0 *: 217 Num reads: 524 Avg. read length: 658 Reads contain 342548 bases, 0 Ns and 2577 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Marking possibly misassembled repeats ...done. Found - 0 Strong RMB - 3 Weak RMB - 0 SNP positions tagged.Transfering contig RMB permanent pair bans. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 3 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 13 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 3 Contig length: 805 Avg. contig coverage: 1 Consensus contains: A: 303 C: 146 G: 115 T: 241 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 805 Reads contain 805 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 4 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 12 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 4 Contig length: 788 Avg. contig coverage: 1 Consensus contains: A: 285 C: 152 G: 124 T: 227 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 788 Reads contain 788 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 5 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 11 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 5 Contig length: 788 Avg. contig coverage: 1 Consensus contains: A: 254 C: 118 G: 133 T: 283 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 788 Reads contain 788 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 6 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 10 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 6 Contig length: 786 Avg. contig coverage: 1 Consensus contains: A: 281 C: 129 G: 138 T: 238 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 786 Reads contain 786 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 7 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 9 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 7 Contig length: 865 Avg. contig coverage: 1 Consensus contains: A: 314 C: 149 G: 103 T: 299 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 865 Reads contain 865 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 8 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 8 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 8 Contig length: 963 Avg. contig coverage: 1 Consensus contains: A: 215 C: 286 G: 205 T: 257 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 963 Reads contain 963 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 9 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 7 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 9 Contig length: 1052 Avg. contig coverage: 1 Consensus contains: A: 308 C: 286 G: 166 T: 292 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 1052 Reads contain 1052 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 10 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 6 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 10 Contig length: 563 Avg. contig coverage: 1 Consensus contains: A: 195 C: 71 G: 110 T: 187 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 563 Reads contain 563 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 11 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 5 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 11 Contig length: 893 Avg. contig coverage: 1 Consensus contains: A: 251 C: 177 G: 136 T: 329 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 893 Reads contain 893 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 12 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 4 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 12 Contig length: 478 Avg. contig coverage: 1 Consensus contains: A: 116 C: 160 G: 101 T: 101 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 478 Reads contain 478 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 13 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 3 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 13 Contig length: 869 Avg. contig coverage: 1 Consensus contains: A: 286 C: 245 G: 93 T: 245 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 869 Reads contain 869 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 14 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 2 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 14 Contig length: 973 Avg. contig coverage: 1 Consensus contains: A: 266 C: 228 G: 254 T: 225 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 973 Reads contain 973 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 15 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 1 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 15 Contig length: 972 Avg. contig coverage: 1 Consensus contains: A: 284 C: 230 G: 123 T: 335 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 972 Reads contain 972 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Saving project statistics to file: cjejuni_demo_log/cjejuni_demo_info_contigstats_pass.2.txt Saving read tag list to file: cjejuni_demo_log/cjejuni_demo_info_readtaglist.2.txt Saving contig tag list to file: cjejuni_demo_log/cjejuni_demo_info_consensustaglist.2.txt Saving project contig<->read list to file: cjejuni_demo_log/cjejuni_demo_info_contigreadlist_pass.2.txt Pass: 3 Performing vector clipping ... done. Pool has 544 reads . Checking reads for trace data: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] No SCF data present in any read, automatic contig editing is now switched off. 544 reads with valid data for assembly. For the reads that are neither backbones nor rails: - 0 reads have not enough good bases for assembly. - 544 reads used for assembly. - 0 reads have no real quality (see miralog.noqualities). - mean length of good parts of used reads: 660 Localtime: Thu Jul 15 12:00:00 2010 Generated 288 unique template ids for 544 valid reads. Localtime: Thu Jul 15 12:00:00 2010 Generated 0 unique strain ids for 544 reads. Localtime: Thu Jul 15 12:00:00 2010 Searching for possible overlaps: Localtime: Thu Jul 15 12:00:00 2010 We will get 1 partitions. Progressend: 1088 Now running partitioned skimmer with 1 partitions: Working on partition 1/1 Will contain read IDs 0 to 543 [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Total megahubs: 0 Skim summary: accepted: 4498 possible: 4913 permbans: 14 Hits chosen: 4498 Localtime: Thu Jul 15 12:00:00 2010 Making alignments. Localtime: Thu Jul 15 12:00:00 2010 Aligning possible forward matches: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Localtime: Thu Jul 15 12:00:00 2010 Aligning possible complement matches: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Localtime: Thu Jul 15 12:00:00 2010 Calculating possible vector leftovers ... done. Loading confirmed overlaps from disk (will need approximately 1.3 M.): [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Sorting confirmed overlaps (this may take a while) ... done. Generating clusters: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Localtime: Thu Jul 15 12:00:00 2010 Building new contig 1 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 544 +[1] t+t++++a+aaaaar RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 1 Contig length: 2467 Avg. contig coverage: 2.36 Consensus contains: A: 701 C: 457 G: 592 T: 690 N: 0 IUPAC: 7 Funny: 0 *: 20 Num reads: 7 Avg. read length: 833 Reads contain 5780 bases, 0 Ns and 55 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 2 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 537 +[1] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [61] +++++++++++++++++++++++++++++++++++++++++++++++++++t++++++++ [120] ++++++++++++++++++++++++++++++++++++++a+a++++++aa+++++++++++ [176] +++++++++++++++++++++++++++++++++++++p++++++++++p+++++++++++ [234] ++++++++++a+++++a+++++++++++++++++++++++++++++++++++++++++++ [292] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [352] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [412] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [472] +++++++++++++++++p+++++++a+++a+++++++++++++++++++++++++ RL1 [524] aapaThat's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 2 Contig length: 40021 Avg. contig coverage: 8.62 Consensus contains: A: 13590 C: 5845 G: 6951 T: 13404 N: 0 IUPAC: 14 Funny: 0 *: 217 Num reads: 524 Avg. read length: 658 Reads contain 342548 bases, 0 Ns and 2577 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Marking possibly misassembled repeats ...done. Found - 0 Strong RMB - 3 Weak RMB - 0 SNP positions tagged.Transfering contig RMB permanent pair bans. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 3 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 13 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 3 Contig length: 805 Avg. contig coverage: 1 Consensus contains: A: 303 C: 146 G: 115 T: 241 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 805 Reads contain 805 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 4 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 12 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 4 Contig length: 788 Avg. contig coverage: 1 Consensus contains: A: 285 C: 152 G: 124 T: 227 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 788 Reads contain 788 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 5 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 11 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 5 Contig length: 788 Avg. contig coverage: 1 Consensus contains: A: 254 C: 118 G: 133 T: 283 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 788 Reads contain 788 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 6 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 10 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 6 Contig length: 786 Avg. contig coverage: 1 Consensus contains: A: 281 C: 129 G: 138 T: 238 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 786 Reads contain 786 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 7 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 9 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 7 Contig length: 865 Avg. contig coverage: 1 Consensus contains: A: 314 C: 149 G: 103 T: 299 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 865 Reads contain 865 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 8 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 8 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 8 Contig length: 963 Avg. contig coverage: 1 Consensus contains: A: 215 C: 286 G: 205 T: 257 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 963 Reads contain 963 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 9 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 7 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 9 Contig length: 1052 Avg. contig coverage: 1 Consensus contains: A: 308 C: 286 G: 166 T: 292 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 1052 Reads contain 1052 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 10 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 6 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 10 Contig length: 563 Avg. contig coverage: 1 Consensus contains: A: 195 C: 71 G: 110 T: 187 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 563 Reads contain 563 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 11 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 5 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 11 Contig length: 893 Avg. contig coverage: 1 Consensus contains: A: 251 C: 177 G: 136 T: 329 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 893 Reads contain 893 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 12 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 4 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 12 Contig length: 478 Avg. contig coverage: 1 Consensus contains: A: 116 C: 160 G: 101 T: 101 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 478 Reads contain 478 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 13 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 3 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 13 Contig length: 869 Avg. contig coverage: 1 Consensus contains: A: 286 C: 245 G: 93 T: 245 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 869 Reads contain 869 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 14 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 2 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 14 Contig length: 973 Avg. contig coverage: 1 Consensus contains: A: 266 C: 228 G: 254 T: 225 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 973 Reads contain 973 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 15 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 1 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 15 Contig length: 972 Avg. contig coverage: 1 Consensus contains: A: 284 C: 230 G: 123 T: 335 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 972 Reads contain 972 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Saving project statistics to file: cjejuni_demo_log/cjejuni_demo_info_contigstats_pass.3.txt Saving read tag list to file: cjejuni_demo_log/cjejuni_demo_info_readtaglist.3.txt Saving contig tag list to file: cjejuni_demo_log/cjejuni_demo_info_consensustaglist.3.txt Saving project contig<->read list to file: cjejuni_demo_log/cjejuni_demo_info_contigreadlist_pass.3.txt Localtime: Thu Jul 15 12:00:00 2010 Hunting contig join spoiler ... done. Localtime: Thu Jul 15 12:00:00 2010 Pass: 4 Performing vector clipping ... done. Pool has 544 reads . Checking reads for trace data: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] No SCF data present in any read, automatic contig editing is now switched off. 544 reads with valid data for assembly. For the reads that are neither backbones nor rails: - 0 reads have not enough good bases for assembly. - 544 reads used for assembly. - 0 reads have no real quality (see miralog.noqualities). - mean length of good parts of used reads: 660 Localtime: Thu Jul 15 12:00:00 2010 Generated 288 unique template ids for 544 valid reads. Localtime: Thu Jul 15 12:00:00 2010 Generated 0 unique strain ids for 544 reads. Localtime: Thu Jul 15 12:00:00 2010 Searching for possible overlaps: Localtime: Thu Jul 15 12:00:00 2010 We will get 1 partitions. Progressend: 1088 Now running partitioned skimmer with 1 partitions: Working on partition 1/1 Will contain read IDs 0 to 543 [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Total megahubs: 0 Skim summary: accepted: 4498 possible: 4913 permbans: 14 Hits chosen: 4498 Localtime: Thu Jul 15 12:00:00 2010 Making alignments. Localtime: Thu Jul 15 12:00:00 2010 Aligning possible forward matches: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Localtime: Thu Jul 15 12:00:00 2010 Aligning possible complement matches: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Localtime: Thu Jul 15 12:00:00 2010 Calculating possible vector leftovers ... done. Loading confirmed overlaps from disk (will need approximately 1.3 M.): [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Sorting confirmed overlaps (this may take a while) ... done. Generating clusters: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Localtime: Thu Jul 15 12:00:00 2010 Building new contig 1 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 544 +[1] t+t++++a+aaaaar RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 1 Contig length: 2467 Avg. contig coverage: 2.36 Consensus contains: A: 701 C: 457 G: 592 T: 690 N: 0 IUPAC: 7 Funny: 0 *: 20 Num reads: 7 Avg. read length: 833 Reads contain 5780 bases, 0 Ns and 55 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 2 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 537 +[1] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [61] +++++++++++++++++++++++++++++++++++++++++++++++++++t++++++++ [120] ++++++++++++++++++++++++++++++++++++++a+a++++++aa+++++++++++ [176] +++++++++++++++++++++++++++++++++++++p++++++++++p+++++++++++ [234] ++++++++++a+++++a+++++++++++++++++++++++++++++++++++++++++++ [292] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [352] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [412] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [472] +++++++++++++++++p+++++++a+++a+++++++++++++++++++++++++ RL1 [524] aapaThat's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 2 Contig length: 40021 Avg. contig coverage: 8.62 Consensus contains: A: 13590 C: 5845 G: 6951 T: 13404 N: 0 IUPAC: 14 Funny: 0 *: 217 Num reads: 524 Avg. read length: 658 Reads contain 342548 bases, 0 Ns and 2577 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Marking possibly misassembled repeats ...done. Found - 0 Strong RMB - 3 Weak RMB - 0 SNP positions tagged.Transfering contig RMB permanent pair bans. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 3 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 13 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 3 Contig length: 805 Avg. contig coverage: 1 Consensus contains: A: 303 C: 146 G: 115 T: 241 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 805 Reads contain 805 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 4 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 12 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 4 Contig length: 788 Avg. contig coverage: 1 Consensus contains: A: 285 C: 152 G: 124 T: 227 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 788 Reads contain 788 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 5 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 11 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 5 Contig length: 788 Avg. contig coverage: 1 Consensus contains: A: 254 C: 118 G: 133 T: 283 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 788 Reads contain 788 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 6 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 10 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 6 Contig length: 786 Avg. contig coverage: 1 Consensus contains: A: 281 C: 129 G: 138 T: 238 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 786 Reads contain 786 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 7 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 9 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 7 Contig length: 865 Avg. contig coverage: 1 Consensus contains: A: 314 C: 149 G: 103 T: 299 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 865 Reads contain 865 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 8 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 8 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 8 Contig length: 963 Avg. contig coverage: 1 Consensus contains: A: 215 C: 286 G: 205 T: 257 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 963 Reads contain 963 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 9 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 7 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 9 Contig length: 1052 Avg. contig coverage: 1 Consensus contains: A: 308 C: 286 G: 166 T: 292 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 1052 Reads contain 1052 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 10 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 6 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 10 Contig length: 563 Avg. contig coverage: 1 Consensus contains: A: 195 C: 71 G: 110 T: 187 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 563 Reads contain 563 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 11 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 5 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 11 Contig length: 893 Avg. contig coverage: 1 Consensus contains: A: 251 C: 177 G: 136 T: 329 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 893 Reads contain 893 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 12 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 4 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 12 Contig length: 478 Avg. contig coverage: 1 Consensus contains: A: 116 C: 160 G: 101 T: 101 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 478 Reads contain 478 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 13 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 3 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 13 Contig length: 869 Avg. contig coverage: 1 Consensus contains: A: 286 C: 245 G: 93 T: 245 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 869 Reads contain 869 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 14 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 2 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 14 Contig length: 973 Avg. contig coverage: 1 Consensus contains: A: 266 C: 228 G: 254 T: 225 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 973 Reads contain 973 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Building new contig 15 Localtime: Thu Jul 15 12:00:00 2010 Unused reads: 1 + RL1 That's it for this contig. Finished building the contig. Localtime: Thu Jul 15 12:00:00 2010 -------------- Contig statistics ---------------- Contig id: 15 Contig length: 972 Avg. contig coverage: 1 Consensus contains: A: 284 C: 230 G: 123 T: 335 N: 0 IUPAC: 0 Funny: 0 *: 0 Num reads: 1 Avg. read length: 972 Reads contain 972 bases, 0 Ns and 0 gaps. ------------------------------------------------- Localtime: Thu Jul 15 12:00:00 2010 Saving of extra temporary singlets disabled. Marking possibly misassembled repeats ...done. Found none. Transfering reads to readpool. Localtime: Thu Jul 15 12:00:00 2010 Saving project statistics to file: cjejuni_demo_log/cjejuni_demo_info_contigstats_pass.4.txt Saving read tag list to file: cjejuni_demo_log/cjejuni_demo_info_readtaglist.4.txt Saving contig tag list to file: cjejuni_demo_log/cjejuni_demo_info_consensustaglist.4.txt Saving project contig<->read list to file: cjejuni_demo_log/cjejuni_demo_info_contigreadlist_pass.4.txt Assembly finished, saving final results. Localtime: Thu Jul 15 12:00:00 2010 Saving project statistics to file: cjejuni_demo_info/cjejuni_demo_info_contigstats.txt Localtime: Thu Jul 15 12:00:00 2010 Saving read tag list to file: cjejuni_demo_info/cjejuni_demo_info_readtaglist.txt Localtime: Thu Jul 15 12:00:00 2010 Saving contig tag list to file: cjejuni_demo_info/cjejuni_demo_info_consensustaglist.txt Localtime: Thu Jul 15 12:00:00 2010 Saving project contig<->read list to file: cjejuni_demo_info/cjejuni_demo_info_contigreadlist.txt Localtime: Thu Jul 15 12:00:00 2010 Saving contigs to file: cjejuni_demo_results/cjejuni_demo_out.caf Localtime: Thu Jul 15 12:00:00 2010 Saving contigs to directory: cjejuni_demo_results/cjejuni_demo_out.gap4da (first deleting old directory) (now creating new directory) (saving contigs) Done. Localtime: Thu Jul 15 12:00:00 2010 Saving contigs to FASTA file: cjejuni_demo_results/cjejuni_demo_out.unpadded.fasta Saving padded contigs to FASTA file: cjejuni_demo_results/cjejuni_demo_out.padded.fasta Saving contig qualities to FASTA quality file: cjejuni_demo_results/cjejuni_demo_out.unpadded.fasta.qual Saving padded contig qualities to FASTA quality file: cjejuni_demo_results/cjejuni_demo_out.padded.fasta.qual Localtime: Thu Jul 15 12:00:00 2010 Saving contigs TCS to file: cjejuni_demo_results/cjejuni_demo_out.tcs Localtime: Thu Jul 15 12:00:00 2010 Saving SNP analysis to file: cjejuni_demo_info/cjejuni_demo_info_snpanalysis.txt Saving contigs to file: cjejuni_demo_results/cjejuni_demo_out.txt Localtime: Thu Jul 15 12:00:00 2010 Saving contigs to file: cjejuni_demo_results/cjejuni_demo_out.ace Localtime: Thu Jul 15 12:00:00 2010 Saving contigs to file: cjejuni_demo_results/cjejuni_demo_out.html Localtime: Thu Jul 15 12:00:00 2010 End of assembly process, thank you for using MIRA. |
Go to the output files for this example
MIRA fragment assembly program Version: EMBOSS:6.4.0.0 Standard (Mandatory) qualifiers: -technology menu [sanger] Which sequencing technologies have created your reads (Values: sanger (Dideoxy); 454 (Roche); solexa (Illumina); solid (ABI SOLiD)) -jobtype menu [genome] Are the data you are assembling forming a larger contiguous sequence (choose: genome) or are you assembling small fragments like in EST or mRNA libraries (choose: est) (Values: genome (Whole genome); est (Short fragments)) -method menu [denovo] Are you building an assembly from scratch (choose: denovo) or are you mapping reads to an existing backbone sequence (choose: mapping) (Values: denovo (de novo assembly); mapping (align to a reference sequence)) -grade menu [normal] Quality grades of de-novo assembly or mapping. Draft is quick-and-dirty, suited to get a first look on approximate coverage of a running project. Should not be used for anything else. Normal is the default parameter set of mira that is able to tackle most genomes. A bit slower than the draft version, but includes such options as read extension and vector remnant clipping. Accurate is still slower than the normal mode but should be used for genomes that pose a problem to the normal mode. (Values: draft (Draft); normal (Normal); accurate (Accurate)) Additional (Optional) qualifiers: -setparams menu [unspecified] Sets parameters suited for loading sequences from FASTA, PHD or CAF files. The default is not to specify the type of input file. (Values: unspecified (Unspecified); fasta (Fasta); phd (PHD); caf (CAF)) -highlyrepetitive boolean [N] A modifier switch for genome data that is deemed to be highly repetitive. The assemblies will run slower due to more iterative cycles that give mira a chance to resolve nasty repeats. -noclipping menu [$(technology)] Switches off clipping options for given sequencing technologies. (Values: sanger (Dideoxy); 454 (Roche); solexa (Illumina); solid (ABI SOLiD)) Advanced (Unprompted) qualifiers: -parameterfile infile Loads parameters from the filename given. Allows a maximum of 10 levels of recursion, i.e. a -params option appearing within a file that loads other parameter files -project string [mira] Default is mira. Defines the project name for this assembly. The project name automatically influences the name of input and output files or directories. E.g. in the default setting, the file names for the output of the assembly in FASTA format would be mira_out.fasta and mira_out.fasta.qual. Setting the project name to 'MyProject' would generate MyProject_out.fasta and MyProject_out.fasta.qual. (Any string) -inproject string [$(project)] Default is mira. Defines the input project name for this assembly. The input project name automatically influences the name of input files or directories only (Any string) -bft menu [fasta] Defines the filetype of the backbone file given. Currently (2.8.3) only FASTA, CAF and GBF files are supported. When GBF (GenBank files, also named .gbk) files are loaded, the features within these files are automatically transformed into Staden-compatible tags and get passed through the assembly. (Values: fasta (FASTA); caf (CAF); gbf (Genbank)) -expdir directory [.] Defines the directory where mira should search for experiment files (EXP). -scfdir directory [.] Defines the directory where mira should search for SCF files -feifile infile [$(inproject)_in.fofn] Defines the file of filenames where the names of the EXP files of a project are located. -fpifile infile [$(inproject)_in.fofn] Defines the file of filenames where the names of the PHD files of a project are located. -pifile infile [$(inproject)_in.phd] Defines the PHD file to load sequences of a project from. -faifile infile [$(inproject)_in.fasta] Defines the FASTA file to load sequences of a project from. -fquifile infile [$(inproject)_in.fasta.qual] Defines the fasta file to load base qualities of a project from. Although the order of reads in the quality file does not need to be the same as in the fasta or fofn projects (although it saves a bit of time if they are). -fqifile infile [$(inproject)_in.fastq] Defines the FASTQ file to load sequences of a project from. -cifile infile [$(inproject)_in.caf] Defines the file to load a CAF project from. Filename must end with '.caf'. -sdifile infile [$(inproject)_straindata_in.txt] Defines the file to load straindata from. Only used in EST projects (miraEST). -xtiifile infile [$(inproject)_xmltraceinfo_in.xml] Defines the file to load a trace info file in XML format from. This can be used both when merging XML data to loaded files or when loading a project from an XML trace info file. -svsifile infile [$(inproject)_ssaha2vectorscreen_in.txt] Defines the file to load the info about possible vector sequence stretches. -bbifile infile [$(inproject)_in.$(technology).$(bft)] Defines the file to load the backbone sequence or assembly. Note that you still must define the file type with [-bft]. -[no]traceinfo toggle [Y] Load traceinfo ancilliary data in XML files -lsd boolean [N] Straindata is a key value file, one read per line. First the name of the read, then the strain name of the organism the read comes from. It is used by the program to differentiate different types of SNPs appearing in organisms and classifying them. -brl integer [2500] Parameter for the internal sectioning size of the backbone. Extremely repetitive sequences may require reducing the default value, but the default value should work well in 99.9% of all cases. (Integer from 1000 to 3000) -mrl integer [40] Minimum length that reads must have to be considered for the assembly. Shorter sequences will be filtered out at the beginning of the process and won't be present in the final project. (Integer 20 or more) -nop integer [3] Defines how many iterations of the whole assembly process are done. Rule of thumb - for quick and dirty assembly use 1 (not recommended). For assembly using read extensions and / or automatic contig editing (-ure and -ace) use at least 2. The recommended setting is 3 or higher, as some knowledge generated by the assembler can be used only from the third iteration on. More than 3 passes might be useful for projects containing many repetitive elements. See also -rbl and -mr for parameters that affect the assembly and disentanglement of possible repeats. (Integer 1 or more) -[no]sep boolean [Y] Defines whether the skim algorithm (and with it also the recalculation of Smith-Waterman alignments) is called in between each main pass. If set to 'N', skimming is done only when needed by the workflow, either when read extensions are searched for (-ure) or when possible vector leftovers are to be clipped (-pvc). Setting this option to 'Y' is highly recommended, setting it to 'N' is only for quick and dirty assemblies. -rbl integer [2] Defines the maximum number of times a contig can be rebuilt during main assembly passes (-nop) if misassemblies, due to possible repeats, are found. (Integer 1 or more) -not integer [2] Number of threads to use (see also -snot for SKIM algorithm) (Integer from 1 to 256) -[no]amm boolean [Y] Whether mira tries to optimise run time of certain algorithms in a space/time trade-off memory usage, increasing or reducing some internal tables as memory permits -mps integer [0] Maximum memory in GB (Integer 0 or more) -kpmf integer [15] Keep percentage of memory free (Integer from 0 to 100) -kcim boolean [N] Keep contigs in memory -esps integer [0] EST-SNP pipeline steps (Integer from 0 to 4) -[no]uti boolean [Y] Two reads sequenced from the same clone template form a read pair with a known minimum and maximum distance. This feature will definitively help for contigs containing lots of repeats. Set this to 'Y' if your data contains information on insert sizes. Information on insert sizes can be given via the SI tag in EXP files (for each read pair individually), or for the whole project using dismin and dismax -tismin integer [-1] Template insert minimum size (Integer -1 or more) -tismax integer [-1] Template insert maximum size (Integer -1 or more) -[no]crhf boolean [Y] Colour reads by hash frequency -[no]pd boolean [Y] Controls whether date and time are printed out during the assembly. Suppressing it isn't useful in normal operation, only when debugging or benchmarking. -ft menu [fasta] Defines whether to load and assemble EXP files from a file of filenames ('mira_in.fofn'), load and assemble FASTA sequences ('mira_in.fasta') and their qualities ('mira_in.fasta.qual'), load and assemble FASTQ sequences and qualities ('mira_in.fastq'), load and assemble sequences or qualities from a phd file ('mira_in.phd') or to load a project from a CAF file ('mira_in.caf') and assemble or eventually reassemble it. N.B. fofnphd is not currently available. (Values: fofnexp (file of EXP filenames); fasta (FASTA and quality files); fastq (FASTQ file); caf (CAF file); phd (PHD file); fofnphd (file of PHD filenames)) -eq menu [scf] Defines the source format for reading qualities from external sources. Normally takes effect only when these are not present in the format of the load_job project (EXP and FASTA can have them, CAF and PHD must have them). (Values: none (Use qualities from input files); scf (SCF quality scores)) -eqo boolean [N] Only takes effect when 'lj' is fofnexp. Defines whether or not the qualities from the external source override the possibly loaded qualities from the load job project. This might be of use in case some post-processing software fiddles around with the quality values of the input file but one wants to have the original ones. -droeqe boolean [N] Should there be a major mismatch between the external quality source and the sequence (e.g. the base sequence read from a SCF file does not match the originally read base sequence), should the read be excluded from assembly or not. If not, it will use the qualities it had before trying to load the external qualities (either default qualities or the ones loaded from the original source). -ssiqf boolean [N] Solexa scores in quality file -fqqo integer [0] FASTQ quality offset (Integer from 0 to 64) -[no]wqf boolean [Y] Wants quality file -rns menu [$(technology)] Defines the centre naming scheme for read suffixes. Currently, only Sanger Institute and TIGR naming schemes are supported out of the box. How to choose? Please read the documentation available at the different centres or ask your sequence provider. In a nutshell, the Sanger scheme is 'somename.[pqsfrw][12][bckdeflmnpt][a|b|c|...' (e.g. U13a08f10.p1ca), TIGR scheme is 'somenameTF*|TR*|TA*' (e.g. GCPBN02TF or GCPDL68TABRPT103A58B). (Values: sanger (Sanger centre); tigr (TIGR); fr (454 simple forward/reverse); stlouis (WashU); solexa (Illumina)) -mxti boolean [N] Some file formats above (FASTA, PHD or even CAF and EXP) possibly don't contain all the info necessary or useful for each read of an assembly. Should additional information, such as like clipping positions etc., be available in a XML trace info file in NCBI format (see File formats), then set this option to 'Y' and it will be merged to the data loaded. Please note, quality clippings given here will override quality clippings loaded earlier or performed by mira. Minimum clippings will still be made by the program, though. -fo boolean [N] If set to 'Y', the project will not be assembled and no assembly output files will be produced. Instead, the project files will only be loaded. This switch is useful for checking consistency of input files. -bdq integer [10] Defines the default base quality of reads that have no quality read from a file. (Integer 0 or more) -[no]epoq boolean [Y] Stops MIRA if a read has no quality values -[no]ard boolean [Y] Automatic repeat detection -ardct float [2.0] Automatic read detection coverage threshold (Number 1.000 or more) -ardml integer [400] Default is 200 for 454 technology (Integer 2 or more) -ardgl integer [40] Default depends on technology (Integer 2 or more) -[no]urd boolean [Y] Default true for most genome assembly, false for EST assembly or Solexa data -urdsip integer [3] Default depends on technology and assembly quality level (Integer 1 or more) -urdcm float [1.5] Default depends on technology and assembly quality level (Number 1.000 or more) -klrs boolean [N] Default depends on assembly quality level and EST/genome assembly -[no]sd boolean [Y] Default is 'Y' for mira and 'N' for miraEST. A spoiler can be either a chimeric read or it is a read with long parts of unclipped vector sequence still included (that was too long for the -pvc vector leftover clipping routines). A spoiler typically prevents contigs being joined; MIRA will cut them back so that they present no more harm to the assembly. Recommended for assemblies of mid-to-high coverage genomic assemblies; not recommended for assemblies of ESTs as one might lose splice variants with that. A minimum number of two assembly passes (-nop) must be run for this option to take effect. -[no]ugpf boolean [Y] MIRA has two different pathfinder algorithms it chooses from to find its way through the (more or less) complete set of possible sequence overlaps; a genomic and an EST pathfinder. The genomic looks a bit into the future of the assembly and tries to stay on safe grounds using a maximum of information already present in the contig that is being built. The EST version, on the contrary, will directly jump at the complex cases posed by very similar repetitive sequences and try to solve those first; it is willing to fall down to brute force when really bad cases (such as coverage with thousands of sequences) are encountered. Generally, the genomic pathfinder will also work quite well with EST sequences (but might get slowed down a lot in pathological cases), while the EST algorithm does not work so well on genomes. If in doubt, leaveas 'Y' for genome projects and set to 'N' for EST projects. -[no]uess boolean [Y] Another important switch if you plan to assemble non-normalised EST libraries, where some ESTs may reach coverages of several hundreds or thousands of reads. This switch lets MIRA save a lot of computational time when aligning those extremely high coverage areas (but only there), at the expense of some accuracy. -esspd integer [500] Defines the number of potential partners a read must have for MIRA switching into emergency search stop mode for that read. (Integer 1 or more) -[no]uebl boolean [Y] Use emergency blacklist -umcbt boolean [N] Defines whether there is an upper limit of time to be used to build one contig. Set this to 'Y' in EST assemblies where you think that extremely high coverages occur. Less useful for assembly of genomic sequences. -bts integer [10000] Depending on -umcbt above, this number defines the time in seconds alloted to building one contig. (Integer 1 or more) -lsbd boolean [N] Straindata is a key value file, one read per line. First the name of the read, then the strain name of the organism the read comes from. It is used by the program to differentiate different types of SNPs appearing in organisms and classifying them. -lb boolean [N] A backbone is a sequence (or a previous assembly) that is used as a template for the current assembly. The current assembly process will first assemble reads to loaded backbone contigs before creating new contigs. This feature is helpful for assembling against previous (and already possibly edited) assembly iterations, or to make a comparative assembly of two very closely related organisms. Please read 'very closely related' as in 'only SNP mutations or short indels present'. -sbuip integer [3] When assembling against backbones, this parameter defines the pass iteration (see nop) from which on the backbones will be really used. In the passes preceding this number, the non-backbone reads will be assembled together as if no backbones existed. This allows mira to correctly spot repetitive stretches that differ by single bases and tag them accordingly. Rule of thumb - if backbones belong to the same strain as the reads to assemble, set to 1. If backbones are a different strain, then set sbuib to 1 lower than nop (example - nop 4 and sbuip 3). (Integer 0 or more) -bbq integer [30] Defines the default quality that the backbone sequences have if they came without quality values in their files (like in GBF format or when FASTA is used without .qual files). A value of -1 causes mira to use the same default quality for backbones as for reads. (Integer from -1 to 100) -bsn string Defines the name of the strain that the backbone sequences have. (Any string) -bsnffa boolean [N] Backbone strain name force for all -brfs string Backbone rail from strain (Any string) -bro integer [0] Backbone rail overlap (Integer from 0 to 2000) -[no]abnc boolean [Y] The standard mode of the assembler is to assemble available reads to a backbone and make new contigs with the remaining reads. If this option is set to 'N', the reads that cannot be assembled into existing contigs are put as singlets into the assembly, not forming new contigs. -[no]ure boolean [Y] Defines whether there is an upper limit of time to be used to build one contig. Set this to 'Y' in EST assemblies where you think that extremely high coverages occur. Less useful for assembly of genomic sequences.Default depends on technology -rewl integer [30] Only takes effect when -ure is set to 'Y'. The read extension routines use a sliding window approach on Smith-Waterman alignments. This parameter defines the window length. Default depends on technology (Integer 0 or more) -rewme integer [2] Only takes effect when -ure is set to 'Y'. The read extension routines use a sliding window approach on Smith-Waterman alignments. This parameter defines the number maximum number of errors (disagreements) between two alignments in the given window. Default depends on technology (Integer 0 or more) -feip integer [0] Only takes effect when -ure is set to 'Y'. The read extension routines can be called before assembly and/or after each assembly pass (see -nop). This parameter defines the first pass in which the read extension routines are called. The default of 0 tells mira to extend the reads the first time before the first assembly pass. (Integer 0 or more) -leip integer [0] Only takes effect when -ure is set to 'Y'. The read extension routines can be called before assembly and/or after each assembly pass (see -nop). This parameter defines the last pass in which the read extension routines are called. The default of 0 tells mira to extend the reads the last time before the first assembly pass. (Integer 0 or more) -msvs boolean [N] Merge with SSAHA vector screen -msvsgs integer [10] Default depends on the sequencing technology (Integer 0 or more) -msvsmfg integer [60] Default depends on the sequencing technology (Integer 0 or more) -msvsmeg integer [120] Default depends on the sequencing technology (Integer 0 or more) -msvssfc integer [0] Default depends on the sequencing technology (Integer 0 or more) -msvssec integer [0] Default depends on the sequencing technology (Integer 0 or more) -[no]pvlc boolean [Y] Possible vector leftover clip -qcmq integer [20] This is the minimum quality required of bases in a window in order to be accepted. Please be cautious and don't use extreme values here, because then the clipping will be too lax or too harsh. Values below 15 and higher than 35 are disallowed. (Integer from 15 to 35) -qcwl integer [30] This is the length of a window in bases for the quality clip. Default depends on sequencing technology (Integer 10 or more) -[no]bsqc boolean [Y] Bad stretch quality clip -bsqcmq integer [20] Default depends on sequencing technology (Integer 0 or more) -bsqcwl integer [30] Default depends on sequencing technology (Integer 0 or more) -[no]mbc boolean [Y] This will let mira perform a 'clipping' of bases that were masked out (replaced with the character X). It is generally not a good idea to use mask bases to remove unwanted portions of a sequence; the EXP file format and the NCBI traceinfo format have excellent possibilities to circumvent this. But because a lot of pre-processing software is built around cross_match, scylla- and phrap-style base masking, the need arised for mira to be able to handle this too. mira will look at the start and end of each sequence to see whether there are masked bases that should be 'clipped'. -mbcgs integer [20] While performing the clip of masked bases, mira will look if it can merge larger chunks of masked bases that are a maximum of -mbcgs apart. (Integer 0 or more) -mbcmfg integer [40] While performing the clip of masked bases at the start of a sequence, mira will allow up to this number of unmasked bases in front of a masked stretch. Default depends on sequencing technology. (Integer 0 or more) -mbcmeg integer [60] While performing the clip of masked bases at the end of a sequence, mira will allow up to this number of unmasked bases behind a masked stretch. Default depends on sequencing technology (Integer 0 or more) -lcc boolean [N] Default depends on sequencing technology -cpat boolean [N] Used in EST assembly -cpkps boolean [N] Clip polyA tail keep polyA signal -cpmsl integer [12] Clip polyA tail max signal length (Integer 0 or more) -cpmea integer [1] Clip polyA tail max errors allowed (Integer 1 or more) -cpmgfe integer [9] Clip polyA tail max gap from end (Integer 1 or more) -[no]emlc boolean [Y] If on, ensures a minimum left clip on each read according to the parameters in -mlcr & -smlc. Default depends on sequencing technology -mlcr integer [25] If -emlc is 'Y', checks whether there is a left clip whose length is at least the size specified here. Default depends on sequencing technology (Integer 0 or more) -smlc integer [30] If -emlc is 'Y' and the actual left clip is < -mlcr, then set the left clip of read to the value given here. Default depends on sequencing technology (Integer 0 or more) -emrc boolean [N] If on, ensures a minimum right clip on each read according to the parameters in -mrcr & -smrc. Default depends on sequencing technology -mrcr integer [10] If -emrc is 'Y', checks whether there is a right clip whose length is at least the size specified here. Default depends on sequencing technology (Integer 0 or more) -smrc integer [20] If -emrc is 'Y' and the actual right clip is < -mrcr, then set the right clip of read to the value given here. Default depends on sequencing technology (Integer 0 or more) -[no]pec boolean [Y] Default depends on other choices -pecbph integer [17] Default is 14 on 32 bit systems and 16 on 64 bit systems. Controls the number of consecutive bases n which are used as a word hash. The higher the value the faster the search. The lower the value the more weak matches are found. Values below 10 are not recommended. Default depends on sequencing technology (Integer 10 or more) -snot integer [2] Number of threads to use in SKIM algorithm (Integer from 1 to 256) -bph integer [17] Default depends on system. Controls the number of consecutive bases n which are used as a word hash. The higher the value the faster the search. The lower the value the more weak matches are found. Values below 10 are not recommended. (Integer 1 or more) -hss integer [4] This is a parameter controlling the stepping increments with which hashes are generated. This allows for a more fine-grained search as matches are now found with at least n+s (see -bph) equal bases instead of the SSAHA 2n. The higher the value the faster the search. The lower the value the more weak matches are found. (Integer 1 or more) -pr integer [70] Controls the relative percentage of exact word matches in an approximate overlap that has to be reached to accept the overlap as a possible match. Increasing this number will decrease the number of possible alignments that have to be checked by Smith-Waterman later on in the assembly, but it might also lead to the rejection of weaker overlaps (i.e. overlaps that contain a higher number of mismatches). (Integer 1 or more) -mhpr integer [2000] Controls the maximum number of possible hits one read can maximally transport to the Smith-Waterman alignment phase. If more potential hits are found, only the best ones are taken. This is an important option for tackling projects that contain extreme assembly conditions. For example, 5000 reads that are all very similar would generate around 40 to 50 million possible alignments (forward and reverse complement). Setting this parameter to 200 reduces the number of alignments to check to around 1.5-2 million. As the assembly increases in passes (-nop), different combinations of possible hits will be checked, always the probably best ones first. So the accuracy of the assembly should only suffer when lowering this number too much. (Integer 1 or more) -mmhr integer [0] If the number of reads identified as megahubs exceeds the al- lowed ratio, mira will abort. This is a fail-safe parameter to avoid assemblies where things look fishy. In case you see this, you might want to ask for advice on the mira_talk mailing list. In short: bacteria should never have megahubs (90% of all cases reported were contamination of some sort and the 10% were due to incredibly high coverage numbers). Eukaryotes are likely to contain megahubs if filtering is [-mnr] not on. (Integer 0 or more) -fenn float [0.4] Freq. est. min normal (Number 0.000 or more) -fexn float [1.6] Freq. est. max normal (Number 0.000 or more) -fer float [1.9] Freq. est. repeat (Number 0.000 or more) -fehr float [8.0] Freq. est. heavy repeat (Number 0.000 or more) -fecr float [20.0] Freq. est. crazy repeat (Number 0.000 or more) -[no]mnr boolean [Y] Default is dependent on --job type 'yes' for de-novo, 'no' for mapping. Tells mira to mask during the SKIM phase subsequences of size [-nph] nucleotides that appear more often than the median occurrence of subsequences would otherwise suggest. The threshold from which subsequences are considered nasty is set by -nrr -nrr integer [100] Sets the ratio from which on subsequences are considered nasty and hidden from the SKIM overlapper. The default of 10 means 'mask all k-mers of [-bph] length which are occurring more than 10 times more often than the average of the project.' (Integer 2 or more) -mhim integer [15000000] Has no influence on the quality of the assembly, only on the maximum memory size needed during the skimming. The default value is equivalent to approximately 500MB. (Integer 100000 or more) -mchr integer [2048] Default depends on sequencing technology. Maximum memory used (in BM) during the reduction of skim hits. (Integer 10 or more) -[no]uqr boolean [Y] Use quick rule -qrmla integer [200] Quick rule min len 1 (Any integer value) -qrmsa integer [90] Quick rule min sim 1 (Any integer value) -qrmlb integer [100] Quick rule min len 2 (Any integer value) -qrmsb integer [95] Quick rule min sim 2 (Any integer value) -bqoml integer [150] Backbone quick overlap min len (Any integer value) -bip integer [15] The banded Smith-Waterman alignment uses this percentage number to compute the bandwidth it has to use when computing the alignment matrix. E.g. expected overlap is 150 bases, bip=10 -> the banded SW will compute a band of 15 bases to each side of the expected alignment diagonal, thus allowing up to 15 unbalanced inserts / deletes in the alignment. INCREASING AND DECREASING THIS NUMBER - increasing will find more non-optimal alignments but will also increase SW runtime between linear and ^2, decreasing will work the other way round (it might miss a few bad alignments but gain speed). (Integer from 1 to 100) -bmin integer [25] Minimum bandwidth in bases to each side. (Integer 1 or more) -bmax integer [100] Maximum bandwidth in bases to each side. (Integer 1 or more) -mo integer [15] Minimum number of overlapping bases needed in an alignment of two sequences to be accepted. (Integer 1 or more) -ms integer [30] Describes the minimum score of an overlap to be taken into account for assembly. mira uses a default scoring scheme for SW align. Each match counts 1, a match with an N counts 0, each mismatch with a non-N base -1 and each gap -2. Use a bigger score to weed out a number of chance matches, a lower score to perhaps find the single (short) alignment that might join two contigs together (at the expense of computing time and memory). (Integer 1 or more) -mrs integer [65] Describes the min percentage of matching between two reads to be considered for assembly. Increasing this number will save memory but one might lose possible alignments. A maximum of 80 is probably sensible here. Decreasing below 55 will probably make memory and time consumption explode. (Integer from 1 to 100) -egp boolean [N] Defines whether or not to increase penalties applied to alignments containing long gaps. Setting this to 'Y' might help in projects with frequent repeats. On the other hand, it is definitively disturbing when assembling very long reads containing multiple long indels in the called base sequence ... although this should not happen in the first place and is a sure sign for problems lying ahead. When in doubt, set it to 'Y' for EST projects and de-novo genome assembly, set it to 'N' for assembly of closely related strains (assembly against a backbone). When set to 'N', it is recommended to have -amgb and -amgbemc both set to 'Y'. -egpl menu [low] Has no effect if extra_gap_penalty is off. Defines an extra penalty applied to 'long' gaps. There are these predefined levels - 1. low - use this if you expect your base caller frequently misses two or more bases. 2. medium - use this if your base caller is expected to frequently miss one to two bases. 3. high - use this if your base caller does not frequently miss more than one base. For some stages of the EST assembly process, a special value 'est' is used. (Values: low (Low); medium (Medium); high (High); est (EST split splices)) -megpp integer [100] Has no effect if extra_gap_penalty is off. Defines the maximum extra penalty in percent applied to 'long' gaps. (Integer from 1 to 100) -np string [$(inproject)] Contigs will have this string prepended to their names. (Any string) -rodirs integer [20] When adding reads to a contig, reject the reads if the drop in the quality of the consensus is > the given value in %. Lower values mean stricter checking. This value is doubled should a read be entered that has a template partner (a read pair) at the right distance. (Integer from 1 to 100) -[no]mr boolean [Y] One of the most important switches in MIRA. If set to 'Y', MIRA will try to resolve misassemblies due to repeats by identifying single base stretch differences and tag those critical bases as RMB (Repeat Marker Base, weak or strong). This switch is also needed when MIRA is run in EST mode to identify possible inter-, intra- and intra-and-interorganism SNPs. -mroir boolean [N] Only takes effect when [-mr] is set to yes. If set to yes, MIRA will not use the repeat resolving algorithm during build time (and therefore will not be able to take advantage of this), but only before saving results to disk. -asir boolean [N] Only takes effect when -mr is set to 'Y', effect is also dependent on the fact whether strain data (see -lsd) is present or not. Usually, mira will mark bases that differentiate between repeats, when a conflict occurs between reads that belong to one strain. If the conflict occurs between reads belonging to different strains they are marked as SNP. However, if this switch is set to 'Y',= then conflicts within a strain are also marked as SNP. This switch is mainly used in assemblies of ESTs; it should not be set for genomic assembly. -mrpg integer [2] Only takes effect when -mr is set to 'Y'. This defines the minimum number of reads in a group that are needed for the RMB (Repeat Marker Bases) or SNP detection routines to be triggered. A group is defined by the reads carrying the same nucleotide for a given position, i.e., an assembly with mrpg=2 will need at least two times two reads with the same nucleotide (having at least a quality as defined in -mgqrt) to be recognised as repeat marker or a SNP. Setting this to a low number increases sensitivity, but might produce a few false positives, resulting in reads being thrown out of contigs because of falsely identified possible repeat markers (or wrongly recognised as SNP). (Integer 2 or more) -mnq integer [20] Default is dependent of the sequencing technology used. Takes only effect when [-mr] is set to yes. This defines the minimum quality of neighbouring bases that a base must have for being taken into consideration during the decision whether column base mismatches are relevant or not. (Integer 10 or more) -mgqrt integer [30] Only takes effect when -mr is set to 'Y'. This defines the minimum quality of a group of bases to be taken into account as potential repeat marker. The lower the number, the more sensitive you get, but lowering below 25 is not recommended as a lot of wrongly called bases can have a quality approaching this value and you'd end up with a lot of false positives. The higher the overall coverage of your project the better, and the higher you can set this number. A value of 35 will probably remove all false positives, a value of 40 will probably never show false positives. (Integer 25 or more) -emea integer [25] Only takes effect when -mr is set to 'Y'. Using the end of sequences of Sanger type shotgun sequencing is always a bit risky, as wrongly called bases tend to crowd there or some sequencing vector relicts hang around. It is even more risky to use these stretches for detecting possible repeats, so one can define an exclusion area where the bases are not used when determining whether a mismatch is due to repeats or not. (Integer 0 or more) -[no]amgb boolean [Y] Determines whether columns containing gap bases (indels) are also tagged. -[no]amgbemc boolean [Y] Only takes effect when -amgb is set to 'Y'. Determines whether multiple columns containing gap bases (indels) are also tagged. -[no]amgbnbs boolean [Y] Only takes effect when -amgb is set to 'Y'. Determines whether, for both tagging columns containing gap bases, both strands need to have a gap. Setting this to 'N' is not recommended except when working in desperately low coverage situations. -fnicpst boolean [N] If set to yes, mira will be forced to make a choice for a consensus base (A,C,G,T or gap) even in unclear cases where it would normally put a IUPAC base. All other things being equal (like quality of the possible consensus base and other things), mira will choose a base by either looking for a majority vote or, if that also is not clear, by preferring gaps over T over G over C over finally A. -msr boolean [N] Can only be used in mapping assemblies. If set to yes, mira will merge all perfectly mapping Solexa reads into longer reads while keeping quality and coverage information intact. This features hugely reduces the number of Solexa reads and makes assembly results with Solexa data small enough to be handled by current finishing programs (gap4, consed, others) on normal workstations. -gor integer [66] Gap override ratio (Integer 0 or more) -ace boolean [N] Once contigs have been build, mira can call a built-in version of the automatic contig editor EdIt. EdIt will try to resolve discrepancies in the contig by performing trace analysis and correct even hard to resolve errors. This option is always useful, but especially in conjunction with -nop and -ure. Notice: the current development version has a memory leak in the editor, therefore the option is not automatically turned on. -[no]sem boolean [Y] If set to 'Y' the automatic editor will not take error hypotheses with a low probability into account, even if all the requirements to make an edit are fulfilled. -ct integer [50] The higher this value, the more strict the automatic editor will apply its internal rule set. Going below 40 is not recommended. (Integer from 1 to 100) -outproject string [$(project)] Default is mira. Defines the output project name for this assembly. The output project name automatically influences the name of output files or directories only (Any string) -sssip boolean [N] Controls whether ’unimportant’ singlets are written to the result files. -[no]stsip boolean [Y] Controls whether singlets which have certain tags (SRMr, CRMr, WRMr, SROr, SAOr, SIOr) are written to the result files, even if [-sssip] is set. -[no]rrol boolean [Y] Removes log files once they should not be needed anymore during the assembly process. -rld boolean [N] Removes the complete log directory at the end of the assembly process. Some logs contain useful information that you may want to analyse though. -[no]orc boolean [Y] Output CAF results -[no]orf boolean [Y] Output FASTA results -org boolean [N] Output GAP4DA results -[no]ora boolean [Y] Output phrap ACE results -orh boolean [N] Output HTML results -[no]ors boolean [Y] Output transposed contig summary results -ort boolean [N] Output simple text results -[no]orw boolean [Y] Output wiggle results -[no]otc boolean [Y] Output temporary CAF results -otm boolean [N] Output temporary MAF results -otf boolean [N] Output temporary FASTA results -otg boolean [N] Output temporary GAP4 results -ota boolean [N] Output temporary phrap ACE results -oth boolean [N] Output temporary HTML results -ots boolean [N] Output temporary transposed contig summary results -ott boolean [N] Output temporary text results -oetc boolean [N] Output extra temporary CAF results -oetf boolean [N] Output extra temporary FASTA results -oetg boolean [N] Output extra temporary GAP4DA results -oeta boolean [N] Output extra temporary phrap ACE results -oeth boolean [N] Output extra temporary HTML results -oetas boolean [N] Output extra temporary also singlets results -tcpl integer [60] When producing an output in text format (-ort|ott|oett), this parameter defines how many bases each line of an alignment should contain. (Integer 1 or more) -hcpl integer [60] When producing an output in text format (-orh|oth|oeth), this parameter defines how many bases each line of an alignment should contain. (Integer 1 or more) -tegfc string When producing an output in text format (-ort|ott|oett), endgaps are filled up with this character. (Any string) -hegfc string When producing an output in HTML format (-orh|oth|oeth), endgaps are filled up with this character. (Any string) -[no]sdlpo boolean [Y] Defines whether the spoiler detection algorithms are run only for the last pass or for all passes (-nop). Takes effect only if spoiler detection (-sd) is on. -tpae boolean [N] This option is useful in EST assembly. Poly-AT stretches at the end of reads that were not correctly masked or clipped in pre-processing steps from external programs get tagged here. The assembler will not use these stretches for critical operations. Additionally, the tags do provide a good visual anchor when looking at the assembly with different programs. -pbwl integer [7] Only takes effect when -tpae is set to 'Y'. Defines the window length within which all bases (except the maximum number of errors allowed) must be either A or T to be considered a polybase stretch. (Integer 1 or more) -pbwme integer [2] Only takes effect when -tpae is set to 'Y. Defines the maximum number of errors allowed in a given window length such that a stretch is considered to be a polybase stretch. The distribution of these errors is not important. (Integer 1 or more) -pbwgd integer [9] Only takes effect when -tpae is set to 'Y'. Defines the number of bases from the end of a sequence (if masked, from the end of the masked area) within which a polybase stretch is looked for without finding one. (Integer 1 or more) -[no]pvc boolean [Y] Mira will try to identify possible sequencing vector relicts present at the start of a sequence and clip them away. These relicts are usually a few bases long and were not correctly removed from the sequence in data pre-processing steps of external programs. You might want to turn off this option if you know (or think) that your data contains a lot of repeats and the option below to fine tune the clipping behaviour does not give the expected results. -pvcmla integer [18] The clipping of possible vector relicts option works quite well. Unfortunately the bounds of repeats or differences in EST splice variants sometimes show the same alignment behaviour as possible sequencing vector relicts and could therefore also be clipped. To stop the vector clipping from mistakenly clipping repetitive regions or EST splice variants, this option puts an upper bound to the number of bases a potential clip is allowed to have. If the number of bases is below or equal to this threshold then the bases are clipped. If the number of bases exceeds the threshold then the clip is NOT performed. Setting the value to 0 turns off the threshold i.e. clips are then always performed if a potential vector is found. (Integer 0 or more) -qc boolean [N] Default is 'N', but is automatically set to 'Y' when using the setparam options 'fasta' or 'phd' (can be turned off again by subsequent options afterwards). This will let mira perform its own quality clipping before sequences are entered into the assembly. The clip function performed is a sequence end window quality clip with back iteration to get a maximum number of bases as useful sequence. Note that the bases clipped away here can still be used afterwards if there is enough evidence supporting their correctness when the option -ure is turned on. -an menu [signal] When adding reads to a contig, dangerous regions can get an extra integrity check. none = no extra check. text = check is only text-based. signal = check is signal based, if the SCF trace is not available, fallback is 'text'. For the time being, only regions tagged as ALUS or REPT in the experiment file are considered dangerous. (Values: none (None); text (Text); signal (Signal)) -dmer integer [1] When adding reads to a contig, reject the reads if the error in zones known as dangerous exceeds the given value in %. Lower values mean stricter checking in these danger zones. For the time being, only regions tagged as ALUS or REPT in the experiment file are considered dangerous. (Integer from 1 to 100) -dismin integer [500] The minimum distance that read pairs may be apart. There is an additional error margin of 10% subtracted from this value during internal computations. (Integer 0 or more) -dismax integer [5000] The maximum distance that read pairs may be apart. There is an additional error margin of 10% added to this value during internal computations. (Integer 0 or more) -oett boolean [N] Output extra temporary TXT results -gapfda string [gap4da] Defines the extension of the directory where mira will write the result of an assembly ready to import into the Staden package (GAP4) in Direct Assembly format. The name of the directory will then be |
Qualifier | Type | Description | Allowed values | Default | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Standard (Mandatory) qualifiers | ||||||||||||||||
-technology | list | Which sequencing technologies have created your reads |
|
sanger | ||||||||||||
-jobtype | list | Are the data you are assembling forming a larger contiguous sequence (choose: genome) or are you assembling small fragments like in EST or mRNA libraries (choose: est) |
|
genome | ||||||||||||
-method | list | Are you building an assembly from scratch (choose: denovo) or are you mapping reads to an existing backbone sequence (choose: mapping) |
|
denovo | ||||||||||||
-grade | list | Quality grades of de-novo assembly or mapping. Draft is quick-and-dirty, suited to get a first look on approximate coverage of a running project. Should not be used for anything else. Normal is the default parameter set of mira that is able to tackle most genomes. A bit slower than the draft version, but includes such options as read extension and vector remnant clipping. Accurate is still slower than the normal mode but should be used for genomes that pose a problem to the normal mode. |
|
normal | ||||||||||||
Additional (Optional) qualifiers | ||||||||||||||||
-setparams | list | Sets parameters suited for loading sequences from FASTA, PHD or CAF files. The default is not to specify the type of input file. |
|
unspecified | ||||||||||||
-highlyrepetitive | boolean | A modifier switch for genome data that is deemed to be highly repetitive. The assemblies will run slower due to more iterative cycles that give mira a chance to resolve nasty repeats. | Boolean value Yes/No | No | ||||||||||||
-noclipping | list | Switches off clipping options for given sequencing technologies. |
|
$(technology) | ||||||||||||
Advanced (Unprompted) qualifiers | ||||||||||||||||
-parameterfile | infile | Loads parameters from the filename given. Allows a maximum of 10 levels of recursion, i.e. a -params option appearing within a file that loads other parameter files | Input file | Required | ||||||||||||
-project | string | Default is mira. Defines the project name for this assembly. The project name automatically influences the name of input and output files or directories. E.g. in the default setting, the file names for the output of the assembly in FASTA format would be mira_out.fasta and mira_out.fasta.qual. Setting the project name to 'MyProject' would generate MyProject_out.fasta and MyProject_out.fasta.qual. | Any string | mira | ||||||||||||
-inproject | string | Default is mira. Defines the input project name for this assembly. The input project name automatically influences the name of input files or directories only | Any string | $(project) | ||||||||||||
-bft | list | Defines the filetype of the backbone file given. Currently (2.8.3) only FASTA, CAF and GBF files are supported. When GBF (GenBank files, also named .gbk) files are loaded, the features within these files are automatically transformed into Staden-compatible tags and get passed through the assembly. |
|
fasta | ||||||||||||
-expdir | directory | Defines the directory where mira should search for experiment files (EXP). | Directory | . | ||||||||||||
-scfdir | directory | Defines the directory where mira should search for SCF files | Directory | . | ||||||||||||
-feifile | infile | Defines the file of filenames where the names of the EXP files of a project are located. | Input file | $(inproject)_in.fofn | ||||||||||||
-fpifile | infile | Defines the file of filenames where the names of the PHD files of a project are located. | Input file | $(inproject)_in.fofn | ||||||||||||
-pifile | infile | Defines the PHD file to load sequences of a project from. | Input file | $(inproject)_in.phd | ||||||||||||
-faifile | infile | Defines the FASTA file to load sequences of a project from. | Input file | $(inproject)_in.fasta | ||||||||||||
-fquifile | infile | Defines the fasta file to load base qualities of a project from. Although the order of reads in the quality file does not need to be the same as in the fasta or fofn projects (although it saves a bit of time if they are). | Input file | $(inproject)_in.fasta.qual | ||||||||||||
-fqifile | infile | Defines the FASTQ file to load sequences of a project from. | Input file | $(inproject)_in.fastq | ||||||||||||
-cifile | infile | Defines the file to load a CAF project from. Filename must end with '.caf'. | Input file | $(inproject)_in.caf | ||||||||||||
-sdifile | infile | Defines the file to load straindata from. Only used in EST projects (miraEST). | Input file | $(inproject)_straindata_in.txt | ||||||||||||
-xtiifile | infile | Defines the file to load a trace info file in XML format from. This can be used both when merging XML data to loaded files or when loading a project from an XML trace info file. | Input file | $(inproject)_xmltraceinfo_in.xml | ||||||||||||
-svsifile | infile | Defines the file to load the info about possible vector sequence stretches. | Input file | $(inproject)_ssaha2vectorscreen_in.txt | ||||||||||||
-bbifile | infile | Defines the file to load the backbone sequence or assembly. Note that you still must define the file type with [-bft]. | Input file | $(inproject)_in.$(technology).$(bft) | ||||||||||||
-[no]traceinfo | toggle | Load traceinfo ancilliary data in XML files | Toggle value Yes/No | Yes | ||||||||||||
-lsd | boolean | Straindata is a key value file, one read per line. First the name of the read, then the strain name of the organism the read comes from. It is used by the program to differentiate different types of SNPs appearing in organisms and classifying them. | Boolean value Yes/No | No | ||||||||||||
-brl | integer | Parameter for the internal sectioning size of the backbone. Extremely repetitive sequences may require reducing the default value, but the default value should work well in 99.9% of all cases. | Integer from 1000 to 3000 | 2500 | ||||||||||||
-mrl | integer | Minimum length that reads must have to be considered for the assembly. Shorter sequences will be filtered out at the beginning of the process and won't be present in the final project. | Integer 20 or more | 40 | ||||||||||||
-nop | integer | Defines how many iterations of the whole assembly process are done. Rule of thumb - for quick and dirty assembly use 1 (not recommended). For assembly using read extensions and / or automatic contig editing (-ure and -ace) use at least 2. The recommended setting is 3 or higher, as some knowledge generated by the assembler can be used only from the third iteration on. More than 3 passes might be useful for projects containing many repetitive elements. See also -rbl and -mr for parameters that affect the assembly and disentanglement of possible repeats. | Integer 1 or more | 3 | ||||||||||||
-[no]sep | boolean | Defines whether the skim algorithm (and with it also the recalculation of Smith-Waterman alignments) is called in between each main pass. If set to 'N', skimming is done only when needed by the workflow, either when read extensions are searched for (-ure) or when possible vector leftovers are to be clipped (-pvc). Setting this option to 'Y' is highly recommended, setting it to 'N' is only for quick and dirty assemblies. | Boolean value Yes/No | Yes | ||||||||||||
-rbl | integer | Defines the maximum number of times a contig can be rebuilt during main assembly passes (-nop) if misassemblies, due to possible repeats, are found. | Integer 1 or more | 2 | ||||||||||||
-not | integer | Number of threads to use (see also -snot for SKIM algorithm) | Integer from 1 to 256 | 2 | ||||||||||||
-[no]amm | boolean | Whether mira tries to optimise run time of certain algorithms in a space/time trade-off memory usage, increasing or reducing some internal tables as memory permits | Boolean value Yes/No | Yes | ||||||||||||
-mps | integer | Maximum memory in GB | Integer 0 or more | 0 | ||||||||||||
-kpmf | integer | Keep percentage of memory free | Integer from 0 to 100 | 15 | ||||||||||||
-kcim | boolean | Keep contigs in memory | Boolean value Yes/No | No | ||||||||||||
-esps | integer | EST-SNP pipeline steps | Integer from 0 to 4 | 0 | ||||||||||||
-[no]uti | boolean | Two reads sequenced from the same clone template form a read pair with a known minimum and maximum distance. This feature will definitively help for contigs containing lots of repeats. Set this to 'Y' if your data contains information on insert sizes. Information on insert sizes can be given via the SI tag in EXP files (for each read pair individually), or for the whole project using dismin and dismax | Boolean value Yes/No | Yes | ||||||||||||
-tismin | integer | Template insert minimum size | Integer -1 or more | -1 | ||||||||||||
-tismax | integer | Template insert maximum size | Integer -1 or more | -1 | ||||||||||||
-[no]crhf | boolean | Colour reads by hash frequency | Boolean value Yes/No | Yes | ||||||||||||
-[no]pd | boolean | Controls whether date and time are printed out during the assembly. Suppressing it isn't useful in normal operation, only when debugging or benchmarking. | Boolean value Yes/No | Yes | ||||||||||||
-ft | list | Defines whether to load and assemble EXP files from a file of filenames ('mira_in.fofn'), load and assemble FASTA sequences ('mira_in.fasta') and their qualities ('mira_in.fasta.qual'), load and assemble FASTQ sequences and qualities ('mira_in.fastq'), load and assemble sequences or qualities from a phd file ('mira_in.phd') or to load a project from a CAF file ('mira_in.caf') and assemble or eventually reassemble it. N.B. fofnphd is not currently available. |
|
fasta | ||||||||||||
-eq | list | Defines the source format for reading qualities from external sources. Normally takes effect only when these are not present in the format of the load_job project (EXP and FASTA can have them, CAF and PHD must have them). |
|
scf | ||||||||||||
-eqo | boolean | Only takes effect when 'lj' is fofnexp. Defines whether or not the qualities from the external source override the possibly loaded qualities from the load job project. This might be of use in case some post-processing software fiddles around with the quality values of the input file but one wants to have the original ones. | Boolean value Yes/No | No | ||||||||||||
-droeqe | boolean | Should there be a major mismatch between the external quality source and the sequence (e.g. the base sequence read from a SCF file does not match the originally read base sequence), should the read be excluded from assembly or not. If not, it will use the qualities it had before trying to load the external qualities (either default qualities or the ones loaded from the original source). | Boolean value Yes/No | No | ||||||||||||
-ssiqf | boolean | Solexa scores in quality file | Boolean value Yes/No | No | ||||||||||||
-fqqo | integer | FASTQ quality offset | Integer from 0 to 64 | 0 | ||||||||||||
-[no]wqf | boolean | Wants quality file | Boolean value Yes/No | Yes | ||||||||||||
-rns | list | Defines the centre naming scheme for read suffixes. Currently, only Sanger Institute and TIGR naming schemes are supported out of the box. How to choose? Please read the documentation available at the different centres or ask your sequence provider. In a nutshell, the Sanger scheme is 'somename.[pqsfrw][12][bckdeflmnpt][a|b|c|...' (e.g. U13a08f10.p1ca), TIGR scheme is 'somenameTF*|TR*|TA*' (e.g. GCPBN02TF or GCPDL68TABRPT103A58B). |
|
$(technology) | ||||||||||||
-mxti | boolean | Some file formats above (FASTA, PHD or even CAF and EXP) possibly don't contain all the info necessary or useful for each read of an assembly. Should additional information, such as like clipping positions etc., be available in a XML trace info file in NCBI format (see File formats), then set this option to 'Y' and it will be merged to the data loaded. Please note, quality clippings given here will override quality clippings loaded earlier or performed by mira. Minimum clippings will still be made by the program, though. | Boolean value Yes/No | No | ||||||||||||
-fo | boolean | If set to 'Y', the project will not be assembled and no assembly output files will be produced. Instead, the project files will only be loaded. This switch is useful for checking consistency of input files. | Boolean value Yes/No | No | ||||||||||||
-bdq | integer | Defines the default base quality of reads that have no quality read from a file. | Integer 0 or more | 10 | ||||||||||||
-[no]epoq | boolean | Stops MIRA if a read has no quality values | Boolean value Yes/No | Yes | ||||||||||||
-[no]ard | boolean | Automatic repeat detection | Boolean value Yes/No | Yes | ||||||||||||
-ardct | float | Automatic read detection coverage threshold | Number 1.000 or more | 2.0 | ||||||||||||
-ardml | integer | Default is 200 for 454 technology | Integer 2 or more | 400 | ||||||||||||
-ardgl | integer | Default depends on technology | Integer 2 or more | 40 | ||||||||||||
-[no]urd | boolean | Default true for most genome assembly, false for EST assembly or Solexa data | Boolean value Yes/No | Yes | ||||||||||||
-urdsip | integer | Default depends on technology and assembly quality level | Integer 1 or more | 3 | ||||||||||||
-urdcm | float | Default depends on technology and assembly quality level | Number 1.000 or more | 1.5 | ||||||||||||
-klrs | boolean | Default depends on assembly quality level and EST/genome assembly | Boolean value Yes/No | No | ||||||||||||
-[no]sd | boolean | Default is 'Y' for mira and 'N' for miraEST. A spoiler can be either a chimeric read or it is a read with long parts of unclipped vector sequence still included (that was too long for the -pvc vector leftover clipping routines). A spoiler typically prevents contigs being joined; MIRA will cut them back so that they present no more harm to the assembly. Recommended for assemblies of mid-to-high coverage genomic assemblies; not recommended for assemblies of ESTs as one might lose splice variants with that. A minimum number of two assembly passes (-nop) must be run for this option to take effect. | Boolean value Yes/No | Yes | ||||||||||||
-[no]ugpf | boolean | MIRA has two different pathfinder algorithms it chooses from to find its way through the (more or less) complete set of possible sequence overlaps; a genomic and an EST pathfinder. The genomic looks a bit into the future of the assembly and tries to stay on safe grounds using a maximum of information already present in the contig that is being built. The EST version, on the contrary, will directly jump at the complex cases posed by very similar repetitive sequences and try to solve those first; it is willing to fall down to brute force when really bad cases (such as coverage with thousands of sequences) are encountered. Generally, the genomic pathfinder will also work quite well with EST sequences (but might get slowed down a lot in pathological cases), while the EST algorithm does not work so well on genomes. If in doubt, leaveas 'Y' for genome projects and set to 'N' for EST projects. | Boolean value Yes/No | Yes | ||||||||||||
-[no]uess | boolean | Another important switch if you plan to assemble non-normalised EST libraries, where some ESTs may reach coverages of several hundreds or thousands of reads. This switch lets MIRA save a lot of computational time when aligning those extremely high coverage areas (but only there), at the expense of some accuracy. | Boolean value Yes/No | Yes | ||||||||||||
-esspd | integer | Defines the number of potential partners a read must have for MIRA switching into emergency search stop mode for that read. | Integer 1 or more | 500 | ||||||||||||
-[no]uebl | boolean | Use emergency blacklist | Boolean value Yes/No | Yes | ||||||||||||
-umcbt | boolean | Defines whether there is an upper limit of time to be used to build one contig. Set this to 'Y' in EST assemblies where you think that extremely high coverages occur. Less useful for assembly of genomic sequences. | Boolean value Yes/No | No | ||||||||||||
-bts | integer | Depending on -umcbt above, this number defines the time in seconds alloted to building one contig. | Integer 1 or more | 10000 | ||||||||||||
-lsbd | boolean | Straindata is a key value file, one read per line. First the name of the read, then the strain name of the organism the read comes from. It is used by the program to differentiate different types of SNPs appearing in organisms and classifying them. | Boolean value Yes/No | No | ||||||||||||
-lb | boolean | A backbone is a sequence (or a previous assembly) that is used as a template for the current assembly. The current assembly process will first assemble reads to loaded backbone contigs before creating new contigs. This feature is helpful for assembling against previous (and already possibly edited) assembly iterations, or to make a comparative assembly of two very closely related organisms. Please read 'very closely related' as in 'only SNP mutations or short indels present'. | Boolean value Yes/No | No | ||||||||||||
-sbuip | integer | When assembling against backbones, this parameter defines the pass iteration (see nop) from which on the backbones will be really used. In the passes preceding this number, the non-backbone reads will be assembled together as if no backbones existed. This allows mira to correctly spot repetitive stretches that differ by single bases and tag them accordingly. Rule of thumb - if backbones belong to the same strain as the reads to assemble, set to 1. If backbones are a different strain, then set sbuib to 1 lower than nop (example - nop 4 and sbuip 3). | Integer 0 or more | 3 | ||||||||||||
-bbq | integer | Defines the default quality that the backbone sequences have if they came without quality values in their files (like in GBF format or when FASTA is used without .qual files). A value of -1 causes mira to use the same default quality for backbones as for reads. | Integer from -1 to 100 | 30 | ||||||||||||
-bsn | string | Defines the name of the strain that the backbone sequences have. | Any string | |||||||||||||
-bsnffa | boolean | Backbone strain name force for all | Boolean value Yes/No | No | ||||||||||||
-brfs | string | Backbone rail from strain | Any string | |||||||||||||
-bro | integer | Backbone rail overlap | Integer from 0 to 2000 | 0 | ||||||||||||
-[no]abnc | boolean | The standard mode of the assembler is to assemble available reads to a backbone and make new contigs with the remaining reads. If this option is set to 'N', the reads that cannot be assembled into existing contigs are put as singlets into the assembly, not forming new contigs. | Boolean value Yes/No | Yes | ||||||||||||
-[no]ure | boolean | Defines whether there is an upper limit of time to be used to build one contig. Set this to 'Y' in EST assemblies where you think that extremely high coverages occur. Less useful for assembly of genomic sequences.Default depends on technology | Boolean value Yes/No | Yes | ||||||||||||
-rewl | integer | Only takes effect when -ure is set to 'Y'. The read extension routines use a sliding window approach on Smith-Waterman alignments. This parameter defines the window length. Default depends on technology | Integer 0 or more | 30 | ||||||||||||
-rewme | integer | Only takes effect when -ure is set to 'Y'. The read extension routines use a sliding window approach on Smith-Waterman alignments. This parameter defines the number maximum number of errors (disagreements) between two alignments in the given window. Default depends on technology | Integer 0 or more | 2 | ||||||||||||
-feip | integer | Only takes effect when -ure is set to 'Y'. The read extension routines can be called before assembly and/or after each assembly pass (see -nop). This parameter defines the first pass in which the read extension routines are called. The default of 0 tells mira to extend the reads the first time before the first assembly pass. | Integer 0 or more | 0 | ||||||||||||
-leip | integer | Only takes effect when -ure is set to 'Y'. The read extension routines can be called before assembly and/or after each assembly pass (see -nop). This parameter defines the last pass in which the read extension routines are called. The default of 0 tells mira to extend the reads the last time before the first assembly pass. | Integer 0 or more | 0 | ||||||||||||
-msvs | boolean | Merge with SSAHA vector screen | Boolean value Yes/No | No | ||||||||||||
-msvsgs | integer | Default depends on the sequencing technology | Integer 0 or more | 10 | ||||||||||||
-msvsmfg | integer | Default depends on the sequencing technology | Integer 0 or more | 60 | ||||||||||||
-msvsmeg | integer | Default depends on the sequencing technology | Integer 0 or more | 120 | ||||||||||||
-msvssfc | integer | Default depends on the sequencing technology | Integer 0 or more | 0 | ||||||||||||
-msvssec | integer | Default depends on the sequencing technology | Integer 0 or more | 0 | ||||||||||||
-[no]pvlc | boolean | Possible vector leftover clip | Boolean value Yes/No | Yes | ||||||||||||
-qcmq | integer | This is the minimum quality required of bases in a window in order to be accepted. Please be cautious and don't use extreme values here, because then the clipping will be too lax or too harsh. Values below 15 and higher than 35 are disallowed. | Integer from 15 to 35 | 20 | ||||||||||||
-qcwl | integer | This is the length of a window in bases for the quality clip. Default depends on sequencing technology | Integer 10 or more | 30 | ||||||||||||
-[no]bsqc | boolean | Bad stretch quality clip | Boolean value Yes/No | Yes | ||||||||||||
-bsqcmq | integer | Default depends on sequencing technology | Integer 0 or more | 20 | ||||||||||||
-bsqcwl | integer | Default depends on sequencing technology | Integer 0 or more | 30 | ||||||||||||
-[no]mbc | boolean | This will let mira perform a 'clipping' of bases that were masked out (replaced with the character X). It is generally not a good idea to use mask bases to remove unwanted portions of a sequence; the EXP file format and the NCBI traceinfo format have excellent possibilities to circumvent this. But because a lot of pre-processing software is built around cross_match, scylla- and phrap-style base masking, the need arised for mira to be able to handle this too. mira will look at the start and end of each sequence to see whether there are masked bases that should be 'clipped'. | Boolean value Yes/No | Yes | ||||||||||||
-mbcgs | integer | While performing the clip of masked bases, mira will look if it can merge larger chunks of masked bases that are a maximum of -mbcgs apart. | Integer 0 or more | 20 | ||||||||||||
-mbcmfg | integer | While performing the clip of masked bases at the start of a sequence, mira will allow up to this number of unmasked bases in front of a masked stretch. Default depends on sequencing technology. | Integer 0 or more | 40 | ||||||||||||
-mbcmeg | integer | While performing the clip of masked bases at the end of a sequence, mira will allow up to this number of unmasked bases behind a masked stretch. Default depends on sequencing technology | Integer 0 or more | 60 | ||||||||||||
-lcc | boolean | Default depends on sequencing technology | Boolean value Yes/No | No | ||||||||||||
-cpat | boolean | Used in EST assembly | Boolean value Yes/No | No | ||||||||||||
-cpkps | boolean | Clip polyA tail keep polyA signal | Boolean value Yes/No | No | ||||||||||||
-cpmsl | integer | Clip polyA tail max signal length | Integer 0 or more | 12 | ||||||||||||
-cpmea | integer | Clip polyA tail max errors allowed | Integer 1 or more | 1 | ||||||||||||
-cpmgfe | integer | Clip polyA tail max gap from end | Integer 1 or more | 9 | ||||||||||||
-[no]emlc | boolean | If on, ensures a minimum left clip on each read according to the parameters in -mlcr & -smlc. Default depends on sequencing technology | Boolean value Yes/No | Yes | ||||||||||||
-mlcr | integer | If -emlc is 'Y', checks whether there is a left clip whose length is at least the size specified here. Default depends on sequencing technology | Integer 0 or more | 25 | ||||||||||||
-smlc | integer | If -emlc is 'Y' and the actual left clip is < -mlcr, then set the left clip of read to the value given here. Default depends on sequencing technology | Integer 0 or more | 30 | ||||||||||||
-emrc | boolean | If on, ensures a minimum right clip on each read according to the parameters in -mrcr & -smrc. Default depends on sequencing technology | Boolean value Yes/No | No | ||||||||||||
-mrcr | integer | If -emrc is 'Y', checks whether there is a right clip whose length is at least the size specified here. Default depends on sequencing technology | Integer 0 or more | 10 | ||||||||||||
-smrc | integer | If -emrc is 'Y' and the actual right clip is < -mrcr, then set the right clip of read to the value given here. Default depends on sequencing technology | Integer 0 or more | 20 | ||||||||||||
-[no]pec | boolean | Default depends on other choices | Boolean value Yes/No | Yes | ||||||||||||
-pecbph | integer | Default is 14 on 32 bit systems and 16 on 64 bit systems. Controls the number of consecutive bases n which are used as a word hash. The higher the value the faster the search. The lower the value the more weak matches are found. Values below 10 are not recommended. Default depends on sequencing technology | Integer 10 or more | 17 | ||||||||||||
-snot | integer | Number of threads to use in SKIM algorithm | Integer from 1 to 256 | 2 | ||||||||||||
-bph | integer | Default depends on system. Controls the number of consecutive bases n which are used as a word hash. The higher the value the faster the search. The lower the value the more weak matches are found. Values below 10 are not recommended. | Integer 1 or more | 17 | ||||||||||||
-hss | integer | This is a parameter controlling the stepping increments with which hashes are generated. This allows for a more fine-grained search as matches are now found with at least n+s (see -bph) equal bases instead of the SSAHA 2n. The higher the value the faster the search. The lower the value the more weak matches are found. | Integer 1 or more | 4 | ||||||||||||
-pr | integer | Controls the relative percentage of exact word matches in an approximate overlap that has to be reached to accept the overlap as a possible match. Increasing this number will decrease the number of possible alignments that have to be checked by Smith-Waterman later on in the assembly, but it might also lead to the rejection of weaker overlaps (i.e. overlaps that contain a higher number of mismatches). | Integer 1 or more | 70 | ||||||||||||
-mhpr | integer | Controls the maximum number of possible hits one read can maximally transport to the Smith-Waterman alignment phase. If more potential hits are found, only the best ones are taken. This is an important option for tackling projects that contain extreme assembly conditions. For example, 5000 reads that are all very similar would generate around 40 to 50 million possible alignments (forward and reverse complement). Setting this parameter to 200 reduces the number of alignments to check to around 1.5-2 million. As the assembly increases in passes (-nop), different combinations of possible hits will be checked, always the probably best ones first. So the accuracy of the assembly should only suffer when lowering this number too much. | Integer 1 or more | 2000 | ||||||||||||
-mmhr | integer | If the number of reads identified as megahubs exceeds the al- lowed ratio, mira will abort. This is a fail-safe parameter to avoid assemblies where things look fishy. In case you see this, you might want to ask for advice on the mira_talk mailing list. In short: bacteria should never have megahubs (90% of all cases reported were contamination of some sort and the 10% were due to incredibly high coverage numbers). Eukaryotes are likely to contain megahubs if filtering is [-mnr] not on. | Integer 0 or more | 0 | ||||||||||||
-fenn | float | Freq. est. min normal | Number 0.000 or more | 0.4 | ||||||||||||
-fexn | float | Freq. est. max normal | Number 0.000 or more | 1.6 | ||||||||||||
-fer | float | Freq. est. repeat | Number 0.000 or more | 1.9 | ||||||||||||
-fehr | float | Freq. est. heavy repeat | Number 0.000 or more | 8.0 | ||||||||||||
-fecr | float | Freq. est. crazy repeat | Number 0.000 or more | 20.0 | ||||||||||||
-[no]mnr | boolean | Default is dependent on --job type 'yes' for de-novo, 'no' for mapping. Tells mira to mask during the SKIM phase subsequences of size [-nph] nucleotides that appear more often than the median occurrence of subsequences would otherwise suggest. The threshold from which subsequences are considered nasty is set by -nrr | Boolean value Yes/No | Yes | ||||||||||||
-nrr | integer | Sets the ratio from which on subsequences are considered nasty and hidden from the SKIM overlapper. The default of 10 means 'mask all k-mers of [-bph] length which are occurring more than 10 times more often than the average of the project.' | Integer 2 or more | 100 | ||||||||||||
-mhim | integer | Has no influence on the quality of the assembly, only on the maximum memory size needed during the skimming. The default value is equivalent to approximately 500MB. | Integer 100000 or more | 15000000 | ||||||||||||
-mchr | integer | Default depends on sequencing technology. Maximum memory used (in BM) during the reduction of skim hits. | Integer 10 or more | 2048 | ||||||||||||
-[no]uqr | boolean | Use quick rule | Boolean value Yes/No | Yes | ||||||||||||
-qrmla | integer | Quick rule min len 1 | Any integer value | 200 | ||||||||||||
-qrmsa | integer | Quick rule min sim 1 | Any integer value | 90 | ||||||||||||
-qrmlb | integer | Quick rule min len 2 | Any integer value | 100 | ||||||||||||
-qrmsb | integer | Quick rule min sim 2 | Any integer value | 95 | ||||||||||||
-bqoml | integer | Backbone quick overlap min len | Any integer value | 150 | ||||||||||||
-bip | integer | The banded Smith-Waterman alignment uses this percentage number to compute the bandwidth it has to use when computing the alignment matrix. E.g. expected overlap is 150 bases, bip=10 -> the banded SW will compute a band of 15 bases to each side of the expected alignment diagonal, thus allowing up to 15 unbalanced inserts / deletes in the alignment. INCREASING AND DECREASING THIS NUMBER - increasing will find more non-optimal alignments but will also increase SW runtime between linear and ^2, decreasing will work the other way round (it might miss a few bad alignments but gain speed). | Integer from 1 to 100 | 15 | ||||||||||||
-bmin | integer | Minimum bandwidth in bases to each side. | Integer 1 or more | 25 | ||||||||||||
-bmax | integer | Maximum bandwidth in bases to each side. | Integer 1 or more | 100 | ||||||||||||
-mo | integer | Minimum number of overlapping bases needed in an alignment of two sequences to be accepted. | Integer 1 or more | 15 | ||||||||||||
-ms | integer | Describes the minimum score of an overlap to be taken into account for assembly. mira uses a default scoring scheme for SW align. Each match counts 1, a match with an N counts 0, each mismatch with a non-N base -1 and each gap -2. Use a bigger score to weed out a number of chance matches, a lower score to perhaps find the single (short) alignment that might join two contigs together (at the expense of computing time and memory). | Integer 1 or more | 30 | ||||||||||||
-mrs | integer | Describes the min percentage of matching between two reads to be considered for assembly. Increasing this number will save memory but one might lose possible alignments. A maximum of 80 is probably sensible here. Decreasing below 55 will probably make memory and time consumption explode. | Integer from 1 to 100 | 65 | ||||||||||||
-egp | boolean | Defines whether or not to increase penalties applied to alignments containing long gaps. Setting this to 'Y' might help in projects with frequent repeats. On the other hand, it is definitively disturbing when assembling very long reads containing multiple long indels in the called base sequence ... although this should not happen in the first place and is a sure sign for problems lying ahead. When in doubt, set it to 'Y' for EST projects and de-novo genome assembly, set it to 'N' for assembly of closely related strains (assembly against a backbone). When set to 'N', it is recommended to have -amgb and -amgbemc both set to 'Y'. | Boolean value Yes/No | No | ||||||||||||
-egpl | list | Has no effect if extra_gap_penalty is off. Defines an extra penalty applied to 'long' gaps. There are these predefined levels - 1. low - use this if you expect your base caller frequently misses two or more bases. 2. medium - use this if your base caller is expected to frequently miss one to two bases. 3. high - use this if your base caller does not frequently miss more than one base. For some stages of the EST assembly process, a special value 'est' is used. |
|
low | ||||||||||||
-megpp | integer | Has no effect if extra_gap_penalty is off. Defines the maximum extra penalty in percent applied to 'long' gaps. | Integer from 1 to 100 | 100 | ||||||||||||
-np | string | Contigs will have this string prepended to their names. | Any string | $(inproject) | ||||||||||||
-rodirs | integer | When adding reads to a contig, reject the reads if the drop in the quality of the consensus is > the given value in %. Lower values mean stricter checking. This value is doubled should a read be entered that has a template partner (a read pair) at the right distance. | Integer from 1 to 100 | 20 | ||||||||||||
-[no]mr | boolean | One of the most important switches in MIRA. If set to 'Y', MIRA will try to resolve misassemblies due to repeats by identifying single base stretch differences and tag those critical bases as RMB (Repeat Marker Base, weak or strong). This switch is also needed when MIRA is run in EST mode to identify possible inter-, intra- and intra-and-interorganism SNPs. | Boolean value Yes/No | Yes | ||||||||||||
-mroir | boolean | Only takes effect when [-mr] is set to yes. If set to yes, MIRA will not use the repeat resolving algorithm during build time (and therefore will not be able to take advantage of this), but only before saving results to disk. | Boolean value Yes/No | No | ||||||||||||
-asir | boolean | Only takes effect when -mr is set to 'Y', effect is also dependent on the fact whether strain data (see -lsd) is present or not. Usually, mira will mark bases that differentiate between repeats, when a conflict occurs between reads that belong to one strain. If the conflict occurs between reads belonging to different strains they are marked as SNP. However, if this switch is set to 'Y',= then conflicts within a strain are also marked as SNP. This switch is mainly used in assemblies of ESTs; it should not be set for genomic assembly. | Boolean value Yes/No | No | ||||||||||||
-mrpg | integer | Only takes effect when -mr is set to 'Y'. This defines the minimum number of reads in a group that are needed for the RMB (Repeat Marker Bases) or SNP detection routines to be triggered. A group is defined by the reads carrying the same nucleotide for a given position, i.e., an assembly with mrpg=2 will need at least two times two reads with the same nucleotide (having at least a quality as defined in -mgqrt) to be recognised as repeat marker or a SNP. Setting this to a low number increases sensitivity, but might produce a few false positives, resulting in reads being thrown out of contigs because of falsely identified possible repeat markers (or wrongly recognised as SNP). | Integer 2 or more | 2 | ||||||||||||
-mnq | integer | Default is dependent of the sequencing technology used. Takes only effect when [-mr] is set to yes. This defines the minimum quality of neighbouring bases that a base must have for being taken into consideration during the decision whether column base mismatches are relevant or not. | Integer 10 or more | 20 | ||||||||||||
-mgqrt | integer | Only takes effect when -mr is set to 'Y'. This defines the minimum quality of a group of bases to be taken into account as potential repeat marker. The lower the number, the more sensitive you get, but lowering below 25 is not recommended as a lot of wrongly called bases can have a quality approaching this value and you'd end up with a lot of false positives. The higher the overall coverage of your project the better, and the higher you can set this number. A value of 35 will probably remove all false positives, a value of 40 will probably never show false positives. | Integer 25 or more | 30 | ||||||||||||
-emea | integer | Only takes effect when -mr is set to 'Y'. Using the end of sequences of Sanger type shotgun sequencing is always a bit risky, as wrongly called bases tend to crowd there or some sequencing vector relicts hang around. It is even more risky to use these stretches for detecting possible repeats, so one can define an exclusion area where the bases are not used when determining whether a mismatch is due to repeats or not. | Integer 0 or more | 25 | ||||||||||||
-[no]amgb | boolean | Determines whether columns containing gap bases (indels) are also tagged. | Boolean value Yes/No | Yes | ||||||||||||
-[no]amgbemc | boolean | Only takes effect when -amgb is set to 'Y'. Determines whether multiple columns containing gap bases (indels) are also tagged. | Boolean value Yes/No | Yes | ||||||||||||
-[no]amgbnbs | boolean | Only takes effect when -amgb is set to 'Y'. Determines whether, for both tagging columns containing gap bases, both strands need to have a gap. Setting this to 'N' is not recommended except when working in desperately low coverage situations. | Boolean value Yes/No | Yes | ||||||||||||
-fnicpst | boolean | If set to yes, mira will be forced to make a choice for a consensus base (A,C,G,T or gap) even in unclear cases where it would normally put a IUPAC base. All other things being equal (like quality of the possible consensus base and other things), mira will choose a base by either looking for a majority vote or, if that also is not clear, by preferring gaps over T over G over C over finally A. | Boolean value Yes/No | No | ||||||||||||
-msr | boolean | Can only be used in mapping assemblies. If set to yes, mira will merge all perfectly mapping Solexa reads into longer reads while keeping quality and coverage information intact. This features hugely reduces the number of Solexa reads and makes assembly results with Solexa data small enough to be handled by current finishing programs (gap4, consed, others) on normal workstations. | Boolean value Yes/No | No | ||||||||||||
-gor | integer | Gap override ratio | Integer 0 or more | 66 | ||||||||||||
-ace | boolean | Once contigs have been build, mira can call a built-in version of the automatic contig editor EdIt. EdIt will try to resolve discrepancies in the contig by performing trace analysis and correct even hard to resolve errors. This option is always useful, but especially in conjunction with -nop and -ure. Notice: the current development version has a memory leak in the editor, therefore the option is not automatically turned on. | Boolean value Yes/No | No | ||||||||||||
-[no]sem | boolean | If set to 'Y' the automatic editor will not take error hypotheses with a low probability into account, even if all the requirements to make an edit are fulfilled. | Boolean value Yes/No | Yes | ||||||||||||
-ct | integer | The higher this value, the more strict the automatic editor will apply its internal rule set. Going below 40 is not recommended. | Integer from 1 to 100 | 50 | ||||||||||||
-outproject | string | Default is mira. Defines the output project name for this assembly. The output project name automatically influences the name of output files or directories only | Any string | $(project) | ||||||||||||
-sssip | boolean | Controls whether ’unimportant’ singlets are written to the result files. | Boolean value Yes/No | No | ||||||||||||
-[no]stsip | boolean | Controls whether singlets which have certain tags (SRMr, CRMr, WRMr, SROr, SAOr, SIOr) are written to the result files, even if [-sssip] is set. | Boolean value Yes/No | Yes | ||||||||||||
-[no]rrol | boolean | Removes log files once they should not be needed anymore during the assembly process. | Boolean value Yes/No | Yes | ||||||||||||
-rld | boolean | Removes the complete log directory at the end of the assembly process. Some logs contain useful information that you may want to analyse though. | Boolean value Yes/No | No | ||||||||||||
-[no]orc | boolean | Output CAF results | Boolean value Yes/No | Yes | ||||||||||||
-[no]orf | boolean | Output FASTA results | Boolean value Yes/No | Yes | ||||||||||||
-org | boolean | Output GAP4DA results | Boolean value Yes/No | No | ||||||||||||
-[no]ora | boolean | Output phrap ACE results | Boolean value Yes/No | Yes | ||||||||||||
-orh | boolean | Output HTML results | Boolean value Yes/No | No | ||||||||||||
-[no]ors | boolean | Output transposed contig summary results | Boolean value Yes/No | Yes | ||||||||||||
-ort | boolean | Output simple text results | Boolean value Yes/No | No | ||||||||||||
-[no]orw | boolean | Output wiggle results | Boolean value Yes/No | Yes | ||||||||||||
-[no]otc | boolean | Output temporary CAF results | Boolean value Yes/No | Yes | ||||||||||||
-otm | boolean | Output temporary MAF results | Boolean value Yes/No | No | ||||||||||||
-otf | boolean | Output temporary FASTA results | Boolean value Yes/No | No | ||||||||||||
-otg | boolean | Output temporary GAP4 results | Boolean value Yes/No | No | ||||||||||||
-ota | boolean | Output temporary phrap ACE results | Boolean value Yes/No | No | ||||||||||||
-oth | boolean | Output temporary HTML results | Boolean value Yes/No | No | ||||||||||||
-ots | boolean | Output temporary transposed contig summary results | Boolean value Yes/No | No | ||||||||||||
-ott | boolean | Output temporary text results | Boolean value Yes/No | No | ||||||||||||
-oetc | boolean | Output extra temporary CAF results | Boolean value Yes/No | No | ||||||||||||
-oetf | boolean | Output extra temporary FASTA results | Boolean value Yes/No | No | ||||||||||||
-oetg | boolean | Output extra temporary GAP4DA results | Boolean value Yes/No | No | ||||||||||||
-oeta | boolean | Output extra temporary phrap ACE results | Boolean value Yes/No | No | ||||||||||||
-oeth | boolean | Output extra temporary HTML results | Boolean value Yes/No | No | ||||||||||||
-oetas | boolean | Output extra temporary also singlets results | Boolean value Yes/No | No | ||||||||||||
-tcpl | integer | When producing an output in text format (-ort|ott|oett), this parameter defines how many bases each line of an alignment should contain. | Integer 1 or more | 60 | ||||||||||||
-hcpl | integer | When producing an output in text format (-orh|oth|oeth), this parameter defines how many bases each line of an alignment should contain. | Integer 1 or more | 60 | ||||||||||||
-tegfc | string | When producing an output in text format (-ort|ott|oett), endgaps are filled up with this character. | Any string | |||||||||||||
-hegfc | string | When producing an output in HTML format (-orh|oth|oeth), endgaps are filled up with this character. | Any string | |||||||||||||
-[no]sdlpo | boolean | Defines whether the spoiler detection algorithms are run only for the last pass or for all passes (-nop). Takes effect only if spoiler detection (-sd) is on. | Boolean value Yes/No | Yes | ||||||||||||
-tpae | boolean | This option is useful in EST assembly. Poly-AT stretches at the end of reads that were not correctly masked or clipped in pre-processing steps from external programs get tagged here. The assembler will not use these stretches for critical operations. Additionally, the tags do provide a good visual anchor when looking at the assembly with different programs. | Boolean value Yes/No | No | ||||||||||||
-pbwl | integer | Only takes effect when -tpae is set to 'Y'. Defines the window length within which all bases (except the maximum number of errors allowed) must be either A or T to be considered a polybase stretch. | Integer 1 or more | 7 | ||||||||||||
-pbwme | integer | Only takes effect when -tpae is set to 'Y. Defines the maximum number of errors allowed in a given window length such that a stretch is considered to be a polybase stretch. The distribution of these errors is not important. | Integer 1 or more | 2 | ||||||||||||
-pbwgd | integer | Only takes effect when -tpae is set to 'Y'. Defines the number of bases from the end of a sequence (if masked, from the end of the masked area) within which a polybase stretch is looked for without finding one. | Integer 1 or more | 9 | ||||||||||||
-[no]pvc | boolean | Mira will try to identify possible sequencing vector relicts present at the start of a sequence and clip them away. These relicts are usually a few bases long and were not correctly removed from the sequence in data pre-processing steps of external programs. You might want to turn off this option if you know (or think) that your data contains a lot of repeats and the option below to fine tune the clipping behaviour does not give the expected results. | Boolean value Yes/No | Yes | ||||||||||||
-pvcmla | integer | The clipping of possible vector relicts option works quite well. Unfortunately the bounds of repeats or differences in EST splice variants sometimes show the same alignment behaviour as possible sequencing vector relicts and could therefore also be clipped. To stop the vector clipping from mistakenly clipping repetitive regions or EST splice variants, this option puts an upper bound to the number of bases a potential clip is allowed to have. If the number of bases is below or equal to this threshold then the bases are clipped. If the number of bases exceeds the threshold then the clip is NOT performed. Setting the value to 0 turns off the threshold i.e. clips are then always performed if a potential vector is found. | Integer 0 or more | 18 | ||||||||||||
-qc | boolean | Default is 'N', but is automatically set to 'Y' when using the setparam options 'fasta' or 'phd' (can be turned off again by subsequent options afterwards). This will let mira perform its own quality clipping before sequences are entered into the assembly. The clip function performed is a sequence end window quality clip with back iteration to get a maximum number of bases as useful sequence. Note that the bases clipped away here can still be used afterwards if there is enough evidence supporting their correctness when the option -ure is turned on. | Boolean value Yes/No | No | ||||||||||||
-an | list | When adding reads to a contig, dangerous regions can get an extra integrity check. none = no extra check. text = check is only text-based. signal = check is signal based, if the SCF trace is not available, fallback is 'text'. For the time being, only regions tagged as ALUS or REPT in the experiment file are considered dangerous. |
|
signal | ||||||||||||
-dmer | integer | When adding reads to a contig, reject the reads if the error in zones known as dangerous exceeds the given value in %. Lower values mean stricter checking in these danger zones. For the time being, only regions tagged as ALUS or REPT in the experiment file are considered dangerous. | Integer from 1 to 100 | 1 | ||||||||||||
-dismin | integer | The minimum distance that read pairs may be apart. There is an additional error margin of 10% subtracted from this value during internal computations. | Integer 0 or more | 500 | ||||||||||||
-dismax | integer | The maximum distance that read pairs may be apart. There is an additional error margin of 10% added to this value during internal computations. | Integer 0 or more | 5000 | ||||||||||||
-oett | boolean | Output extra temporary TXT results | Boolean value Yes/No | No | ||||||||||||
-gapfda | string | Defines the extension of the directory where mira will write the result of an assembly ready to import into the Staden package (GAP4) in Direct Assembly format. The name of the directory will then be <projectname>_.<extension> | Any string | gap4da | ||||||||||||
-log | string | Defines the directory where mira will write some log files to. Note that the name of the actual project will be prepended. | Any string | miralog | ||||||||||||
-co | string | Defines the file in CAF format to save an assembled project to. Filename must end with '.caf'. | Any string | mira_out.caf | ||||||||||||
Associated qualifiers | ||||||||||||||||
"-expdir" associated directory qualifiers | ||||||||||||||||
-extension | string | Default file extension | Any string | |||||||||||||
"-scfdir" associated directory qualifiers | ||||||||||||||||
-extension | string | Default file extension | Any string | |||||||||||||
General qualifiers | ||||||||||||||||
-auto | boolean | Turn off prompts | Boolean value Yes/No | N | ||||||||||||
-stdout | boolean | Write first file to standard output | Boolean value Yes/No | N | ||||||||||||
-filter | boolean | Read first file from standard input, write first file to standard output | Boolean value Yes/No | N | ||||||||||||
-options | boolean | Prompt for standard and additional values | Boolean value Yes/No | N | ||||||||||||
-debug | boolean | Write debug output to program.dbg | Boolean value Yes/No | N | ||||||||||||
-verbose | boolean | Report some/full command line options | Boolean value Yes/No | Y | ||||||||||||
-help | boolean | Report command line options and exit. More information on associated and general qualifiers can be found with -help -verbose | Boolean value Yes/No | N | ||||||||||||
-warning | boolean | Report warnings | Boolean value Yes/No | Y | ||||||||||||
-error | boolean | Report errors | Boolean value Yes/No | Y | ||||||||||||
-fatal | boolean | Report fatal errors | Boolean value Yes/No | Y | ||||||||||||
-die | boolean | Report dying program messages | Boolean value Yes/No | Y | ||||||||||||
-version | boolean | Report version number and exit | Boolean value Yes/No | N |
|
This directory contains output files.
This directory contains output files.
This directory contains output files.
Program name | Description |
---|---|
emiraest | MIRAest fragment assembly program |
Please report all bugs to the EMBOSS bug team (emboss-bug © emboss.open-bio.org) not to the original author.