From ame at esbs.u-strasbg.fr Fri Feb 2 05:47:49 2007 From: ame at esbs.u-strasbg.fr (Jean-Christophe AME) Date: Fri, 2 Feb 2007 11:47:49 +0100 Subject: [EMBOSS] Restriction fragment sequences Message-ID: <291324FA-79E7-437D-98C7-95FC49D92F2D@esbs.u-strasbg.fr> Hello, I have a question concerning DNA restriction fragment analysis : Is there a way to generate the actual sequence of the restriction fragment generated by restrict or remap, this is to facilitate the in silico construction of recombinant plasmid just with a cut and paste. May there are some ways do this automatically (there was CloneIt but it doesn't work). Thanks JC ________________________ Jean-Christophe Am?, PhD D?partement Int?grit? du G?nome, UMR 7175-LC1 du CNRS ?cole Sup?rieure de Biotechnologie de Strasbourg P?le API Parc d'innovation, Boulevard S?bastien Brant BP 10413 67412 ILLKIRCH CEDEX France tel.: +33 (0)3 90 24 47 05 Fax.: +33 (0)3 90 24 46 86 http://parplink.u-strasbg.fr http://idg.u-strasbg.fr/ ? Science sans conscience n'est que ruine de l'?me ...? (Fran?ois Rabelais, 1483-1553) From pmr at ebi.ac.uk Fri Feb 2 06:28:55 2007 From: pmr at ebi.ac.uk (Peter Rice) Date: Fri, 02 Feb 2007 11:28:55 +0000 Subject: [EMBOSS] Restriction fragment sequences In-Reply-To: <291324FA-79E7-437D-98C7-95FC49D92F2D@esbs.u-strasbg.fr> References: <291324FA-79E7-437D-98C7-95FC49D92F2D@esbs.u-strasbg.fr> Message-ID: <45C32077.2060907@ebi.ac.uk> Jean-Christophe AME wrote: > Hello, > > I have a question concerning DNA restriction fragment analysis : Is > there a way to generate the actual sequence of the restriction > fragment generated by restrict or remap, this is to facilitate the in > silico construction of recombinant plasmid just with a cut and paste. > May there are some ways do this automatically (there was CloneIt but > it doesn't work). Interesting suggestion. You really need a nucleotide version of digest (or restrict with the fragment start/end and sizes reported instead of the cut sites). With the command line option -rformat listfile you can then use seqret to return the sequences but using @filename as input. Unfortunately if you do that with restrict you only get the restriction sites. We will add a new application to the next release. regards, Peter Rice From David.Bauer at SCHERING.DE Fri Feb 2 07:40:08 2007 From: David.Bauer at SCHERING.DE (David.Bauer at SCHERING.DE) Date: Fri, 2 Feb 2007 13:40:08 +0100 Subject: [EMBOSS] Antwort: Re: Restriction fragment sequences In-Reply-To: <45C32077.2060907@ebi.ac.uk> Message-ID: Hello Peter, emboss-bounces at lists.open-bio.org schrieb am 02/02/2007 12:28:55: > Jean-Christophe AME wrote: > > Hello, > > > > I have a question concerning DNA restriction fragment analysis : Is > > there a way to generate the actual sequence of the restriction > > fragment generated by restrict or remap, this is to facilitate the in > > silico construction of recombinant plasmid just with a cut and paste. > > May there are some ways do this automatically (there was CloneIt but > > it doesn't work). > > Interesting suggestion. You really need a nucleotide version of digest (or > restrict with the fragment start/end and sizes reported instead of > the cut sites). It would be really good to have this functionality in EMBOSS. But I guess we need more than just the fragments here. If we want to ligate a fragment into a vector, than we need also somewhere the information about the restriction sites at the ends (blunt, 5'overhang, 3'overhang). So I could imagine to have two applications. One which is able to create fragments and a description file with information about the fragments and theire ends. Or maybe one could put this information in the fasta description line of the created fragment. And a secon application which would allow to select particular fragments (from the description file) and than in-silico ligate the fragments. Optionally it should allow end modifications like Klenow fill-in. So this application would not just concatenate the fragments but simulate the ligation reaction. Cheers, David. From kehayden at gmail.com Thu Feb 8 17:17:52 2007 From: kehayden at gmail.com (Karen Hayden) Date: Thu, 8 Feb 2007 17:17:52 -0500 Subject: [EMBOSS] needle question! Message-ID: <54f810780702081417v4f3e7d84v16c40182d06628aa@mail.gmail.com> Hello, I am currently using needle to generate an alignment between two sequences which contain non-informative bases (ie, identified low quality bases (phred scores) and have been changed to "N"). Presently, these bases are penalized as any other non-matching character. Is there any way to change needle to "overlook" these bases when generating the best scoring alignment (or, do I need to write my own version of needle?) Thank you in advance for any advise you can offer! Best regards, Karen From pmr at ebi.ac.uk Thu Feb 8 18:15:32 2007 From: pmr at ebi.ac.uk (pmr at ebi.ac.uk) Date: Thu, 8 Feb 2007 23:15:32 -0000 (GMT) Subject: [EMBOSS] needle question! In-Reply-To: <54f810780702081417v4f3e7d84v16c40182d06628aa@mail.gmail.com> References: <54f810780702081417v4f3e7d84v16c40182d06628aa@mail.gmail.com> Message-ID: <37423.86.134.64.112.1170976532.squirrel@webmail.ebi.ac.uk> Dear Karen, > I am currently using needle to generate an alignment between two > sequences which contain non-informative bases (ie, identified low > quality bases (phred scores) and have been changed to "N"). > Presently, these bases are penalized as any other non-matching > character. Is there any way to change needle to "overlook" these > bases when generating the best scoring alignment (or, do I need to > write my own version of needle?) There are two matrix files for nucleotide comparisons. The default is EDNAFULL which counts N as an average of all possible scores (1 match against 3 possible mismatches). The alternative is EDNAMAT which only scores exact matches like blastn (use -data EDNAMAT on the command line to see the difference). But you can also copy EDNAMAT to your local directory with embossdata EDNAFULL -fetch mv EDNAFULL EDNAPHRED (best to do this rename or you will accidentally be using this file by default for other needle runs in the same directory) edit EDNAPHRED to have the scores you want for N (perhaps +1 for a small match to ACGTU, +2 for a match to a 2-base code RYSWKM, +3 for a match to a 3-base code BDHV and +4 for a match to another N. Then run with: needle -data EDNAPHRED If enough users think this is a meaningful scoring system we could add such a matrix to the distribution. Let us know if it really gives you more useful scores. My natural prejudice is to trust EDNAFULL. I guess you are expecting to often find the base in the other sequence is the one phred started with, which will indeed bias the scoring. Hope this helps, Peter From prateek.vit at gmail.com Fri Feb 9 04:37:02 2007 From: prateek.vit at gmail.com (prateek singh yadav) Date: Fri, 9 Feb 2007 15:07:02 +0530 Subject: [EMBOSS] problem in JEMBOSS installation Message-ID: Hii all, for two days I am trying to install JEMBOSS on EL system. I have configured apache-tomcat and axis(SOAP). when I run script ./install-jemboss-server.sh every thing runs fine but at last it shows error, which I am pasting below: File /usr/local/emboss/share/EMBOSS/jemboss/org/emboss/jemboss/server/JembossFileServer.java is missing ------------------------ mv: cannot stat `/usr/local/emboss/share/EMBOSS/jemboss/resources/jemboss.properties': No such file or directory touch: cannot touch `/usr/local/emboss/share/EMBOSS/jemboss/resources/jemboss.properties': No such file or directory ./install-jemboss-server.sh: line 401: /usr/local/emboss/share/EMBOSS/jemboss/resources/jemboss.properties: No such file or directory ./install-jemboss-server.sh: line 403: /usr/local/emboss/share/EMBOSS/jemboss/resources/jemboss.properties: No such file or directory ./install-jemboss-server.sh: line 404: /usr/local/emboss/share/EMBOSS/jemboss/resources/jemboss.properties: No such file or directory ./install-jemboss-server.sh: line 407: /usr/local/emboss/share/EMBOSS/jemboss/resources/jemboss.properties: No such file or directory ./install-jemboss-server.sh: line 414: /usr/local/emboss/share/EMBOSS/jemboss/resources/jemboss.properties: No such file or directory ./install-jemboss-server.sh: line 415: /usr/local/emboss/share/EMBOSS/jemboss/resources/jemboss.properties: No such file or directory ./install-jemboss-server.sh: line 418: /usr/local/emboss/share/EMBOSS/jemboss/resources/jemboss.properties: No such file or directory ./install-jemboss-server.sh: line 419: /usr/local/emboss/share/EMBOSS/jemboss/resources/jemboss.properties: No such file or directory ./install-jemboss-server.sh: line 420: /usr/local/emboss/share/EMBOSS/jemboss/resources/jemboss.properties: No such file or directory ./install-jemboss-server.sh: line 421: /usr/local/emboss/share/EMBOSS/jemboss/resources/jemboss.properties: No such file or directory ./install-jemboss-server.sh: line 425: /usr/local/emboss/share/EMBOSS/jemboss/resources/jemboss.properties: No such file or directory ./install-jemboss-server.sh: line 426: /usr/local/emboss/share/EMBOSS/jemboss/resources/jemboss.properties: No such file or directory cp: cannot stat `/usr/local/emboss/share/EMBOSS/jemboss/resources/jemboss.properties': No such file or directory Changed /usr/local/emboss/share/EMBOSS/jemboss/resources/jemboss.properties to reflect this installation (original in jemboss.properties.orig) sed: can't read /usr/local/emboss/share/EMBOSS/jemboss/runJemboss.sh: No such file or directory mv: cannot stat `/usr/local/emboss/share/EMBOSS/jemboss/runJemboss.sh': No such file or directory mv: cannot stat `/usr/local/emboss/share/EMBOSS/jemboss/resources': No such file or directory adding: META-INF/ (in=0) (out=0) (stored 0%) adding: META-INF/MANIFEST.MF (in=56) (out=56) (stored 0%) org/emboss/jemboss/parser/Ajax.*: No such file or directory Error adding org/emboss/jemboss/parser/Ajax.* to jar archive! Tomcat XML deployment descriptors have been created for the Jemboss Server. Would you like an automatic deployment of the Jemboss web services to be tried (y/n) [y]? I am totally confused what to do.... Can anyone help me in this????????? regards, Prateek -- Prateek Singh 3rd year Bioinformatics(BTech) Vellore Institute Of Technology Vellore-632014 prateek.vit at gmail.com From maoj at helix.nih.gov Mon Feb 12 12:56:11 2007 From: maoj at helix.nih.gov (Jean Mao) Date: Mon, 12 Feb 2007 12:56:11 -0500 Subject: [EMBOSS] question about 'fuzznuc'and 'urzpro' Message-ID: <000001c74ecf$15374910$be4de780@CIT.NIH.GOV> Hi, I know I can give a pattern like 'ACCGGT' and search against a file which contains multiple sequences. Is there a way I can specify a 'pattern file' which contains multiple patterns that I want to search for instead of just one pattern each time? For example, I have a fileA which contains multiple DNA sequences. I want to create a fileB which contains 20 patterns that I want to seach each of them against the sequences in the fileA. We are in the transition from GCG to EMBOSS. And the program 'findpatterns' in GCG can do this. But I couldn't find corresponding emboss program that does the same thing. Thank you in advance. Jean Mao Helix Staff CIT, NIH From pmr at ebi.ac.uk Mon Feb 12 13:54:27 2007 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 12 Feb 2007 18:54:27 +0000 Subject: [EMBOSS] question about 'fuzznuc'and 'urzpro' In-Reply-To: <000001c74ecf$15374910$be4de780@CIT.NIH.GOV> References: <000001c74ecf$15374910$be4de780@CIT.NIH.GOV> Message-ID: <45D0B7E3.4060507@ebi.ac.uk> Hi Jean, > I know I can give a pattern like 'ACCGGT' and search against a file which > contains multiple sequences. Is there a way I can specify a 'pattern file' > which contains multiple patterns that I want to search for instead of just > one pattern each time? For example, I have a fileA which contains multiple > DNA sequences. I want to create a fileB which contains 20 patterns that I > want to seach each of them against the sequences in the fileA. We are in the > transition from GCG to EMBOSS. And the program 'findpatterns' in GCG can do > this. But I couldn't find corresponding emboss program that does the same > thing. New in EMBOSS 4.0.0, contributed by Henrikki Almusa of Medicel in Helsinki. fuzznuc (and fuzzpro and fuzztran) now can read in a file of patterns with the commandline syntax: fuzznuc @patternfile You can also use @patternfile in response to the prompt for a pattern. Here is an example pattern file with FASTA-style IDs and mismatch counts for each pattern: >pat1 cggccctaaccctagcccta >pat2 cg(2)c(3)taac cctagc(3)ta >pat3 cggc{2,4}taac{2,5} Here is a file with just the second pattern, and no name (it will default to pattern1 cg(2)c(3)taac cctagc(3)ta You can set a default name with -pname and a default mismatch with -pmismatch I note we could document this better in the fuzz* program manual entries. We will do for the 4.1 release. Hope that helps, Peter From kehayden at gmail.com Mon Feb 12 14:13:35 2007 From: kehayden at gmail.com (Karen Hayden) Date: Mon, 12 Feb 2007 14:13:35 -0500 Subject: [EMBOSS] needle question! In-Reply-To: <54f810780702081533h48a44630q4d1e1b0d44b67205@mail.gmail.com> References: <54f810780702081417v4f3e7d84v16c40182d06628aa@mail.gmail.com> <37423.86.134.64.112.1170976532.squirrel@webmail.ebi.ac.uk> <54f810780702081533h48a44630q4d1e1b0d44b67205@mail.gmail.com> Message-ID: <54f810780702121113n17dc6527x750731f27827d0a1@mail.gmail.com> I have an additional question about needle, as I would like to actually remove noninformative bases from the final alignment score: ie. If the sequence follows -CATTCNNNCA- -CATTCAAACA- With suggested matrix weight changes I would expect to see a 100% similarity of 10/10 bases However, it is more informative to me to to see 100% similarity of 7/7 bases (with N no longer aiding my alignment score). One could imagine an artificial similarity score inflation if the entire length is used to generate the score...ie. if 100 bases were being aligned to 100 bp sequence (containing 10 "Ns"), and then 5 of those bases were an informative mismatch: Needle would currently provide: 95/100 (or simply 95% similarity) But the answer needed would be: 85/90 (or 94.4% similarity). Does this make sense? Thank you in advance for any help you can offer! Karen On 2/8/07, Karen Hayden wrote: > Hey Peter, > That was absolutely perfect. Thank you! > > Best wishes, > Karen > > > On 2/8/07, pmr at ebi.ac.uk wrote: > > Dear Karen, > > > > > I am currently using needle to generate an alignment between two > > > sequences which contain non-informative bases (ie, identified low > > > quality bases (phred scores) and have been changed to "N"). > > > Presently, these bases are penalized as any other non-matching > > > character. Is there any way to change needle to "overlook" these > > > bases when generating the best scoring alignment (or, do I need to > > > write my own version of needle?) > > > > There are two matrix files for nucleotide comparisons. The default is > > EDNAFULL which counts N as an average of all possible scores (1 match > > against 3 possible mismatches). > > > > The alternative is EDNAMAT which only scores exact matches like blastn > > (use -data EDNAMAT on the command line to see the difference). > > > > But you can also copy EDNAMAT to your local directory with > > > > embossdata EDNAFULL -fetch > > mv EDNAFULL EDNAPHRED > > (best to do this rename or you will accidentally be using this file by > > default for other needle runs in the same directory) > > > > edit EDNAPHRED to have the scores you want for N (perhaps +1 for a small > > match to ACGTU, +2 for a match to a 2-base code RYSWKM, +3 for a match to > > a 3-base code BDHV and +4 for a match to another N. > > > > Then run with: > > > > needle -data EDNAPHRED > > > > If enough users think this is a meaningful scoring system we could add > > such a matrix to the distribution. Let us know if it really gives you more > > useful scores. My natural prejudice is to trust EDNAFULL. I guess you are > > expecting to often find the base in the other sequence is the one phred > > started with, which will indeed bias the scoring. > > > > Hope this helps, > > > > Peter > > > > > > > > > -- > Karen E. Hayden > Starving Graduate Student > Duke University > Durham, NC 27708 > -- Karen E. Hayden Starving Graduate Student Duke University Durham, NC 27708 From jison at ebi.ac.uk Mon Feb 12 17:48:00 2007 From: jison at ebi.ac.uk (Jon Ison) Date: Mon, 12 Feb 2007 22:48:00 -0000 (GMT) Subject: [EMBOSS] needle question! In-Reply-To: <54f810780702121113n17dc6527x750731f27827d0a1@mail.gmail.com> References: <54f810780702081417v4f3e7d84v16c40182d06628aa@mail.gmail.com> <37423.86.134.64.112.1170976532.squirrel@webmail.ebi.ac.uk> <54f810780702081533h48a44630q4d1e1b0d44b67205@mail.gmail.com> <54f810780702121113n17dc6527x750731f27827d0a1@mail.gmail.com> Message-ID: <49212.84.92.187.247.1171320480.squirrel@webmail.ebi.ac.uk> Hi Karen If I understand you correctly, you want 'N' bases to be totally "invisible" during the generation and scoring of the alignment. To score the alignment in the way you describe would, I think, require probably trivial reprogramming of needle, via a new "advanced" option. Could be done for the next release or sooner, how urgent do you need it? Scoring in the way you describe is a reasonable thing to do, but if N is not to contribute to the score it should not contribute to the alignment either, so you'd need to adjust the scoring matrix so that all substitutions involving N are neutral - I guess by specifying a value of zero for them. Just my two penneth' :) Cheers Jon > I have an additional question about needle, as I would like to > actually remove noninformative bases from the final alignment score: > > ie. If the sequence follows > -CATTCNNNCA- > -CATTCAAACA- > > With suggested matrix weight changes I would expect to see a 100% > similarity of 10/10 bases > However, it is more informative to me to to see 100% similarity of 7/7 > bases (with N no longer aiding my alignment score). One could imagine > an artificial similarity score inflation if the entire length is used > to generate the score...ie. if 100 bases were being aligned to 100 bp > sequence (containing 10 "Ns"), and then 5 of those bases were an > informative mismatch: > > Needle would currently provide: > 95/100 (or simply 95% similarity) > > But the answer needed would be: > 85/90 (or 94.4% similarity). > > Does this make sense? > Thank you in advance for any help you can offer! > > Karen > > > > On 2/8/07, Karen Hayden wrote: >> Hey Peter, >> That was absolutely perfect. Thank you! >> >> Best wishes, >> Karen >> >> >> On 2/8/07, pmr at ebi.ac.uk wrote: >> > Dear Karen, >> > >> > > I am currently using needle to generate an alignment between two >> > > sequences which contain non-informative bases (ie, identified low >> > > quality bases (phred scores) and have been changed to "N"). >> > > Presently, these bases are penalized as any other non-matching >> > > character. Is there any way to change needle to "overlook" these >> > > bases when generating the best scoring alignment (or, do I need to >> > > write my own version of needle?) >> > >> > There are two matrix files for nucleotide comparisons. The default is >> > EDNAFULL which counts N as an average of all possible scores (1 match >> > against 3 possible mismatches). >> > >> > The alternative is EDNAMAT which only scores exact matches like blastn >> > (use -data EDNAMAT on the command line to see the difference). >> > >> > But you can also copy EDNAMAT to your local directory with >> > >> > embossdata EDNAFULL -fetch >> > mv EDNAFULL EDNAPHRED >> > (best to do this rename or you will accidentally be using this file by >> > default for other needle runs in the same directory) >> > >> > edit EDNAPHRED to have the scores you want for N (perhaps +1 for a small >> > match to ACGTU, +2 for a match to a 2-base code RYSWKM, +3 for a match to >> > a 3-base code BDHV and +4 for a match to another N. >> > >> > Then run with: >> > >> > needle -data EDNAPHRED >> > >> > If enough users think this is a meaningful scoring system we could add >> > such a matrix to the distribution. Let us know if it really gives you more >> > useful scores. My natural prejudice is to trust EDNAFULL. I guess you are >> > expecting to often find the base in the other sequence is the one phred >> > started with, which will indeed bias the scoring. >> > >> > Hope this helps, >> > >> > Peter >> > >> > >> > >> >> >> -- >> Karen E. Hayden >> Starving Graduate Student >> Duke University >> Durham, NC 27708 >> > > > -- > Karen E. Hayden > Starving Graduate Student > Duke University > Durham, NC 27708 > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > From pmr at ebi.ac.uk Tue Feb 13 05:03:21 2007 From: pmr at ebi.ac.uk (Peter Rice) Date: Tue, 13 Feb 2007 10:03:21 +0000 Subject: [EMBOSS] question about 'fuzznuc'and 'urzpro' In-Reply-To: <416B9A34D7CA1C4C9ED58354E75101BB02622856@NIHCESMLBX3.nih.gov> References: <000001c74ecf$15374910$be4de780@CIT.NIH.GOV> <45D0B7E3.4060507@ebi.ac.uk> <416B9A34D7CA1C4C9ED58354E75101BB02622856@NIHCESMLBX3.nih.gov> Message-ID: <45D18CE9.1070804@ebi.ac.uk> Hi Jean, I copied this reply to the list - as it includes poorly documented features and some suggestions for the future. > It's great to know it can be done! I do have further questions. So in the > pattern file that has no name and contains two lines, you said it's going to > default to pattern 1. Does that means that without the '>', everything will > be concatenated and treated as one pattern? Yes. We did include a -pformat qualifier to set the format of the pattern file, so we can extend in future to have one pattern per line. Actually I should ask what's the difference between > >> pat2 > cg(2)c(3)taac > cctagc(3)ta > > and > >> pat2 > cg(2)c(3)taaccctagc(3)ta They are the same - pattern lines are simply joined together until the next new pattern header (>pat3) is found. > also what's the difference between a file containing >> pat2 > cg(2)c(3)taac > cctagc(3)ta > with a file containing > cg(2)c(3)taac > cctagc(3)ta The first allows one mismatch in matching the pattern. These patterns for with the HHTETRA entry we use for the example in the program manual (accession number L46634) >HHTETRA L46634.1 Human herpesvirus 7 (clone ED132'1.2) telomeric repeat region. aagcttaaactgaggtcacacacgactttaattacggcaacgcaacagctgtaagctgca ggaaagatacgatcgtaagcaaatgtagtcctacaatcaagcgaggttgtagacgttacc tacaatgaactacacctctaagcataacctgtcgggcacagtgagacacgcagccgtaaa ttcaaaactcaacccaaaccgaagtctaagtctcaccctaatcgtaacagtaaccctaca actctaatcctagtccgtaaccgtaaccccaatcctagcccttagccctaaccctagccc taaccctagctctaaccttagctctaactctgaccctaggcctaaccctaagcctaaccc taaccgtagctctaagtttaaccctaaccctaaccctaaccatgaccctgaccctaaccc tagggctgcggccctaaccctagccctaaccctaaccctaatcctaatcctagccctaac cctagggctgcggccctaaccctagccctaaccctaaccctaaccctagggctgcggccc taaccctaaccctagggctgcggcccgaaccctaaccctaaccctaaccctaaccctagg gctgcggccctaaccctaaccctagggctgcggccctaaccctaaccctagggctgcggc ccgaaccctaaccctaaccctaaccctagggctgcggccctaaccctaaccctagggctg cggccctaaccctaaccctaactctagggctgcggccctaaccctaaccctaaccctaac cctagggctgcggcccgaaccctagccctaaccctaaccctgaccctgaccctaacccta accctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaacccta accctaaccctaaccctaaccctaaccccgcccccactggcagccaatgtcttgtaatgc cttcaaggcactttttctgcgagccgcgcgcagcactcagtgaaaaacaagtttgtgcac gagaaagacgctgccaaaccgcagctgcagcatgaaggctgagtgcacaattttggcttt agtcccataaaggcgcggcttcccgtagagtagaaaaccgcagcgcggcgcacagagcga aggcagcggctttcagactgtttgccaagcgcagtctgcatcttaccaatgatgatcgca agcaagaaaaatgttctttcttagcatatgcgtggttaatcctgttgtggtcatcactaa gttttcaagctt > Also could you explain how to use -pname and -pmismatch? >I don't understand this part at all :-P Thank you very much! Ah ... they are associated qualifiers (like -sformat, sbegin, send for sequences, -osformat for sequence output, -aformat for alignments and -rformat for reports. They only show up if you use -help -verbose to see the help. This caused some problems for fuzznuc users with release 4.0.0 as they replace the previous version which had a -mismatch option and only read one pattern. -pmismatch sets a default number of mismatches for all patterns (that you can override within the pattern file). -pname sets a pattern name for the output (something that was missing before). Oops, we have a bug ... the name is being ignored in fuzznuc. Will be fixed in 4.1.0. -pformat sets the pattern file format - so far this is ignored so we have not documented pattern file format names. I think a file with one line for each pattern and numbering 1, 2, 3 added to the pattern name would be useful. We could call the formats "simple" (one line per pattern) and "fasta" (the current format with names) Oops, another bug. Using a bad pattern file name is not being caught. Fixed in 4.1.0 We also added files of regular expressions used by dreg and preg so you can also use them for pattern searched (it depends on whether you prefer prosite-style patterns or regular expressions - I find the prosite style for fuzznuc are much easier). We can use the same file formats for them. I have to check the original pattern file code from Henrikki Almusa to see whether we lost anything in the naming and formats. Hope that helps, Peter From maoj at helix.nih.gov Wed Feb 14 10:53:09 2007 From: maoj at helix.nih.gov (Jean Mao) Date: Wed, 14 Feb 2007 10:53:09 -0500 Subject: [EMBOSS] problem compiling EMBOSS-4.0.0 Message-ID: <000501c75050$39ef0540$be4de780@CIT.NIH.GOV> Based on the conversation I had below with Peter, I grep that patch and put into $EMBOSS/ajax, make clean, make install and got the following this time , Thank you for your help : gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\" -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"EMBOSS\" -DVERSION=\"4.0.0\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DHAVE_DIRENT_H=1 -DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DGETPGRP_VOID=1 -DHAVE_STRFTIME=1 -DHAVE_UNISTD_H=1 -DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1 -DHAVE_WORKING_FORK=1 -DHAVE_VPRINTF=1 -DHAVE_MEMMOVE=1 -DHAVE_LIBM=1 -DPLD_png=1 -I. -I. -DAJAX_FIXED_ROOT=\"/usr/local/EMBOSS-4.0.0.fix1-23/emboss\" -DPREFIX=\"/usr/local/EMBOSS-4.0.0.fix1-23\" -I../plplot -DPOSIX_MALLOC_THRESHOLD=10 -DAJ_LinuxLF -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -DLENDIAN -DNO_AUTH -O2 -D__amd64__ -MT ajreport.lo -MD -MP -MF .deps/ajreport.Tpo -c ajreport.c -fPIC -DPIC -o .libs/ajreport.o ajreport.c:2375: error: conflicting types for 'ajReportWrite' ajreport.h:122: error: previous declaration of 'ajReportWrite' was here ajreport.c:2375: error: conflicting types for 'ajReportWrite' ajreport.h:122: error: previous declaration of 'ajReportWrite' was here ajreport.c: In function `ajReportWrite': ajreport.c:2390: error: structure has no member named `MaxHitAll' ajreport.c:2392: error: structure has no member named `MaxHitAll' ajreport.c:2399: error: structure has no member named `MaxHitSeq' ajreport.c:2401: error: structure has no member named `MaxHitAll' ajreport.c:2402: error: structure has no member named `MaxHitSeq' ajreport.c:2402: error: structure has no member named `MaxHitSeq' ajreport.c:2404: error: structure has no member named `MaxHitSeq' ajreport.c:2409: error: structure has no member named `MaxHitSeq' ajreport.c:2409: error: structure has no member named `MaxHitAll' ajreport.c:2423: warning: assignment makes pointer from integer without a cast ajreport.c: In function `ajReportWriteHeader': ajreport.c:2569: error: structure has no member named `MaxHitAll' ajreport.c:2571: error: structure has no member named `MaxHitAll' ajreport.c:2572: error: structure has no member named `MaxHitSeq' ajreport.c:2574: error: structure has no member named `MaxHitSeq' ajreport.c: In function `ajReportWriteTail': ajreport.c:2723: error: structure has no member named `MaxHitAll' ajreport.c:2724: error: structure has no member named `MaxHitAll' ajreport.c: At top level: ajreport.c:3061: error: conflicting types for 'ajReportAppendSubTail' ajreport.c:2420: error: previous implicit declaration of 'ajReportAppendSubTail' was here make[1]: *** [ajreport.lo] Error 1 make[1]: Leaving directory `/usr/local/EMBOSS-4.0.0.fix1-23/ajax' make: *** [install-recursive] Error 1 -----Original Message----- From: Peter Rice [mailto:pmr at ebi.ac.uk] Sent: 2007?2?14? 10:07 To: Mao, Jean (NIH/CIT) [E] Cc: emboss-bug at emboss.open-bio.org Subject: 4.0 patch problem Hi Jean > I am working on patching EMBOSS 4.0 right now and run into some problems. On my Linux machine: > > downloaded the patch1-23.gz file from your ftp site. then copy my > current emboss tree to a different name. cd to that dir, then > > gunzip -c /somewhere/patch-1-X.gz | patch -p1 > > I answered 'y' to all questions for those patches I previously applied. > > Then I modify the install directory in the configure file, > > ./configure > make clean > make install > > later, the following error appears. I attached the three files in case you need them. Thank you very much! > > if /bin/sh ../libtool --tag=CC --mode=compile gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\" -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"EMBOSS\" -DVERSION=\"4.0.0\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DHAVE_DIRENT_H=1 -DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DGETPGRP_VOID=1 -DHAVE_STRFTIME=1 -DHAVE_UNISTD_H=1 -DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1 -DHAVE_WORKING_FORK=1 -DHAVE_VPRINTF=1 -DHAVE_MEMMOVE=1 -DHAVE_LIBM=1 -DPLD_png=1 -I. -I. -DAJAX_FIXED_ROOT=\"/usr/local/EMBOSS-4.0.0.fix1-23/emboss\" -DPREFIX=\"/usr/local/EMBOSS-4.0.0.fix1-23\" -I../plplot -DPOSIX_MALLOC_THRESHOLD=10 -DAJ_LinuxLF -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -DLENDIAN -DNO_AUTH -O2 -D__amd64__ -MT ajreport.lo -MD -MP -MF ".deps/ajreport.Tpo" -c -o ajreport. lo ajreport.c; \ > then mv -f ".deps/ajreport.Tpo" ".deps/ajreport.Plo"; else rm -f > ".deps/ajreport.Tpo"; exit 1; fi gcc -DPACKAGE_NAME=\"\" > -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\" -DPACKAGE_STRING=\"\" > -DPACKAGE_BUGREPORT=\"\" -DPACKAGE=\"EMBOSS\" -DVERSION=\"4.0.0\" > -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 > -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 > -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 > -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DHAVE_DIRENT_H=1 -DSTDC_HEADERS=1 > -DHAVE_UNISTD_H=1 -DGETPGRP_VOID=1 -DHAVE_STRFTIME=1 -DHAVE_UNISTD_H=1 > -DHAVE_FORK=1 -DHAVE_VFORK=1 -DHAVE_WORKING_VFORK=1 > -DHAVE_WORKING_FORK=1 -DHAVE_VPRINTF=1 -DHAVE_MEMMOVE=1 -DHAVE_LIBM=1 > -DPLD_png=1 -I. -I. > -DAJAX_FIXED_ROOT=\"/usr/local/EMBOSS-4.0.0.fix1-23/emboss\" > -DPREFIX=\"/usr/local/EMBOSS-4.0.0.fix1-23\" -I../plplot > -DPOSIX_MALLOC_THRESHOLD=10 -DAJ_LinuxLF -D_FILE_OFFSET_BITS=64 > -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -DLENDIAN -DNO_AUTH -O2 > -D__amd64__ -MT ajreport.lo -MD -MP -MF .deps/ajreport.Tpo -c > ajreport.c -fPIC -DPIC -o .libs/ajreport.o > ajreport.c:137: warning: excess elements in struct initializer > ajreport.c:137: warning: (near initialization for `reportFormat[0]') > ajreport.c:139: warning: excess elements in struct initializer > ajreport.c:139: warning: (near initialization for `reportFormat[1]') > ajreport.c:141: warning: excess elements in struct initializer > ajreport.c:141: warning: (near initialization for `reportFormat[2]') > ajreport.c:143: warning: excess elements in struct initializer > ajreport.c:143: warning: (near initialization for `reportFormat[3]') > ajreport.c:145: warning: excess elements in struct initializer > ajreport.c:145: warning: (near initialization for `reportFormat[4]') > ajreport.c:148: warning: initialization makes pointer from integer > without a cast > ajreport.c:148: warning: excess elements in struct initializer > ajreport.c:148: warning: (near initialization for `reportFormat[5]') > ajreport.c:151: warning: initialization makes pointer from integer > without a cast > ajreport.c:151: warning: excess elements in struct initializer > ajreport.c:151: warning: (near initialization for `reportFormat[6]') > ajreport.c:154: warning: initialization makes pointer from integer > without a cast > ajreport.c:154: warning: excess elements in struct initializer > ajreport.c:154: warning: (near initialization for `reportFormat[7]') > ajreport.c:156: warning: initialization makes pointer from integer > without a cast > ajreport.c:156: warning: excess elements in struct initializer > ajreport.c:156: warning: (near initialization for `reportFormat[8]') > ajreport.c:161: warning: excess elements in struct initializer > ajreport.c:161: warning: (near initialization for `reportFormat[9]') > ajreport.c:163: warning: excess elements in struct initializer > ajreport.c:163: warning: (near initialization for `reportFormat[10]') > ajreport.c:165: warning: initialization makes pointer from integer > without a cast > ajreport.c:165: warning: excess elements in struct initializer > ajreport.c:165: warning: (near initialization for `reportFormat[11]') > ajreport.c:167: warning: initialization makes pointer from integer > without a cast > ajreport.c:167: warning: excess elements in struct initializer > ajreport.c:167: warning: (near initialization for `reportFormat[12]') > ajreport.c:169: warning: initialization makes pointer from integer > without a cast > ajreport.c:169: warning: excess elements in struct initializer > ajreport.c:169: warning: (near initialization for `reportFormat[13]') > ajreport.c:171: warning: initialization makes pointer from integer > without a cast > ajreport.c:171: warning: excess elements in struct initializer > ajreport.c:171: warning: (near initialization for `reportFormat[14]') > ajreport.c:173: warning: initialization makes pointer from integer > without a cast > ajreport.c:173: warning: excess elements in struct initializer > ajreport.c:173: warning: (near initialization for `reportFormat[15]') > ajreport.c:175: warning: initialization makes pointer from integer > without a cast > ajreport.c:175: warning: excess elements in struct initializer > ajreport.c:175: warning: (near initialization for `reportFormat[16]') > ajreport.c:177: warning: initialization makes pointer from integer > without a cast > ajreport.c:177: warning: excess elements in struct initializer > ajreport.c:177: warning: (near initialization for `reportFormat[17]') > ajreport.c:179: warning: initialization makes pointer from integer > without a cast > ajreport.c:179: warning: excess elements in struct initializer > ajreport.c:179: warning: (near initialization for `reportFormat[18]') > ajreport.c:180: warning: excess elements in struct initializer > ajreport.c:180: warning: (near initialization for `reportFormat[19]') > ajreport.c: In function `ajReportNew': > ajreport.c:2342: error: structure has no member named `Count' > ajreport.c: In function `ajReportWriteHeader': > ajreport.c:2488: error: structure has no member named `Count' > make[1]: *** [ajreport.lo] Error 1 > make[1]: Leaving directory `/usr/local/EMBOSS-4.0.0.fix1-23/ajax' > make: *** [install-recursive] Error 1 ajax/ajreport.h and ajax/ajreport.c do not match. Perhaps one patch did not get applied. I think ajreport.c is out of date. The structure should have CountSeq and CountHit. In the original 4.0.0 it only had Count. You will find a new ajreport.c on the ftp server, or you can try to reapply the patches. Hope that helps. I am away for the next few days so please mail emboss-bug in case I have problems getting to my email. regards, Peter From maoj at helix.nih.gov Wed Feb 14 12:56:31 2007 From: maoj at helix.nih.gov (Jean Mao) Date: Wed, 14 Feb 2007 12:56:31 -0500 Subject: [EMBOSS] Emboss compile problem Message-ID: <000a01c75061$756b8920$be4de780@CIT.NIH.GOV> Please ignore my previous email. I manually grep all the patches and applied, run configure, make clean, make install, now I got the following message. Thank you.: make[3]: Entering directory `/usr/local/EMBOSS-4.0.0.fix1-23/doc/programs' make[4]: Entering directory `/usr/local/EMBOSS-4.0.0.fix1-23/doc/programs' make[4]: Nothing to be done for `install-exec-am'. make[4]: Nothing to be done for `install-data-am'. make[4]: Leaving directory `/usr/local/EMBOSS-4.0.0.fix1-23/doc/programs' make[3]: Leaving directory `/usr/local/EMBOSS-4.0.0.fix1-23/doc/programs' make[2]: Leaving directory `/usr/local/EMBOSS-4.0.0.fix1-23/doc/programs' Making install in tutorials make[2]: Entering directory `/usr/local/EMBOSS-4.0.0.fix1-23/doc/tutorials' make[2]: *** No rule to make target `emboss_tut.tar.gz', needed by `all-am'. Stop. make[2]: Leaving directory `/usr/local/EMBOSS-4.0.0.fix1-23/doc/tutorials' make[1]: *** [install-recursive] Error 1 make[1]: Leaving directory `/usr/local/EMBOSS-4.0.0.fix1-23/doc' make: *** [install-recursive] Error 1 From andrespinzon at gmail.com Tue Feb 20 11:20:48 2007 From: andrespinzon at gmail.com (Andres Pinzon) Date: Tue, 20 Feb 2007 11:20:48 -0500 Subject: [EMBOSS] dreg and reverse strand Message-ID: <8968fc7e0702200820r7a455733x41d1c1a7c32760d5@mail.gmail.com> Hi, Im using dreg to find some patterns on a xanthomonas* genome reverse strand. This is the command im using: dreg -sequence ./campestrisVesicatoria.gb -pattern 'TTC(G|T|C){14,17}TTC(G|A|T)' -outfile campestrisVes-rev.dreg.gb -rformat3 genbank -sask1 But there are 2 problems: 1) the output file is, in fact, a genbank output file, for instance: ------------------------------------- misc_feature 450..471 /note="*pat regex1" ------------------------------------- But it lacks the "complement" keyword, so "Artemis" the annotation program im using, can not read it as a feature in the reverse strand. So, It should look like this: ---------------------------------------------- misc_feature complement(450..471) /note="*pat regex1" ----------------------------------------------- 2) It seems that dreg is identifying a wrong pattern. I edited the output file and put the complement keyword in one of the hits (the example above). But theres no pattern at that position. This is a screenshot: http://bioinf.ibun.unal.edu.co/~problem/art-dreg.png Am i doing something wrong? Thanks in advance, -- Andr?s Pinz?n cPh.D. http://groups.google.com/group/Bioinformatica_es Bioinformatics Center, Colombia EMBnet node http://bioinf.ibun.unal.edu.co Tel +57 3165000 ext 16961 Fax +571 3165415 Micology and Phytopathology Laboratory - Los Andes University. Tel +571 3394949 ext. 2768 From pmr at ebi.ac.uk Tue Feb 20 11:56:28 2007 From: pmr at ebi.ac.uk (Peter Rice) Date: Tue, 20 Feb 2007 16:56:28 +0000 Subject: [EMBOSS] dreg and reverse strand In-Reply-To: <8968fc7e0702200820r7a455733x41d1c1a7c32760d5@mail.gmail.com> References: <8968fc7e0702200820r7a455733x41d1c1a7c32760d5@mail.gmail.com> Message-ID: <45DB283C.5020902@ebi.ac.uk> Andres Pinzon wrote: > Hi, > Im using dreg to find some patterns on a xanthomonas* genome reverse strand. > This is the command im using: > > dreg -sequence ./campestrisVesicatoria.gb -pattern > 'TTC(G|T|C){14,17}TTC(G|A|T)' -outfile campestrisVes-rev.dreg.gb > -rformat3 genbank -sask1 Oops. Can you send me the input sequence please. We will fix it for the next release (soon) regards, Peter From andrespinzon at gmail.com Tue Feb 20 15:13:58 2007 From: andrespinzon at gmail.com (Andres Pinzon) Date: Tue, 20 Feb 2007 15:13:58 -0500 Subject: [EMBOSS] Fuzznuc question: how to search complementary strand? Message-ID: <8968fc7e0702201213oa8e295fyd336463f8051c75@mail.gmail.com> Hi, Im trying fuzznuc to search for some patterns in a a genome. When I search the forward strand: fuzznuc -sequence mysqeq.gb -pattern 'TTC[GTC]-N(14,17)-TTC[GAT]' -outfile myseq.fuzznuc -rformat2 genbank Everything is ok. ...But when I search the complementary strand: fuzznuc -sequence mysqeq.gb -pattern 'TTC[GTC]-N(14,17)-TTC[GAT]' -outfile myseq-rev.fuzznuc -rformat2 genbank -complement It appears to do it well, I mean that it searches for the pattern on both strands, but when I take a closer look to the output file, this is what I got: =================== misc_feature 676..700 /note="*pat pattern1" misc_feature 12325..12349 /note="*pat pattern1" . . . . misc_feature complement(676..700) /note="*pat pattern1" misc_feature complement(12325..12349) /note="*pat pattern1" ====================== It reports a pattern on complement that exists, in fact, but on the forward strand not in complement. Am I doing something wrong? What options do I have to use in order to make fuzznuc to report the occurrences of "pattern" on both: reverse and complementary strand? Regards, -- Andr?s Pinz?n cPh.D. http://groups.google.com/group/Bioinformatica_es Bioinformatics Center, Colombia EMBnet node http://bioinf.ibun.unal.edu.co Tel +57 3165000 ext 16961 Fax +571 3165415 Micology and Phytopathology Laboratory - Los Andes University. Tel +571 3394949 ext. 2768 From charles-listes-emboss at plessy.org Wed Feb 21 03:04:38 2007 From: charles-listes-emboss at plessy.org (Charles Plessy) Date: Wed, 21 Feb 2007 17:04:38 +0900 Subject: [EMBOSS] Is vectorstrip gapless by design or is it a bug ? Message-ID: <20070221080438.GB23932@kunpuu.plessy.org> Dear list, I am using vectorstrip to find PCR primers in cloned PCR products. Strangely, in some cases it misses a primer, because it overestimates the number of mismatches. In the following example, vectorstrip identifies the first primer with six mismatches, although it has only two. It means that if I run vectorstrip with a -mismatch value lower that 29, I do miss the primer. The following is a mixture of shell commands and extracts of outputs. The sequence consists of two reads assembled by using trimseq on .ab1 files, and then merger on the resulting fasta files. export SEQ="ttttcccccccccnntttttttnnnnncccccnnnnnnnnnaaaaAAccCcTcNCTaTagggCGAGTTggGccCtTCTAGTNtGCATGCtTCGAGcGGcccGccAGTgTTGATGGaTaTCTTGCaGaaTTcGcccTTaaTGAggTAACCgGTTcccAGCaGNttttttttttttttttttttttttttttttttttttttttttttttttttttttttttAaaaaGaaTTGtttattTACTGAACCNgggCAtAtTaGaTACACAACCCATTTTaaaTTTAcATcttttAAtTCaaTtTTGAAgTGttTTTAcAcAcCCNCNCAAaAaaaaaaaaaTTTGGCATGcAACAgCTgGGAACCGTtACCtCATTAAgggCGAAtTCcAGcAcAcTGGCgGCCGTTACtAAGGGATCCGAGCTcGGNACCAAGnnnngnnnnnnnnnnnnnnnnnnttntttnntnnnnaaaaa" export LINKERA="AATGAGGTAACGGTTCCCAGC" export LINKERB="GCTGGGAACCGTTACCTCATT" vectorstrip asis:$SEQ \ -linkera=$LINKERA \ -linkerb=$LINKERB \ -outfile stdout \ -outseq /dev/null \ -novectorfile \ -nobesthits \ -mismatch 30 Sequence: asis Vector: no_name 5' sequence matches: From 138 to 158 with 6 mismatches 3' sequence matches: From 351 to 371 with 0 mismatches Sequences output to file: from 159 to 350 CaGNtttttttttttttttttttttttttttttttttttttttttttttt ttttttttttttAaaaaGaaTTGtttattTACTGAACCNgggCAtAtTaG aTACACAACCCATTTTaaaTTTAcATcttttAAtTCaaTtTTGAAgTGtt TTTAcAcAcCCNCNCAAaAaaaaaaaaaTTTGGCATGcAACA sequence trimmed from 5' end: ttttcccccccccnntttttttnnnnncccccnnnnnnnnnaaaaAAccC cTcNCTaTagggCGAGTTggGccCtTCTAGTNtGCATGCtTCGAGcGGcc cGccAGTgTTGATGGaTaTCTTGCaGaaTTcGcccTTaaTGAggTAACCg GTTcccAG sequence trimmed from 3' end: gCTgGGAACCGTtACCtCATTAAgggCGAAtTCcAGcAcAcTGGCgGCCG TTACtAAGGGATCCGAGCTcGGNACCAAGnnnngnnnnnnnnnnnnnnnn nnttntttnntnnnnaaaaa needle asis:$SEQ[138:158] asis:$LINKERA stdout -auto asis 138 aaTGAggTAACCgGTTcccAG- 158 |||||||||| |||||||||| asis 1 AATGAGGTAA-CGGTTCCCAGC 21 Interestingly, in the following aligmnent, the number of mismatches is 6. But I did not find anything saying that gaps were disallowed in vectorscript ? aaTGAggTAACCgGTTcccAG ||||||||||| | | || AATGAGGTAACGGTTCCCAGC I am using emboss through fink (emboss package 4.0.0-2). Have a nice day, -- Charles Plessy http://charles.plessy.org Wako, Saitama, Japan From jison at ebi.ac.uk Mon Feb 26 07:10:41 2007 From: jison at ebi.ac.uk (Jon Ison) Date: Mon, 26 Feb 2007 12:10:41 -0000 (GMT) Subject: [EMBOSS] Is vectorstrip gapless by design or is it a bug ? In-Reply-To: <20070221080438.GB23932@kunpuu.plessy.org> References: <20070221080438.GB23932@kunpuu.plessy.org> Message-ID: <58372.172.22.100.168.1172491841.squirrel@webmail.ebi.ac.uk> Hi Charles, I wasn't sure you'd already got a reply to this so here goes. From a very quick look at your email ... does it all make sense considering that needle does an optimal alignment with gaps whereas vectorstip uses a "word-match" type (ungapped) alignment ... ? It seems OTT to make vectorstrip do a optimal alignment but I guess its possible in principle ... If you've still got a prob. feel free to get back. Cheers Jon > Dear list, > > I am using vectorstrip to find PCR primers in cloned PCR products. Strangely, > in some cases it misses a primer, because it overestimates the number of > mismatches. > > In the following example, vectorstrip identifies the first primer with six > mismatches, although it has only two. It means that if I run vectorstrip with > a -mismatch value lower that 29, I do miss the primer. > > The following is a mixture of shell commands and extracts of outputs. The > sequence consists of two reads assembled by using trimseq on .ab1 files, and > then merger on the resulting fasta files. > > > export > SEQ="ttttcccccccccnntttttttnnnnncccccnnnnnnnnnaaaaAAccCcTcNCTaTagggCGAGTTggGccCtTCTAGTNtGCATGCtTCGAGcGGcccGccAGTgTTGATGGaTaTCTTGCaGaaTTcGcccTTaaTGAggTAACCgGTTcccAGCaGNttttttttttttttttttttttttttttttttttttttttttttttttttttttttttAaaaaGaaTTGtttattTACTGAACCNgggCAtAtTaGaTACACAACCCATTTTaaaTTTAcATcttttAAtTCaaTtTTGAAgTGttTTTAcAcAcCCNCNCAAaAaaaaaaaaaTTTGGCATGcAACAgCTgGGAACCGTtACCtCATTAAgggCGAAtTCcAGcAcAcTGGCgGCCGTTACtAAGGGATCCGAGCTcGGNACCAAGnnnngnnnnnnnnnnnnnnnnnnttntttnntnnnnaaaaa" > > export LINKERA="AATGAGGTAACGGTTCCCAGC" > > export LINKERB="GCTGGGAACCGTTACCTCATT" > > vectorstrip asis:$SEQ \ > -linkera=$LINKERA \ > -linkerb=$LINKERB \ > -outfile stdout \ > -outseq /dev/null \ > -novectorfile \ > -nobesthits \ > -mismatch 30 > > > Sequence: asis Vector: no_name > 5' sequence matches: > From 138 to 158 with 6 mismatches > 3' sequence matches: > From 351 to 371 with 0 mismatches > Sequences output to file: > from 159 to 350 > CaGNtttttttttttttttttttttttttttttttttttttttttttttt > ttttttttttttAaaaaGaaTTGtttattTACTGAACCNgggCAtAtTaG > aTACACAACCCATTTTaaaTTTAcATcttttAAtTCaaTtTTGAAgTGtt > TTTAcAcAcCCNCNCAAaAaaaaaaaaaTTTGGCATGcAACA > sequence trimmed from 5' end: > ttttcccccccccnntttttttnnnnncccccnnnnnnnnnaaaaAAccC > cTcNCTaTagggCGAGTTggGccCtTCTAGTNtGCATGCtTCGAGcGGcc > cGccAGTgTTGATGGaTaTCTTGCaGaaTTcGcccTTaaTGAggTAACCg > GTTcccAG > sequence trimmed from 3' end: > gCTgGGAACCGTtACCtCATTAAgggCGAAtTCcAGcAcAcTGGCgGCCG > TTACtAAGGGATCCGAGCTcGGNACCAAGnnnngnnnnnnnnnnnnnnnn > nnttntttnntnnnnaaaaa > > needle asis:$SEQ[138:158] asis:$LINKERA stdout -auto > > asis 138 aaTGAggTAACCgGTTcccAG- 158 > |||||||||| |||||||||| > asis 1 AATGAGGTAA-CGGTTCCCAGC 21 > > > Interestingly, in the following aligmnent, the number of mismatches is > 6. But I did not find anything saying that gaps were disallowed in > vectorscript ? > > aaTGAggTAACCgGTTcccAG > ||||||||||| | | || > AATGAGGTAACGGTTCCCAGC > > > I am using emboss through fink (emboss package 4.0.0-2). > > Have a nice day, > > -- > Charles Plessy > http://charles.plessy.org > Wako, Saitama, Japan > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > From pmr at ebi.ac.uk Mon Feb 26 10:15:15 2007 From: pmr at ebi.ac.uk (pmr at ebi.ac.uk) Date: Mon, 26 Feb 2007 15:15:15 -0000 (GMT) Subject: [EMBOSS] Is vectorstrip gapless by design or is it a bug ? In-Reply-To: <20070221080438.GB23932@kunpuu.plessy.org> References: <20070221080438.GB23932@kunpuu.plessy.org> Message-ID: <51293.172.22.68.96.1172502915.squirrel@webmail.ebi.ac.uk> Dear Charles, > In the following example, vectorstrip identifies the first primer with six > mismatches, although it has only two. It means that if I run vectorstrip > with > a -mismatch value lower that 29, I do miss the primer. vectorstrip is indeed gapless by design. The algorithm is rather crude and could be updated. I am currently looking into other vectorstrip issues and now is a good time to ask questions about it. Being gapless, you have to look at the number of mismatches without inserting gaps. I believe it was designed with the asusmption that 5' vector matches would be in good quality sequence. Other change requests I am looking at are: an option -allsequences to report all sequences in the output report (so that web interfaces can more easily parse the output) checking some test cases for possible missed 3' matches better annotation in the fasta format sequence output (does anyone use that?) hope that helps Peter From pmr at ebi.ac.uk Mon Feb 26 11:24:59 2007 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 26 Feb 2007 16:24:59 +0000 Subject: [EMBOSS] Fuzznuc question: how to search complementary strand? In-Reply-To: <8968fc7e0702201213oa8e295fyd336463f8051c75@mail.gmail.com> References: <8968fc7e0702201213oa8e295fyd336463f8051c75@mail.gmail.com> Message-ID: <45E309DB.8010505@ebi.ac.uk> Andres Pinzon wrote: > Hi, > Im trying fuzznuc to search for some patterns in a a genome. > > ...But when I search the complementary strand: > > It reports a pattern on complement that exists, in fact, but on the > forward strand not in complement. > > Am I doing something wrong? I think this is one we patched soon after the 4.0.0 release. There are patches on our FTP server, and a new 4.1.0 release will appear soon with this fix included. > What options do I have to use in order to make fuzznuc to report the > occurrences of "pattern" on both: reverse and complementary strand? -complement is correct. It searched both strands. To search only the complementary strand, use the general EMBOSS option -sreverse and do not specify -complement Hope this helps, Peter From charles-listes-emboss at plessy.org Tue Feb 27 05:31:14 2007 From: charles-listes-emboss at plessy.org (Charles Plessy) Date: Tue, 27 Feb 2007 19:31:14 +0900 Subject: [EMBOSS] Is vectorstrip gapless by design or is it a bug ? In-Reply-To: <51293.172.22.68.96.1172502915.squirrel@webmail.ebi.ac.uk> References: <20070221080438.GB23932@kunpuu.plessy.org> <51293.172.22.68.96.1172502915.squirrel@webmail.ebi.ac.uk> Message-ID: <20070227103114.GF30296@kunpuu.plessy.org> Le Mon, Feb 26, 2007 at 03:15:15PM -0000, pmr at ebi.ac.uk a ?crit : > Dear Charles, > > > In the following example, vectorstrip identifies the first primer with six > > mismatches, although it has only two. It means that if I run vectorstrip > > with > > a -mismatch value lower that 29, I do miss the primer. > > vectorstrip is indeed gapless by design. The algorithm is rather crude and > could be updated. I am currently looking into other vectorstrip issues and > now is a good time to ask questions about it. > > Being gapless, you have to look at the number of mismatches without > inserting gaps. I believe it was designed with the asusmption that 5' > vector matches would be in good quality sequence. Dear Peter and Jon, thank you for your answers. The manual page of vectorstrip says that it "is suitable for use with low quality sequence data", so it would be definitely good to either update the algorithm or the documentation. The way I am using vectorscript is to look for PCR primers in sequence reads from PCR products cloned in TA vectors. In that case, as there is no directionality, I have to write two entries in the vector files. Also, sometimes there are PCR artifacts amplified with one primer only. So maybe an option to deal with this simply could be useful if I am not the only one to use vectorstrip for this? Have a nice day, -- Charles Plessy http://charles.plessy.org Wako, Saitama, Japan From gbottu at ben.vub.ac.be Tue Feb 27 07:27:02 2007 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Tue, 27 Feb 2007 13:27:02 +0100 Subject: [EMBOSS] Is vectorstrip gapless by design or is it a bug ? - Checke In-Reply-To: <20070227103114.GF30296@kunpuu.plessy.org> References: <20070221080438.GB23932@kunpuu.plessy.org> <51293.172.22.68.96.1172502915.squirrel@webmail.ebi.ac.uk> <20070227103114.GF30296@kunpuu.plessy.org> Message-ID: <20070227122702.GA10732@bigben.ulb.ac.be> By the way, have you considered using instead stssearch to search for the primers at the ends of the sequence ? Guy Bottu, Belgian EMBnet Node From charles-listes-emboss at plessy.org Tue Feb 27 19:18:31 2007 From: charles-listes-emboss at plessy.org (Charles Plessy) Date: Wed, 28 Feb 2007 09:18:31 +0900 Subject: [EMBOSS] Is vectorstrip gapless by design or is it a bug ? - Checke In-Reply-To: <20070227122702.GA10732@bigben.ulb.ac.be> References: <20070221080438.GB23932@kunpuu.plessy.org> <51293.172.22.68.96.1172502915.squirrel@webmail.ebi.ac.uk> <20070227103114.GF30296@kunpuu.plessy.org> <20070227122702.GA10732@bigben.ulb.ac.be> Message-ID: <20070228001831.GE20100@kunpuu.plessy.org> Le Tue, Feb 27, 2007 at 01:27:02PM +0100, Guy Bottu a ?crit : > By the way, have you considered using instead stssearch to search > for the primers at the ends of the sequence ? Dear Guy, I have to shamefully admit that I was not aware of stssearch or primersearch. I started to use vectorscreen for finding the arms of my cloning vector, and when using it started to cheat it by substituting primer sequences for the vector arms. I missed the two other tools. I will check if primersearch is gapless, and use it instead of vectorscreen. Many thanks for all the answers, and have a nice day, -- Charles Plessy http://charles.plessy.org Wako, Saitama, Japan From andrespinzon at gmail.com Wed Feb 28 12:20:25 2007 From: andrespinzon at gmail.com (Andres Pinzon) Date: Wed, 28 Feb 2007 12:20:25 -0500 Subject: [EMBOSS] how to get jtranslations using extractfeat? Message-ID: <8968fc7e0702280920md2efee8p2e431bc66c214ef8@mail.gmail.com> Hi, Im trying to get all the "/translation" sequences from a genome embl feature file. I mean, each CD have a translation tag and I need those translations in a fasta file. I've tried all possible combinations of -type -tag but i can not get the translated sequences, but the DNA sequences. This is the (basic) command I run: ================= extractfeat -sequence xoo-maff.embl -type CDS -outseq myseq -tag translation ================ Is it possible to get this translated sequences from the feature file? Or do I have to get the corresponding CDS DNA sequences and then translate them? Thanks in advance. -- Andr?s Pinz?n cPh.D. http://groups.google.com/group/Bioinformatica_es Bioinformatics Center, Colombia EMBnet node http://bioinf.ibun.unal.edu.co Tel +57 3165000 ext 16961 Fax +571 3165415 Micology and Phytopathology Laboratory - Los Andes University. Tel +571 3394949 ext. 2768 From pmr at ebi.ac.uk Wed Feb 28 12:57:54 2007 From: pmr at ebi.ac.uk (Peter Rice) Date: Wed, 28 Feb 2007 17:57:54 +0000 Subject: [EMBOSS] how to get jtranslations using extractfeat? In-Reply-To: <8968fc7e0702280920md2efee8p2e431bc66c214ef8@mail.gmail.com> References: <8968fc7e0702280920md2efee8p2e431bc66c214ef8@mail.gmail.com> Message-ID: <45E5C2A2.9060006@ebi.ac.uk> Andres Pinzon wrote: > Hi, > Im trying to get all the "/translation" sequences from a genome embl > feature file. > I mean, each CD have a translation tag and I need those translations > in a fasta file. I've tried all possible combinations of -type -tag > but i can not get the translated sequences, but the DNA sequences. > > Is it possible to get this translated sequences from the feature file? > Or do I have to get the corresponding CDS DNA sequences and then translate them? Good suggestion ... we can try to make a new application. The /translation tag is rather special (because the value is a real sequence) ... also it may have a different name in some databases or feature file formats. We will need to make up names for each translation (sequence identifiers, and something derived from the feature table) like the names used by extractfeat. Alternate splicing will make it difficult to create reliable unique names. Extractfeat does have the same problem - and nobody has complained. If we keep a table of names so far we can add something to the end of any duplicates. Extracttrans is a possible name for the program. regards, Peter From andrespinzon at gmail.com Wed Feb 28 14:46:43 2007 From: andrespinzon at gmail.com (Andres Pinzon) Date: Wed, 28 Feb 2007 14:46:43 -0500 Subject: [EMBOSS] Modify description in coderet output file. Message-ID: <8968fc7e0702281146m4f3124aeq9a6f92ad011b4fa9@mail.gmail.com> Hi I realized that coderet can extract the translation sequences succesfully (on a previous message I was asking for this feature using extractfeat) BUT theres a problem, coderet puts its own description on the fasta header for each sequences, so if there are 1000 translations it puts something like: >apo0009686_prot1 >apo0009686_prot2 >apo0009686_prot3 . . . >apo0009686_protn And it would be pretty useful to have the protein_id for each sequence, instead of that descriptors. Is it possible? Regards, -- Andr?s Pinz?n cPh.D. http://groups.google.com/group/Bioinformatica_es Bioinformatics Center, Colombia EMBnet node http://bioinf.ibun.unal.edu.co Tel +57 3165000 ext 16961 Fax +571 3165415 Micology and Phytopathology Laboratory - Los Andes University. Tel +571 3394949 ext. 2768 From fangw at CLEMSON.EDU Wed Feb 28 18:47:45 2007 From: fangw at CLEMSON.EDU (fangw at CLEMSON.EDU) Date: Wed, 28 Feb 2007 18:47:45 -0500 (EST) Subject: [EMBOSS] question about translation start stie In-Reply-To: References: Message-ID: <4537.130.127.150.224.1172706465.squirrel@wm.clemson.edu> Hello, Everyone: Does anyone know if EMBOSS could give us the translation start site and translation start site ? Thanks! Looking forward to your reply. Nice day, Fang