From charles-listes-emboss at plessy.org Fri Jun 1 01:52:04 2007 From: charles-listes-emboss at plessy.org (charles-listes-emboss at plessy.org) Date: Fri, 1 Jun 2007 14:52:04 +0900 Subject: [EMBOSS] Indexing the ID field of EMBL-formatted databases. In-Reply-To: <465D7D17.3020006@ebi.ac.uk> References: <20070529083710.GH2487@kunpuu.plessy.org> <465D7D17.3020006@ebi.ac.uk> Message-ID: <20070601055204.GA26344@kunpuu.plessy.org> Le Wed, May 30, 2007 at 02:33:11PM +0100, Peter Rice a ?crit : > > Aha ... mirbase is in EMBL format .. except the IDs are in lower case. All other > EMBL/UniProt databases are in upper case. Dear Peter, I tried to index RepBase today, and I ran into another problem: not only the IDs are lower case, but also it does not provide accession numbers. I used the following for indexing: dbxflat -dbname repbase\ -dbresource embl\ -idformat EMBL\ -filenames '*ref'\ -directory .\ -fields id,org,key,des\ -release 12.04 DB repbase [ type: N format: embl method: emboss directory: /home/charles/databases/RepBase12.04.embl file: *.ref fields: "id key des org" comment: "Repeats" ] However, seqret still complains that the AC field is not indexed: gslc12?RepBase12.04.embl?$ seqret repbase:RLTR19_MM Reads and writes (returns) sequences EMBOSS An error in ajindex.c at line 3027: Cannot open param file /home/charles/databases/RepBase12.04.embl/repbase.pxac I will rebuild emboss with the patched version of dbiflat and see if it can work instead. Have a nice day, -- Charles From pmr at ebi.ac.uk Fri Jun 1 03:13:47 2007 From: pmr at ebi.ac.uk (pmr at ebi.ac.uk) Date: Fri, 1 Jun 2007 08:13:47 +0100 (BST) Subject: [EMBOSS] Indexing the ID field of EMBL-formatted databases. In-Reply-To: <20070601055204.GA26344@kunpuu.plessy.org> References: <20070529083710.GH2487@kunpuu.plessy.org> <465D7D17.3020006@ebi.ac.uk> <20070601055204.GA26344@kunpuu.plessy.org> Message-ID: <41165.86.141.180.197.1180682027.squirrel@webmail.ebi.ac.uk> Dear Charles, > I tried to index RepBase today, and I ran into another problem: not only > the IDs are lower case, but also it does not provide accession numbers. > I used the following for indexing: > > dbxflat -dbname repbase\ > -dbresource embl\ > -idformat EMBL\ > -filenames '*ref'\ > -directory .\ > -fields id,org,key,des\ > -release 12.04 > > DB repbase [ > type: N > format: embl > method: emboss > directory: /home/charles/databases/RepBase12.04.embl > file: *.ref > fields: "id key des org" > comment: "Repeats" > ] > > However, seqret still complains that the AC field is not indexed: > > gslc12???RepBase12.04.embl???$ seqret repbase:RLTR19_MM > Reads and writes (returns) sequences > > EMBOSS An error in ajindex.c at line 3027: > Cannot open param file > /home/charles/databases/RepBase12.04.embl/repbase.pxac We have part of the solution. Database definitions can have the extra attribute hasaccession: "N" but this is only used when accessing SRS servers at present (the original problem was with sequences from the PDB database which has no accessions). The fields attribute defines additional fields, but cannot "turn off" the accession index. We need to test for this in the other access methods in ajseqdb.c I will do that today. Thanks for the bug (feature) report. regards, Peter From charles-listes-emboss at plessy.org Fri Jun 1 04:37:28 2007 From: charles-listes-emboss at plessy.org (Charles Plessy) Date: Fri, 1 Jun 2007 17:37:28 +0900 Subject: [EMBOSS] Indexing the ID field of EMBL-formatted databases. In-Reply-To: <465D7D17.3020006@ebi.ac.uk> References: <20070529083710.GH2487@kunpuu.plessy.org> <465D7D17.3020006@ebi.ac.uk> Message-ID: <20070601083728.GA29855@kunpuu.plessy.org> Le Wed, May 30, 2007 at 02:33:11PM +0100, Peter Rice a ?crit : > in emboss/dbiflat.c function dbiflat_ParseEmbl, add a conversion to upper case: > > if(lineType == FLATTYPE_ID) > { > ajRegExec(regEmblId, rline); > ajRegSubI(regEmblId, 1, myid); > ajStrFmtUpper(&myid); > ajDebug("++id '%S'\n", *myid); > ajRegSubI(regEmblId, 3, &tmpfd); Dear Peter, I tried to apply the following patch: --- ./emboss/dbiflat.c.old 2007-06-01 15:00:55.000000000 +0900 +++ ./emboss/dbiflat.c 2007-06-01 15:03:59.000000000 +0900 @@ -739,6 +739,7 @@ { ajRegExec(regEmblId, rline); ajRegSubI(regEmblId, 1, myid); + ajStrFmtUpper(&myid); ajDebug("++id '%S'\n", *myid); ajRegSubI(regEmblId, 3, &tmpfd); if(svnfield >= 0 && ajStrGetLen(tmpfd)) But now dbiflat segfaults... (I am running Debian GNU/Linux 4.0 on a iMac G5). regards, -- Charles From pmr at ebi.ac.uk Fri Jun 1 05:31:41 2007 From: pmr at ebi.ac.uk (Peter Rice) Date: Fri, 01 Jun 2007 10:31:41 +0100 Subject: [EMBOSS] Indexing the ID field of EMBL-formatted databases. In-Reply-To: <41165.86.141.180.197.1180682027.squirrel@webmail.ebi.ac.uk> References: <20070529083710.GH2487@kunpuu.plessy.org> <465D7D17.3020006@ebi.ac.uk> <20070601055204.GA26344@kunpuu.plessy.org> <41165.86.141.180.197.1180682027.squirrel@webmail.ebi.ac.uk> Message-ID: <465FE77D.4020000@ebi.ac.uk> Dear Charles, > I will do that today. Thanks for the bug (feature) report. Looking good ... with hasaccession: "N" defined I no whave code that will not try to use the accession number index. If you try to specify "-acc:" in the query you will get an error message saying that the database does not support accession searches. To be included in release 5.0.0 in July. regards, Peter Rice From pmr at ebi.ac.uk Fri Jun 1 06:51:22 2007 From: pmr at ebi.ac.uk (Peter Rice) Date: Fri, 01 Jun 2007 11:51:22 +0100 Subject: [EMBOSS] Indexing the ID field of EMBL-formatted databases. In-Reply-To: <20070601083728.GA29855@kunpuu.plessy.org> References: <20070529083710.GH2487@kunpuu.plessy.org> <465D7D17.3020006@ebi.ac.uk> <20070601083728.GA29855@kunpuu.plessy.org> Message-ID: <465FFA2A.1060202@ebi.ac.uk> Dear Charles, > I tried to apply the following patch: > ajRegExec(regEmblId, rline); > ajRegSubI(regEmblId, 1, myid); > + ajStrFmtUpper(&myid); Oops. Sorry, must have pasted in from the wrong place. ajStrFmtUpper(myid); regards, Peter From pmr at ebi.ac.uk Fri Jun 1 12:17:06 2007 From: pmr at ebi.ac.uk (Peter Rice) Date: Fri, 01 Jun 2007 17:17:06 +0100 Subject: [EMBOSS] Segmentation fault with multiple similarity matricies in fdnadist In-Reply-To: <200705291714.25677.hjenkins@uvic.ca> References: <200705291714.25677.hjenkins@uvic.ca> Message-ID: <46604682.8020603@ebi.ac.uk> dear hazel, Hazel Hartman Jenkins wrote: > If I run the following command; > fneighbor -datafile tinytest.dat -replicates y -outfile filefrom.fnb > then everything works. > > If, however, my tinytest.phy contains two similarity matricies (or, for > that matter, the one hundred bootstrap replicates written by fdnadist by > default), like this; > 3 > 1187Aquife 0.000000 0.368385 0.404489 > BB213b06 0.368385 0.000000 0.151182 > BB269b06 0.404489 0.151182 0.000000 > 3 > 1187Aquife 0.000000 0.368385 0.404489 > BB213b06 0.368385 0.000000 0.151182 > BB269b06 0.404489 0.151182 0.000000 > > then fdnadist returns; > > Phylogenies from distance matrix by N-J or UPGMA method > Segmentation fault > Ah, firstly that is a bug in reading distance matrices. The reading should stop after the first 3 rows of data. That is easy to fix. I will look into handling multiple distance matrices in one file ... there seems to be some incomplete code in neighbour but at least EMBOSS should gracefully load them. That will take a little more effort because it involves some changes to our port of neighbour, but is not difficult (he says confidently!) We also need to update the phylip code in EMBOSS to the latest release. It would help to know how many users are working with the EMBOSS phylip. regards, Peter Rice From Nicolas.Roggli at molbio.unige.ch Fri Jun 1 04:28:07 2007 From: Nicolas.Roggli at molbio.unige.ch (nicolas roggli) Date: Fri, 01 Jun 2007 10:28:07 +0200 Subject: [EMBOSS] Can't get seqmatchall to run, please help me Message-ID: hello I have this list of files that I want to process with seqmatchall. -rw-r--r-- 1 nicolas user 3394 Jun 1 02:29 rtt109_h_puta. -rw-r--r-- 1 nicolas user 2692 Jun 1 02:29 rtt109_h_puta_1. -rw-r--r-- 1 nicolas user 3529 Jun 1 02:29 rtt109_h_puta_10. -rw-r--r-- 1 nicolas user 3603 Jun 1 02:29 rtt109_h_puta_11. -rw-r--r-- 1 nicolas user 3687 Jun 1 02:29 rtt109_h_puta_12. -rw-r--r-- 1 nicolas user 2707 Jun 1 02:29 rtt109_h_puta_2. -rw-r--r-- 1 nicolas user 2691 Jun 1 02:29 rtt109_h_puta_3. -rw-r--r-- 1 nicolas user 2681 Jun 1 02:29 rtt109_h_puta_4. -rw-r--r-- 1 nicolas user 2666 Jun 1 02:29 rtt109_h_puta_5. -rw-r--r-- 1 nicolas user 2963 Jun 1 02:29 rtt109_h_puta_6. -rw-r--r-- 1 nicolas user 2669 Jun 1 02:29 rtt109_h_puta_7. -rw-r--r-- 1 nicolas user 2966 Jun 1 02:29 rtt109_h_puta_8. -rw-r--r-- 1 nicolas user 3817 Jun 1 02:29 rtt109_h_puta_9. I do this % seqmatchall All-against-all comparison of a set of sequences Input sequence set: rtt109_h_puta* Segmentation fault (core dumped) this is the core -rw-r--r-- 1 nicolas user 2490368 Jun 1 03:15 core What I am doing wrong? What is the segmentation fault? Thanks for any help nicolas From david.bauer at bayerhealthcare.com Fri Jun 1 17:14:59 2007 From: david.bauer at bayerhealthcare.com (david.bauer at bayerhealthcare.com) Date: Fri, 1 Jun 2007 23:14:59 +0200 Subject: [EMBOSS] Antwort: Can't get seqmatchall to run, please help me In-Reply-To: Message-ID: Hi Nicolas, seqmatchall expects as input a "seqset". This means that all sequences must be in one file. If your individual sequences are in fasta format just create one fasta file with all sequences. The coredump just means that something went wrong during program execution and the program crashed. You can delete the coredump file. HTH, David. emboss-bounces at lists.open-bio.org schrieb am 01/06/2007 10:28:07: > hello > I have this list of files that I want to process with seqmatchall. > > -rw-r--r-- 1 nicolas user 3394 Jun 1 02:29 rtt109_h_puta. > -rw-r--r-- 1 nicolas user 2692 Jun 1 02:29 rtt109_h_puta_1. > -rw-r--r-- 1 nicolas user 3529 Jun 1 02:29 rtt109_h_puta_10. > -rw-r--r-- 1 nicolas user 3603 Jun 1 02:29 rtt109_h_puta_11. > -rw-r--r-- 1 nicolas user 3687 Jun 1 02:29 rtt109_h_puta_12. > -rw-r--r-- 1 nicolas user 2707 Jun 1 02:29 rtt109_h_puta_2. > -rw-r--r-- 1 nicolas user 2691 Jun 1 02:29 rtt109_h_puta_3. > -rw-r--r-- 1 nicolas user 2681 Jun 1 02:29 rtt109_h_puta_4. > -rw-r--r-- 1 nicolas user 2666 Jun 1 02:29 rtt109_h_puta_5. > -rw-r--r-- 1 nicolas user 2963 Jun 1 02:29 rtt109_h_puta_6. > -rw-r--r-- 1 nicolas user 2669 Jun 1 02:29 rtt109_h_puta_7. > -rw-r--r-- 1 nicolas user 2966 Jun 1 02:29 rtt109_h_puta_8. > -rw-r--r-- 1 nicolas user 3817 Jun 1 02:29 rtt109_h_puta_9. > > I do this > % seqmatchall > All-against-all comparison of a set of sequences > Input sequence set: rtt109_h_puta* > Segmentation fault (core dumped) > > this is the core > -rw-r--r-- 1 nicolas user 2490368 Jun 1 03:15 core > > What I am doing wrong? What is the segmentation fault? > > Thanks for any help > > nicolas > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss From wgallin at ualberta.ca Fri Jun 1 17:18:49 2007 From: wgallin at ualberta.ca (Warren Gallin) Date: Fri, 1 Jun 2007 15:18:49 -0600 Subject: [EMBOSS] Running Problem Message-ID: Hi, I just downloaded and compiled the latest stable EMBOSS package. The compiling went alright, but when I try to run a program I get the following message: bio-c172:~/Desktop/EMBOSS-4.1.0 wgallin$ needle dyld: Library not loaded: /usr/local/lib/libpng12.0.dylib Referenced from: /usr/local/bin/needle Reason: Incompatible library version: needle requires version 19.0.0 or later, but libpng12.0.dylib provides version 0.1.2 Trace/BPT trap At first I thought that I had an outdated libpng, but I downloaded the latest one from their site, did a clean configuration and make and got the same message. Can anyone point me to how to fix this problem? I am working on a Mac G4 powerbook (867 MHz) running OS X 10.4.9, with 384 MB RAM. All help and suggestions are welcome. Warren Gallin From kvddrift at earthlink.net Fri Jun 1 18:44:16 2007 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri, 1 Jun 2007 18:44:16 -0400 Subject: [EMBOSS] Running Problem In-Reply-To: References: Message-ID: Warren, Have you considered using fink to install emboss on your Mac? It will take care of installing all the right required libs, including libpng. - Koen. On Jun 1, 2007, at 5:18 PM, Warren Gallin wrote: > Hi, > > I just downloaded and compiled the latest stable EMBOSS package. > The compiling went alright, but when I try to run a program I get the > following message: > > bio-c172:~/Desktop/EMBOSS-4.1.0 wgallin$ needle > dyld: Library not loaded: /usr/local/lib/libpng12.0.dylib > Referenced from: /usr/local/bin/needle > Reason: Incompatible library version: needle requires version > 19.0.0 or later, but libpng12.0.dylib provides version 0.1.2 > Trace/BPT trap > > At first I thought that I had an outdated libpng, but I downloaded > the latest one from their site, did a clean configuration and make > and got the same message. > > Can anyone point me to how to fix this problem? > > I am working on a Mac G4 powerbook (867 MHz) running OS X 10.4.9, > with 384 MB RAM. > > All help and suggestions are welcome. > > Warren Gallin > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss From ajb at ebi.ac.uk Fri Jun 1 19:02:43 2007 From: ajb at ebi.ac.uk (ajb at ebi.ac.uk) Date: Sat, 2 Jun 2007 00:02:43 +0100 (BST) Subject: [EMBOSS] Running Problem In-Reply-To: References: Message-ID: <37118.81.98.241.17.1180738963.squirrel@webmail.ebi.ac.uk> Hello Warren, It is the GD library that accesses libpng so it may be worthwhile for you to check whether that's up to date. You are, of course, correct in doing a 'make clean' and configuring again after such changes. HTH Alan Bleasby EBI > Hi, > > I just downloaded and compiled the latest stable EMBOSS package. > The compiling went alright, but when I try to run a program I get the > following message: > > bio-c172:~/Desktop/EMBOSS-4.1.0 wgallin$ needle > dyld: Library not loaded: /usr/local/lib/libpng12.0.dylib > Referenced from: /usr/local/bin/needle > Reason: Incompatible library version: needle requires version > 19.0.0 or later, but libpng12.0.dylib provides version 0.1.2 > Trace/BPT trap > > At first I thought that I had an outdated libpng, but I downloaded > the latest one from their site, did a clean configuration and make > and got the same message. > > Can anyone point me to how to fix this problem? > > I am working on a Mac G4 powerbook (867 MHz) running OS X 10.4.9, > with 384 MB RAM. > > All help and suggestions are welcome. > > Warren Gallin > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > From hjenkins at uvic.ca Fri Jun 1 20:08:40 2007 From: hjenkins at uvic.ca (Hazel Hartman Jenkins) Date: Fri, 1 Jun 2007 18:08:40 -0600 Subject: [EMBOSS] Segmentation fault with multiple similarity matricies in fneighbor In-Reply-To: <56623.84.92.187.247.1180707409.squirrel@webmail.ebi.ac.uk> References: <200705291714.25677.hjenkins@uvic.ca> <56623.84.92.187.247.1180707409.squirrel@webmail.ebi.ac.uk> Message-ID: <200706011808.40798.hjenkins@uvic.ca> Dear List, Hazel Hartman Jenkins wrote: [corrected] > If I run the following command; > fneighbor -datafile tinytest.dat -replicates y -outfile filefrom.fnb > then everything works. > > If, however, my tinytest.dat contains two ?similarity matricies (or, for > that matter, the one hundred bootstrap replicates written by fdnadist by > default), like this; > ? ? 3 > 1187Aquife ?0.000000 ?0.368385 ?0.404489 > BB213b06 ? ?0.368385 ?0.000000 ?0.151182 > BB269b06 ? ?0.404489 ?0.151182 ?0.000000 > ? ? 3 > 1187Aquife ?0.000000 ?0.368385 ?0.404489 > BB213b06 ? ?0.368385 ?0.000000 ?0.151182 > BB269b06 ? ?0.404489 ?0.151182 ?0.000000 > > then fneighbor returns; > > Phylogenies from distance matrix by N-J or UPGMA method > Segmentation fault > fneighbour (and ffitch and fkitsch - they also have this bug) should definitely support multiple input matrices, as the original Phylip routines do. It is a very desirable trait because it is needed to create bootstrap values for trees built from distance matrix data. The desired behaviour is for fneighbor (and ffitch and fkitsch) to accept input files containing multiple distance matrices and produce multiple trees from them, in standard nested-parenthesis notation, which can then be read by fconsense. The reading should not stop at the end of the first distance matrix, or the fault will become silent, and the user familiar with Phylip may not notice that the extra matrices have been dropped until many processing steps later. I'll describe why it should work that way in a little more detail by describing the way in which I've used the functionality. The first step in making a tree with bootstrap values is to create multiple pseudo-sequences assembled from random samples (with replacement) of the genetic sequences you want to make into a tree. By default, both Seqboot (Phylip) and fseqboot (EMBASSY) give one hundred pseudo-sequences. The next step is to make one hundred slightly different trees Some methods build trees directly from the sequence data. The methods implemented by Neighbor, Fitch, and Kitsch all build trees from distance matrices. So first you have to make the hundred distance matrices. The distance matrices are calculated from the sequence data using DNAdist. In EMBASSY, fdnadist calculates one hundred distance matrices from the hundred pseudo-sequence datasets faultlessly. Now comes the problem. In Phylip you can feed the hundred-distance-matrices output from DNAdist directly into Neighbor (or Fitch or Kitsch), and build your one hundred trees in one command. EMBASSY currently will only build one at a time; this is inconvenient. The last step feeds the file containing 100 trees into Consense. Consense to labels each possible subtree (group all on one branch) with the number (percentage) of subsamples which include it. You now have bootstrap values ready to tag onto a tree (which is calculated separately from /all/ of the sequence data). I'm afraid I don't know of anyone else using EMBOSS Phylip, but if I can get it to work I'll pass my script along with my recommendation. I find it easier to script than Phylip. Please e-mail me with any questions, or for specific Phylip/EMBASSY scripts. I have some knowledge of C++, and I'm willing to help with the coding; but I warn that I'm new to development. Regards, Hazel Jenkins From charles-listes-emboss at plessy.org Mon Jun 4 02:24:17 2007 From: charles-listes-emboss at plessy.org (Charles Plessy) Date: Mon, 4 Jun 2007 15:24:17 +0900 Subject: [EMBOSS] Display problems with dottup. Message-ID: <20070604062417.GD19895@kunpuu.plessy.org> Dear developpers, I think that I found a bug: dottup tembl:eclac[1:1000] tembl:eclaci -wordsize=6 -gtitle="eclaci vs eclac" (modified from the example from http://emboss.sourceforge.net/apps/release/4.1/emboss/apps/dottup.html ) produces the same graph as the original command without [1:1000], except that the absent sequences have been replaced by white space. I would have expected the frame to shrink accordingly. I am using EMBOSS 4.1 with the latest fixes applied (ftp://emboss.open-bio.org/pub/EMBOSS/fixes). Have a nice day, -- Charles Plessy http://charles.plessy.org Wako, Saitama, Japan From pmr at ebi.ac.uk Mon Jun 4 10:59:21 2007 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 04 Jun 2007 15:59:21 +0100 Subject: [EMBOSS] Segmentation fault with multiple similarity matricies in fneighbor In-Reply-To: <200706011808.40798.hjenkins@uvic.ca> References: <200705291714.25677.hjenkins@uvic.ca> <56623.84.92.187.247.1180707409.squirrel@webmail.ebi.ac.uk> <200706011808.40798.hjenkins@uvic.ca> Message-ID: <466428C9.5070002@ebi.ac.uk> Dear Hazel, > fneighbour (and ffitch and fkitsch - they also have this bug) should > definitely support multiple input matrices, as the original Phylip routines > do. It is a very desirable trait because it is needed to create bootstrap > values for trees built from distance matrix data. I have a fix ... but would like more input data to test. Can you send me an example input file to run through both EMBASSY fneighbor and the original neighbor. (Better still, an example set of pseudo sequences to run through the remainder of your script) > The desired behaviour is for fneighbor (and ffitch and fkitsch) to accept > input files containing multiple distance matrices and produce multiple trees > from them, in standard nested-parenthesis notation, which can then be read by > fconsense. > > The reading should not stop at the end of the first distance matrix, or the > fault will become silent, and the user familiar with Phylip may not notice > that the extra matrices have been dropped until many processing steps later. Yes indeed. The fix reads all distance inputs as multiple sets (we already read tree data as multiple trees) and processes each in turn. regards, Peter Rice From pmr at ebi.ac.uk Mon Jun 4 11:28:04 2007 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 04 Jun 2007 16:28:04 +0100 Subject: [EMBOSS] Can't get seqmatchall to run, please help me In-Reply-To: References: Message-ID: <46642F84.7000507@ebi.ac.uk> nicolas roggli wrote: > % seqmatchall > All-against-all comparison of a set of sequences > Input sequence set: rtt109_h_puta* > Segmentation fault (core dumped) seqmatchall works for me, with a set of FASTA format seqeunces. I suspect there is a problem in reading one of the sequence files. Is "tt109_h_puta." a real sequence file? (the other files all have numbers) If you run with -debug on the command line the program will create a file seqmatchall.dbg. If you sent this file to me I can see where the fault occurs. regards, Peter Rice From pmr at ebi.ac.uk Mon Jun 4 11:32:42 2007 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 04 Jun 2007 16:32:42 +0100 Subject: [EMBOSS] Antwort: Can't get seqmatchall to run, please help me In-Reply-To: References: Message-ID: <4664309A.3080403@ebi.ac.uk> david.bauer at bayerhealthcare.com wrote: > Hi Nicolas, > > seqmatchall expects as input a "seqset". This means that all sequences > must be in one file. > emboss-bounces at lists.open-bio.org schrieb am 01/06/2007 10:28:07: >> I do this >> % seqmatchall >> All-against-all comparison of a set of sequences >> Input sequence set: rtt109_h_puta* >> Segmentation fault (core dumped) The rtt109_h_puta* syntax is correct - seqmatchall will read all files that match the wildcard. "seqset" means that all sequences are loaded into memory. It is used for all-against-all comparisons and multiple alignment. The alternative, "seqall", means that sequences are read one at a time. This is used for analysing each sequence in a dataset, searching databases, and other algorithms where we do not need all the sequences at the same time. Hope that helps, Peter Rice From fernan at iib.unsam.edu.ar Mon Jun 4 10:42:44 2007 From: fernan at iib.unsam.edu.ar (Fernan Aguero) Date: Mon, 4 Jun 2007 11:42:44 -0300 Subject: [EMBOSS] antigenic + valid ambiguity (aa residue) codes (BUG?) Message-ID: <20070604144244.GC8900@iib.unsam.edu.ar> Hi! we're running antigenic on a number of sequences which contain some ambiguous residues. It seems like antigenic doesn't like the '*', 'B', 'U', 'Z' and 'X' characters in protein sequences. This is weird because then we're left out of choices to represent 'unknown' residues. 'X' is pretty standard to mean 'any aminoacid', while 'B' and 'Z' are used as ambiguity codes by some programs to mean (glutamate/glutamine, aspartate/asparragine). It's also weird because antigenic silently takes in a sequence in which we replaced one aminoacid within an antigenic epitope with an 'O' (a non-existent aminoacid code). But it strips it off the sequence, shortening the length of the sequence and thus shifting all epitope positions downstream. It's also weird because when we replace the 'O' for another non-existing aminoacid code ('J') antigenic chokes: 'Sequence is not a protein'. Does this happen with other programs that use protein sequences as input? I guess this is a bug ... the behaviour should be consistent and either take all valid aminoacid codes or none (and leaving space for 'X'). Thanks in advance, Fernan From pmr at ebi.ac.uk Mon Jun 4 12:00:59 2007 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 04 Jun 2007 17:00:59 +0100 Subject: [EMBOSS] Display problems with dottup. In-Reply-To: <20070604062417.GD19895@kunpuu.plessy.org> References: <20070604062417.GD19895@kunpuu.plessy.org> Message-ID: <4664373B.5060308@ebi.ac.uk> Charles Plessy wrote: > Dear developpers, > > I think that I found a bug: > > dottup tembl:eclac[1:1000] tembl:eclaci -wordsize=6 -gtitle="eclaci vs eclac" > > (modified from the example from > http://emboss.sourceforge.net/apps/release/4.1/emboss/apps/dottup.html ) > > produces the same graph as the original command without [1:1000], except > that the absent sequences have been replaced by white space. I would > have expected the frame to shrink accordingly. I will fix it for the next release. I would have expected the frame to grow (there is room for a bigger plot with only 1000 bases to show in both directions :-) regards, Peter From pmr at ebi.ac.uk Mon Jun 4 12:48:45 2007 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 04 Jun 2007 17:48:45 +0100 Subject: [EMBOSS] antigenic + valid ambiguity (aa residue) codes (BUG?) In-Reply-To: <20070604144244.GC8900@iib.unsam.edu.ar> References: <20070604144244.GC8900@iib.unsam.edu.ar> Message-ID: <4664426D.6060108@ebi.ac.uk> Fernan Aguero wrote: > we're running antigenic on a number of sequences > which contain some ambiguous residues. > > It seems like antigenic doesn't like the '*', 'B', 'U', 'Z' and > 'X' characters in protein sequences. Which version of EMBOSS are you running? O is now a valid amino acid character. In earlier releases it was treated as a phylip gap character. The algorithm in antigenic uses a published table that only has values for the 20 naturally occurring amino acids. We can add average values for ambiguity codes (weighted). We have no data for U and O, but we can convert them to X. In the next release, antigenic will accept any protein sequence. regards, Peter Rice From fernan at iib.unsam.edu.ar Mon Jun 4 12:26:23 2007 From: fernan at iib.unsam.edu.ar (Fernan Aguero) Date: Mon, 4 Jun 2007 13:26:23 -0300 Subject: [EMBOSS] antigenic + valid ambiguity (aa residue) codes (BUG?) In-Reply-To: <4664426D.6060108@ebi.ac.uk> References: <20070604144244.GC8900@iib.unsam.edu.ar> <4664426D.6060108@ebi.ac.uk> Message-ID: <20070604162623.GD8900@iib.unsam.edu.ar> +----[ Peter Rice (04.Jun.2007 13:18): | | Fernan Aguero wrote: | > we're running antigenic on a number of sequences | > which contain some ambiguous residues. | > | > It seems like antigenic doesn't like the '*', 'B', 'U', 'Z' and | > 'X' characters in protein sequences. | | Which version of EMBOSS are you running? 4.0.0 from the Rocks Bio Roll (x86_64). | O is now a valid amino acid character. | In earlier releases it was treated as a phylip gap character. | | The algorithm in antigenic uses a published table that only has values for the | 20 naturally occurring amino acids. | | We can add average values for ambiguity codes (weighted). We have no data for U | and O, but we can convert them to X. | | In the next release, antigenic will accept any protein sequence. | | regards, | +----] Peter, Thanks for the prompt reply. Are the changes to antigenic (at least some) already in CVS? Fernan From pmr at ebi.ac.uk Mon Jun 4 14:07:16 2007 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 04 Jun 2007 19:07:16 +0100 Subject: [EMBOSS] antigenic + valid ambiguity (aa residue) codes (BUG?) In-Reply-To: <20070604162623.GD8900@iib.unsam.edu.ar> References: <20070604144244.GC8900@iib.unsam.edu.ar> <4664426D.6060108@ebi.ac.uk> <20070604162623.GD8900@iib.unsam.edu.ar> Message-ID: <466454D4.8030901@ebi.ac.uk> Fernan Aguero wrote: > Thanks for the prompt reply. Are the changes to antigenic > (at least some) already in CVS? I need more testing before I commit to CVS. Probably some time tomorrow. I have to complete fixing the dottup bug (dottup is fixed, I need to check the other dotplot applications) and make sure everything passes QA tests first. If you send me a protein sequence I can check what the modified antigenic produces. regards, Peter Rice From khayeni at bioinf.wits.ac.za Tue Jun 5 05:53:14 2007 From: khayeni at bioinf.wits.ac.za (khayeni) Date: Tue, 05 Jun 2007 11:53:14 +0200 Subject: [EMBOSS] fparse - wEMBOSS input format issues Message-ID: <1181037194.11549.2.camel@localhost.localdomain> Good day A user at the University of the Witwatersrand is trying to use the fparse program, version 3.6b. The input file as follows works fine and produces output. 5 6 Alpha 110110 Beta 110000 Gamma 100110 Delta 011001 Epsilon 001110 The program manual states that the program can accept as input up to 8 different states, but if the above input is changed to include more alternative states as follows, the program produces an error. Modified input - three possible states 5 6 Alpha 120110 Beta 120000 Gamma 120110 Delta 011021 Epsilon 021110 Error: Error: Bad discrete states file 'sample.dat': read 10 states for 'Alpha', expected 6 Error: Unable to read discrete states from 'sample.dat' Died: fpars terminated: Bad value for '-infile' with -auto defined fpars exited with status 1... What could be the reason for this? The user also stated that when using a Windows based stand alone implementation of the pars program the input file worked fine. Any help would be appreciated. Kind Regards This communication is intended for the addressee only. It is confidential. If you have received this communication in error, please notify us immediately and destroy the original message. You may not copy or disseminate this communication without the permission of the University. Only authorized signatories are competent to enter into agreements on behalf of the University and recipients are thus advised that the content of this message may not be legally binding on the University and may contain the personal views and opinions of the author, which are not necessarily the views and opinions of The University of the Witwatersrand, Johannesburg. All agreements between the University and outsiders are subject to South African Law unless the University agrees in writing to the contrary. From tshtatland at gmail.com Tue Jun 5 12:09:32 2007 From: tshtatland at gmail.com (Timur Shtatland) Date: Tue, 5 Jun 2007 12:09:32 -0400 Subject: [EMBOSS] sorting and/or filtering EMBOSS water output by score Message-ID: Hi, Can EMBOSS water produce the output in which the hits are reverse sorted and/or filtered by score? For example, I would like to use something like this to order alignments from best to worst, and display only alignments above a certain minimum score: water [other options] -orderby score -scoremin 30.0 By default, alignments are displayed in the order of sequence occurrence in the input file, and all alignments are shown. Thank you. Timur Shtatland From sbassi at gmail.com Tue Jun 5 19:30:03 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Tue, 5 Jun 2007 19:30:03 -0400 Subject: [EMBOSS] Problem installing Phylip Message-ID: I downloaded phylipnew3.6 from emboss website, configure OK, but make gave me this error: (...) then mv -f ".deps/phylip.Tpo" ".deps/phylip.Po"; else rm -f ".deps/phylip.Tpo"; exit 1; fi make[1]: *** No rule to make target `../../../nucleus/libnucleus.la', needed by `fclique'. Stop. make[1]: Leaving directory `/home/dnalinux/bioinfo/PHYLIPNEW-3.6b/src' make: *** [all-recursive] Error 1 Best, SB. -- Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From sbassi at gmail.com Tue Jun 5 19:43:04 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Tue, 5 Jun 2007 19:43:04 -0400 Subject: [EMBOSS] Phylip installation problem solved Message-ID: Phylip was not in embassy directory, so I moved it there, configure and make install without any problem. Best, SB From ajb at ebi.ac.uk Wed Jun 6 03:34:42 2007 From: ajb at ebi.ac.uk (ajb at ebi.ac.uk) Date: Wed, 6 Jun 2007 08:34:42 +0100 (BST) Subject: [EMBOSS] Problem installing Phylip In-Reply-To: References: Message-ID: <51971.81.98.241.17.1181115282.squirrel@webmail.ebi.ac.uk> Hello Sebastian, The appended is from the FAQ file, see section 'b)'. > make[1]: *** No rule to make target `../../../nucleus/libnucleus.la', > needed by `fclique'. Stop. > make[1]: Leaving directory `/home/dnalinux/bioinfo/PHYLIPNEW-3.6b/src' It looks like you did not configure using the same --prefix command you used for configuring the main package. HTH Alan Q) Installing associated software PHYLIP A) a) from the anonymous cvs code. 1) Go to the phylip directory cd embassy/phylip 2) make the configuration file aclocal autoconf automake 3) configure and compile ./configure (use same options as you used to compile emboss) make make install b) from PHYLIP-3.6b.tar.gz available from our FTP server ftp://emboss.open-bio.org/pub/EMBOSS/ in file PHYLIP-3.6b.tar.gz If you have done a full installation of EMBOSS using a 'prefix' e.g. you configured with ./configure --prefix=/usr/local/emboss and followed this with a 'make install' (highly recommended) then: 1) gunzip and untar the file anywhere gunzip PHYLIP-3.6b.tar.gz tar xf PHYLIP-3.6b.tar 2) go into the phylip directory cd PHYLIP-3.6b 3) configure and compile ./configure (use same options as you used to compile emboss) make make install N.B. If you configured without using a prefix but did do a 'make install' (or specified a prefix of /usr/local, which amounts to the same thing) then you must configure using: ./configure --prefix=/usr/local --enable-localforce If, on the other hand, you did not do a 'make install' of EMBOSS then: 1) Go to the emboss directory cd EMBOSS-3.0.0 2) make new directory embassy if it does not exist already. mkdir embassy 3) Go into that directory cd embassy 4) gunzip and untar the file gunzip PHYLIP-3.6b.tar.gz tar xvf PHYLIP-3.6b.tar 5) go into the phylip directory cd PHYLIP-3.6b 6) configure and compile ./configure (use same options as you used to compile emboss) make 7) Set your PATH to include the full path of the 'src' directory > I downloaded phylipnew3.6 from emboss website, configure OK, but make > gave me this error: > > (...) > then mv -f ".deps/phylip.Tpo" ".deps/phylip.Po"; else rm -f > ".deps/phylip.Tpo"; exit 1; fi > make[1]: *** No rule to make target `../../../nucleus/libnucleus.la', > needed by `fclique'. Stop. > make[1]: Leaving directory `/home/dnalinux/bioinfo/PHYLIPNEW-3.6b/src' > make: *** [all-recursive] Error 1 > > Best, > SB. > > -- > Bioinformatics news: http://www.bioinformatica.info > Lriser: http://www.linspire.com/lraiser_success.php?serial=318 > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > From pmr at ebi.ac.uk Wed Jun 6 05:18:44 2007 From: pmr at ebi.ac.uk (Peter Rice) Date: Wed, 06 Jun 2007 10:18:44 +0100 Subject: [EMBOSS] fparse - wEMBOSS input format issues In-Reply-To: <1181037194.11549.2.camel@localhost.localdomain> References: <1181037194.11549.2.camel@localhost.localdomain> Message-ID: <46667BF4.5020302@ebi.ac.uk> khayeni wrote: > Error: Bad discrete states file 'sample.dat': read 10 states for > 'Alpha', > expected 6 > Error: Unable to read discrete states from 'sample.dat' > Died: fpars terminated: Bad value for '-infile' with -auto > defined > fpars exited with status 1... > > What could be the reason for this? > > The user also stated that when using a Windows based stand alone > implementation of the pars program the input file worked fine. In the EMBOSS port, we define for each program the characters that can be accepted. Unfortunately fpars is by default reading 0 and 1 only. It should work the same way as the standalone pars if you edit emboss_acd/fpars.acd and add a line to the definition of the input file: discretestates: infile [ parameter: "Y" characters: "\S+" help: "File containing one or more data sets" ] You will need to reinstall the phylipnew programs (or copy the fpars.acd file to where the installed copy is, in share/EMBOSS/acd where EMBOSS and phylipnew were installed) We intended characters: "" to have the same effect but it is not working in EMBOSS 4.1.0. \S+ is the internal representation for "any non-space character". We will fix this - and fix the error message you saw - in the next release. regards, Peter Rice From sbassi at gmail.com Wed Jun 6 10:08:29 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Wed, 6 Jun 2007 11:08:29 -0300 Subject: [EMBOSS] Problem installing Phylip In-Reply-To: <51971.81.98.241.17.1181115282.squirrel@webmail.ebi.ac.uk> References: <51971.81.98.241.17.1181115282.squirrel@webmail.ebi.ac.uk> Message-ID: On 6/6/07, ajb at ebi.ac.uk wrote: > Hello Sebastian, > The appended is from the FAQ file, see section 'b)'. .... > It looks like you did not configure using the same --prefix command > you used for configuring the main package. I didn't used prefix when configured EMBOSS, the problem was that I tried to configure it on a different directory from embassy. Now I corrected this and works (also MSE and ESIM4). Thank you very much! Best, -- Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From xinhong at indiana.edu Fri Jun 8 11:01:07 2007 From: xinhong at indiana.edu (Hong, Xin) Date: Fri, 8 Jun 2007 11:01:07 -0400 Subject: [EMBOSS] eprimer3 and emma References: <20070529083710.GH2487@kunpuu.plessy.org> <465D7D17.3020006@ebi.ac.uk> <20070601083728.GA29855@kunpuu.plessy.org> <465FFA2A.1060202@ebi.ac.uk> Message-ID: <89F32E6A19D3B34283EA3C105E33F11DC1D376@iu-mssg-mbx105.ads.iu.edu> Hello there, I have two questions, when I check all the functions of EMBOSS after installation. Here is the error message I get, when I try eprimer3. [discern:discern_test]\% eprimer3 tembl:hsfau1 hsfau.eprimer3 -explain Picks PCR primers and hybridization oligos Died: The program 'primer3_core' must be on the path. It is part of the 'primer3' package, version 0.9, available from the Whitehead Institute. See: http://www-genome.wi.mit.edu/ I wondering could version higher than 0.9 work. I found the C source code on http://primer3.sourceforge.net/releases.php . In addition, what the path mean in "must be on the path". Another question is about emma. How can I ban emma? If we have clustalw run on other place. Or I have to install clustalw. [discern:discern_test]\% emma Multiple alignment program - interface to ClustalW program Input (gapped) sequence(s): globins.fasta output sequence set [hbb_human.aln]: Dendrogram (tree file) from clustalw output file [hbb_human.dnd]: EMBOSS An error in ajsys.c at line 421: cannot find program 'clustalw' Thank, Xin From pmr at ebi.ac.uk Fri Jun 8 12:42:21 2007 From: pmr at ebi.ac.uk (Peter Rice) Date: Fri, 08 Jun 2007 17:42:21 +0100 Subject: [EMBOSS] eprimer3 and emma In-Reply-To: <89F32E6A19D3B34283EA3C105E33F11DC1D376@iu-mssg-mbx105.ads.iu.edu> References: <20070529083710.GH2487@kunpuu.plessy.org> <465D7D17.3020006@ebi.ac.uk> <20070601083728.GA29855@kunpuu.plessy.org> <465FFA2A.1060202@ebi.ac.uk> <89F32E6A19D3B34283EA3C105E33F11DC1D376@iu-mssg-mbx105.ads.iu.edu> Message-ID: <466986ED.9020606@ebi.ac.uk> Hong, Xin wrote: > I wondering could [primer3 version higher than 0.9 work. I found the C source code on http://primer3.sourceforge.net/releases.php . In addition, what the path mean in "must be on the path". Yes, I have tested with the latest version. The documentation for the next release will show the latest version. > Another question is about emma. How can I ban emma? If we have clustalw run on other place. Or I have to install clustalw. You can install clustalw ... it is possible to tell emma where clustalw is by defining the emboss_clustalw variable to the full path Hope that helps, Peter Rice From pmr at ebi.ac.uk Fri Jun 8 12:53:59 2007 From: pmr at ebi.ac.uk (Peter Rice) Date: Fri, 08 Jun 2007 17:53:59 +0100 Subject: [EMBOSS] eprimer3 and emma In-Reply-To: <89F32E6A19D3B34283EA3C105E33F11DC1D376@iu-mssg-mbx105.ads.iu.edu> References: <20070529083710.GH2487@kunpuu.plessy.org> <465D7D17.3020006@ebi.ac.uk> <20070601083728.GA29855@kunpuu.plessy.org> <465FFA2A.1060202@ebi.ac.uk> <89F32E6A19D3B34283EA3C105E33F11DC1D376@iu-mssg-mbx105.ads.iu.edu> Message-ID: <466989A7.9090500@ebi.ac.uk> Hong, Xin wrote: > I wondering could [primer3 version higher than 0.9 work. I found the C source code on http://primer3.sourceforge.net/releases.php . In addition, what the path mean in "must be on the path". Yes, I have tested with the latest version. The documentation for the next release will show the latest version. > Another question is about emma. How can I ban emma? If we have clustalw run on other place. Or I have to install clustalw. You can install clustalw ... it is possible to tell emma where clustalw is by defining the emboss_clustalw variable to the full path Hope that helps, Peter Rice From shrish at ccmb.res.in Mon Jun 11 08:25:02 2007 From: shrish at ccmb.res.in (Shrish Tiwari) Date: Mon, 11 Jun 2007 17:55:02 +0530 (IST) Subject: [EMBOSS] extracting introns. UTRs Message-ID: <29276167.1181564702083.JavaMail.root@mailserver> An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://lists.open-bio.org/pipermail/emboss/attachments/20070611/3d217f4f/attachment.pl From gbottu at ben.vub.ac.be Tue Jun 12 03:50:42 2007 From: gbottu at ben.vub.ac.be (Guy Bottu) Date: Tue, 12 Jun 2007 09:50:42 +0200 Subject: [EMBOSS] suggestion for improving restrict - Checked by AntiVir DEMO version - Message-ID: <20070612075042.GA2213@bigben.ulb.ac.be> Dear users and developers of EMBOSS, One of our users has a suggestion for improvinf restrict. He needs a list with the lengths of the restriction fragments, in the same order as they appear on the plasmid. Do you think this is an intersting addition ? (Note also how I suggested him to get around using Excel). Regards, Guy Bottu, BEN ----- Forwarded message from Xavier Danthinne ----- From: "Xavier Danthinne" To: "BEN administration" Subject: Re: EMBOSS - Checked by AntiVir DEMO version - - Checked by AntiVir +DEM Date: Fri, 8 Jun 2007 10:16:10 -0600 X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.3028 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3028 X-Authentication-Info: Submitted using SMTP AUTH LOGIN at +imta07a2.registeredsite.com from [63.79.129.51] using ID xdanthin at od260.com at +Fri, 8 Jun 2007 12:16:02 -0400 X-AntiVirus: checked by AntiVir Milter 1.0.6; AVE 7.4.0.32; VDF 6.38.2.8 Hello Guy, I read the exercise #6 that you mentioned, and the third paragraph ("Now suppose that...") is exactly what I was looking for. I understand that the solution is to save the data from "restrict" as tab-delimited, so we can import them into excel and calculate in a new column the difference between one site and the next one. This is not such a big deal to do that each time, but if the program restrict could do it for us, that would be great. This is what computers are for, isn't it? Thanks for your help, and have a good weekend. Xavier Xavier Danthinne, Ph.D. O.D.260 Inc PO Box 534 Boise, ID 83701 Ph. (208)345-7369 Fax (208)345-7569 Cell (208)484-0104 ----- Original Message ----- From: "BEN administration" To: "Xavier Danthinne" Sent: Friday, June 08, 2007 3:25 AM Subject: Re: EMBOSS - Checked by AntiVir DEMO version - >On Wed, Jun 06, 2007 at 11:52:21PM -0600, Xavier Danthinne wrote: >>I still like using EMBOSS. I have a suggestion regarding the program >>"restrict". If the program could list restriction fragments with their >>size in the order by which they constitute a piece of DNA such as a >>plasmid, this would be great. I work with large cosmids, and it is >>sometimes difficult to figure out where a specific fragment is located >>among others. Having this feature (like we had in GCG) would help. > >Well, you can export the output in MS Excel format and then quite easily >comute the list you want. The exercise 6 from >ftp://ftp.be.embnet.org/pub/BEN_Tutorials/unix_perl/ex-UNIX.doc >gives you an example of how to. >If you use wEMBOSS rather then the command line the parameters to set are >"Comma separated enzyme list" ... "Allow circular DNA?" y "Sort output >alphabetically?" y (or n, dependant on your needs) "Report format" >tab-delimited table format. >Does this help ? > >Regards, >Guy Bottu > ----- End forwarded message ----- From liuxq at mail.cbi.pku.edu.cn Tue Jun 12 10:30:08 2007 From: liuxq at mail.cbi.pku.edu.cn (Liu XQ) Date: Tue, 12 Jun 2007 22:30:08 +0800 Subject: [EMBOSS] png output problem Message-ID: <466EADF0.5030600@mail.cbi.pku.edu.cn> hi, I am an emboss user. When I use emboss program such as abiview to output png format image, many partial images are produced. Is there any method to restrict the program only output one integral png image? Thanks. Xiaoqiao From xinhong at indiana.edu Tue Jun 12 17:07:53 2007 From: xinhong at indiana.edu (Hong, Xin) Date: Tue, 12 Jun 2007 17:07:53 -0400 Subject: [EMBOSS] segmentation fault and other error messages References: <20070529083710.GH2487@kunpuu.plessy.org> <465D7D17.3020006@ebi.ac.uk> <20070601083728.GA29855@kunpuu.plessy.org> <465FFA2A.1060202@ebi.ac.uk> <89F32E6A19D3B34283EA3C105E33F11DC1D376@iu-mssg-mbx105.ads.iu.edu> <466989A7.9090500@ebi.ac.uk> Message-ID: <89F32E6A19D3B34283EA3C105E33F11DC1D37D@iu-mssg-mbx105.ads.iu.edu> Dear support group members of EMBOSS, I have several types of errors during my test of installation of EMBOSS. I tried on three different computers: desktop/Ubuntu, AIX5.0 (named as libra) and x86_64/RHEL (called discern). Could anyone give us some hint? 1. segmentation fault:fontml failed on all [libra02:discern_test]\% fcontml Gene frequency and continuous character Maximum Likelihood Input file: contml.dat Uncaught exception: Allocation failed, insufficient memory available, raised at ajstr.c:2083 [discern:discern_test]\% fcontml -printdata Gene frequency and continuous character Maximum Likelihood Input file: contml.dat Segmentation fault [ccc-desktop:test]\% fcontml -printdataGene frequency and continuous character Maximum Likelihood Input file: contml.dat Segmentation fault (core dumped) 2. no error message, create a empty file on discern: fconsense, ftreedist, ftreedistpair Please note, we install EMBOSS in 64 bits on dicern. They work fine on other two computers. 3. a wierd one: ememe. I am sure we installed MEME package. [discern:discern_test]\% ememe crp0.s -mod oops -revcomp ex2.html Motif detection sh: meme: command not found [libra02:discern_test]\% ememe crp0.s -mod oops -revcomp ex2.html Motif detection sh: meme: not found. [ccc-desktop:test]\% ememe crp0.s -mod oops -revcomp ex2.html Motif detection sh: meme: not found Best, Xin From charles-listes-emboss at plessy.org Tue Jun 12 21:27:34 2007 From: charles-listes-emboss at plessy.org (Charles Plessy) Date: Wed, 13 Jun 2007 10:27:34 +0900 Subject: [EMBOSS] png output problem In-Reply-To: <466EADF0.5030600@mail.cbi.pku.edu.cn> References: <466EADF0.5030600@mail.cbi.pku.edu.cn> Message-ID: <20070613012734.GB22552@kunpuu.plessy.org> Le Tue, Jun 12, 2007 at 10:30:08PM +0800, Liu XQ a ?crit : > hi, > I am an emboss user. When I use emboss program such as abiview to output > png format image, many partial images are produced. Is there any method > to restrict the program only output one integral png image? Dear Xiaoqiao, I am not completely sure that it would solve your problem, but there are fixes available for abiview on EMBOSS' FTP server: ftp://emboss.open-bio.org/pub/EMBOSS/fixes Good luck, -- Charles Plessy http://charles.plessy.org Wako, Saitama, Japan From david.bauer at bayerhealthcare.com Wed Jun 13 02:47:06 2007 From: david.bauer at bayerhealthcare.com (david.bauer at bayerhealthcare.com) Date: Wed, 13 Jun 2007 08:47:06 +0200 Subject: [EMBOSS] png output problem In-Reply-To: <20070613012734.GB22552@kunpuu.plessy.org> Message-ID: Hi Xiaoqiao, I think you mean that abiview produces one separate image for each 40 bases. You can change this with the option -window. But be aware, that abiview always puts the whole window in one image so using window size of more than 100 produces quite compressed plots. Unfortunately there is no option to get the whole tracefile plotted on a page in a more compact style. Sometime I also need for reporting purposes to create a hardcopy of an abi tracefile. So I think this would be a nice improvement of abiview. David. emboss-bounces at lists.open-bio.org schrieb am 13/06/2007 03:27:34: > Le Tue, Jun 12, 2007 at 10:30:08PM +0800, Liu XQ a ?crit : > > hi, > > I am an emboss user. When I use emboss program such as abiview to output > > png format image, many partial images are produced. Is there any method > > to restrict the program only output one integral png image? > > Dear Xiaoqiao, > > I am not completely sure that it would solve your problem, but there are > fixes available for abiview on EMBOSS' FTP server: > > ftp://emboss.open-bio.org/pub/EMBOSS/fixes > > Good luck, > > -- > Charles Plessy > http://charles.plessy.org > Wako, Saitama, Japan > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss From kellerfam at mac.com Wed Jun 13 12:34:52 2007 From: kellerfam at mac.com (Thomas Keller) Date: Wed, 13 Jun 2007 09:34:52 -0700 Subject: [EMBOSS] png output problem In-Reply-To: References: Message-ID: <0EE33265-1B0B-4A00-AB09-965CC53AF2BF@mac.com> What happened to the paperless office! On Jun 12, 2007, at 11:47 PM, david.bauer at bayerhealthcare.com wrote: > Hi Xiaoqiao, > > I think you mean that abiview produces one separate image for each 40 > bases. > You can change this with the option -window. But be aware, that > abiview > always puts the whole window in one image so using window size of more > than 100 produces quite compressed plots. > Unfortunately there is no option to get the whole tracefile plotted > on a > page in a more compact style. > > Sometime I also need for reporting purposes to create a hardcopy of > an abi > tracefile. So I think this would be a nice improvement of abiview. > > David. > > emboss-bounces at lists.open-bio.org schrieb am 13/06/2007 03:27:34: > >> Le Tue, Jun 12, 2007 at 10:30:08PM +0800, Liu XQ a ?crit : >>> hi, >>> I am an emboss user. When I use emboss program such as abiview to > output >>> png format image, many partial images are produced. Is there any > method >>> to restrict the program only output one integral png image? >> >> Dear Xiaoqiao, >> >> I am not completely sure that it would solve your problem, but >> there are >> fixes available for abiview on EMBOSS' FTP server: >> >> ftp://emboss.open-bio.org/pub/EMBOSS/fixes >> >> Good luck, >> >> -- >> Charles Plessy >> http://charles.plessy.org >> Wako, Saitama, Japan >> _______________________________________________ >> EMBOSS mailing list >> EMBOSS at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/emboss > > > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss From pmr at ebi.ac.uk Wed Jun 13 18:00:07 2007 From: pmr at ebi.ac.uk (Peter Rice) Date: Wed, 13 Jun 2007 23:00:07 +0100 Subject: [EMBOSS] segmentation fault and other error messages In-Reply-To: <89F32E6A19D3B34283EA3C105E33F11DC1D37D@iu-mssg-mbx105.ads.iu.edu> References: <20070529083710.GH2487@kunpuu.plessy.org> <465D7D17.3020006@ebi.ac.uk> <20070601083728.GA29855@kunpuu.plessy.org> <465FFA2A.1060202@ebi.ac.uk> <89F32E6A19D3B34283EA3C105E33F11DC1D376@iu-mssg-mbx105.ads.iu.edu> <466989A7.9090500@ebi.ac.uk> <89F32E6A19D3B34283EA3C105E33F11DC1D37D@iu-mssg-mbx105.ads.iu.edu> Message-ID: <467068E7.3050308@ebi.ac.uk> Dear Xin, Hong, Xin wrote: > Dear support group members of EMBOSS, > > I have several types of errors during my test of installation of EMBOSS. I tried on three different computers: desktop/Ubuntu, AIX5.0 (named as libra) and x86_64/RHEL (called discern). Could anyone give us some hint? > > 1. segmentation fault:fcontml failed on all > > [libra02:discern_test]\% fcontml > Gene frequency and continuous character Maximum Likelihood > Input file: contml.dat > Uncaught exception: Allocation failed, insufficient memory available, raised at ajstr.c:2083 Can you please send me the input file contml.dat > 2. no error message, create a empty file on discern: fconsense, ftreedist, ftreedistpair > Please note, we install EMBOSS in 64 bits on dicern. They work fine on other two computers. What was the ./configure commandline when you install? > 3. a wierd one: ememe. > > I am sure we installed MEME package. > > [discern:discern_test]\% ememe crp0.s -mod oops -revcomp ex2.html > Motif detection > sh: meme: command not found ememe will search for meme in the /bin/sh path, perhaps you installed it somewhere that is in your path when you login but not for this shell? You can define environment variable EMBOSS_MEME (uppercase, or put it into the emboss.defaults file) to point to the full path to meme. regards, Peter Rice From david.bauer at bayerhealthcare.com Thu Jun 14 01:50:34 2007 From: david.bauer at bayerhealthcare.com (david.bauer at bayerhealthcare.com) Date: Thu, 14 Jun 2007 07:50:34 +0200 Subject: [EMBOSS] png output problem In-Reply-To: <0EE33265-1B0B-4A00-AB09-965CC53AF2BF@mac.com> Message-ID: Thomas Keller schrieb am 13/06/2007 18:34:52: > What happened to the paperless office! > In Pharma this is still just a nice dream ;-) From niels at genomics.dk Thu Jun 14 05:09:38 2007 From: niels at genomics.dk (Niels Larsen) Date: Thu, 14 Jun 2007 11:09:38 +0200 Subject: [EMBOSS] Genbank GI fetching? Message-ID: <467105D2.8040005@genomics.dk> Greetings, Can dbxflat index genbank, so seqret can fetch entries by GI number? (apologies if I overlooked it). Niels L From david.bauer at bayerhealthcare.com Thu Jun 14 05:34:14 2007 From: david.bauer at bayerhealthcare.com (david.bauer at bayerhealthcare.com) Date: Thu, 14 Jun 2007 11:34:14 +0200 Subject: [EMBOSS] Antwort: Genbank GI fetching? In-Reply-To: <467105D2.8040005@genomics.dk> Message-ID: Hi, as default it indexes only ID and ACC. If you want GI indexed, you must use the -fields option and specify also sv (sequence version) to be indexed. David. emboss-bounces at lists.open-bio.org schrieb am 14/06/2007 11:09:38: > Greetings, > > Can dbxflat index genbank, so seqret can fetch entries by > GI number? (apologies if I overlooked it). > > Niels L > _______________________________________________ > EMBOSS mailing list > EMBOSS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss From niels at genomics.dk Thu Jun 14 05:53:34 2007 From: niels at genomics.dk (Niels Larsen) Date: Thu, 14 Jun 2007 11:53:34 +0200 Subject: [EMBOSS] Antwort: Genbank GI fetching? In-Reply-To: References: Message-ID: <4671101E.9000003@genomics.dk> Hi .. I did run dbxflat like this dbxflat -dbname test -dbresource test -idformat gb -filenames \ 'gbbct7.seq' -directory . -fields=id,acc,sv,des,key,org and got an index where entries can be fetched with seqret test:AM260486 -stdout -auto but not with seqret test:106880293 -stdout -auto where 106880293 is the GI number from the AM260486. Maybe I am specifying the usa wrong .. ? Niels L david.bauer at bayerhealthcare.com wrote: > Hi, > > as default it indexes only ID and ACC. If you want GI indexed, you must > use the -fields option and specify also sv (sequence version) to be > indexed. > > David. From david.bauer at bayerhealthcare.com Thu Jun 14 07:17:20 2007 From: david.bauer at bayerhealthcare.com (david.bauer at bayerhealthcare.com) Date: Thu, 14 Jun 2007 13:17:20 +0200 Subject: [EMBOSS] Antwort: Genbank GI fetching? In-Reply-To: <4671101E.9000003@genomics.dk> Message-ID: Hi Niels, when indexing the sv field, dbxflat indexes the words as they are there. This means that 'GI:' becomes part of the indexed word. So seqret test:GI:106880293 or seqret test:\*106880293 will return the requested entry. And you also need the line fields: "id acc sv des key org" in your database definition. HTH, David. Niels Larsen schrieb am 14/06/2007 11:53:34: > Hi .. I did run dbxflat like this > > dbxflat -dbname test -dbresource test -idformat gb -filenames \ > 'gbbct7.seq' -directory . -fields=id,acc,sv,des,key,org > > and got an index where entries can be fetched with > > seqret test:AM260486 -stdout -auto > > but not with > > seqret test:106880293 -stdout -auto > > where 106880293 is the GI number from the AM260486. Maybe > I am specifying the usa wrong .. ? > > Niels L > > david.bauer at bayerhealthcare.com wrote: > > Hi, > > > > as default it indexes only ID and ACC. If you want GI indexed, you must > > use the -fields option and specify also sv (sequence version) to be > > indexed. > > > > David. From niels at genomics.dk Fri Jun 15 12:32:13 2007 From: niels at genomics.dk (Niels Larsen) Date: Fri, 15 Jun 2007 18:32:13 +0200 Subject: [EMBOSS] Antwort: Genbank GI fetching? In-Reply-To: References: Message-ID: <4672BF0D.5080003@genomics.dk> Thanks David Bauer, for pulling me afloat. The dbxflat is now grinding and I am making a small set of Perl accessors. I saw a broken link at http://emboss.sourceforge.net, so tried to see if there are more. The validator at W3C http://validator.w3.org/checklink gives a decent list in response to the EMBOSS link, with the "recursive" option on. Another little thing, but I might be wrong, is whether all applications are listed in groups? for example, I go here (from the home page), http://emboss.sourceforge.net/apps/release/4.1/emboss/apps/groups.html and click about, hunting for seqret, but dont find (I wanted to see which other related programs there is to seqret). The "see also" tables give me that of course, but I didnt discover that at first. Finally question: I will build an accessor (in Perl) that invokes seqret for pulling out a genbank sub-sequence, often just a small piece, plus the features that overlap with this piece. Can EMBOSS do this, or must I pull the whole entry, parse and find the overlapping pieces in Perl? I am working on these sites, which are only guaranteed to work now and then, http://biobase.com:8000/UTHCT http://biobase.com:8000/RNAport http://biobase.com:8000/RRNA and I may use EMBOSS tools as part of a later query mechanism. For now I will be using EMBOSS for filling in the non-matches between the matches from a blast and other similarity reports, for zoom-able alignments and better overview. And so far it is going well with EMBOSS. Niels L From david.bauer at bayerhealthcare.com Sun Jun 17 02:53:34 2007 From: david.bauer at bayerhealthcare.com (david.bauer at bayerhealthcare.com) Date: Sun, 17 Jun 2007 08:53:34 +0200 Subject: [EMBOSS] Antwort: Genbank GI fetching? In-Reply-To: <4672BF0D.5080003@genomics.dk> Message-ID: Hi Niels, Niels Larsen schrieb am 15/06/2007 18:32:13: > http://emboss.sourceforge.net/apps/release/4.1/emboss/apps/groups.html > > and click about, hunting for seqret, but dont find (I wanted you could find seqret in the edit group but unfortunately this belongs to the category "broken link". > Finally question: I will build an accessor (in Perl) that > invokes seqret for pulling out a genbank sub-sequence, often > just a small piece, plus the features that overlap with this > piece. Can EMBOSS do this, or must I pull the whole entry, > parse and find the overlapping pieces in Perl? There are (at least) 3 programs in EMBOSS which could be helpfull for this task: 1) showfeat Returns a nice parsable overview of the features in the GenBank entry. 2) extractfeat Extracts sequence of features from the GenBank entry. 3) coderet This is for mRNA and CDS features and takes care of joining the exons and get the protein sequence of a CDS. Cheers, David. From niels at genomics.dk Sun Jun 17 19:56:40 2007 From: niels at genomics.dk (Niels Larsen) Date: Mon, 18 Jun 2007 01:56:40 +0200 Subject: [EMBOSS] Antwort: Genbank GI fetching? In-Reply-To: References: Message-ID: <4675CA38.2000003@genomics.dk> Hi David, Thanks again, for the hints. Great. I found dbxflat behaves well, goes fast and and makes small indices when only id,acc are asked for. But Genbank/EMBL have become 500gb+ monsters uncompressed, and so I made this primitive scheme in addition: split the flatfiles into many smaller compressed files organised in directories that are the first 4 digits of the GI number. Then with grep and zcat as "accessors", and 5-10 mb chunks, the average access time is 0.1-0.2 seconds - much worse than dbxflat, but better than fetching posts from NCBI, and then its 100gb instead of 500, close to its distributed compressed size. I would have used EMBL if EBI's remote services worked reliably. Btw, the seqret documentation doesnt say, but stdin: works as stdout: zcat 2.gz | seqret -filter stdin:AAIY01677200 -sbegin1 11 -osformat2 embl -firstonly Is adding to indices on the todo-list for dbxflat? Niels L From david.bauer at bayerhealthcare.com Mon Jun 18 03:34:02 2007 From: david.bauer at bayerhealthcare.com (david.bauer at bayerhealthcare.com) Date: Mon, 18 Jun 2007 09:34:02 +0200 Subject: [EMBOSS] Antwort: Genbank GI fetching? In-Reply-To: <4675CA38.2000003@genomics.dk> Message-ID: Hi Niels, Niels Larsen schrieb am 18/06/2007 01:56:40: > Btw, the seqret documentation > doesnt say, but stdin: works as stdout: > > zcat 2.gz | seqret -filter stdin:AAIY01677200 -sbegin1 11 -osformat2 > embl -firstonly Ehm, this is a strange construction. The -filter is a general qualifier for all EMBOSS programs and means that input comes from stdin and output goes to stdout. So I'm not sure what happens if you combine -filter with stdin: but I guess that Peter has an answer to this. Cheers, David. From pmr at ebi.ac.uk Mon Jun 18 05:09:39 2007 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 18 Jun 2007 10:09:39 +0100 Subject: [EMBOSS] Antwort: Genbank GI fetching? In-Reply-To: References: Message-ID: <46764BD3.7050502@ebi.ac.uk> david.bauer at bayerhealthcare.com wrote: > Hi Niels, > > Niels Larsen schrieb am 18/06/2007 01:56:40: > >> Btw, the seqret documentation >> doesnt say, but stdin: works as stdout: >> >> zcat 2.gz | seqret -filter stdin:AAIY01677200 -sbegin1 11 -osformat2 >> embl -firstonly > > Ehm, this is a strange construction. The -filter is a general qualifier > for all EMBOSS programs and means that input comes from stdin and output > goes to stdout. So I'm not sure what happens if you combine -filter with > stdin: but I guess that Peter has an answer to this. No problem. -filter defaults to reading stdin for the first input, and writing to stdout for the first output. If you specify something else for the input or output, it will do what you say. So, as David says, it is the -filter that tells it to write to stdout. regards, Peter From niels at genomics.dk Mon Jun 18 09:33:04 2007 From: niels at genomics.dk (Niels Larsen) Date: Mon, 18 Jun 2007 15:33:04 +0200 Subject: [EMBOSS] Antwort: Genbank GI fetching? In-Reply-To: References: Message-ID: <46768990.8050108@genomics.dk> david.bauer at bayerhealthcare.com wrote: > Hi Niels, > > Niels Larsen schrieb am 18/06/2007 01:56:40: > >> Btw, the seqret documentation >> doesnt say, but stdin: works as stdout: >> >> zcat 2.gz | seqret -filter stdin:AAIY01677200 -sbegin1 11 -osformat2 >> embl -firstonly > > Ehm, this is a strange construction. The -filter is a general qualifier > for all EMBOSS programs and means that input comes from stdin and output > goes to stdout. So I'm not sure what happens if you combine -filter with > stdin: but I guess that Peter has an answer to this. It was my first way to get only a particular entry from the stream, but there could well be a better way .. next I will be looking for a way to get several out of the stream, instead of one, with pure command line .. could use a list file, but then my code that makes these command lines would need to manage those on the fly .. I found a little trap with seqret, if one enters by mistake --firstonly (the GNU way) instead of -firstonly, then seqret quietly returns nothing. It should probably complain instead, the other arguments do that. I use 4.1.0. Niels L From pmr at ebi.ac.uk Mon Jun 18 09:33:41 2007 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 18 Jun 2007 14:33:41 +0100 Subject: [EMBOSS] edit_group broken link In-Reply-To: References: Message-ID: <467689B5.9060406@ebi.ac.uk> david.bauer at bayerhealthcare.com wrote: > you could find seqret in the edit group but unfortunately this belongs to > the category "broken link". Oops. Fixed. I will check the website for any other broken links. Peter From pmr at ebi.ac.uk Mon Jun 18 09:38:17 2007 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 18 Jun 2007 14:38:17 +0100 Subject: [EMBOSS] Antwort: Genbank GI fetching? In-Reply-To: <4675CA38.2000003@genomics.dk> References: <4675CA38.2000003@genomics.dk> Message-ID: <46768AC9.9020206@ebi.ac.uk> Niels Larsen wrote: > Is adding to indices on the todo-list for dbxflat? Indeed it is. But it will not be in time for the July release. We hope to be able to add/replace and remove entries from existing indices as a better way to support the nucleotide sequence database updates between releases. We would also be interested in any compression scheme that would allow direct access to individual sequences - we could then index such files with dbxflat (or dbiflat). We would consider it an advantage if other packages could use the same data files (our hope is to avoid creating an additional copy of EMBL or GenBank just for EMBOSS). regards, Peter Rice From pmr at ebi.ac.uk Mon Jun 18 10:38:57 2007 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 18 Jun 2007 15:38:57 +0100 Subject: [EMBOSS] Command line with --firstonly In-Reply-To: <46768990.8050108@genomics.dk> References: <46768990.8050108@genomics.dk> Message-ID: <46769901.8020008@ebi.ac.uk> Niels Larsen wrote: > I found a little trap with seqret, if one enters by mistake --firstonly > (the GNU way) instead of -firstonly, then seqret quietly returns > nothing. It should probably complain instead, the other arguments > do that. I use 4.1.0. Thanks for the suggestion. It doesn't silently return nothing ... it writes to a file called --firstonly (and you need "cat -- --firstonly to read it and "rm -- --firstonly" to remove it :-) In the next release we will allow -- as a qualifier prefix on the command line. We have no use (as far as I can tell) for "--" as a delimiter for real filenames (like cat and rm). regards, Peter From niels at genomics.dk Mon Jun 18 16:08:40 2007 From: niels at genomics.dk (Niels Larsen) Date: Mon, 18 Jun 2007 22:08:40 +0200 Subject: [EMBOSS] Command line with --firstonly In-Reply-To: <46769901.8020008@ebi.ac.uk> References: <46768990.8050108@genomics.dk> <46769901.8020008@ebi.ac.uk> Message-ID: <4676E648.1010805@genomics.dk> Ah well, maybe I am the only one that could ever fall into that trap, but thanks. Have I overlooked a way to switch fasta output to sequence-as -single-line mode? I think it allowed by fasta format, and if I/we were to build a processing pipe with sequence flowing, then I imagine that would be useful both for speed and I/O simplicity .. Niels L Peter Rice wrote: > Niels Larsen wrote: > >> I found a little trap with seqret, if one enters by mistake --firstonly >> (the GNU way) instead of -firstonly, then seqret quietly returns >> nothing. It should probably complain instead, the other arguments >> do that. I use 4.1.0. > > Thanks for the suggestion. > > It doesn't silently return nothing ... it writes to a file called --firstonly > (and you need "cat -- --firstonly to read it and "rm -- --firstonly" to remove > it :-) > > In the next release we will allow -- as a qualifier prefix on the command line. > > We have no use (as far as I can tell) for "--" as a delimiter for real filenames > (like cat and rm). > > regards, > > Peter From yogi.sundaravadanam at agrf.org.au Tue Jun 19 00:29:52 2007 From: yogi.sundaravadanam at agrf.org.au (Yogi Sundaravadanam) Date: Tue, 19 Jun 2007 14:29:52 +1000 Subject: [EMBOSS] Sixpack/transeq frame translation Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://lists.open-bio.org/pipermail/emboss/attachments/20070619/b6cead2a/attachment.pl From david.bauer at bayerhealthcare.com Tue Jun 19 02:04:38 2007 From: david.bauer at bayerhealthcare.com (david.bauer at bayerhealthcare.com) Date: Tue, 19 Jun 2007 08:04:38 +0200 Subject: [EMBOSS] Command line with --firstonly In-Reply-To: <4676E648.1010805@genomics.dk> Message-ID: There is the format "meganon". It is very similar to what you want. The sequence is just one line without line breaks. Only the header is not fasta. The sequence output formats are defined in ajseqwrite.c. You can use the meganon format as template to create your own modified fasta output format. Cheers, David. emboss-bounces at lists.open-bio.org schrieb am 18/06/2007 22:08:40: > Have I overlooked a way to switch fasta output to sequence-as > -single-line mode? I think it allowed by fasta format, and > if I/we were to build a processing pipe with sequence flowing, > then I imagine that would be useful both for speed and I/O > simplicity .. From pmr at ebi.ac.uk Wed Jun 20 03:39:17 2007 From: pmr at ebi.ac.uk (Peter Rice) Date: Wed, 20 Jun 2007 08:39:17 +0100 Subject: [EMBOSS] Command line with --firstonly In-Reply-To: <4676E648.1010805@genomics.dk> References: <46768990.8050108@genomics.dk> <46769901.8020008@ebi.ac.uk> <4676E648.1010805@genomics.dk> Message-ID: <4678D9A5.4070904@ebi.ac.uk> Niels Larsen wrote: > Have I overlooked a way to switch fasta output to sequence-as > -single-line mode? I think it allowed by fasta format, and > if I/we were to build a processing pipe with sequence flowing, > then I imagine that would be useful both for speed and I/O > simplicity .. -ossingle It has the drawback that you have to accept the default file names for the extra output files. We recommend runninng in an empty directory so you can find all the output. regards, Peter From pmr at ebi.ac.uk Wed Jun 20 04:08:50 2007 From: pmr at ebi.ac.uk (Peter Rice) Date: Wed, 20 Jun 2007 09:08:50 +0100 Subject: [EMBOSS] Command line with --firstonly In-Reply-To: <4678D9A5.4070904@ebi.ac.uk> References: <46768990.8050108@genomics.dk> <46769901.8020008@ebi.ac.uk> <4676E648.1010805@genomics.dk> <4678D9A5.4070904@ebi.ac.uk> Message-ID: <4678E092.8050402@ebi.ac.uk> Peter Rice wrote: > Niels Larsen wrote: >> Have I overlooked a way to switch fasta output to sequence-as >> -single-line mode? I think it allowed by fasta format, and >> if I/we were to build a processing pipe with sequence flowing, >> then I imagine that would be useful both for speed and I/O >> simplicity .. > > -ossingle Oops. I misread your request as "single file mode". If you want the sequence all on one line after the header that can be done, but would need a new format name... or more than one new name if you want to allow more than one style of ID. Human chromosome 1 would have a very long line in the file. In C I am not sure it helps with speed for long sequences ... EMBL format is (or will be in 5.0.0) much faster for these as it includes the sequence length and avoids a lot of string copying. regards, Peter From wli at ebi.ac.uk Wed Jun 20 08:22:00 2007 From: wli at ebi.ac.uk (Weizhong Li) Date: Wed, 20 Jun 2007 13:22:00 +0100 Subject: [EMBOSS] dbxfasta and seqret Message-ID: <46791BE8.8020305@ebi.ac.uk> Hi, Does anybody know how to use seqret to fetch fasta sequences from the index database by dbxfasta? Many thanks Weizhong From pmr at ebi.ac.uk Wed Jun 20 08:54:35 2007 From: pmr at ebi.ac.uk (Peter Rice) Date: Wed, 20 Jun 2007 13:54:35 +0100 Subject: [EMBOSS] dbxfasta and seqret In-Reply-To: <46791BE8.8020305@ebi.ac.uk> References: <46791BE8.8020305@ebi.ac.uk> Message-ID: <4679238B.9090908@ebi.ac.uk> Weizhong Li wrote: > Does anybody know how to use seqret to fetch fasta sequences from the > index database by dbxfasta? You need a database definition. Seqret will use this to find the new index files and retrieve entries. Define a database that points to your dbxfasta index and data files (in your ~/.embossrc file or in the global emboss.defaults.file) DB mydata [ type: "N" format: "fasta" method: "emboss" dir: "/data/path/in/full" indexdir: "/index/path/in/full" comment: "DBXFASTA index of embl" # if you used a different name when you indexed the database: dbalias: embl ] Then seqret mydata:x12345 will retrieve entry x12345 in FASTA format. regards, Peter From pmr at ebi.ac.uk Wed Jun 20 09:20:59 2007 From: pmr at ebi.ac.uk (Peter Rice) Date: Wed, 20 Jun 2007 14:20:59 +0100 Subject: [EMBOSS] Sixpack/transeq frame translation In-Reply-To: References: Message-ID: <467929BB.8000509@ebi.ac.uk> Yogi Sundaravadanam wrote: > Sixpack/transeq translations don?t match BLAST translation. This is > giving me such grief. After some research, I found out that there?s an > ?alternate option in ?transeq? that translate the nucleic sequence to a > peptide, the BLAST way. Is it possible to do the same for Sixpack? I am not sure what you mean by "BLAST translation". Do you mean the reverse frames are numbered differently? We followed the convention used by the Staden package. Would a -alternative option be useful in sixpack? regards, Peter Rice From d.gatherer at vir.gla.ac.uk Wed Jun 20 10:18:37 2007 From: d.gatherer at vir.gla.ac.uk (Derek Gatherer) Date: Wed, 20 Jun 2007 15:18:37 +0100 Subject: [EMBOSS] Sixpack/transeq frame translation In-Reply-To: <467929BB.8000509@ebi.ac.uk> References: <467929BB.8000509@ebi.ac.uk> Message-ID: At 14:20 20/06/2007, Peter Rice wrote: >Would a -alternative option be useful in sixpack? Hi Peter Sixpack could do with a real sorting out, as could a few of the other translation programs, as I have mentioned before: http://emboss.open-bio.org/pipermail/emboss/2006-May/002529.html I'd be happy to produce a detailed spec. of what sixpack/showseq/transeq does and doesn't do (as opposed to what the documentation claims they do), if this would help. I've been trying to write a wrapper script that handles GenBank files and uses EMBOSS translation programs as the core of its API, but have been consistently frustrated by the bugs in them. Best wishes Derek From staylor at molbiol.ox.ac.uk Thu Jun 21 11:45:48 2007 From: staylor at molbiol.ox.ac.uk (Steve Taylor) Date: Thu, 21 Jun 2007 16:45:48 +0100 Subject: [EMBOSS] Searching for repeats in fuzznuc Message-ID: <467A9D2C.6050108@molbiol.ox.ac.uk> Hi, I would like to search for a specific repeat using fuzznuc. It is pretty easy using a regexp in preg (yes, I know preg should really only be used for protein sequences:-)) via preg -pattern '(TG{5,20}){2,10}' but is there a way to do something similar in fuzznuc, since I would like to introduce mismatches. Thanks for any help, Steve ------------------------------------------------------------------ Medical Sciences Division Weatherall Institute of Molecular Medicine/Sir William Dunn School Oxford University From pmr at ebi.ac.uk Thu Jun 21 12:31:54 2007 From: pmr at ebi.ac.uk (Peter Rice) Date: Thu, 21 Jun 2007 17:31:54 +0100 Subject: [EMBOSS] Searching for repeats in fuzznuc In-Reply-To: <467A9D2C.6050108@molbiol.ox.ac.uk> References: <467A9D2C.6050108@molbiol.ox.ac.uk> Message-ID: <467AA7FA.4060108@ebi.ac.uk> Hi Steve, > I would like to search for a specific repeat using fuzznuc. It is pretty easy using a regexp in preg (yes, I know preg should really only be used for protein sequences:-)) via > > preg -pattern '(TG{5,20}){2,10}' > > but is there a way to do something similar in fuzznuc, since I would like to introduce mismatches. You could use dreg rather than preg :-) Fuzznuc now accepts files with a pattern on each line: % cat tg.pat TG(5,20)TG(5,20) TG(5,20)TG(5,20)TG(5,20) TG(5,20)TG(5,20)TG(5,20)TG(5,20) TG(5,20)TG(5,20)TG(5,20)TG(5,20)TG(5,20) or, if you want to name them, ... % cat tg.pat >tg2 TG(5,20)TG(5,20) >tg3 TG(5,20)TG(5,20)TG(5,20) >tg4 TG(5,20)TG(5,20)TG(5,20)TG(5,20) >tg5 TG(5,20)TG(5,20)TG(5,20)TG(5,20)TG(5,20) You can use the file with: % fuzznuc -pattern @tg.pat Hope that helps, Peter From staylor at molbiol.ox.ac.uk Thu Jun 21 12:42:57 2007 From: staylor at molbiol.ox.ac.uk (Steve Taylor) Date: Thu, 21 Jun 2007 17:42:57 +0100 Subject: [EMBOSS] Searching for repeats in fuzznuc In-Reply-To: <467AA7FA.4060108@ebi.ac.uk> References: <467A9D2C.6050108@molbiol.ox.ac.uk> <467AA7FA.4060108@ebi.ac.uk> Message-ID: <467AAA91.2090505@molbiol.ox.ac.uk> Hi Peter, > >> I would like to search for a specific repeat using fuzznuc. It is >> pretty easy using a regexp in preg (yes, I know preg should really >> only be used for protein sequences:-)) via >> >> preg -pattern '(TG{5,20}){2,10}' >> >> but is there a way to do something similar in fuzznuc, since I would >> like to introduce mismatches. > > > You could use dreg rather than preg :-) > Excellent first suggestion!:-) > Fuzznuc now accepts files with a pattern on each line: > > % cat tg.pat > TG(5,20)TG(5,20) > TG(5,20)TG(5,20)TG(5,20) > TG(5,20)TG(5,20)TG(5,20)TG(5,20) > TG(5,20)TG(5,20)TG(5,20)TG(5,20)TG(5,20) > > or, if you want to name them, ... > > % cat tg.pat > >> tg2 > > TG(5,20)TG(5,20) > >> tg3 > > TG(5,20)TG(5,20)TG(5,20) > >> tg4 > > TG(5,20)TG(5,20)TG(5,20)TG(5,20) > >> tg5 > > TG(5,20)TG(5,20)TG(5,20)TG(5,20)TG(5,20) > > You can use the file with: > > % fuzznuc -pattern @tg.pat Thanks. That looks like a useful work around. Out of interest any plans for a mismatch option in dreg? Steve From pmr at ebi.ac.uk Thu Jun 21 12:56:16 2007 From: pmr at ebi.ac.uk (Peter Rice) Date: Thu, 21 Jun 2007 17:56:16 +0100 Subject: [EMBOSS] Searching for repeats in fuzznuc In-Reply-To: <467AAB6F.7040006@purdue.edu> References: <467A9D2C.6050108@molbiol.ox.ac.uk> <467AA7FA.4060108@ebi.ac.uk> <467AAB6F.7040006@purdue.edu> Message-ID: <467AADB0.50800@ebi.ac.uk> Phillip San Miguel wrote: > Peter Rice wrote: >> [...] >> >> % fuzznuc -pattern @tg.pat >> >> Hope that helps, >> >> Peter >> > Hi Peter, > Along these lines, I've noticed that dreg wants the pattern file name > to be all caps. Eg. Haha ... a side effect of the line in emboss/acd/dreg.acd upper: "Y" It is supposed to make the patterns upper case, not the filename :-) if you remove that line, and are careful to make patterns in upper case, you will be OK until the new release. Thanks for pointing it out. Peter From pmr at ebi.ac.uk Thu Jun 21 13:13:25 2007 From: pmr at ebi.ac.uk (Peter Rice) Date: Thu, 21 Jun 2007 18:13:25 +0100 Subject: [EMBOSS] Searching for repeats in fuzznuc In-Reply-To: <467AADB0.50800@ebi.ac.uk> References: <467A9D2C.6050108@molbiol.ox.ac.uk> <467AA7FA.4060108@ebi.ac.uk> <467AAB6F.7040006@purdue.edu> <467AADB0.50800@ebi.ac.uk> Message-ID: <467AB1B5.7020404@ebi.ac.uk> Peter Rice wrote: > Haha ... a side effect of the line in emboss/acd/dreg.acd > > upper: "Y" > > It is supposed to make the patterns upper case, not the filename :-) > > if you remove that line, and are careful to make patterns in upper case, you > will be OK until the new release. or, in ajax/ajacd.c function acdSetRegexp, comment out: /* if(upper) ajStrFmtUpper(&acdReply); if(lower) ajStrFmtLower(&acdReply); */ regards, Peter From pmiguel at purdue.edu Thu Jun 21 12:46:39 2007 From: pmiguel at purdue.edu (Phillip San Miguel) Date: Thu, 21 Jun 2007 12:46:39 -0400 Subject: [EMBOSS] Searching for repeats in fuzznuc In-Reply-To: <467AA7FA.4060108@ebi.ac.uk> References: <467A9D2C.6050108@molbiol.ox.ac.uk> <467AA7FA.4060108@ebi.ac.uk> Message-ID: <467AAB6F.7040006@purdue.edu> Peter Rice wrote: > [...] > > > % fuzznuc -pattern @tg.pat > > Hope that helps, > > Peter > Hi Peter, Along these lines, I've noticed that dreg wants the pattern file name to be all caps. Eg. # cat test.pat >CCHC TG[CT].{6}TG[CT].{12}CA[CT](?:...){4,5}TG[CT] # dreg -rformat gff -stdout -auto -sequence HEX0238L06merge.fasta @test.pat Error: Unable to open regular expression file 'TEST.PAT' Error: Bad regular expression pattern: '@TEST.PAT' Died: dreg terminated: Bad value for '-pattern' with -auto defined # ln -s test.pat TEST.PAT # dreg -rformat gff -stdout -auto -sequence HEX0238L06merge.fasta @test.pat ##gff-version 2.0 ##date 2007-06-21 ##Type DNA HEX0238L06Merged_Contigs HEX0238L06Merged_Contigs dreg misc_feature 25288 25329 0.000 + . Sequence "HEX0238L06Merged_Contigs.1" ; note "*pat CCHC" Phillip From yogi.sundaravadanam at agrf.org.au Thu Jun 21 18:21:28 2007 From: yogi.sundaravadanam at agrf.org.au (Yogi Sundaravadanam) Date: Fri, 22 Jun 2007 08:21:28 +1000 Subject: [EMBOSS] Sixpack/transeq frame translation In-Reply-To: <467929BB.8000509@ebi.ac.uk> Message-ID: >Do you mean the reverse frames are numbered differently? We followed >the >convention used by the Staden package. YES! >Would a -alternative option be useful in sixpack? YES... is there an alternative option for sixpack, though? Yogi -----Original Message----- From: Peter Rice [mailto:pmr at ebi.ac.uk] Sent: Wednesday, 20 June 2007 11:21 PM To: Yogi Sundaravadanam Cc: emboss Subject: Re: [EMBOSS] Sixpack/transeq frame translation Yogi Sundaravadanam wrote: > Sixpack/transeq translations don?t match BLAST translation. This is > giving me such grief. After some research, I found out that there?s an > ?alternate option in ?transeq? that translate the nucleic sequence to a > peptide, the BLAST way. Is it possible to do the same for Sixpack? I am not sure what you mean by "BLAST translation". Do you mean the reverse frames are numbered differently? We followed the convention used by the Staden package. Would a -alternative option be useful in sixpack? regards, Peter Rice