[EMBOSS] seqret output sequence format "ncbi"

Peter Rice pmr at ebi.ac.uk
Tue Aug 19 09:52:42 UTC 2008


john walshaw (JIC) wrote:
> Thanks for your help Peter, please see comments below. 
>> From: Peter Rice [mailto:pmr at ebi.ac.uk] 
>> Sent: 19 August 2008 09:30
>> You emntioned UniProt 14 - the latest release also includes 
>> extensions to the Fasta format description to tag species and 
>> other information. We are considering making this the default 
>> version of the FASTA format for EMBOSS so we can preserve 
>> more information - does this sound like a good idea?
> 
> Personally, I think this would be a good idea. I'm assuming that
> EMBOSS progs would themselves be able to parse these fields from the
> FASTA headers?

Yes ... assuming they fit the expected format. We have to hope that no
other "FASTA" format is using something similar. UniProt has a very
limited set of XX= tags - depending on which part of UniProt you look at.

>> Also on the subject of UniProt 14 - the .dat flat files have a new 
>> syntax for the DE lines. we had to ignore that as the cange appeared 
>> just before EMBOSS 6.0.0 Is anyone interested in having the details 
>> parsed out, or in having the original friendly description generated?
> 
> Having the option to parse them out would be useful :) These multiple
> names can be a bit awkward sometimes, so if UniProt and EMBOSS do some
> of the work for you, that's got to be good.

Thanks. We will try to parse them. If we can generate the equivalent
UniProt 13 descriptions then we will know we have a reasonable parser.

If we can parse them ... my preference would be to use the old-style
descriptions. Only UniProt seems to be using this new split in their 
releases.

regards,

Peter




More information about the EMBOSS mailing list