[emboss-dev] EMBOSS and its FASTA like alignment output

Peter Rice pmr at ebi.ac.uk
Mon Aug 3 15:31:41 UTC 2009


Peter wrote:
> Hi,
> 
> One of the many things I talked to Peter Rice about in Sweden
> was the Pearson FASTA like output from needle and water (e.g.
> what EMBOSS calls the markx10 output format), and why it
> includes the EMBOSS header and footer lines (which start with
> a # character), which are not present in real FASTA output.
> 
> Biopython can parse the pairwise -m 10 output from Bill
> Pearson's FASTA tools, so in theory we (Biopython) should
> be able to parse the markx10 output from EMBOSS needle
> and water. We could probably cope with the extra header
> and footer, but I think it would be best if EMBOSS could
> produce something more closely matching the real FASTA
> output. Unfortunately, it appears to be more than just the
> headers which upset our parser - even ignoring them,
> EMBOSS markx10 output still looks rather different to
> (current) FASTA -m 10 output. Was the markx10 output
> mimicking a particular (old) version of the FASTA tools?

I have checked the latest FASTA3 and FASTA2 tools from Bill Pearson.

What does BioPython expect as "markx10" and the other markx formats?

There are extra lines reporting equivalent data to the EMBOSS alignment
headers which we could include, but I would like to know there is a
parser that can accept them as markx* format in each case.

In this case "more closely matching" may not be close enough :-)

regards,

Peter Rice



More information about the emboss-dev mailing list