[EMBOSS] Compseq DNA/Protein sequence problem

Bernd Web bernd.web at gmail.com
Mon Apr 23 17:28:15 EDT 2007


Hi Annette,

Your seq1 is incorrectly guessed to be a nucleotide sequence, since
you state it's protein. EMBOSS provides a boolean to state nucleotide
or protein nature of your sequence, see EMBOSS help:

 "-sequence" associated qualifiers
 -snucleotide1       boolean    Sequence is nucleotide
 -sprotein1            boolean    Sequence is protein

regards,
bernd

On 4/23/07, Becher, Anette <anette.becher at agresearch.co.nz> wrote:
> Hi all,
>
> I believe I *may* have found a bug in compseq.
>
> I have been using compseq to calculate the frequency of amino acids in
> translated DNA sequences. I find that frequently compseq takes the amino
> acid sequence to be DNA (they are sequences with an unusual composition,
> but then I am looking for odd proteins). So instead of the expected
> output for all amino acids with most being zero, I often get output for
> A,C,G,T and 'other'. I cannot see an obvious pattern that would explain
> this behaviour, but maybe you can help.
>
> Command line:
>
> compseq -seq compseq_bug.in -word 1 -frame 1 -out compseq_bug.out
>
> An example input and output file are pasted in below - I can provide
> many more.
>
> It might help if the user could specify whether the input sequence is
> DNA or protein, rather than the program working it out somehow?
>
>
> Best wishes
>
>
> Anette
>
>
>
> Here is an example of the problem:
>
>
> >Seq1
> GSGGGGGSGGRGMGGWGGGRGSGVGGRGWGVG
>
>
> #
> # Output from 'compseq'
> #
> # Only words in frame 1 will be counted.
> # The Expected frequencies are calculated on the (false) assumption that
> every
> # word has equal frequency.
> #
> # The input sequences are:
> #       Seq1
>
>
> Word size       1
> Total count     31
>
> #
> # Word  Obs Count       Obs Frequency   Exp Frequency   Obs/Exp
> Frequency
> #
> A       0               0.0000000       0.2500000       0.0000000
> C       0               0.0000000       0.2500000       0.0000000
> G       20              0.6451613       0.2500000       2.5806452
> T       0               0.0000000       0.2500000       0.0000000
>
> Other   11              0.3548387       0.0000000
> 10000000000.0000000
>
>
>
>
> Here is a similar sequence that works fine:
>
>
> >Seq2
> VGSEGGGGGRRGEGGGGGGRGGGGGRWEEGAG
>
>
>
> #
> # Output from 'compseq'
> #
> # Only words in frame 1 will be counted.
> # The Expected frequencies are calculated on the (false) assumption that
> every
> # word has equal frequency.
> #
> # The input sequences are:
> #       Seq2
>
>
> Word size       1
> Total count     31
>
> #
> # Word  Obs Count       Obs Frequency   Exp Frequency   Obs/Exp
> Frequency
> #
> A       1               0.0322581       0.0476190       0.6774194
> C       0               0.0000000       0.0476190       0.0000000
> D       0               0.0000000       0.0476190       0.0000000
> E       4               0.1290323       0.0476190       2.7096774
> F       0               0.0000000       0.0476190       0.0000000
> G       20              0.6451613       0.0476190       13.5483871
> H       0               0.0000000       0.0476190       0.0000000
> I       0               0.0000000       0.0476190       0.0000000
> K       0               0.0000000       0.0476190       0.0000000
> L       0               0.0000000       0.0476190       0.0000000
> M       0               0.0000000       0.0476190       0.0000000
> N       0               0.0000000       0.0476190       0.0000000
> P       0               0.0000000       0.0476190       0.0000000
> Q       0               0.0000000       0.0476190       0.0000000
> R       4               0.1290323       0.0476190       2.7096774
> S       1               0.0322581       0.0476190       0.6774194
> T       0               0.0000000       0.0476190       0.0000000
> U       0               0.0000000       0.0476190       0.0000000
> V       0               0.0000000       0.0476190       0.0000000
> W       1               0.0322581       0.0476190       0.6774194
> Y       0               0.0000000       0.0476190       0.0000000
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
>
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss
>


More information about the EMBOSS mailing list