9.3. Inbuilt Functionality

9.3. Inbuilt Functionality
Prev	Chapter 9. A Complete Application	Next

seqret has rich inbuilt functionality:

It can read and write any sequence format that EMBOSS supports (see the EMBOSS Users Guide), which includes all the common formats.
It uses the powerful USA syntax (see the EMBOSS Users Guide) for specifying the location and format of the sequence data.
seqret can read one or more sequences from a database, file, listfile (file of USAs), the command line or the output of other programs and can then write them to a file, database or to stdout which can be redirected to another program. It's therefore useful for extracting sequences from databases and displaying them.
seqret supports many command line options which are not listed in the ACD file. These include global and datatype-specific qualifiers. Many others are available and may be used in combination, providing greater flexibility without the requirement for any additional programming.
It supports global qualifiers (see the EMBOSS Users Guide) which are available to all EMBOSS applications.
It supports various datatype-specific qualifiers (see the EMBOSS Users Guide); in this case, the sequence datatypes which include the seqall: sequence input and seqoutall: outseq output definitions from the ACD file.
For example, you can specify the input and output formats by using the -sformat FormatName (input) and -osformat FormatName (output) options. If you don't specify the input format it will try all known formats until one succeeds, and if you don't specify the output format then FASTA will be used by default. seqret is useful for reformatting sequences, perhaps in preparation for input to another program.
The -sbegin and -send options specify the start and end position of a subsequence in the input sequence. This fragment is written to the output file, therefore seqret is useful for simple extraction of sequence regions. The -sreverse switch will use the reverse complement of a nucleic acid sequence.

As you've seen from seqret.c, no application code is needed to benefit from this inbuilt functionality. What's more, as new sequence input or output formats are added to EMBOSS, seqret will automatically be able to use them; no application code needs to change. It should be obvious then that this inbuilt functionality saves you, as a software developer, a great deal of time.

Examples illustrating a few of the many use cases of seqret are shown below.

Here seqret is being used to retrieve the entry 5HT1D_FUGRU from the swissprot database and write the entry out in FASTA format:

% seqret
Reads and writes (returns) sequences
Input (gapped) sequence(s): swissprot:5HT1D_FUGRU
Output sequence [5ht1d_fugru.fasta]:
% more 5ht1d_fugru.fasta
>5HT1D_FUGRU P79748 5-hydroxytryptamine receptor 1D (5-HT-1D)
MELDNNSLDYFSSNFTDIPSNTTVAHWTEATLLGLQISVSVVLAIVTLATMLSNAFVIAT
IFLTRKLHTPANFLIGSLAVTDMLVSILVMPISIVYTVSKTWSLGQIVCDIWLSSDITFC
TASILHLCVIALDRYWAITDALEYSKRRTMRRAAVMVAVVWVISISISMPPLFWRQAKAH
EELKECMVNTDQISYTLYSTFGAFYVPTVLLIILYGRIYVAARSRIFKTPSYSGKRFTTA
QLIQTSAGSSLCSLNSASNQEAHLHSGAGGEGGGSPLFVNSVKVKLADNVLERKRLCAAR
ERKATKTLGIILGAFIICWLPFFVVTLVWAICKECSFDPLLFDVFTWLGYLNSLINPVIY
TVFNDEFKQAFQKLIKFRR

The same thing could be achieved by using the USA (see the EMBOSS Users Guide) mechanism to specify the same input and output sequences on the command line:

seqret swissprot:5Ht1D_FUGRU fasta:5ht1d_fugru.fasta

This example retrieves all of the sequences from the input stream. Not a very sensible thing to do, but it illustrates that the USA mechanism supports wildcard specification of sequences:

% seqret swissprot:"*"
Reads and writes (returns) sequences
Output sequence [ubr5_rat.fasta]: all.fasta
% more all.fasta
>UBR5_RAT Q62671 E3 ubiquitin-protein ligase UBR5 (6.3.2.-)
MMSARGDFLNYALSLMRSHNDEHSDVLPVLDVCSLKHVAYVFQALIYWIKAMNQQTTLDT
PQLERKRTRELLELGIDNEDSEHENDDDTSQSATLNDKDDESLPAETGQNHPFFRRSDSM
TFLGCIPPNPFEVPLAEAIPLADQPHLLQPNARKEDLFGRPSQGLYSSSAGSGKCLVEVT
MDRNCLEVLPTKMSYAANLKNVMNMQNRQKKAGEDQSMLAEEADSSKPGPSAHDVAAQLK
SSLLAEIGLTESEGPPLTSFRPQCSFMGMVISHDMLLGRWRLSLELFGRVFMEDVGAEPG
SILTELGGFEVKESKFRREMEKLRNQQSRDLSLEVDRDRDLLIQQTMRQLNNHFGRRCAT
TPMAVHRVKVTFKDEPGEGSGVARSFYTAIAQAFLSNEKLPNLDCIQNANKGTHTSLMQR
LRNRGERDREREREREMRRSSGLRAGSRRDRDRDFRRQLSIDTRPFRPASEGNPSDDPDP
LPAHRQALGERLYPRVQAMQPAFASKITGMLLELSPAQLLLLLASEDSLRARVEEAMELI
VAHGRENGADSILDLGLLDSSEKVQENRKRHGSSRSVVDMDLDDTDDGDDNAPLFYQPGK
RGFYTPRPGKNTEARLNCFRNIGRILGLCLLQNELCPITLNRHVIKVLLGRKVNWHDFAF
FDPVMYESLRQLILASQSSDADAVFSAMDLAFAVDLCKEEGGGQVELIPNGVNIPVTPQN
VYEYVRKYAEHRMLVVAEQPLHAMRKGLLDVLPKNSLEDLTAEDFRLLVNGCGEVNVQML
ISFTSFNDESGENAEKLLQFKRWFWSIVERMSMTERQDLVYFWTSSPSLPASEEGFQPMP
SITIRPPDDQHLPTANTCISRLYVPLYSSKQILKQKLLLAIKTKNFGFV
>5HT1D_FUGRU P79748 5-hydroxytryptamine receptor 1D (5-HT-1D) (5HT1D)
MELDNNSLDYFSSNFTDIPSNTTVAHWTEATLLGLQISVSVVLAIVTLATMLSNAFVIAT
IFLTRKLHTPANFLIGSLAVTDMLVSILVMPISIVYTVSKTWSLGQIVCDIWLSSDITFC
TASILHLCVIALDRYWAITDALEYSKRRTMRRAAVMVAVVWVISISISMPPLFWRQAKAH
EELKECMVNTDQISYTLYSTFGAFYVPTVLLIILYGRIYVAARSRIFKTPSYSGKRFTTA
QLIQTSAGSSLCSLNSASNQEAHLHSGAGGEGGGSPLFVNSVKVKLADNVLERKRLCAAR
ERKATKTLGIILGAFIICWLPFFVVTLVWAICKECSFDPLLFDVFTWLGYLNSLINPVIY
TVFNDEFKQAFQKLIKFRR
>ACTB1_FUGRU P68142 Actin, cytoplasmic 1 (Beta-actin A)
MEDEIAALVVDNGSGMCKAGFAGDDAPRAVFPSIVGRPRHQGVMVGMGQKDSYVGDEAQS
KRGILTLKYPIEHGIVTNWDDMEKIWHHTFYNELRVAPEEHPVLLTEAPLNPKANREKMT
QIMFETFNTPAMYVAIQAVLSLYASGRTTGIVMDSGDGVTHTVPIYEGYALPHAILRLDL
... data omitted

The following example illustrates the use of the -firstonly option. This is an advanced option therefore will never be prompted for; whenever used it has to be specified on the command line:

% seqret swissprot:"*" -outseq all.fasta -firstonly
Reads and writes (returns) sequences
%more all.fasta
>UBR5_RAT Q62671 E3 ubiquitin-protein ligase UBR5 (6.3.2.-)
MMSARGDFLNYALSLMRSHNDEHSDVLPVLDVCSLKHVAYVFQALIYWIKAMNQQTTLDT
PQLERKRTRELLELGIDNEDSEHENDDDTSQSATLNDKDDESLPAETGQNHPFFRRSDSM
TFLGCIPPNPFEVPLAEAIPLADQPHLLQPNARKEDLFGRPSQGLYSSSAGSGKCLVEVT
MDRNCLEVLPTKMSYAANLKNVMNMQNRQKKAGEDQSMLAEEADSSKPGPSAHDVAAQLK
SSLLAEIGLTESEGPPLTSFRPQCSFMGMVISHDMLLGRWRLSLELFGRVFMEDVGAEPG
SILTELGGFEVKESKFRREMEKLRNQQSRDLSLEVDRDRDLLIQQTMRQLNNHFGRRCAT
TPMAVHRVKVTFKDEPGEGSGVARSFYTAIAQAFLSNEKLPNLDCIQNANKGTHTSLMQR
LRNRGERDREREREREMRRSSGLRAGSRRDRDRDFRRQLSIDTRPFRPASEGNPSDDPDP
LPAHRQALGERLYPRVQAMQPAFASKITGMLLELSPAQLLLLLASEDSLRARVEEAMELI
VAHGRENGADSILDLGLLDSSEKVQENRKRHGSSRSVVDMDLDDTDDGDDNAPLFYQPGK
RGFYTPRPGKNTEARLNCFRNIGRILGLCLLQNELCPITLNRHVIKVLLGRKVNWHDFAF
FDPVMYESLRQLILASQSSDADAVFSAMDLAFAVDLCKEEGGGQVELIPNGVNIPVTPQN
VYEYVRKYAEHRMLVVAEQPLHAMRKGLLDVLPKNSLEDLTAEDFRLLVNGCGEVNVQML
ISFTSFNDESGENAEKLLQFKRWFWSIVERMSMTERQDLVYFWTSSPSLPASEEGFQPMP

Here, usage information is being generated by using the -help global qualifier:

% seqret -help
   Standard (Mandatory) qualifiers:
  [-sequence]          seqall     (Gapped) sequence(s) filename and optional
                                  format, or reference (input USA)
  [-outseq]            seqoutall  Sequence set(s) filename and optional format
                                  (output USA)

   Additional (Optional) qualifiers: (none)
   Advanced (Unprompted) qualifiers:
   -feature            boolean    Use feature information
   -firstonly          boolean    Read one sequence and stop

   General qualifiers:
   -help               boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose

In the example below, the sequence-specific command line options -sbegin and -send are used to specify a sequence region:

% seqret swissprot:5HT1D_FUGRU -sbegin 10 -send 20 fasta:5ht1d_fugru.fasta
Reads and writes (returns) sequences
%more 5ht1d_fugru.fasta
>5HT1D_FUGRU P79748 5-HYDROXYTRYPTAMINE 1D RECEPTOR (5-HT-1D) (SEROTONIN RECEPTOR).
YFSSNFTDIPS

Prev	Up	Next
9.2. The C Source Code (`seqret.c`)	Home	9.4. Documentation (`seqret.html`)