seqret has rich inbuilt functionality:
It can read and write any sequence format that EMBOSS supports (see the EMBOSS Users Guide), which includes all the common formats.
It uses the powerful USA syntax (see the EMBOSS Users Guide) for specifying the location and format of the sequence data.
seqret can read one or more sequences from a database, file, listfile (file of USAs), the command line or the output of other programs and can then write them to a file, database or to stdout
which can be redirected to another program. It's therefore useful for extracting sequences from databases and displaying them.
seqret supports many command line options which are not listed in the ACD file. These include global and datatype-specific qualifiers. Many others are available and may be used in combination, providing greater flexibility without the requirement for any additional programming.
It supports global qualifiers (see the EMBOSS Users Guide) which are available to all EMBOSS applications.
It supports various datatype-specific qualifiers (see the EMBOSS Users Guide); in this case, the sequence datatypes which include the seqall: sequence
input and seqoutall: outseq
output definitions from the ACD file.
For example, you can specify the input and output formats by using the -sformat
(input) and FormatName
-osformat
(output) options. If you don't specify the input format it will try all known formats until one succeeds, and if you don't specify the output format then FASTA will be used by default. seqret is useful for reformatting sequences, perhaps in preparation for input to another program.FormatName
The -sbegin
and -send
options specify the start and end position of a subsequence in the input sequence. This fragment is written to the output file, therefore seqret is useful for simple extraction of sequence regions. The -sreverse
switch will use the reverse complement of a nucleic acid sequence.
As you've seen from seqret.c
, no application code is needed to benefit from this inbuilt functionality. What's more, as new sequence input or output formats are added to EMBOSS, seqret will automatically be able to use them; no application code needs to change. It should be obvious then that this inbuilt functionality saves you, as a software developer, a great deal of time.
Examples illustrating a few of the many use cases of seqret are shown below.
Here seqret is being used to retrieve the entry 5HT1D_FUGRU
from the swissprot database and write the entry out in FASTA format:
%
seqret
Reads and writes (returns) sequences Input (gapped) sequence(s): swissprot:5HT1D_FUGRU Output sequence [5ht1d_fugru.fasta]:%
more 5ht1d_fugru.fasta >5HT1D_FUGRU P79748 5-hydroxytryptamine receptor 1D (5-HT-1D) MELDNNSLDYFSSNFTDIPSNTTVAHWTEATLLGLQISVSVVLAIVTLATMLSNAFVIAT IFLTRKLHTPANFLIGSLAVTDMLVSILVMPISIVYTVSKTWSLGQIVCDIWLSSDITFC TASILHLCVIALDRYWAITDALEYSKRRTMRRAAVMVAVVWVISISISMPPLFWRQAKAH EELKECMVNTDQISYTLYSTFGAFYVPTVLLIILYGRIYVAARSRIFKTPSYSGKRFTTA QLIQTSAGSSLCSLNSASNQEAHLHSGAGGEGGGSPLFVNSVKVKLADNVLERKRLCAAR ERKATKTLGIILGAFIICWLPFFVVTLVWAICKECSFDPLLFDVFTWLGYLNSLINPVIY TVFNDEFKQAFQKLIKFRR
The same thing could be achieved by using the USA (see the EMBOSS Users Guide) mechanism to specify the same input and output sequences on the command line:
seqret swissprot:5Ht1D_FUGRU fasta:5ht1d_fugru.fasta |
This example retrieves all of the sequences from the input stream. Not a very sensible thing to do, but it illustrates that the USA mechanism supports wildcard specification of sequences:
%
seqret swissprot:"*"
Reads and writes (returns) sequences Output sequence [ubr5_rat.fasta]: all.fasta%
more all.fasta >UBR5_RAT Q62671 E3 ubiquitin-protein ligase UBR5 (6.3.2.-) MMSARGDFLNYALSLMRSHNDEHSDVLPVLDVCSLKHVAYVFQALIYWIKAMNQQTTLDT PQLERKRTRELLELGIDNEDSEHENDDDTSQSATLNDKDDESLPAETGQNHPFFRRSDSM TFLGCIPPNPFEVPLAEAIPLADQPHLLQPNARKEDLFGRPSQGLYSSSAGSGKCLVEVT MDRNCLEVLPTKMSYAANLKNVMNMQNRQKKAGEDQSMLAEEADSSKPGPSAHDVAAQLK SSLLAEIGLTESEGPPLTSFRPQCSFMGMVISHDMLLGRWRLSLELFGRVFMEDVGAEPG SILTELGGFEVKESKFRREMEKLRNQQSRDLSLEVDRDRDLLIQQTMRQLNNHFGRRCAT TPMAVHRVKVTFKDEPGEGSGVARSFYTAIAQAFLSNEKLPNLDCIQNANKGTHTSLMQR LRNRGERDREREREREMRRSSGLRAGSRRDRDRDFRRQLSIDTRPFRPASEGNPSDDPDP LPAHRQALGERLYPRVQAMQPAFASKITGMLLELSPAQLLLLLASEDSLRARVEEAMELI VAHGRENGADSILDLGLLDSSEKVQENRKRHGSSRSVVDMDLDDTDDGDDNAPLFYQPGK RGFYTPRPGKNTEARLNCFRNIGRILGLCLLQNELCPITLNRHVIKVLLGRKVNWHDFAF FDPVMYESLRQLILASQSSDADAVFSAMDLAFAVDLCKEEGGGQVELIPNGVNIPVTPQN VYEYVRKYAEHRMLVVAEQPLHAMRKGLLDVLPKNSLEDLTAEDFRLLVNGCGEVNVQML ISFTSFNDESGENAEKLLQFKRWFWSIVERMSMTERQDLVYFWTSSPSLPASEEGFQPMP SITIRPPDDQHLPTANTCISRLYVPLYSSKQILKQKLLLAIKTKNFGFV >5HT1D_FUGRU P79748 5-hydroxytryptamine receptor 1D (5-HT-1D) (5HT1D) MELDNNSLDYFSSNFTDIPSNTTVAHWTEATLLGLQISVSVVLAIVTLATMLSNAFVIAT IFLTRKLHTPANFLIGSLAVTDMLVSILVMPISIVYTVSKTWSLGQIVCDIWLSSDITFC TASILHLCVIALDRYWAITDALEYSKRRTMRRAAVMVAVVWVISISISMPPLFWRQAKAH EELKECMVNTDQISYTLYSTFGAFYVPTVLLIILYGRIYVAARSRIFKTPSYSGKRFTTA QLIQTSAGSSLCSLNSASNQEAHLHSGAGGEGGGSPLFVNSVKVKLADNVLERKRLCAAR ERKATKTLGIILGAFIICWLPFFVVTLVWAICKECSFDPLLFDVFTWLGYLNSLINPVIY TVFNDEFKQAFQKLIKFRR >ACTB1_FUGRU P68142 Actin, cytoplasmic 1 (Beta-actin A) MEDEIAALVVDNGSGMCKAGFAGDDAPRAVFPSIVGRPRHQGVMVGMGQKDSYVGDEAQS KRGILTLKYPIEHGIVTNWDDMEKIWHHTFYNELRVAPEEHPVLLTEAPLNPKANREKMT QIMFETFNTPAMYVAIQAVLSLYASGRTTGIVMDSGDGVTHTVPIYEGYALPHAILRLDL... data omitted
The following example illustrates the use of the -firstonly
option. This is an advanced option therefore will never be prompted for; whenever used it has to be specified on the command line:
%
seqret swissprot:"*" -outseq all.fasta -firstonly
Reads and writes (returns) sequences%
more all.fasta >UBR5_RAT Q62671 E3 ubiquitin-protein ligase UBR5 (6.3.2.-) MMSARGDFLNYALSLMRSHNDEHSDVLPVLDVCSLKHVAYVFQALIYWIKAMNQQTTLDT PQLERKRTRELLELGIDNEDSEHENDDDTSQSATLNDKDDESLPAETGQNHPFFRRSDSM TFLGCIPPNPFEVPLAEAIPLADQPHLLQPNARKEDLFGRPSQGLYSSSAGSGKCLVEVT MDRNCLEVLPTKMSYAANLKNVMNMQNRQKKAGEDQSMLAEEADSSKPGPSAHDVAAQLK SSLLAEIGLTESEGPPLTSFRPQCSFMGMVISHDMLLGRWRLSLELFGRVFMEDVGAEPG SILTELGGFEVKESKFRREMEKLRNQQSRDLSLEVDRDRDLLIQQTMRQLNNHFGRRCAT TPMAVHRVKVTFKDEPGEGSGVARSFYTAIAQAFLSNEKLPNLDCIQNANKGTHTSLMQR LRNRGERDREREREREMRRSSGLRAGSRRDRDRDFRRQLSIDTRPFRPASEGNPSDDPDP LPAHRQALGERLYPRVQAMQPAFASKITGMLLELSPAQLLLLLASEDSLRARVEEAMELI VAHGRENGADSILDLGLLDSSEKVQENRKRHGSSRSVVDMDLDDTDDGDDNAPLFYQPGK RGFYTPRPGKNTEARLNCFRNIGRILGLCLLQNELCPITLNRHVIKVLLGRKVNWHDFAF FDPVMYESLRQLILASQSSDADAVFSAMDLAFAVDLCKEEGGGQVELIPNGVNIPVTPQN VYEYVRKYAEHRMLVVAEQPLHAMRKGLLDVLPKNSLEDLTAEDFRLLVNGCGEVNVQML ISFTSFNDESGENAEKLLQFKRWFWSIVERMSMTERQDLVYFWTSSPSLPASEEGFQPMP
Here, usage information is being generated by using the -help
global qualifier:
%
seqret -help
Standard (Mandatory) qualifiers: [-sequence] seqall (Gapped) sequence(s) filename and optional format, or reference (input USA) [-outseq] seqoutall Sequence set(s) filename and optional format (output USA) Additional (Optional) qualifiers: (none) Advanced (Unprompted) qualifiers: -feature boolean Use feature information -firstonly boolean Read one sequence and stop General qualifiers: -help boolean Report command line options. More information on associated and general qualifiers can be found with -help -verbose
In the example below, the sequence-specific command line options -sbegin
and -send
are used to specify a sequence region:
%
seqret swissprot:5HT1D_FUGRU -sbegin 10 -send 20 fasta:5ht1d_fugru.fasta
Reads and writes (returns) sequences%
more 5ht1d_fugru.fasta >5HT1D_FUGRU P79748 5-HYDROXYTRYPTAMINE 1D RECEPTOR (5-HT-1D) (SEROTONIN RECEPTOR). YFSSNFTDIPS