EMBOSS applications are invoked by typing their name at a command line prompt. For example, to run the seqret application you would type:
seqret |
If you're not certain of the application you need, see the tables of application names and short descriptions (Section 3.1, “Application Documentation”).
The same information is retrieved by running the wossname application. This searches for keywords or parts of words in the application short description (the text that is displayed by a program when it first starts). If no keywords are specified, then details of all the EMBOSS programs are output. Simply type:
wossname |
Every application has a set of options allowing you to specify all of the inputs and outputs, including input and data files and values that control how the application operates. Options might be application-specific, available for particular datatypes only (datatype-specific or available for all datatypes (global). All options are described in the application documentation:
CVS (Developers) Release (http://emboss.open-bio.org/rel/dev/apps/)
Stable Release 6 (http://emboss.open-bio.org/rel/rel6/apps/)
Application-specific options are defined in an Ajax Command Definition (ACD) file, associated with the EMBOSS program. To retrieve this list of options from the command line, run the application with -help
(and nothing else):
seqret -help |
To get a complete list of options that includes datatype-specific options (inbuilt options associated with the datatypes the application processes), and global options (ones available to all applications), run the application with -help -verbose
:
seqret -help -verbose |
Some application options must be specified and some are optional. EMBOSS makes the distinction between application parameters and qualifiers. Parameters are always required and prompted for if necessary whereas qualifiers may or may not be required and prompted for, depending on how they are specified in the ACD file.
Values for parameters and qualifiers are set either on the command line used to run the program, or as a response to a prompt generated by EMBOSS before the main application code runs. Any required values that you have not already given on the command line will be prompted for automatically.
For example, the seqret application can be run with an input sequence by typing:
seqret input.seq |
seqret, however, has two parameters. They are the input and output sequence files, therefore, if you type the above command you will be prompted for the output sequence.
Datatype-specific qualifiers (Section 6.4, “Datatype-specific Command Line Qualifiers”) are available for specific input and output datatypes, for any application which uses these datatypes. They are used to specify a particular input or output in more detail, for instance the format of an output sequence file. The command below calls seqret with the -osformat
qualifier to set the output format of the sequence file to embl
. -osformat
is specific to the sequence output datatypes:
seqret input.seq -osformat embl |
Global qualifiers (Section 6.3, “Global Command Line Qualifiers”) are available to all EMBOSS applications. They change the behaviour of the program for which they are set. You've already come across the use -help
which is a global qualifier to retrieve application options:
seqret -help |
Application-specific options are defined in the Ajax Command Definition (ACD) file that is associated with the EMBOSS program. The ACD file determines exactly what can appear on the command line and how values are prompted for. If you only intend to use but not write ACD files, then you don't need to know the ACD syntax or even look at the ACD file. All parameters and qualifiers are described in the application documentation and help is available at the command line by using -help
.
Every application option has a corresponding definition in the ACD file and is defined as one of:
parameter
standard qualifier
additional qualifier
with the default of:
advanced qualifier
Parameters are usually the primary input and output files whereas qualifiers are used for other options.
You don't need to use a flag to specify a value for a parameter on the command line. Values are typically specified like this:
|
It is, however, necessary to give such unqualified parameter values in the same order as the corresponding data definitions appear in the ACD file (and documentation).
In contrast, you must use a flag to give a value for a qualifier. Values for standard, additional and advanced qualifiers are specified like this:
|
The flag can optionally be given for a parameter too:
|
In either case, where the flag is given, values can be given in any order. The flags (parameter or qualifier names) are listed in the documentation, are shown when running the application with -help
, or can be seen in the ACD file itself (they are the text tokens given after the colon (:
) on the first line of each data definition.)
Example. In seqret.acd
two parameters are defined; an input sequence (with the parameter name sequence
) and an output sequence (called outseq
). The input sequence is defined before the output sequence:
application: seqret [ documentation: "Reads and writes (returns) sequences" groups: "Edit" ] section: input [ information: "Input section" type: "page" ] seqall: sequence [ parameter: "Y" ] endsection: input . . . section: output [ information: "Output section" type: "page" ] seqoutall: outseq [ parameter: "Y" ] endsection: output
Assuming our input sequence was in the file input.seq
and you wanted to write a file called output.seq
, the following command is perfectly valid:
seqret input.seq output.seq |
Whereas the following command would mess things up:
seqret output.seq input.seq |
EMBOSS would try to open a file called output.seq
for reading, and would also open a file called input.seq
for writing, possibly overwriting a valuable data file in the latter case!
Where the flags are used, values can be given in any order, so either of the following is perfectly valid:
seqret -sequence input.seq -outseq output.seq |
seqret -outseq output.seq -sequence input.seq |
Datatype-specific qualifiers (Section 6.4, “Datatype-specific Command Line Qualifiers”) are available for specific input and output datatypes. They are used to specify a particular input or output in more detail, for instance the format of an output sequence file, or the types of data that are written in an application report.
In cases where an application has two or more options of the same ACD datatype, the command line flags refer to the option that preceded the flag on the command line, but not those appearing afterwards. Flags that are specific to options of different datatypes can be intermixed: the order is not important.
In the example below, the program seqret takes two parameters, an input sequence (file in.seq
) and an output sequence (out.seq
) . The order of the command line flags that follow is irrelevant as the two qualifiers refer to different datatypes:
seqret |
In the following example, the program water takes two parameters, both input sequences (files aap.seq
and noot.seq
, of datatypes sequence
and seqall
, each of which can have a -sformat
qualifier), and here the order of the qualifiers is important. Assuming aap.seq
is in FASTA format and noot.seq
is in GCG format we have:
water |
Instead of having to adhere to a rigorous order for command line flags when two or more options of the same (class of) datatype are defined, it is also possible to use numbers with the qualifier/parameter names, to indicate the option to which the flag refers.
This is formalised as follows:
- |
where QualifierPosition
is an integer number indicating the option to which the flag refers. The number should reflect the order of that option in the ACD file relative to other options of the same type: it is not the absolute position of the data definition! For example, if an ACD file contains two sequence
input parameters (at the top of the ACD file) and two align
output parameters for alignment output (at the bottom of the file), the align
parameters would be numbered 1
and 2
respectively, not 3
and 4
which would be their absolute position in the file.
In the following example, qualifier numbering indicates that the format of the first parameter is fasta
and the second gcg
:
someprogram aap.seq noot.seq -sformat2 gcg -sformat1 fasta |
As a further example, consider the ACD file below:
application: seqtest sequence: asequence [ parameter: Y ] int : wibblefactor [ parameter: Y ] sequence: bsequence [ parameter: Y ]
The following command line:
seqtest |
defines that the first sequence file (seqtest.in
) is in GCG format and the second sequence file (seqtest.out
) is in FASTA format. Note that the second -sformat
qualifier has been numbered 2 because it is the second sequence parameter, even though it is the third parameter in the file.
Global qualifiers (Section 6.3, “Global Command Line Qualifiers”) are command line qualifiers that are available to all EMBOSS applications. They change the behaviour of the program for which they are set. They are used in the same way as any other qualifier, but are usually given on the command line after the application name and other parameters.
EMBOSS supports three different command line styles. In the examples below, the seqret application is used to retrieve a 100 nucleotide sequence from the input sequence P10932
from the EMBL database. The global qualifier -auto
is used to turn off any prompting of the user.
Unix style:
%
seqret embl:P10932 -send 100 -auto
%
seqret -send 100 embl:P10932 -auto
SeqPup style:
%
seqret embl:P10932 -end=100 -auto
VMS style:
%
seqret /SEQUENCE=EMBL:P10932 /SEND=100 /AUTO
As you can see, the command line syntax is very versatile. To save confusion, it is strongly recommended that you use the UNIX command style.
The general behaviour of EMBOSS programs such as prompting for values, the directory to be searched for data files, default sequence formats, messaging etc, may be controlled with environment variables. See Section 2.8, “Maintenance” for more information.