EMBOSS applications are invoked by typing their name at a command line prompt. For example, to run the seqret application you would type:
If you're not certain of the application you need, see the tables of application names and short descriptions (Section 3.1, “Application Documentation”).
The same information is retrieved by running the wossname application. This searches for keywords or parts of words in the application short description (the text that is displayed by a program when it first starts). If no keywords are specified, then details of all the EMBOSS programs are output. Simply type:
Every application has a set of options allowing you to specify all of the inputs and outputs, including input and data files and values that control how the application operates. Options might be application-specific, available for particular datatypes only (datatype-specific or available for all datatypes (global). All options are described in the application documentation:
CVS (Developers) Release (http://emboss.open-bio.org/rel/dev/apps/)
Stable Release 6 (http://emboss.open-bio.org/rel/rel6/apps/)
Application-specific options are defined in an Ajax Command Definition (ACD) file, associated with the EMBOSS program. To retrieve this list of options from the command line, run the application with
-help (and nothing else):
To get a complete list of options that includes datatype-specific options (inbuilt options associated with the datatypes the application processes), and global options (ones available to all applications), run the application with
Some application options must be specified and some are optional. EMBOSS makes the distinction between application parameters and qualifiers. Parameters are always required and prompted for if necessary whereas qualifiers may or may not be required and prompted for, depending on how they are specified in the ACD file.
Values for parameters and qualifiers are set either on the command line used to run the program, or as a response to a prompt generated by EMBOSS before the main application code runs. Any required values that you have not already given on the command line will be prompted for automatically.
For example, the seqret application can be run with an input sequence by typing:
seqret, however, has two parameters. They are the input and output sequence files, therefore, if you type the above command you will be prompted for the output sequence.
Datatype-specific qualifiers (Section 6.4, “Datatype-specific Command Line Qualifiers”) are available for specific input and output datatypes, for any application which uses these datatypes. They are used to specify a particular input or output in more detail, for instance the format of an output sequence file. The command below calls seqret with the
-osformat qualifier to set the output format of the sequence file to
-osformat is specific to the sequence output datatypes:
Global qualifiers (Section 6.3, “Global Command Line Qualifiers”) are available to all EMBOSS applications. They change the behaviour of the program for which they are set. You've already come across the use
-help which is a global qualifier to retrieve application options:
Application-specific options are defined in the Ajax Command Definition (ACD) file that is associated with the EMBOSS program. The ACD file determines exactly what can appear on the command line and how values are prompted for. If you only intend to use but not write ACD files, then you don't need to know the ACD syntax or even look at the ACD file. All parameters and qualifiers are described in the application documentation and help is available at the command line by using
Every application option has a corresponding definition in the ACD file and is defined as one of:
with the default of:
Parameters are usually the primary input and output files whereas qualifiers are used for other options.
You don't need to use a flag to specify a value for a parameter on the command line. Values are typically specified like this:
It is, however, necessary to give such unqualified parameter values in the same order as the corresponding data definitions appear in the ACD file (and documentation).
In contrast, you must use a flag to give a value for a qualifier. Values for standard, additional and advanced qualifiers are specified like this:
The flag can optionally be given for a parameter too:
In either case, where the flag is given, values can be given in any order. The flags (parameter or qualifier names) are listed in the documentation, are shown when running the application with
-help, or can be seen in the ACD file itself (they are the text tokens given after the colon (
:) on the first line of each data definition.)
seqret.acd two parameters are defined; an input sequence (with the parameter name
sequence) and an output sequence (called
outseq). The input sequence is defined before the output sequence:
application: seqret [ documentation: "Reads and writes (returns) sequences" groups: "Edit" ] section: input [ information: "Input section" type: "page" ] seqall: sequence [ parameter: "Y" ] endsection: input . . . section: output [ information: "Output section" type: "page" ] seqoutall: outseq [ parameter: "Y" ] endsection: output
Assuming our input sequence was in the file
input.seq and you wanted to write a file called
output.seq, the following command is perfectly valid:
Whereas the following command would mess things up:
EMBOSS would try to open a file called
output.seq for reading, and would also open a file called
input.seq for writing, possibly overwriting a valuable data file in the latter case!
Where the flags are used, values can be given in any order, so either of the following is perfectly valid:
Datatype-specific qualifiers (Section 6.4, “Datatype-specific Command Line Qualifiers”) are available for specific input and output datatypes. They are used to specify a particular input or output in more detail, for instance the format of an output sequence file, or the types of data that are written in an application report.
In cases where an application has two or more options of the same ACD datatype, the command line flags refer to the option that preceded the flag on the command line, but not those appearing afterwards. Flags that are specific to options of different datatypes can be intermixed: the order is not important.
In the example below, the program seqret takes two parameters, an input sequence (file
in.seq) and an output sequence (
out.seq) . The order of the command line flags that follow is irrelevant as the two qualifiers refer to different datatypes:
In the following example, the program water takes two parameters, both input sequences (files
noot.seq, of datatypes
seqall, each of which can have a
-sformat qualifier), and here the order of the qualifiers is important. Assuming
aap.seq is in FASTA format and
noot.seq is in GCG format we have:
Instead of having to adhere to a rigorous order for command line flags when two or more options of the same (class of) datatype are defined, it is also possible to use numbers with the qualifier/parameter names, to indicate the option to which the flag refers.
This is formalised as follows:
QualifierPosition is an integer number indicating the option to which the flag refers. The number should reflect the order of that option in the ACD file relative to other options of the same type: it is not the absolute position of the data definition! For example, if an ACD file contains two
sequence input parameters (at the top of the ACD file) and two
align output parameters for alignment output (at the bottom of the file), the
align parameters would be numbered
2 respectively, not
4 which would be their absolute position in the file.
In the following example, qualifier numbering indicates that the format of the first parameter is
fasta and the second
As a further example, consider the ACD file below:
application: seqtest sequence: asequence [ parameter: Y ] int : wibblefactor [ parameter: Y ] sequence: bsequence [ parameter: Y ]
The following command line:
defines that the first sequence file (
seqtest.in) is in GCG format and the second sequence file (
seqtest.out) is in FASTA format. Note that the second
-sformat qualifier has been numbered 2 because it is the second sequence parameter, even though it is the third parameter in the file.
Global qualifiers (Section 6.3, “Global Command Line Qualifiers”) are command line qualifiers that are available to all EMBOSS applications. They change the behaviour of the program for which they are set. They are used in the same way as any other qualifier, but are usually given on the command line after the application name and other parameters.
EMBOSS supports three different command line styles. In the examples below, the seqret application is used to retrieve a 100 nucleotide sequence from the input sequence
P10932 from the EMBL database. The global qualifier
-auto is used to turn off any prompting of the user.
seqret embl:P10932 -send 100 -auto
seqret -send 100 embl:P10932 -auto
seqret embl:P10932 -end=100 -auto
seqret /SEQUENCE=EMBL:P10932 /SEND=100 /AUTO
As you can see, the command line syntax is very versatile. To save confusion, it is strongly recommended that you use the UNIX command style.
The general behaviour of EMBOSS programs such as prompting for values, the directory to be searched for data files, default sequence formats, messaging etc, may be controlled with environment variables. See Section 2.8, “Maintenance” for more information.