A.1. Introduction to ACD Syntax

A.1.1. General Syntax

The Ajax Command Definition (ACD) language was designed for writing ACD files for EMBOSS applications. Every application in EMBOSS or EMBASSY has an ACD file. The ACD syntax allows for very flexible descriptions of an application's parameters and its command line interface. It can specify everything that can appear on the command line or in another interface such as a web page.

ACD files are plain ASCII text files and must have the extension .acd. Typically they have the same name as the application, but this is not mandatory.

A.1.1.1. Whitespace

During ACD file parsing, the entire file contents are effectively treated as a single string which is parsed into tokens delimited by space characters. A single space between individual tokens is required: extraneous whitespaces are ignored.

A.1.1.2. Comments

Comment lines can be added and begin with "#" and continue to the end of the line.

A.1.2. ACD Definitions

An ACD file contains a single application definition and a data definition for each parameter. The application definition is given first, followed by the data definitions. Data definitions are organised into sections (see Section A.1.6, “ACD File Sections”).

Application and data definitions have the following general form: a single text token followed by a colon ':' (or '=') and a white space, followed by a second token. The definition body follows, which is one or more attributes delimited by a mandatory pair of square brackets [ ], which can span multiple lines. Each attribute is a name: value pair with the attribute value given between quotes (" "):

Either:

token: token 
[
   Attribute1Name: "Attribute1Value"
   Attribute2Name: "Attribute2Value"
]

Or:

token=token 
[
   Attribute1Name: "Attribute1Value"
   Attribute2Name: "Attribute2Value"
]

The first token is either application: (for the application definition) or an AJAX datatype (e.g. sequence) for data definitions. The second token is either the name of the application (e.g. seqret) or the name of parameter (e.g. asequence).

Application definition:

application: ApplicationName 
[
   ApplicationAttribute1Name: "ApplicationAttribute1Value"
   ApplicationAttribute2Name: "ApplicationAttribute2Value"
]

Data definition:

Datatype: ParameterName 
[
   DataAttribute1Name: "DataAttribute1Value"
   DataAttribute2Name: "DataAttribute2Value"
]

The application token and tokens for the datatype and attribute names can be can be abbreviated up to the point where they are not ambiguous. Such abbreviations are not recommended however because they tend to make the ACD file more difficult to read.

Attribute values are normally enclosed in double quotes, although this is only mandatory for values (typically strings) which include whitespace.

A.1.2.1. Application Definition

The application definition must be the first definition in the file:

application: ApplicationName 
[
   ApplicationAttribute1Name: "ApplicationAttribute1Value"
   ApplicationAttribute2Name: "ApplicationAttribute2Value"
]

The application name is arbitrary but is typically the same as that used for the ACD file name. It is the ACD file name (not ApplicationName, if different) that's used from within the application C source code to associate it with an ACD file. This allows multiple ACD files (and therefore command line interfaces) to be developed for a single file of application C source code.

For complete description of the available application attributes see Section A.3, “Application Attributes”.

A.1.2.2. Data Definition

All application parameters must have a data definition. Data definitions follow the application definition and must be placed in an appropriate file section (see Section A.1.6, “ACD File Sections”):

Datatype: ParameterName 
[
   DataAttribute1Name: "DataAttribute1Value"
   DataAttribute2Name: "DataAttribute2Value"
]

Datatype must be a valid ACD datatype. For a complete descriptions of the available datatypes see Section A.2, “Datatypes”.

ParameterName is the name of the parameter. It is a string that must conform to certain conventions (Section A.1, “Introduction to ACD Syntax”). This name is used to refer to the data definition from the command line and from within the C source code (see Section 6.3, “Handling ACD Files”).

For a complete description of the available attributes see:

Global attributes (Section A.4, “Global Attributes”)
Datatype-specific attributes (Section A.5, “Datatype-specific Attributes”)
Calculated attributes (Section A.6, “Calculated Attributes”)

A.1.3. Parameter Naming Conventions

A.1.3.1. General Conventions

The general conventions for parameter and qualifier names are as follows:

  • Must not contain whitespace characters

  • Should not normally be single characters

  • Should be meaningful words and indicate the function of the option so far as possible

  • They are not case sensitive

A.1.3.2. Datatype-specific Conventions

The conventions for parameter names that apply for individual datatypes are given in the table below.

Where more than one instance of a datatype is specified in an ACD file, then the character a, b etc can be appended to the flag: asequence, bsequence etc. This is indicated in the table by an asterisk in the parameter name, for example *sequence (see Table A.1, “Parameter and Qualifier Naming Conventions”).

Table A.1. Parameter and Qualifier Naming Conventions

Datatype

Name

Usage

sequence

sequence, *sequence

Primary input sequence, generally required

seqall

sequence, *sequence, seqall

Primary input sequence database, generally required

seqset

sequence, *sequence, sequences

Primary input sequences, generally required

seqsetall

sequence, *sequence, sequences

Primary input sequences, generally required

seqout, seqoutset, seqoutall

outseq, *outseq, *outfile

Primary output sequence, generally required, generally should default to the primary input sequence name, extension defaults to the name of the output sequence format.

outfile

outfile, *file

Primary output non-sequence results file, generally required. The file extension should be allowed to default to the application name. outfile should be used for the first output file. outfile or *file is acceptable for the second and subsequent output files.

report

outfile, *file

Report output file. outfile should be used for the first report file. outfile or *file is acceptable for the second and subsequent report files.

align

outfile, *file

Alignment output file. outfile should be used for the first output alignment. outfile or *file is acceptable for the second and subsequent output alignments.

infile

infile, *file

Primary input non-sequence file

infile

data

Primary auxiliary input data file, generally optional

infile

patterns

File of patterns to search for in sequence

integer

minlen

Minimal length of sequence feature to be found

integer

maxlen

Maximum length of sequence feature to be found

integer

wordsize

Word size for hash tables etc. Generally minimum value = 2 for protein, 4 for DNA

integer

window

Window length for calculating dotplots/features/etc.

integer

shift

Amount by which window is shifted in each iteration

boolean

consensus

Flag for whether consensus sequence should be output

float

gap

Gap penalty

float

gapext

Gap extension penalty

integer

from

Position of start of input sequence to specify for an operation (e.g. deletion), defaults to start of sequence, minimum value = 1, maximum value = <sequence length>

integer

to

Position of end of input sequence to specify for an operation (e.g.: deletion), defaults to the from value, minimum value = from value, maximum value = <sequence length>

float or integer

threshold

Threshold for various operations

boolean

left

Operation should be done at the start of the sequence

boolean

right

Operation should be done at the end of the sequence

string

pattern

Pattern to search for in sequence

graph

graph

Graphical output

xygraph

graph

Graphical output

directory

directory, *dir, *path

Directory of files

outdir

outdir, *outdir

Output directory of files

dirlist

directory

Directory of files

filelist

*files

List of files

matrix

matrix

Matrices

datafile

datafile

Datafiles

feature

feature, *feature

Feature input

featout

outfeat, *outfeat

Feature output

regexp

pattern

Regular expressions

A.1.3.3. Validated Parameter Names

For some datatypes, conventions are more strongly enforced: a warning will be generated during ACD processing if a standard name is not used for the following datatypes:

  • Sequence inputs (any data definition of the type sequence, seqall, seqsetall or seqset) and sequence outputs (seqout, seqoutall and seqoutset datatypes)

  • Feature inputs (any data definition of the type feature) and feature outputs (featout datatype)

  • Alignments (align datatype)

  • File inputs and outputs (any data definition of the type infile, filelist, directory, dirlist or outfile)

  • Report output (report datatype)

A.1.4. Types of Attributes

Application attributes may be defined for an application definition (Section A.3, “Application Attributes”).

There are three basic types of attributes that may be defined for a data definition:

Additionally, there are various "datatype associated" command line qualifiers (or simply "associated qualifiers") that are inbuilt for certain ACD datatypes may also be defined as attributes in the appropriate data definition. These are listed in the datatype descriptions (Section A.2, “Datatypes”).

A.1.5. Parameters and Qualifiers

Every data definition in the ACD file can be defined via an appropriate attribute to be one of the following:

  • Parameter

  • Standard Qualifier

  • Additional Qualifier

with the default being:

  • Advanced Qualifier

They differ in terms of how they are prompted for, how they may be specified on the command line and whether help information for them appears.

This behaviour is summarised in the table below (Table A.2, “Behaviour of Command line Parameters and Qualifiers”). "Flag" indicates whether the flag (parameter or qualifier name) must be given on the command line. "Prompt" indicates whether a value will be prompted for if one is not specified on the command line. Additional qualifiers will only be prompted for if -options is specified. "Help" indicates where the information from the built-in -help qualifier is shown. For more information, see Section 4.5, “Controlling the Prompt”.

Table A.2. Behaviour of Command line Parameters and Qualifiers
TypeAttributeFlagPromptHelp
parameterparameter: "Y"NoYesRequired section
standardstandard: "Y"YesYesRequired section
additionaladditional: "Y"YesYes (with -options) or No (default needed)Advanced section
advanced (default)No attribute neededYesNoAdvanced section

A.1.6. ACD File Sections

Any data definitions in an ACD file must be contained within an appropriate section and given in the correct order. The sections must appear in this order:

  1. Input

  2. Required

  3. Additional

  4. Advanced

  5. Output

Subsections with arbitrary names can also be defined. They can appear in any order but must be nested in a major section.

Sections and subsections have the following general syntax:

section: SectionName 
[
  information: "SectionName section"
  type: "page"
]
.
. (data definitions go here)
.
   section: NestedSectionName 
   [
   information: "NestedSectionName section"
   type: "page"
   ]
   .
   . (data definitions go here)
   .
   endsection: NestedSectionName
.
endsection: SectionName

For example:

section: input 
[
  information: "Input section"
  type: "page"
]
.
. (input data definitions go here)
.
   section: inputsubsection 
   [
   information: "Input sub-section"
   type: "page"
   ]
.
. (input sub-section data definitions go here)
.
endsection: inputsubsection

endsection: input

The section contents is summarised in the table (Table A.3, “ACD File Sections”).

Table A.3. ACD File Sections
Section nameDescription
Input

Simple input values and any ACD type that will read input, including infile, sequence, seqset, seqall, matrix, fmatrix and codon. Any other parameters and qualifiers related to input can also be placed in this section.

At present datafile is also included.

RequiredParameters and Standard Qualifiers, including any whose standard: attribute can be true but depends on a conditional operation. Any toggle: definitions that are used by the Parameters and Standard Qualifiers. Note that input and output parameters and qualifiers must be in their respective sections.
AdditionalAdditional Qualifiers, including any whose additional attribute can be true but depends on a conditional operation. Any toggle: definitions that are used by Additional Qualifiers. Input and output parameters and qualifiers must be in their respective sections.
AdvancedAny qualifiers (except input and output qualifiers) which have no standard: or additional: attribute defined.
OutputAny data type that will write output, including any outfile, outdata, seqout, seqoutall, seqoutset and outtree. Other qualifiers related to output can also be placed in this section. This is the last section to be defined.

A.1.6.1. Validation of Sections

Restrictions on the order of sections and what data definitions can appear in what sections are defined in the EMBOSS system file sections.standard (see Section 4.1, “Introduction to ACD File Development”). The restrictions are enforced during ACD processing and an error will be generated in the following circumstances:

  • If major sections appear in the wrong order

  • If subsections appear in the wrong major sections

  • If a parameter (data definition with a parameter: "Y" attribute) or a standard qualifier (standard: "Y" attribute) occurs in the "Advanced" or "Additional" sections

  • If an additional qualifier (additional: "Y" attribute) occurs in the "Advanced" or "Required" sections

  • If an advanced qualifier (no parameter: "Y", standard: "Y" or additional: "Y" attribute) occurs in the "Additional" or "Required" sections