A.2. Datatypes

For convenience, the available ACD datatypes are organised into five groupings reflecting similar properties or modes of usage as follows:

The available datatypes are described in detail below.

A.2.1. Description of Simple ACD Datatypes

A.2.1.1. array

A list of either integer or floating point numbers.

Data value. The data value is a list of numbers separated by spaces or commas.

For example:

"1 2 3 4 5"
"1.5, 2.0, 2.5, 3.0"

Default value. A default value is set using the default: global attribute.

Key attributes. The ACD attributes control validation, for example the permissible number of values (size: attribute), or a total value the list of numbers must add up to sumtest:, within a certain tolerance tolerance:, which is only tested if the boolean sum: attribute is set.

A.2.1.2. boolean

Simple boolean value.

Data value. The data value has a "true" or "false" value which may be specified as follows:

"Y"
"yes"
"true"
"N"
"no"
"false"

The value will be "Y" if the parameter name is entered on the command line as a flag, for example -BooleanOption. If the qualifier is absent from the command line the default value is used. The flag can also be prefixed by no, for example -noBooleanOption, to force the value to be "N". This is needed if the default value is "Y".

Default value. A default value is set using the default: global attribute.

Key attributes. None.

A.2.1.3. integer

Simple integer number.

Data value. The data value is any integer value.

For example:

"100"

Default value. A default value is set using the default: global attribute.

Key attributes. Many applications will stipulate a minimum and / or maximum value, e.g. a minimum value of 0 or 1. The permissible value range is controlled by the minimum: and maximum: attributes. trueminimum:, failrange: and rangemessage: are used where the minimum: and maximum: attributes have calculated values (see Section 4.3.6.1, “Attributes for Simple ACD Datatypes”).

A.2.1.4. float

Simple floating point number.

Data value. The data value is any valid floating point number.

For example:

"100.24"

Default value. A default value is set using the default: global attribute.

Key attributes. The value range is controlled by minimum: and maximum: attributes and the maximum precision by precision:. trueminimum:, failrange: and rangemessage: are used where the minimum: and maximum: attributes have calculated values (see Section 4.3.6.1, “Attributes for Simple ACD Datatypes”).

A.2.1.5. range

Range(s) of sequence positions.

Data value. One or more ranges may be defined on the command line or in a Range File.

On the command line, a range is defined by a pair of integer numbers and multiple ranges may be given. The numbers may be delimited by any non-digit, non-alphabetic character. For example:

"24-45, 56-78"
"1:45, 67=99;765..888"
"1,5,8,10,23,45,57,99"

A range file contains a list of pairs of numbers with optional text comments. One pair of numbers must be given per line and the file can contain comment lines which are preceded with a # character. For example:

# A set of ranges in a range file.
 12      23      
  4      5       This is an optional comment.
 67      10348   Another comment.

Range files are specified on the command line by preceding the filename with @filename. For example:

@filename RangeFileName

In cases where the numbers are sequence positions, the upper and lower bounds will in practice depend on the length of the sequence to which they are applied. You should bear in mind that sequence positions can be negative, in which case they count back from the end of the sequence.

Default value. A default value is set using the default: global attribute.

Key attributes. None.

A.2.1.6. regexp

A regular expression pattern.

EMBOSS uses the "Perl-Compatible Regular Expression Library" (PCRE) to process regular expressions.

Data value. Any regular expression that is valid in Perl 5.0 (http://search.cpan.org/~nwclark/perl-5.8.7/pod/perlre.pod) should be valid here.

Default value. A default value is set using the default: global attribute.

Key attributes. Attributes provide validation, for example, to control the length (minlength: and maxlength attributes) and case of the regular expression, which can be set to upper (upper: "Y") or lower case (lower: "Y") only.

A.2.1.7. pattern

A sequence pattern.

Data value. The standard IUPAC one-letter codes for the amino acids are used. The symbol x is used for a position where any amino acid is accepted. Ambiguities are indicated by listing the acceptable amino acids for a given position, between square parentheses [ ]. For example:

[ALT]

stands for Ala or Leu or Thr. Ambiguities are also indicated by listing between a pair of curly brackets { } the amino acids that are not accepted at a given position. For example:

{AM}

stands for any amino acid except Ala and Met. Each element in a pattern is separated from its neighbour by a dash (-). Repetition of an element of the pattern can be indicated by following that element with a numerical value or a numerical range between parenthesis. For example:

x(3) corresponds to x-x-x
x(2,4) corresponds to x-x or x-x-x or x-x-x-x

When a pattern is restricted to either the N- or C-terminal of a sequence, that pattern either starts with a < symbol or respectively ends with a > symbol. A period ends the pattern. For example:

[DE](2)HS{P}X(2)PX(2,4)C.

Default value. A default value is set using the default: global attribute.

Key attributes. Attributes provide validation, for example, to control the length (minlength: and maxlength attributes) and case of the pattern, which can be set to upper (upper: "Y") or lower case (lower: "Y") only. The type of pattern (nucleotide or protein) can be set using type:.

A.2.1.8. string

Simple string.

Most string values are free text, although strings can be used by a program for any input that is not covered by the other ACD datatypes.

Data value. The data value is any valid ASCII text string which should be enclosed in quotes. For example:

"This is a valid text string"

Default value. A default value is set using the default: global attribute.

Key attributes. Whenever a string datatype is defined, a type for it should be specified with the knowntype: attribute: a warning message will be generated during ACD processing otherwise.

ACD attributes are available to control the length or to provide a regular expression pattern to validate the string if necessary.

A.2.1.9. toggle

Simple boolean switch for controlling other parameters.

toggle parameters work in the same way as boolean parameters but are intended for use in turning the prompting for other parameters on or off (see Section 4.5, “Controlling the Prompt”). Typically this is done by using a calculation to determine the value of a standard: or additional: attribute of the data definition that is being controlled. In such cases the toggle parameters are used in the calculated values, and can be placed in the "Required" section of an ACD file even if not themselves defined as standard qualifiers (using the standard: attribute).

Data value. The data value has a "true" or "false" value which may be specified as follows:

"Y"
"yes"
"true"
"N"
"no"
"false"

Exactly like boolean parameters, the value will be "Y" if the flag for the parameter is entered on the command line (for example -ToggleOption). If the qualifier is absent from the command line the value will be the default value. The flag can also be prefixed by no (for example -noToggleOption) to force the value to be "N" (False). This is needed if the default value is "Y".

Default value. A default value is set using the default: global attribute.

Key attributes. None.

A.2.2. Description of Input ACD Datatypes

A.2.2.1. codon

Codon usage table file.

Codon usage table files are ASCII text files and can be read in several formats including GCG. Codon usage files are distributed in the EMBOSS data directory.

Data value. The data value is the name of a codon usage table file in the EMBOSS data search path (see the EMBOSS Users Guide).

Default value. EMBOSS uses the human codon usage table Ehum.cut provided in the EMBOSS distribution by default but this would typically be overridden by the user. Codon usage tables are species-specific and in some cases specific to a class of genes within a species, so it is useful to be able to set the codon usage table on an application-specific basis.

A default value is set using the default: global attribute.

Key attributes. None.

A.2.2.2. cpdb

Protein coordinate data in CCF (clean coordinate file) format.

CCF (clean coordinate file) format is a simple "clean" file format for protein and domain coordinate data. See the documentation for pdbparse, part of the EMBASSY domainatrix package, which generates CCF files from PDB file input.

Data value. The data value is the name of a CCF file.

Default value. A default value is set using the default: global attribute.

There is an internally-defined default value ("1azu") although it is not normally appropriate to use it.

Key attributes. None.

A.2.2.3. datafile

A formatted data file read from the standard EMBOSS data search path (see the EMBOSS Users Guide).

Many data files already have their own ACD datatype, for example, matrix, matrixf and codon. Other data files do not have or need their own ACD definition and datafile is used for these.

Data value. The data value is the name of a data file in the EMBOSS data search path (see the EMBOSS Users Guide).

Default value. A default value is set using the default: global attribute.

The default datafile name may also be defined by two ACD attributes, for the file base name (name:) and file extension (extension:).

Key attributes. Datafiles often have a hard-coded filename. You are free to define this using the name: attribute to override that name.

A.2.2.4. directory

A directory that can be used for input or output.

Data value. The data value is the name of any valid directory. For example:

"."
"/data"
"/data/sequences"

Default value. A default value is set using the default: global attribute.

Key attributes. The attributes provide additional validation of user input.

A.2.2.5. dirlist

A list of file names that are read from a directory.

Data value. The data value is the name of any valid directory.

For example:

"."
"/data"
"/data/sequences"

Default value. A default value is set using the default: global attribute.

Key attributes. The attributes provide additional validation of the user input.

The type of data in the files can be identified by specifying a value for the knowntype: attribute. This allows inputs to be matched to outputs where the knowntype: attribute is set, for example, for an outfile definition.

A.2.2.6. discretestates

Discrete states file.

discretestates was implemented for the phylipnew EMBASSY package. discretestates input is used by the phylip "discrete character" applications.

discretestates could be replaced by a simple input file in GUIs, with the user required to provide the correct data format.

Data value. The data value is the name of a phylip "discrete states" file.

Default value. A default value is set using the default: global attribute.

Key attributes. The attributes provide detailed type checking, and can automatically detect and validate the various alternative formats that phylip supports without the need for complex extra command line options.

A.2.2.7. distances

Distance matrix.

distances is specific to the phylipnew EMBASSY package. distances input is used by the phylip "distance matrix" applications.

The distances datatype can be replaced by a simple input file in GUIs, with the user required to provide the correct data format.

Data value. The data value is the name of a distance matrix file. The accepted file formats includes all the formats read by phylip, with automatic interconversion.

Default value. A default value is set using the default: global attribute.

Key attributes. The attributes provide detailed type checking, and can automatically detect and validate the various alternative formats that phylip supports without the need for complex extra command line options.

A.2.2.8. features

Sequence feature annotation in any known feature format.

Data value. The data value is the name of a features file. A features file contains sequence feature information. Several feature formats are supported (see the EMBOSS Users Guide).

Default value. A default value is set using the default: global attribute.

Key attributes. The type of features can be restricted by setting the type: attribute, for example, so that the program accepts only DNA features. The feature type must be one of protein or nucleotide. There is a default based on the type of an input sequence (where used), but a value should be specified so that the application can validate that the input is of the specified type. If no type is specified for input features and there is no sequence input from which to take a default type, then an error will be generated during ACD processing.

Features can also be read from an input sequence (sequence, seqall, seqset and seqsetall datatypes) and written alongside an output sequence (seqout, seqoutall and seqoutset datatypes) if their features: attribute is set.

A.2.2.9. filelist

A list of input files.

Data value. The data value is a list of file names separated by commas.

For example:

"../data/file1.dat, file2.dat"

Filelist is equivalent to the infile datatype, but allows the user to specify one or more input files.

Default value. A default value is set using the default: global attribute.

Key attributes. The attributes provides additional validation of the user input.

The type of data can be identified by specifying a value for the knowntype: attribute. This allows inputs to be matched to outputs where the knowntype: attribute is set, for example, for an outfile definition.

A.2.2.10. frequencies

Frequency value(s).

frequencies is specific to the phylipnew EMBASSY package and is used by the phylip "gene frequency and continuous character" applications.

The frequencies datatype can be replaced by a simple input file in GUIs, with the user required to provide the correct data format.

Data value. The data value is the name of a frequencies file. The accepted file formats include all the formats read by phylip, with automatic interconversion.

Default value. A default value is set using the default: global attribute.

Key attributes. The attributes provide detailed type checking, and can automatically detect and validate the various alternative formats that phylip supports without the need for complex extra command line options.

A.2.2.11. infile

General input file.

infile is used for files of data not catered for by some other ACD datatype. For example, an infile would not normally contain sequence data.

Data value. The data value is the name of an input file.

For example:

"data.in"
"/data/infile.1" 

Default value. A default value is set using the default: global attribute.

Key attributes. The type of data can be identified by specifying a value for the knowntype: attribute. This allows inputs to be matched to outputs where the knowntype: attribute is also set for the outfile definition. A directory containing the file can be specified, via an environment variable, by using directory:.

A.2.2.12. matrix

Comparison matrix file (integer values).

These are typically amino acid or nucleotide substitution matrices. The matrix files distributed with BLAST are distributed with EMBOSS in the EMBOSS data directory.

The matrix datatype defines integer matrices which are usually faster than floating point matrices. Floating point matrices (matrixf datatype) are available if needed, and an integer matrix file can of course also be read as floating point.

Typically where a comparison matrix is specified, gap penalties will also be required. These must be specified separately in one or more other data definitions.

Data value. The data value is the name of an integer comparison matrix file in the EMBOSS data search path (see the EMBOSS Users Guide).

Default value. A default value is set using the default: global attribute.

Key attributes. Attributes of the matrix datatype define characteristics and allow validation of matrices of integer numbers for biological data.

The matrix datatype has a protein: attribute to force selection of a nucleic acid or protein comparison matrix. In ACD files, the type of the input sequence is often used to set the type of matrix.

A.2.2.13. matrixf

Comparison matrix file (floating point values).

The matrixf datatype defines floating point matrices which are usually slower than floating point matrices. An integer matrix file can of course also be read as floating point.

These are typically amino acid or nucleotide substitution matrices. The matrix files distributed with BLAST are distributed with EMBOSS in the EMBOSS data directory.

Typically where a comparison matrix is specified, gap penalties will also be required. These must be specified separately in one or more other data definitions.

Data value. The data value is the name of a floating point comparison matrix file in the EMBOSS data search path (see the EMBOSS Users Guide).

Default value. A default value is set using the default: global attribute.

Key attributes. Attributes of the matrixf datatype define characteristics and allow validation of matrices of floating point numbers for biological data. The matrixf datatype has an protein: attribute to force selection of a nucleic acid or protein comparison matrix. In ACD files, the type of the input sequence is often used to set the type of matrix.

A.2.2.14. properties

Property value(s).

properties is specific to the phylipnew EMBASSY package. Properties input is used by the phylip applications to define weights, ancestral states and factors (multi-state characters).

The properties datatype can be replaced by a simple input file in GUIs, with the user required to provide the correct data format.

Data value. The data value is the name of a properties file. The accepted formats include all the formats read by phylip, with automatic interconversion.

Default value. A default value is set using the default: global attribute.

Key attributes. The attributes provide detailed type checking and can automatically detect and validate the various alternative formats that phylip supports without the need for complex extra command line options.

A.2.2.15. scop

SCOP and CATH domain classification data in DCF (domain classification file) format.

DCF (domain classification file) format is a simple "clean" file format for domain classification data. See the documentation for domainer, part of the EMBASSY domainatrix package, which generates DCF files from SCOP and CATH file input.

Data value. The data value is the name of a DCF file.

Default value. A default value is set using the default: global attribute.

Domain classification file input has an internally-defined default value ("d3sdha") although it is not normally appropriate to use this default.

Key attributes. None.

A.2.2.16. sequence

A single sequence for reading.

Data value. The data value is the Uniform Sequence Address or USA (see the EMBOSS Users Guide) of a single sequence. For example, the USA might be a database reference or file.

Default value. A default value is set using the default: global attribute.

Key attributes. The type of sequence can be restricted by setting the type: attribute, for example, so that the program accepts only DNA sequences. The sequence type must be a standard sequence type (Section A.7, “Sequence Types”).

Sequence features (Section 6.9, “Handling Features”) can be read if the features: ACD attribute is set.

A.2.2.17. seqall

A set of single sequences that are addressed one after another.

Data value. The data value is the USA of a set of single sequences. For example, the USA (see the EMBOSS Users Guide) might specify a sequence database for sequential reading of entries.

Default value. A default value is set using the default: global attribute.

Key attributes. The type of sequence can be restricted by setting the type: attribute, for example, so that the program accepts only DNA sequences. The sequence type must be a standard sequence type (Section A.7, “Sequence Types”).

Sequence features (Section 6.9, “Handling Features”) can be read if the features: ACD attribute is set.

A.2.2.18. seqset

A set of single sequences that can be used all at the same time.

Data value. The data value is the USA (see the EMBOSS Users Guide) of a set of single sequences. For example, set of sequences from a multiple alignment file, or sequences from a database.

Default value. A default value is set using the default: global attribute.

Key attributes. The type of sequence can be restricted by setting the type: attribute, for example, so that the program accepts only DNA sequences. The sequence type must be a standard sequence type (Section A.7, “Sequence Types”).

Sequence features (Section 6.9, “Handling Features”) can be read if the features: ACD attribute is set.

The aligned: attribute must be set: an error will be generated during ACD processing otherwise.

A.2.2.19. seqsetall

One or more sets of single sequences that can be used all at the same time.

Data value. The data value is the USA (see the EMBOSS Users Guide) of one or more sets of single sequences. For example, sets of sequences from two databases or two alignment files. The data value would typically be a "list file" (a file containing a list of USAs).

Default value. A default value is set using the default: global attribute.

Key attributes. The type of sequence can be restricted by setting the type: attribute, for example, so that the program accepts only DNA sequences. The sequence type must be a standard sequence type (Section A.7, “Sequence Types”).

Sequence features (Section 6.9, “Handling Features”) can be read if the features: ACD attribute is set.

The aligned: attribute must be set: an error will be generated during ACD processing otherwise.

A.2.2.20. tree

Phylogenetic tree.

The tree datatype is specific to the phylipnew package. Tree input is used by the phylip applications to define one or more phylogenetic trees.

The tree datatype can be replaced by a simple input file in GUIs, with the user required to provide the correct data format.

The trees are currently parsed by phylip itself, but in the future native parsing methods might be implemented.

Data value. The data value is the name of a tree file. The formats accepted include all the formats read by Phylip, with automatic interconversion.

Default value. A default value is set using the default: global attribute.

Key attributes. The attributes provide detailed type checking, and can automatically detect and validate the various alternative formats that phylip supports without the need for complex extra command line options.

A.2.3. Description of Output ACD Datatypes

A.2.3.1. align

Output file for sequence alignments.

The data is stored as sequences and all of the common alignment formats are supported (see the EMBOSS Users Guide).

Data value. The data value is any valid file name.

Default value. An alignment filename with the format name.extension is constructed if the datatype-specific qualifiers -aname and -aextension are specified. Values may be hard-coded with the corresponding aname: and aextension: attributes.

A default value is also set by defining the default: global attribute.

Key attributes. An alignment output file is defined in the same way as a plain output file (outfile datatype) but has extra qualifiers to allow a choice of alignment formats and attributes to specify whether the alignment will have 2 or more sequences (which limits the possible formats).

The multiple: boolean attribute should be set to "Y" if the output can contain more than one alignment from the same input.

The output format is normally set at the command line but a default may be hard-coded with aformat:.

A.2.3.2. featout

Output file for sequence feature annotation.

The data is stored as a feature table and most common sequence feature formats are supported (see the EMBOSS Users Guide).

Data value. The data value is any valid file name.

Default value. The output filename has the format name.extension and is constructed from the name: and extension: attributes (which the -ofname and offormat built-in command line qualifiers override (see the EMBOSS Users Guide). If the name: attribute is not defined in the ACD file, it will default to the calculated attribute name: of the first sequence that is read in. The ACD operation to get this value is ($(asequence.name) if the sequence parameter was named asequence. The extension: attribute will default to the output feature format, e.g. .gff.

A default value is also set by defining the default: global attribute.

Key attributes. Features can also be read from an input sequence (sequence, seqall, seqset and seqsetall datatypes) and written alongside an output sequence (seqout, seqoutall and seqoutset datatypes) if their features: attribute is set.

GFF format is used by default for the output feature(s). The format is normally set at the command line but a default may be hard-coded with offormat:.

The type of features can be restricted by setting the type: attribute, for example, so that the program accepts only DNA features. The feature type must be one of protein or nucleotide. There is a default based on the type of an input sequence (where used), but a value should be specified so that the application can validate that the input is of the specified type. If no type is specified for input features and there is no sequence input from which to take a default type, then an error will be generated during ACD processing.

A.2.3.3. outcodon

Output file for codon usage data.

Data value. The data value is any valid file name.

The data is stored as a codon usage table. Codon usage table files are ASCII text files and can be read in several formats (see the EMBOSS Users Guide).

Default value. The output filename has the format name.extension and is constructed from the name: and extension: attributes. The name: attribute will default to outfile. The extension: attribute will default to the format, with cut defined as the default format to match the usual codon usage file naming convention. This format is also called EMBOSS codon format.

A default value is also set by defining the default: global attribute.

Key attributes. The default data format can be specified by an oformat: attribute which the -oformat associated qualifier can override.

A.2.3.4. outcpdb

Output file for protein coordinate data in CCF (clean coordinate file) format.

CCF (clean coordinate file) format is a simple "clean" file format for protein and domain coordinate data. See the documentation for pdbparse, part of the EMBASSY domainatrix package, which generates CCF files from PDB file input.

Data value. The data value is any valid file name.

Default value. A default value is set using the default: global attribute.

Key attributes. The default data format can be specified by an oformat: attribute which the -oformat associated qualifier can override.

A.2.3.5. outdata

Output file for data formatted cleanly as a table or list.

The output corresponding to multiple outdata definitions in an ACD file is appended to a single file. The individual ACD definitions allow the format of each file section to be defined.

Data value. The data value is any valid file name.

Default value. A default value is set using the default: global attribute.

Key attributes. The default data format can be specified by an oformat: attribute which the -oformat associated qualifier can override.

A.2.3.6. outdir

Output directory for writing of multiple output files.

Data value. The data value is the name of any valid directory.

For example:

"."
"/data"
"/data/sequences"

Default value. A default value is set using the default: global attribute.

Key attributes. The default file extension cam be set with the extension: attribute.

A.2.3.7. outdiscrete

Output file for phylogenetics discrete characteristics data.

Data value. The data value is any valid file name.

Default value. A default value is set using the default: global attribute.

Key attributes. The default data format can be specified by an oformat: attribute which the -oformat associated qualifier can override.

A.2.3.8. outdistance

Output file for phylogenetics distance matrix data.

Data value. The data value is any valid file name.

Default value. A default value is set using the default: global attribute.

Key attributes. The default data format can be specified by an oformat: attribute which the -oformat associated qualifier can override.

A.2.3.9. outfile

General output file.

outfile is used for data not catered for by some other output ACD datatype. It is suitable for general application output in plain text. For example, the output file would not normally contain sequence data.

Data value. The data value is any valid file name.

Default value. The output filename has the format name.extension and is constructed from the name: and extension: attributes. The extension: attribute will default to the program name, and is usually left as the default value.

A default value is also set by defining the default: global attribute.

Key attributes. The type of data can be identified by a knowntype: attribute and matched to an standard type of an infile data definition for use as input to another program. The standard EMBOSS known types are described elsewhere (Section 4.3, “Data Definition”

A.2.3.10. outfileall

Multiple general output files.

outfileall is used for data not catered for by some other output ACD datatype. It is suitable for general application output in plain text. For example, the output files would not normally contain sequence data.

Data value. The data value is the base file name of the output files.

Default value. The output filename has the format name.extension and is constructed from the name: and extension: attributes. The extension: attribute will default to the program name, and is usually left as the default value.

A default value is also set by defining the default: global attribute.

Key attributes. The type of data can be identified as a standard types (Section A.4, “Global Attributes”) using the knowntype: attribute. The type can be matched to that for an infile data definition for use as input to another program.

A.2.3.11. outfreq

Output file for phylogenetics character frequency data.

Data value. The data value is any valid file name.

Default value. A default value is set using the default: global attribute.

Key attributes. The default data format can be specified by an oformat: attribute which the -oformat associated qualifier can override.

A.2.3.12. outmatrix

Output file for integer comparison matrix data.

Data value. The data value is any valid file name.

Default value. A default value is set using the default: global attribute.

Key attributes. The default data format can be specified by an oformat: attribute which the -oformat associated qualifier can override.

A.2.3.13. outmatrixf

Output file for floating point comparison matrix data.

Data value. The data value is any valid file name.

Default value. A default value is set using the default: global attribute.

Key attributes. The default data format can be specified by an oformat: attribute which the -oformat associated qualifier can override.

A.2.3.14. outproperties

Output file for phylogenetics property data.

Data value. The data value is any valid file name.

Default value. A default value is set using the default: global attribute.

Key attributes. The default data format can be specified by an oformat: attribute which the -oformat associated qualifier can override.

A.2.3.15. outscop

Output file for SCOP and CATH domain classification information in DCF (domain classification file) format.

DCF (domain classification file) format is a simple "clean" file format for domain classification data. See the documentation for domainer, part of the EMBASSY domainatrix package, which generates DCF files from SCOP and CATH file input.

Data value. The data value is any valid file name.

Default value. A default value is set using the default: global attribute.

Key attributes. The default data format can be specified by an oformat: attribute which the -oformat associated qualifier can override.

A.2.3.16. outtree

Output file for phylogenetic tree data.

Data value. The data value is any valid file name.

Default value. The output filename has the format name.extension and is constructed from the name: and extension: attributes. The extension: attribute will default to the output file format, and is usually left as the default value.

A default value is also set by defining the default: global attribute.

Key attributes. The default data format can be specified by an oformat: attribute which the -oformat associated qualifier can override.

A.2.3.17. report

Output file for sequence annotation.

Report data is stored internally as a feature table, so the supported formats (see the EMBOSS Users Guide) include the most common feature formats.

Data value. The data value is any valid file name.

Default value. A report filename with the format name.extension is constructed if the datatype-specific qualifiers -rname and -rextension are specified. Values may be hard-coded with the corresponding rname: and rextension: attributes.

A default value is also set by defining the default: global attribute.

Key attributes. A report file is defined in the same way as a plain output file (outfile) but has extra qualifiers to allow a choice of report formats.

rformat: specifies the report format to use, which must be one of the supported report formats (see the EMBOSS Users Guide).

multiple: is a boolean attribute which should be set to "Y" if the output can contain more than one report from the same input.

type: is defined as one of "protein" or "nucleotide" where the report format is one of the standard feature table formats (see the EMBOSS Users Guide).

taglist: defines the tag / value pairs from the internal feature table to be reported in the output.

A.2.3.18. seqout

Output file for a single sequence.

Data value. The data value is a USA (see the EMBOSS Users Guide) of sequence output stream. For example, the USA might be a database reference or file.

Default value. The output filename has the format name.extension and is constructed from the name: and extension: attributes. If the name: attribute is not defined in the ACD file it will default to the calculated attribute name: of the first sequence that is read in. The ACD operation to get this value is $(asequence.name) if the sequence parameter was named asequence.

A default value is also set by defining the default: global attribute.

Key attributes. The type of sequence can be restricted by setting the type: attribute. The sequence type must be a standard type (Section A.7, “Sequence Types”).

Sequence features (Section 6.9, “Handling Features”) can be written if the features: ACD attribute is set.

FASTA format is used by default for the output sequence(s). The format is normally set at the command line but a default may be hard-coded with osformat:.

A.2.3.19. seqoutall

Output file for multiple sequences.

Data value. The data value is a USA (see the EMBOSS Users Guide) of a sequence output stream. For example, the USA might be a database reference or file.

Default value. The output filename has the format name.extension and is constructed from the name: and extension: attributes. If the name: attribute is not defined in the ACD file it will default to the calculated attribute name: of the first sequence that is read in. The ACD operation to get this value is $(asequence.name) if the sequence parameter was named asequence.

A default value is also set by defining the default: global attribute.

Key attributes. The type of sequence can be restricted by setting the type: attribute. The sequence type must be a standard type (Section A.7, “Sequence Types”).

Sequence features (Section 6.9, “Handling Features”) can be written if the features: ACD attribute is set.

FASTA format is used by default for the output sequence(s). The format is normally set at the command line but a default may be hard-coded with osformat:.

A.2.3.20. seqoutset

Output file for a set of sequences.

Data value. The data value is a USA (see the EMBOSS Users Guide) of a sequence output stream on a set of single sequences stored in memory together, to be written to file. For example, the USA might be a database reference or file.

Default value. The output filename has the format name.extension and is constructed from the name: and extension: attributes. If the name: attribute is not defined in the ACD file. It will default to the calculated attribute name: of the first sequence that is read in. The ACD operation to get this value is $(asequence.name) if the sequence parameter was named asequence.

A default value is also set by defining the default: global attribute.

Key attributes. The type of sequence can be restricted by setting the type: attribute. The sequence type must be a standard type (Section A.7, “Sequence Types”).

Sequence features (Section 6.9, “Handling Features”) can be written if the features: ACD attribute is set.

FASTA format is used by default for the output sequence(s). The format is normally set at the command line but a default may be hard-coded with osformat:.

A.2.4. Description of Selection ACD Datatypes

A.2.4.1. list

A list of options (text descriptions) with text labels.

The user is presented with a limited list of options they can choose from. The choices can be labelled by any arbitrary text label. The option descriptions are usually more verbose than for the selection datatype.

Data value. The data value is one (or more) of the valid options.

An option is specified by the label text or a non-ambiguous part of the descriptive text itself given after the label. If multiple selections are allowed, the user must supply a comma-separated list of labels (options).

For example, the following ACD definition:

list: frame  [
  standard: "Y"
  help: "Allows selection from a set of reading frames"
  default: "1"
  minimum: "1"
  maximum: "1"
  header: "Translation frames"
  values: "1:1, 2:2, 3:3, F:Forward three frames, -1:-1, -2:-2, -3:-3, R:Reverse three frames, 6:All six frames"
  delimiter: ","
  codedelimiter: ":"
  information: "Frame(s) to translate"
]

Would present to the user something like:

Translation frames

   1     1
   2     2
   3     3
   F     Forward three frames
  -1    -1
  -2    -2
  -3    -3
   R     Reverse three frames
   6     All three frames

Frame(s) to translate[1]:

To select from the list, the user specifies one (or sometimes more) labels, or partial text descriptions. The program is given a list of text labels as input. In this example, the minimum and maximum number of selections are set to one in the ACD definition, therefore only one selection value is allowed. For example, these are all valid selections:

"1"
"F"
"Forward"
"For"
"R"
"Reverse"
"Rev"

If the maximum count had been set to 3, say, then the following would be valid:

"-1,F,6"

Default value. A default value is set using the default: global attribute.

Key attributes. None.

A.2.4.2. selection

A list of options (text descriptions) with automatically generated numerical labels.

The user is presented with a limited list of options they can choose from. The choices are numbered automatically from 1 up. The option descriptions are typically more verbose than for list definitions.

Data value. The data value is one (or more) of the valid options.

An option is specified by number or a non-ambiguous part of the descriptive text itself given after the label. If multiple selections are allowed, the user must supply a comma-separated list of numbers (options).

For example, the following ACD definition:

selection: reject [
    default: "3, 5, 6"
    minimum: "1"
    maximum: "6"
    values: "None, AAINDEX, CVS, CODONS, PRINTS, PROSITE, REBASE"
    delimiter: ","
    header: "Directories to ignore"
    information: "Select directories"
    help: "This specifies the names of the sub-directories of the
           EMBOSS data directory that should be ignored when displaying data
           directories."
    button: "Y"
  ]

Would present to the user something like:

Directories to ignore
1        None
2        AAINDEX
3        CVS
4        CODONS
5        PRINTS
6        PROSITE
7        REBASE

Select directories{3, 5, 6]:

To select from the list, the user specifies one (or sometimes more) numbers, or partial text descriptions. The program is given a list of text labels as input. In this example, a minimum of 1 and maximum of 6 selections are set in the ACD definition. Here are some valid selections:

"3,5,6"
"3"
"CVS"
"5"
"PRINTS"
"PRI"

Default value. A default value is set using the default: global attribute.

Key attributes. None.

The list datatype is preferred to the selection:.

A.2.5. Description of Graphics ACD Datatypes

A.2.5.1. Graph

Graphical output of any general kind.

Dotplots may be generated with the graph datatype.

Data value. The data value is the graphics device, as specified by the PLplot graphics library used in EMBOSS at present.

The currently supported devices include:

  • ps (postscript)

  • png (PNG files)

  • X11 (X-Windows)

A value of ? in answer to the prompt will list the available graphics devices on your installation. Some permissible values therefore are:

"ps"
"png"
"X11"
"?"

Default value. A default value is set using the default: global attribute.

Key attributes. gtitle: specifies the graph title (many other graphical elements can be set).

A.2.5.2. graphxy

Graphical output as a simple two dimensional (2D) XY plot with the sequence along the x-axis.

Data value. The data value is the graphics device, as specified by the PLplot graphics library used in EMBOSS at present.

The currently supported devices include ps for Postscript, png for PNG files, and X11 for X-Windows. A value of ? in answer to the prompt will list the available graphics devices on your installation. Some permissible values therefore are:

"ps"
"png"
"X11"
"?"

Default value. A default value is set using the default: global attribute.

Key attributes. multiple: specifies the number of multiple XY graphs in a single output.

gtitle: specifies the graph title (many other graphical elements can be set).