This section covers ACD file processing for the "simple" ACD datatypes (Section A.2.1, “Description of Simple ACD Datatypes”):
integer
float
boolean
toggle
string
array
range
Values are retrieved by calls to ajAcdGet*
functions which return an AJAX datatype of the appropriate type: a fundamental type is returned for integer
, float
, boolean
and toggle
and an EMBOSS object for the other types. Functions for handling ranges is covered and includes:
Get and set elements of the range object
Query the properties of a range object
Process a string (AjPStr
) and sequence (AjPSeq
) according to the specification in a range object
For regular expressions (regexp
ACD datatype) and sequence patterns (pattern
ACD datatype) see Section 6.6, “Handling Sequence Patterns”. Array handling (Section 6.17, “Handling Arrays”) and string handling (Section 6.5, “Handling Strings”) are described in-depth elsewhere.
AJAX library files for handling "Simple" ACD datatypes are listed in the table (Table 6.3, “AJAX Library Files for Handling "Simple" ACD Datatypes”). Library file documentation, including a complete description of datatypes and functions, is available at:
http://emboss.open-bio.org/rel/dev/libs/ |
Library File Documentation | Description |
---|---|
ajrange | Handling of AJAX range expressions |
ajstr | String handling |
ajarr | Array handling |
ajrange.h/c
. Defines the range specification object (AjPRange
) and contain functions for handling of range specifications (see Section A.2.1.5, “range
”).
ajstr.h/c
. Defines the string object (AjPStr
) used for handling strings from the ACD file. They contain most of the functions you will ever need for general string handling (Section 6.5, “Handling Strings”).
ajarr.h/c
. Most of the functions you will ever need for general array handling (Section 6.17, “Handling Arrays”). They define the AjPFloat
object used for handling arrays from an ACD file. They contain static data structures and functions for handling arrays at a low level.
The "Simple" ACD datatypes are used for application input:
Typical ACD definition for "Simple" ACD datatype inputs are shown below.
For integer number input:
integer: wordsize [ default: "4" minimum: "2" maximum: "20" information: "Word size" ]
For floating point number input:
float: minscore [ default: "0.0" minimum: "0.0" information: "Minimum score of feature to display" ]
For boolean input:
boolean: feature [ default: "N" information: "Use feature information" ]
For toggle input:
toggle: tolower [ default: "N" information: "Change masked region to lower case" ]
For string input:
string: delimiter [ default: "|" information: "Delimiter of records in text output file" knowntype: "output delimiter" ]
For array input:
array: thresholds [ information: "Values to represent 'identical', 'similar' and 'related'" default: "-1.5,0.0,1.5" minimum: "0.0" size: "3" sum: "0" sumtest: "Y" ]
For range input:
range: regions [ information: "Regions to put in uppercase (eg: 4-57,78-94)" default: "" help: "Regions to put in uppercase. If this is left blank, the sequence case is left alone. A set of regions is specified by a set of pairs of integer positions separated by any non-digit, non-alpha character. For example: \ 24-45, 56-78 \ 1:45, 67=99;765..888 \ 1,5,8,10,23,45,57,99" ]
A standard parameter name might be available depending on the specific use-case of the data definition; for example gap penalty
for any float
input that defines a gap penalty. See Appendix A, ACD Syntax Reference.
Attributes that are typically specified are summarised below. They are datatype-specific (Section A.5, “Datatype-specific Attributes”) unless they are indicated as being global attributes (Section A.4, “Global Attributes”).
default:
A global attribute and specifies a default value.
minimum:
Specifies the minimum permitted value.
maximum:
Specifies the maximum permitted value.
information:
A global attribute that specifies the user prompt and is also used in the application documentation.
knowntype:
This global attribute should always be specified for string inputs. If the output is not of any of the standard EMBOSS known types then
is the recommended value.ApplicationName
output
size:
Specifies the permissible number of elements in an array
data definition.
sum:
Specifies the total of all values in an array
data definition and is tested for unless the sumtest:
attribute is false.
sumtest:
A boolean attribute which, if set to false, turns off testing for the sum:
attribute for an array data definition.
For handling "Simple" ACD datatypes defined in the ACD file use the primitive types:
Otherwise use an AJAX object:
AjPStr
String (for string
ACD datatype).
AjPFloat
Array of floating point numbers (for array
ACD datatype).
AjPRange
AJAX sequence range specification (for range
ACD datatype). See Section A.2.1.5, “range
”.
Datatypes and functions for handling "Simple" ACD datatypes via the ACD file are shown below (Table 6.4, “Datatypes and Functions for "Simple" ACD Datatype Input”).
ACD datatype | AJAX datatype | To retrieve from ACD |
---|---|---|
integer | ajint | ajAcdGetInt |
float | float | ajAcdGetFloat |
boolean | AjBool | ajAcdGetBoolean |
toggle | AjBool | ajAcdGetToggle |
string | AjPStr | ajAcdGetString |
array | AjPFloat | ajAcdGetArray |
range | AjPRange | ajAcdGetRange |
Your application code will call embInit
to process the ACD file and command line (see Section 6.3, “Handling ACD Files”). All values from the ACD file are read into memory and files are opened as necessary. You have a handle on the files and memory through the ajAcdGet*
family of functions which return pointers to appropriate objects.
You wouldn't normally retrieve a toggle
from ACD as they're intended for use within the ACD file only, usually to control the prompting for another parameter (see Section 4.5, “Controlling the Prompt”).
To retrieve data from the ACD file a simple variable or object pointer is declared as required, and then initialised using the appropriate ajAcdGet*
function.
Functions to set range object properties are:
/* Set the start and end values of a range element. */ AjBool ajRangeElementSet (AjPRange thys, ajuint element, ajuint start, ajuint end); /* Sets range values offset relative to a sequence -sbegin value */ AjBool ajRangeSetOffset (AjPRange thys, ajuint begin);
ajRangeSetOffset
will set the range values relative to a specified position (begin
), usually the start position of a range of positions in a sequence as specified on the command line with -sbegin
or in the USA of a sequence (see the EMBOSS Users Guide). If, for example, begin
is 11
and the range is 11-12
the new range is changed to 1-2
.
To use a string, array or range object that is not defined in the ACD file you must first instantiate the appropriate object pointer. The default constructor functions are:
AjPStr ajStrNew (void); /* String object. */ AjPFloat ajFloatNew (void); /* Float array */ AjPRange ajRangeNewI (ajuint n); /* Range object */
ajRangeNewI
takes an integer (n
) which is the number of ranges the object can hold.
All constructors return the address of a new object. The pointers do not need to be initialised to NULL
but it is good practice to do so:
AjPStr delimiter = NULL; AjPFloat thresholds = NULL; AjPRange regions = NULL; delimiter = ajStrNew(); thresholds = ajFloatNew(); /* Object with a single range */ regions = ajRangeNewI(1); ... /* Do something with objects */ ajStrDel(&delimiter); ajFloatDel(&thresholds); ajRangeDel(®ions);
You must free the memory for objects once you are finished with them. The destructor functions are:
AjPStr ajStrDel (AjPStr *Pstr); /* String object. */ void ajFloatDel (AjPFloat* Parr); /* Float array */ void ajRangeDel (AjPRange *Prange); /* Range object */
They are used as follows:
AjPStr delimiter = NULL; AjPFloat thresholds = NULL; AjPRange regions = NULL; delimiter = ajAcdGetString("delimiter"); thresholds = ajAcdGetArray("thresholds"); regions = ajAcdGetRange("regions"); ... /* Do something with objects */ ajStrDel(&delimiter); ajFloatDel(&thresholds); ajRangeDel(®ions);
There are a variety of alternative constructor functions for the AjPRange
object:
/* Copy a range object. */ AjPRange ajRangeNewRange (const AjPRange src); /* Construct from a string. */ AjPRange ajRangeNewString (const AjPStr str); /* Construct from a string with explicit specification. */ AjPRange ajRangeNewStringLimits (const AjPStr str, ajuint imin, ajuint imax, ajuint minsize, ajuint size); /* Construct from a file. */ AjPRange ajRangeNewFilename (const AjPStr name); /* Construct from a file with explicit specification. */ AjPRange ajRangeNewFilenameLimits (const AjPStr name, ajuint imin, ajuint imax, ajuint minsize, ajuint size);
ajRangeNewStringLimits
and ajRangeNewFilenameLimits
both construct a range object with specified limits: minimum value (imin
), maximum value (imax
), minimum number of ranges (minsize
) and the required number of ranges (size
). A value of zero for size
indicates that there is no required number.
ajRangeNewFilename
and ajRangeNewFilenameLimits
construct an object from a "range file" (see Section A.2.1.5, “range
”).
For alternative constructor functions for the AjPStr
object see Section 6.5, “Handling Strings”.
The following functions retrieve elements from a range object:
/* Get the number of ranges */ ajuint ajRangeGetSize (const AjPRange thys); /* Get text value of a range */ AjBool ajRangeElementGetText (const AjPRange thys, ajuint element, AjPStr *text); /* Get start and end values */ AjBool ajRangeElementGetValues (const AjPRange thys, ajuint element, ajuint *start, ajuint *end);
ajRangeElementGetText
will retrieve text from the specified range element (element
). The text is defined as any non-digit characters after the pair of range numbers. For example, for the pair of ranges 10-20 potential exon 50-60 repeat
the text values are: "potential exon"
and "repeat"
. The address of the string object (text
) to hold the text is passed.
Functions for querying the properties of a range object include:
/* Tests if the set of ranges are in ascending non-overlapping order */ AjBool ajRangeIsOrdered (const AjPRange thys); /* Tests if any range elements overlap to a region (of a sequence). */ ajuint ajRangeCountOverlaps (const AjPRange thys, ajuint pos, ajuint length); /* Tests for a single range from the start to end of a sequence. */ AjBool ajRangeIsWhole (const AjPRange thys, const AjPSeq seq);
ajRangeCountOverlaps
returns the number of ranges in a range object which overlap with a sequence region defined by a start position (pos
) and a length (length
).
ajRangeIsWhole
tests whether the range object contains a single range from the start to end of the given sequence (seq
).
These functions process a sequence object (AjPSeq
) according to the specification in a range object:
/* Remove all subsequences not corresponding to ranges */ AjBool ajRangeSeqExtract (const AjPRange thys, AjPSeq seq); /* Store retained text as a list of strings. */ AjBool ajRangeSeqExtractList (const AjPRange thys,const AjPSeq seq, AjPList outliststr); /* Insert spaces into sequence to pad out to the ranges. */ AjBool ajRangeSeqStuff (const AjPRange thys, AjPSeq seq); /* Mask ranges of positions in a sequence. */ AjBool ajRangeSeqMask (const AjPRange thys, AjPSeq seq, const AjPStr maskchar); /* Convert the ranges of characters in a sequence to lower-case. */ AjBool ajRangeSeqToLower (const AjPRange thys, AjPSeq seq);
ajRangeSeqExtract
retains regions in a sequence corresponding to the ranges: regions not in a range are removed. A sequence processed by ajRangeSeqExtract
will comprise regions from the original sequence concatenated in the order specified in the set of ranges. If these are not in ascending order then the resulting sequence won't be in position order either.
ajRangeSeqExtractList
is the same as ajRangeSeqExtract
except that the retained subsequences are written to a list. The order of the list is the same as that specified in the set of ranges. If these are not in ascending order then the resulting list of strings won't be either.
ajRangeSeqStuff
takes a string and an ordered, non-overlapping set of ranges and writes a string padded with whitespace such that a space is given for all positions not within a range. For example, for the string "abcde"
and ranges 3-5,7-8
the string generated will be " abc de"
.
ajRangeSeqMask
will mask the ranges of positions in a sequence, replacing all characters within range with the mask character (maskchar
).
A set of functions equivalent to the sequence manipulation functions are provided for strings:
AjBool ajRangeStrExtract (const AjPRange thys, const AjPStr instr, AjPStr *outstr); AjBool ajRangeStrExtractList (const AjPRange thys, const AjPStr instr, AjPList outliststr); AjBool ajRangeStrStuff (const AjPRange thys, const AjPStr instr, AjPStr *outstr); AjBool ajRangeStrMask (const AjPRange thys, AjPStr *str, const AjPStr maskchar); AjBool ajRangeStrToLower (const AjPRange thys, AjPStr *str);
Their functions are identical to their sequence counterparts except that a string (AjPStr
) rather than a sequence (AjPSeq
) is taken. Also, ajRangeStrStuff
has individual arguments for the input and output strings.