6.4. Handling "Simple" ACD Datatypes

6.4.1. Introduction

This section covers ACD file processing for the "simple" ACD datatypes (Section A.2.1, “Description of Simple ACD Datatypes”):

  • integer

  • float

  • boolean

  • toggle

  • string

  • array

  • range

Values are retrieved by calls to ajAcdGet* functions which return an AJAX datatype of the appropriate type: a fundamental type is returned for integer, float, boolean and toggle and an EMBOSS object for the other types. Functions for handling ranges is covered and includes:

  • Get and set elements of the range object

  • Query the properties of a range object

  • Process a string (AjPStr) and sequence (AjPSeq) according to the specification in a range object

For regular expressions (regexp ACD datatype) and sequence patterns (pattern ACD datatype) see Section 6.6, “Handling Sequence Patterns”. Array handling (Section 6.17, “Handling Arrays”) and string handling (Section 6.5, “Handling Strings”) are described in-depth elsewhere.

6.4.2. AJAX Library Files

AJAX library files for handling "Simple" ACD datatypes are listed in the table (Table 6.3, “AJAX Library Files for Handling "Simple" ACD Datatypes”). Library file documentation, including a complete description of datatypes and functions, is available at:

http://emboss.open-bio.org/rel/dev/libs/
Table 6.3. AJAX Library Files for Handling "Simple" ACD Datatypes
Library File DocumentationDescription
ajrangeHandling of AJAX range expressions
ajstrString handling
ajarrArray handling

ajrange.h/cDefines the range specification object (AjPRange) and contain functions for handling of range specifications (see Section A.2.1.5, “range).

ajstr.h/cDefines the string object (AjPStr) used for handling strings from the ACD file. They contain most of the functions you will ever need for general string handling (Section 6.5, “Handling Strings”).

ajarr.h/cMost of the functions you will ever need for general array handling (Section 6.17, “Handling Arrays”). They define the AjPFloat object used for handling arrays from an ACD file. They contain static data structures and functions for handling arrays at a low level.

6.4.3. ACD Datatypes

The "Simple" ACD datatypes are used for application input:

integer

Simple integer number.

float

Simple floating point number.

boolean

Simple boolean value for boolean ACD datatype.

toggle

Simple boolean value for toggle ACD datatype.

string

Simple string.

array

List of either integer or floating point numbers.

range

Range of sequence positions.

6.4.4. ACD Data Definition

Typical ACD definition for "Simple" ACD datatype inputs are shown below.

6.4.4.1. integer

For integer number input:

integer: wordsize 
[
    default: "4"
    minimum: "2"
    maximum: "20"
    information: "Word size"
]

6.4.4.2. float

For floating point number input:

float: minscore 
[
    default: "0.0"
    minimum: "0.0"
    information: "Minimum score of feature to display"
]

6.4.4.3. boolean

For boolean input:

boolean: feature 
[
    default: "N"
    information: "Use feature information"
]

6.4.4.4. toggle

For toggle input:

toggle: tolower 
[
    default: "N"
    information: "Change masked region to lower case"
]

6.4.4.5. string

For string input:

string: delimiter 
[
    default: "|"
    information: "Delimiter of records in text output file"
    knowntype: "output delimiter"
  ]

6.4.4.6. array

For array input:

array: thresholds
[
    information: "Values to represent 'identical',  'similar' and 'related'"
    default: "-1.5,0.0,1.5"
    minimum: "0.0"
    size: "3"
    sum: "0"
    sumtest: "Y"
]

6.4.4.7. range

For range input:

range: regions 
[
    information: "Regions to put in uppercase (eg: 4-57,78-94)"
    default: ""
    help: "Regions to put in uppercase. If this is left blank, the sequence case is left alone. A set of regions is specified by a set of pairs of integer positions separated by any non-digit, non-alpha character. For example: \
           24-45, 56-78 \
           1:45, 67=99;765..888 \
           1,5,8,10,23,45,57,99"
]

6.4.4.8. Parameter Name

A standard parameter name might be available depending on the specific use-case of the data definition; for example gap penalty for any float input that defines a gap penalty. See Appendix A, ACD Syntax Reference.

6.4.4.9. Common Attributes

Attributes that are typically specified are summarised below. They are datatype-specific (Section A.5, “Datatype-specific Attributes”) unless they are indicated as being global attributes (Section A.4, “Global Attributes”).

default: A global attribute and specifies a default value.

minimum: Specifies the minimum permitted value.

maximum: Specifies the maximum permitted value.

information: A global attribute that specifies the user prompt and is also used in the application documentation.

knowntype: This global attribute should always be specified for string inputs. If the output is not of any of the standard EMBOSS known types then ApplicationName output is the recommended value.

size: Specifies the permissible number of elements in an array data definition.

sum: Specifies the total of all values in an array data definition and is tested for unless the sumtest: attribute is false.

sumtest: A boolean attribute which, if set to false, turns off testing for the sum: attribute for an array data definition.

6.4.5. AJAX Datatypes

For handling "Simple" ACD datatypes defined in the ACD file use the primitive types:

ajint

Simple integer number (for integer ACD datatype).

float

Simple floating point number (for float ACD datatype).

AjBool

Simple boolean value (for boolean and toggle ACD datatypes).

Otherwise use an AJAX object:

AjPStr

String (for string ACD datatype).

AjPFloat

Array of floating point numbers (for array ACD datatype).

AjPRange

AJAX sequence range specification (for range ACD datatype). See Section A.2.1.5, “range.

6.4.6. ACD File Handling

Datatypes and functions for handling "Simple" ACD datatypes via the ACD file are shown below (Table 6.4, “Datatypes and Functions for "Simple" ACD Datatype Input”).

Table 6.4. Datatypes and Functions for "Simple" ACD Datatype Input
ACD datatypeAJAX datatypeTo retrieve from ACD
integerajintajAcdGetInt
floatfloatajAcdGetFloat
booleanAjBoolajAcdGetBoolean
toggleAjBoolajAcdGetToggle
stringAjPStrajAcdGetString
arrayAjPFloatajAcdGetArray
rangeAjPRangeajAcdGetRange

Your application code will call embInit to process the ACD file and command line (see Section 6.3, “Handling ACD Files”). All values from the ACD file are read into memory and files are opened as necessary. You have a handle on the files and memory through the ajAcdGet* family of functions which return pointers to appropriate objects.

You wouldn't normally retrieve a toggle from ACD as they're intended for use within the ACD file only, usually to control the prompting for another parameter (see Section 4.5, “Controlling the Prompt”).

6.4.6.1. Retrieval of "Simple" ACD Datatypes

To retrieve data from the ACD file a simple variable or object pointer is declared as required, and then initialised using the appropriate ajAcdGet* function.

6.4.6.1.1. integer
    ajint wordsize = 0;

    wordsize = ajAcdGetInt("wordsize");
6.4.6.1.2. float
    float minscore = 0.;

    minscore = ajAcdGetFloat("minscore");
6.4.6.1.3. boolean
    AjBool showall = ajFalse;

    showall = ajAcdGetBoolean("showall");
6.4.6.1.4. toggle
    AjBool tolower = ajFalse;

    tolower = ajAcdGetToggle("tolower");
6.4.6.1.5. string
    AjPStr delimiter = NULL;

    delimiter = ajAcdGetString("delimiter");
6.4.6.1.6. array
AjPFloat thresholds = NULL;

    thresholds = ajAcdGetArray("thresholds");
6.4.6.1.7. range
    AjPRange regions = NULL;

    regions = ajAcdGetRange("regions");

6.4.6.2. Processing Command line Options and ACD Attribute

6.4.6.2.1. Setting Range Object Properties

Functions to set range object properties are:

/* Set the start and end values of a range element. */
AjBool  ajRangeElementSet (AjPRange thys, ajuint element, ajuint start, ajuint end);       

/* Sets range values offset relative to a sequence -sbegin value */
AjBool  ajRangeSetOffset (AjPRange thys, ajuint begin); 

ajRangeSetOffset will set the range values relative to a specified position (begin), usually the start position of a range of positions in a sequence as specified on the command line with -sbegin or in the USA of a sequence (see the EMBOSS Users Guide). If, for example, begin is 11 and the range is 11-12 the new range is changed to 1-2.

6.4.6.3. Memory Management

It is your responsibility to free memory at the end of the program. You must call the default destructor function (see below) on any AjPStr, AjPFloat or AjPRange objects returned by calls to ajAcdGet*. This is not necessary, of course, for the primitive datatypes.

6.4.7. Object Memory Management

6.4.7.1. Default Object Construction

To use a string, array or range object that is not defined in the ACD file you must first instantiate the appropriate object pointer. The default constructor functions are:

AjPStr  ajStrNew (void);         /* String object. */
AjPFloat  ajFloatNew (void);       /* Float array */
AjPRange  ajRangeNewI (ajuint n);  /* Range object   */

ajRangeNewI takes an integer (n) which is the number of ranges the object can hold.

All constructors return the address of a new object. The pointers do not need to be initialised to NULL but it is good practice to do so:

    AjPStr   delimiter  = NULL;
    AjPFloat thresholds = NULL;
    AjPRange regions    = NULL;

    delimiter  = ajStrNew();
    thresholds = ajFloatNew();

    /* Object with a single range */
    regions    = ajRangeNewI(1);  

    ... /* Do something with objects */

    ajStrDel(&delimiter);
    ajFloatDel(&thresholds);
    ajRangeDel(&regions);

6.4.7.2. Default Object Destruction

You must free the memory for objects once you are finished with them. The destructor functions are:

AjPStr  ajStrDel (AjPStr *Pstr);       /* String object. */
void  ajFloatDel (AjPFloat* Parr);   /* Float array    */  
void  ajRangeDel (AjPRange *Prange); /* Range object   */

They are used as follows:

    AjPStr   delimiter  = NULL;
    AjPFloat thresholds = NULL;
    AjPRange regions    = NULL;

    delimiter  = ajAcdGetString("delimiter");
    thresholds = ajAcdGetArray("thresholds");
    regions    = ajAcdGetRange("regions");

    ... /* Do something with objects */

    ajStrDel(&delimiter);
    ajFloatDel(&thresholds);
    ajRangeDel(&regions);

6.4.7.3. Alternative Object Construction and Loading

There are a variety of alternative constructor functions for the AjPRange object:

/* Copy a range object. */ 
AjPRange  ajRangeNewRange (const AjPRange src);     

/* Construct from a string. */ 
AjPRange  ajRangeNewString (const AjPStr str);        

/* Construct from a string with explicit specification. */
AjPRange  ajRangeNewStringLimits (const AjPStr str, ajuint imin, ajuint imax,
                                  ajuint minsize, ajuint size);     

/* Construct from a file. */  
AjPRange  ajRangeNewFilename (const AjPStr name);                                                               

/* Construct from a file with explicit specification. */
AjPRange  ajRangeNewFilenameLimits (const AjPStr name,
                                    ajuint imin, ajuint imax,
                                    ajuint minsize, ajuint size);

ajRangeNewStringLimits and ajRangeNewFilenameLimits both construct a range object with specified limits: minimum value (imin), maximum value (imax), minimum number of ranges (minsize) and the required number of ranges (size). A value of zero for size indicates that there is no required number.

ajRangeNewFilename and ajRangeNewFilenameLimits construct an object from a "range file" (see Section A.2.1.5, “range).

For alternative constructor functions for the AjPStr object see Section 6.5, “Handling Strings”.

6.4.8. Getting Range Object Elements

The following functions retrieve elements from a range object:

/* Get the number of ranges  */
ajuint  ajRangeGetSize (const AjPRange thys);                                              

/* Get text value of a range */
AjBool  ajRangeElementGetText (const AjPRange thys, ajuint element,
                               AjPStr *text);                 

/* Get start and end values  */
AjBool  ajRangeElementGetValues (const AjPRange thys, ajuint element,
                                 ajuint *start, ajuint *end);  

ajRangeElementGetText will retrieve text from the specified range element (element). The text is defined as any non-digit characters after the pair of range numbers. For example, for the pair of ranges 10-20 potential exon 50-60 repeat the text values are: "potential exon" and "repeat". The address of the string object (text) to hold the text is passed.

6.4.9. Querying Range Object Properties

Functions for querying the properties of a range object include:

/* Tests if the set of ranges are in ascending non-overlapping order */
AjBool  ajRangeIsOrdered (const AjPRange thys);                                

/* Tests if any range elements overlap to a region (of a sequence). */
ajuint  ajRangeCountOverlaps (const AjPRange thys, ajuint pos, ajuint length);    

/* Tests for a single range from the start to end of a sequence.     */
AjBool  ajRangeIsWhole (const AjPRange thys, const AjPSeq seq);

ajRangeCountOverlaps returns the number of ranges in a range object which overlap with a sequence region defined by a start position (pos) and a length (length).

ajRangeIsWhole tests whether the range object contains a single range from the start to end of the given sequence (seq).

6.4.10. Sequence Manipulation Functions

These functions process a sequence object (AjPSeq) according to the specification in a range object:

/* Remove all subsequences not corresponding to ranges */
AjBool  ajRangeSeqExtract (const AjPRange thys, AjPSeq seq);     

/* Store retained text as a list of strings. */
AjBool  ajRangeSeqExtractList (const AjPRange thys,const AjPSeq seq, AjPList outliststr);   

/* Insert spaces into sequence to pad out to the ranges.   */ 
AjBool  ajRangeSeqStuff (const AjPRange thys, AjPSeq seq);     

/* Mask ranges of positions in a sequence. */
AjBool  ajRangeSeqMask (const AjPRange thys, AjPSeq seq, const AjPStr maskchar);    

/* Convert the ranges of characters in a sequence to lower-case. */
AjBool  ajRangeSeqToLower (const AjPRange thys, AjPSeq seq);  

ajRangeSeqExtract retains regions in a sequence corresponding to the ranges: regions not in a range are removed. A sequence processed by ajRangeSeqExtract will comprise regions from the original sequence concatenated in the order specified in the set of ranges. If these are not in ascending order then the resulting sequence won't be in position order either.

ajRangeSeqExtractList is the same as ajRangeSeqExtract except that the retained subsequences are written to a list. The order of the list is the same as that specified in the set of ranges. If these are not in ascending order then the resulting list of strings won't be either.

ajRangeSeqStuff takes a string and an ordered, non-overlapping set of ranges and writes a string padded with whitespace such that a space is given for all positions not within a range. For example, for the string "abcde" and ranges 3-5,7-8 the string generated will be " abc de".

ajRangeSeqMask will mask the ranges of positions in a sequence, replacing all characters within range with the mask character (maskchar).

6.4.11. String Manipulation Functions

A set of functions equivalent to the sequence manipulation functions are provided for strings:

AjBool  ajRangeStrExtract (const AjPRange thys, const AjPStr instr, AjPStr *outstr);           
AjBool  ajRangeStrExtractList (const AjPRange thys, const AjPStr instr, AjPList outliststr);
AjBool  ajRangeStrStuff (const AjPRange thys, const AjPStr instr, AjPStr *outstr);
AjBool  ajRangeStrMask (const AjPRange thys, AjPStr *str, const AjPStr maskchar);
AjBool  ajRangeStrToLower (const AjPRange thys, AjPStr *str);  

Their functions are identical to their sequence counterparts except that a string (AjPStr) rather than a sequence (AjPSeq) is taken. Also, ajRangeStrStuff has individual arguments for the input and output strings.