5.4. Memory Management in EMBOSS

5.4.1. Introduction to Memory Management

Memory management in C can be a difficult area to master, especially if you are used to scripting or object-oriented languages where this aspect might be automatically taken care of. It requires a sound knowledge of pointers and discipline in coding but is one of the most powerful aspects of the language, allowing very memory-intensive code to be written in an efficient way that might not be feasible in other languages. Curiously many C programming books omit a detailed coverage of pointers and memory management, yet these areas account for most of the time spent debugging C programs. For this reason great effort has been made to make handling memory in EMBOSS as simple as possible.

Memory management when using the libraries is greatly simplified, at least when programming with the existing datatypes. As far as possible the developer is shielded from low level C calls to allocate and free memory. This is achieved in the following ways:

  • General memory management macros

  • Object memory management macros

  • Object constructor and destructor functions

  • Failsafe object construction

  • Dynamic objects

  • EMBOSS is free of arbitrary limits

5.4.1.1. General memory management macros

General memory management macros are provided to wrap the C malloc, calloc and free functions. malloc allocates memory with uninitialised content, calloc allocates memory and initialises it to zero, and free frees allocated memory.

When programming using the libraries you should use the objects provided (or create new ones) and therefore will seldom need to call these C functions. In some cases, however, it is necessary or desirable to do so and you should use these associated macros.

5.4.1.2. Object memory management macros

Macros are provided to simplify the memory allocation for single objects and arrays of objects of any type, and for freeing that memory. Bear in mind these macros only allocate memory for a basic object (or an array of them) as defined by a single object definition. Where the object itself includes pointers for nested data structures then memory for these nested objects is not allocated. That is what the constructor functions are for (see below).

5.4.1.3. Object constructor and destructor functions

A constructor function (memory allocation) and destructor function (freeing memory) are provided for every type of object. Their function goes beyond that of the object macros, which merely allocate or free a single block of memory referenced by a single pointer. Constructors will allocate the object and all nested objects and initialise the memory, possibly with values passed in by argument. Similarly, destructors ensure that all pointers nested within the structure are freed correctly and that the main object pointer passed is reset to NULL so that it's ready for reuse in the calling code.

5.4.1.4. Failsafe object construction

Many (but currently not all) functions that modify an object passed by argument will allocate memory for it if an unallocated (NULL) pointer is passed. This is provided as a safety measure against sloppy programming. You should not rely on it. It's recommended that, where appropriate, all object pointers are explicitly allocated in your code before they are used, and of course freed later once you are done with them.

5.4.1.5. Dynamic objects

Memory for most objects is dynamically reallocated (and freed) as needed by the library functions. This means for example you can append text to a string object without worrying whether there is sufficient space available, or write to an array element without first checking the array is big enough (new elements will be created as necessary). Similarly, memory is freed automatically when it is no longer needed. The most commonly used dynamic objects are the strings and arrays mentioned, but most of the object functions show this behaviour.

5.4.1.6. EMBOSS is free of arbitrary limits

There are no arbitrary hard-coded limits in the code. For example there is no hard-coded maximum to the length of a sequence or the number of sequences in a sequence alignment, and no upper limits to the size of a matrix you can create. The only restrictions come from the hardware you are using.

5.4.2. General Memory Management

5.4.2.1. General Macros

Instead of calling malloc, calloc or free directly you should use the macros provided:

AJALLOC(nbytes)

Allocates nbytes of uninitialised memory. This is equivalent to using malloc.

AJALLOC0(nbytes)

Allocates nbytes of memory and initialises the memory to zero. Equivalent to using calloc.

AJCALLOC(count,nbytes)

Allocates an array of count elements of nbytes. The array elements are uninitialised. Equivalent to using malloc.

AJCALLOC0(count,nbytes)

Allocates an array of count elements of nbytes and initialise the memory to zero. Equivalent to using calloc.

AJRESIZE(ptr, nbytes)

Resizes previously allocated memory (referenced by ptr) to a new size of nbytes. Initialises new additional reserved memory (if any) to zero. The original memory contents are preserved regardless of whether the block is moved or not. If a NULL pointer is passed then a new block of memory is allocated automatically. Equivalent to using realloc.

AJFREE(p)

Frees memory using free and sets the pointer to NULL. Ignores NULL pointers.

Most of these macros allocate (or reallocate) some memory and return a pointer to it. In case of failure a NULL pointer is returned and an exception raised. The exception message is printed to the standard error stream and the code exits. The exception message includes the source file name and source line number.

For most applications, you would use AJALLOC0 or AJCALLOC0 as it's safer to initialise the memory to zero by default. If you are certain the memory will be overwritten after it's allocated then AJALLOC or AJCALLOC should be used instead for efficiency.

5.4.2.2. Object Macros

The macros below are used to create a single object or an array of objects of any type, and for freeing that memory. They allocate memory for a basic object (or array of them) as defined by a single object definition. Where the object itself includes pointers for nested data structures then memory for these nested objects is not allocated (or freed) by these macros. They are normally called from within object constructor and destructor functions.

AJNEW(p)

Allocates memory to an object pointer (p) for a single object of the correct type. The memory is not initialised. This is equivalent to using malloc.

AJNEW0(p)

Allocates memory to an object pointer (p) for a single object of the correct type. The memory is initialised to zero. This is equivalent to using calloc.

AJCNEW(p,c)

Allocates memory to an object pointer (p) for an array of c objects of the correct type. The memory is not initialised. This is equivalent to using malloc.

AJCNEW0(p,c)

Allocates memory to an object pointer (p) for an array of c objects of the correct type. The memory is initialised to zero. This is equivalent to using calloc.

AJCRESIZE0(p, c)

Resizes a previously allocated array of objects (referenced by p) such that it becomes an array of c objects of the correct type. Initialises new additional reserved memory (if any) to zero and preserves the original memory content.

AJFREE(ptr)

Frees a previously allocated object or array (referenced by ptr). Tests that the memory pointer has a non-NULL value to protect against twice freeing, or freeing unallocated memory.

5.4.2.3. Arrays of Fundamental C-type Datatypes

Two datatypes are defined in ajdefine.h for handling arrays of C-type integers (int) and floats (float):

/* @datatype AjIntArray *******************************************************
**
** Array of integers
**
** @attr typedef [ajint*] Value
** @@
******************************************************************************/

typedef ajint* AjIntArray;

/* @datatype AjFloatArray *****************************************************
**
** Array of floats
**
** @attr typedef [float*] Value
** @@
******************************************************************************/

typedef float* AjFloatArray;

They may be used with the macros above to allocate memory for such arrays. A typical use is shown below:

AjIntArray    integers = NULL;
AjFloatArray  floats   = NULL;
ajint         dim      = 10;

AJCNEW0(integers, dim);
AJCNEW0(floats, dim);

/* Do something with arrays */

AJFREE(integers);
AJFREE(floats);

5.4.2.4. Memory Leaks

Memory leaks occur when, in your source code, you lose a reference to an allocated block of memory. This usually happens by accidentally making a pointer point somewhere else, without first freeing the memory or copying the pointer and freeing it later. They are one of the most common sources of error in C programming, accounting for much of the time spent debugging code. Leaks are easily avoided if you have a strong grasp of pointers, particularly their implementation in EMBOSS, and take a disciplined approach when coding.

It is vital that you keep track of exactly what objects you have in memory and what references (pointers) you have to this memory. Most memory leaks can be avoided if you explicitly allocate memory for objects before they are used and free this memory later once you are done with the object. Be careful to match calls to destructor functions with calls to constructors. If you rely on a function's failsafe memory allocation mechanism then the allocation is hidden from you and it's no longer obvious from the code that memory has been allocated and needs freeing.

Consider the following code.

int main(int argc, char **argv)
{
    AjPStr  mystring=NULL;

    embInit("noleaks", argc, argv);

    mystring = ajAcdGetString("astring")     
    ajStrDel(&mystring);            

    embExit();

    return 0;
}

An AJAX string object (AjPStr) is declared and embInit called to invoke ACD file processing. The ACD file is read and the user prompted for input values. The call to embInit also allocates memory for all ACD data items and initialises the objects (see Section 6.3, “Handling ACD Files”).

When retrieving, for example, a string (AjPStr) data item by using a call to ajAcdGetString then the function will return a pointer to the string created by embInit. This means that you do not have to allocate memory for the string first, which is why the above code does not call explicitly a string constructor function.

Nonetheless, a string object was created by embInit and must be freed once you are done with it. That is what the call ajStrDel(&mystring); is for. Had you omitted this then you would be relying on the operating system to free the process memory when the application exits. This is not strictly a memory leak but would be bad practice.

In the following code, the string constructor function ajStrNewC is called unnecessarily:

int main(int argc, char **argv)
{
    AjPStr  mystring=NULL;

    embInit("leaks", argc, argv);

    mystring = ajStrNewC("Hello");
    mystring = ajAcdGetString("astring")      /* Memory leak */
    ajStrDel(&mystring);            

    embExit();

    return 0;
}

A string object pointer (mystring) is defined as before and is made to point to a new object allocated by ajStrNewC(). Then, by calling ajAcdGetString, the same pointer is made to point to the string allocated by embInit instead. You have lost the handle on the memory allocated by ajStrNewC. In other words you've created a memory leak.

There are other ways to create memory leaks than described here. Details of how to avoid leaks are described for individual datatypes in the programming guides (see Section 6.2, “Programming Guides”).

All code submitted to EMBOSS should be appropriately tested and debugged so that it does not leak memory (see Section 3.3, “Debugging”).

5.4.3. Object Memory Management

5.4.3.1. Introduction

In C++ an object is a definition or a template for instances of that object. The instance is the actual thing that can be manipulated. If you want to do anything you must create an instance in memory i.e. instantiate the object. When programming with EMBOSS objects it's important to make the distinction between the object pointer and the object proper (or instance) residing in memory. An object pointer is merely a variable which holds the memory address of a certain type of object. The object proper is a particular instance of an object residing in memory.

In principle, it would be possible to instantiate (allocate memory for) an object in this way:

#include "emboss.h"
int main(void)
{
    AjOStr  my_structure;

    /* Do something with my_structure */

    AJFREE(my_structure);
}

The declaration of my_structure would create a single string object in memory, which is later freed by calling AJFREE. This approach is not taken in EMBOSS however because, as already explained, object pointers are always used for reasons of efficiency and convenience. The above definition does not give the programmer the freedom to manage the memory of the object. Even if you only need one structure you should never use (for example) AjOStr because it would be inconsistent with the rest of EMBOSS. The above code is almost certain to fail anyway owing to the way the library handles string objects. The AJFREE would also not free any required internal memory allocation in my_structure.

The standard way to instantiate an object is to dynamically allocate memory to the object pointer. It's for this reason and for brevity that, for example, an AjPStr may be referred to as an "object" even though "object pointer" is more accurate. The terms are not important as long as you understand whether you are dealing with a pointer or a structure in memory.

All objects should be allocated dynamically and freed once you're done with them. This is easy because a constructor function (for memory allocation) and destructor function (for freeing memory) are provided for every type of object.

Consider for example the following code:

#include "emboss.h"

int main(int argc, char **argv)
{
    AjPStr  my_string=NULL;
  
    embInit("helloworld", argc, argv);

    my_string = ajStrNew();

    ajStrAssignC(&my_string, "Hello, World!\n");
    ajFmtPrint("%S", my_string);
  
    ajStrDel(&my_string);

    embExit();
    return 0;
}

AjPStr my_string=NULL; declares the object pointer and initialises it to NULL. Pointers should always be set to NULL when they are declared because EMBOSS functions presume that non-NULL pointers have had memory allocated to them. If you do not set the pointer to NULL then it may receive some junk value when the program runs and any function that uses it might mistakenly assume memory had been allocated for it. That might lead to a segmentation fault or bus error!

ajStrNew() is the constructor function. This conceptually allocates a block of memory for the object and returns the memory address of the allocated block. The memory address is held in the variable my_string. Disregard the calls to ajStrAssignC and ajFmtPrint for the time being.

ajStrDel() is the destructor function. This must not only free the memory but also set the pointer back to NULL so that it's ready for reuse. You see the address of my_string is passed. You may be wondering, as my_string is a pointer anyway, why do you need to pass the address of it? The answer is simple if you remember that in C the function arguments are passed "by value". A temporary copy of each argument is created and passed to the function rather than the originals. Although a copy of the pointer would be enough to free the memory that is pointed to, you need a handle on (the address of) the original if you want to set the original pointer to NULL. Hence the requirement for passing the address of my_string (&my_string).

5.4.3.2. Object Construction

Constructor functions (constructors) return a pointer to a new object in memory. There are 4 basic types of constructor to consider:

  • ACD data constructor functions

  • Default constructor functions

  • Alternative constructor functions

  • Functions whose primary purpose is not object construction but which will construct an object if necessary as a failsafe measure i.e. if a NULL pointer is passed for an output parameter of the function.

When managing memory for the objects a knowledge of the behaviour of any called functions is required. There are three cases to discern:

  • A function requires a pre-existing object

  • A function can use but does not require a pre-existing object and will allocate one if necessary

  • A function always allocates an object and either returns a pointer to it or allocates an object pointer, the address of which has been passed as an argument

In most but not all cases it is obvious from the function name whether a function is a constructor or merely uses an object.

5.4.3.2.1. ACD Data Construction

The ACD data constructor functions are used to return objects that are defined in the application ACD file. They are all defined in ajacd.h/c and have the general name:

ajAcdGetDatatype

where Datatype is one of the supported ACD datatypes (Section A.2, “Datatypes”).

Strictly speaking they are not constructor functions but instead return a pointer to an appropriate AJAX object that has been allocated by a call to the embInit function, a call which all EMBOSS applications must use (see Section 6.3, “Handling ACD Files”). For example ajAcdGetString returns a pointer to an AJAX string object (AjPStr) produced by parsing an ACD string (string) data definition:

AjPStr  ajAcdGetString (const char *token);

The token parameter is the name of the ACD data definition to read. Attributes in the data definition and/or user input gathered during ACD file processing are used to initialise the object. Memory for any new objects must be freed later on in the main() function. The use of these functions is explained in detail elsewhere (see Section 6.3, “Handling ACD Files”).

5.4.3.2.2. Default Object Construction

The default object constructor functions are the usual way to create new objects in your source code that are not defined in the ACD file. Usually all the default constructors for an object are listed under a single section in the C source (and documentation) for the library file. They normally have the suffix New in their name and have no parameters. For example:

AjPStr  ajStrNew (void);    /* Create a string object.        */

The use of such functions for individual datatypes is described in the library programming guides (see Section 6.2, “Programming Guides”).

5.4.3.2.3. Alternative Object Construction

Alternative constructor functions provide different ways to create new objects and often have parameters used for initialising elements in the object. They are usually listed in the same section in the C source file and documentation as for the default constructor functions. They have New in their name to make their behaviour clear. For example:

AjPStr  ajStrNewC (const char *txt);                                  /* Construct from C-type string */
AjPStr  ajStrNewResC (const char *txt, ajuint size);                  /* Construct from C-type string with reserved size */

The use of such functions for individual datatypes is described in the library programming guides (see Section 6.2, “Programming Guides”).

5.4.3.2.4. Failsafe Construction

Most functions that write to an object passed by argument will allocate memory for the object if necessary. This safety measure prevents failures and errors in cases where an unallocated (NULL) pointer is passed. In practice it's recommended that, where appropriate, all object pointers are explicitly allocated in your code before they are used. Consider the string assignment function ajStrAssignS which copies one string value (str) to another (Pstr):

AjBool  ajStrAssignS(AjPStr* Pstr, const AjPStr str);

It's not at all obvious from the name that this function will allocate a string object for Pstr if NULL is passed. Therefore, if you rely on the failsafe construction behaviour it will obfuscate your code. You should therefore code this behaviour into any new functions you write, but not rely on this behaviour in the functions that you call.

5.4.3.3. Object Destruction

Destructor functions (destructors) free the memory pointed to by an object pointer and reset it to NULL so that it is ready for reuse. For most objects there is a default destructor function which is the typical method for deleting objects in your source code. These have a single parameter which is the address of the object pointer being freed. In a few cases there are alternative destructors with non-standard behaviour, for example with parameters to provide a handle on some elements of the object which are not freed. Usually all destructor functions for an object are listed under a single section in the C source (and documentation) for the library file and have Del in their name, most often as a suffix. For example:

AjPStr  ajStrDel (AjPStr *Pstr);    /* Delete a string object.        */