6.13. Handling Codon Usage Tables

6.13.1. Introduction

EMBOSS provides a codon usage table object that holds arrays describing codon usage in a nucleotide sequence.

Codon usage tables for input and output are usually defined in the ACD file, however they can be created directly if this is required. Functions are provided for reading and writing the data, getting and setting elements of the object, calculating properties of the codon usage table and for back translation. There are also some miscellaneous functions, mostly used for interconverting triplet base codes to codon indices.

Code that creates a codon usage table needs to first create an output object, then populate it with codon usage data from coding sequences, calculate statistics on codon frequencies and write to a file which uses the user-selected format.

6.13.2. AJAX Library Files

AJAX library files for handling codon usage tables are listed in the table (Table 6.22, “AJAX Library Files for Handling Codon Usage Tables”). Library file documentation, including a complete description of datatypes and functions, is available at:

http://emboss.open-bio.org/rel/dev/libs/
Table 6.22. AJAX Library Files for Handling Codon Usage Tables
Library File DocumentationDescription
ajcodCodon usage table handling

ajcod.h/cDefines the AjPCod object and functions for handling codon usage tables. It also contains static data structures and functions for handling them at a low level. You are unlikely to need these unless you plan to extend the core functionality of the library.

6.13.3. ACD Datatypes

The ACD datatype for handling codon usage tables is:

codon

Codon usage table file.

6.13.4. ACD Data Definition

A typical ACD definition for codon usage table input:

codon: infile 
[
    parameter: "Y"
]

A typical ACD definition for codon usage table output:

outcodon: outfile 
[
    parameter: "Y"
]

6.13.4.1. Parameter Name

All data definitions for codon usage table input and output should have an intuitive name: no standard names are currently defined.

6.13.4.2. Common Attributes

Attributes that are typically specified are summarised below.

parameter: If a codon usage table is the primary input or output of an EMBOSS application it should be defined as a parameter by using parameter: "Y" (see Section A.4, “Global Attributes”).

6.13.5. AJAX Datatypes

For handling codon usage tables, including input codon usage tables defined in the ACD file, use:

AjPCod

Codon usage table object (for codon ACD datatype).

For handling output codon usage tables defined in the ACD file use:

AjPOutfile

General output file (for outcodon ACD datatype).

6.13.6. ACD File Handling

Datatypes and functions for handling codon usage tables via the ACD file are shown below (Table 6.23, “Datatypes and Functions for Codon Usage Table Input and Output”).

Table 6.23. Datatypes and Functions for Codon Usage Table Input and Output
To read a codon usage tableTo write a codon usage table
ACD datatypecodonoutcodon
AJAX datatypeAjPCodAjPOutfile
To retrieve from ACDajAcdGetCodonajAcdGetOutcodon

Your application code will call embInit to process the ACD file and command line (see Section 6.3, “Handling ACD Files”). All values from the ACD file are read into memory and files are opened as necessary. You have a handle on the files and memory through the ajAcdGet* family of functions which return pointers to appropriate objects.

6.13.6.1. Input Codon Usage Table Retrieval

To retrieve an input codon usage table an object pointer is declared and then initialised using ajAcdGetCodon:

    AjPCod     cod    = NULL;

    cod    = ajAcdGetCodon("infile");

6.13.6.2. Output Codon Usage Table Retrieval

To retrieve an output codon usage table stream an object pointer is declared and initialised using ajAcdGetOutcodon:

    AjPOutfile codout = NULL;

    codout = ajAcdGetOutcodon("outfile");

6.13.6.3. Processing Command line Options and ACD Attributes

Currently there are no functions for this.

6.13.6.4. Memory and File Management

It is your responsibility to close any files and free up memory at the end of the program.

6.13.6.4.1. Closing Output Codon Usage Table Files

To close an output codon usage table use ajOutfileClose:

ajOutfileClose(&codout);
6.13.6.4.2. Freeing Memory

You must call the default destructor function (see below) on any objects returned by calls to ajAcdGetCodon or ajAcdGetOutcodon.

Additionally, you must call ajCodExit to clean up internal memory allocated for housekeeping of codon usage processing:

ajCodExit();

6.13.7. Codon Usage Table Object Memory Management

6.13.7.1. Default Object Construction

To use a codon usage table object that is not defined in the ACD file you must first instantiate the appropriate object pointer. The default construction function is:

AjPCod  ajCodNew (void);

All constructors return the address of a new object. In the following code the pointer does not need to be initialised to NULL but it is good practice to do so:

    AjPCod     cod    = NULL;

    cod = ajCodNew();

    /* The object is instantiated and ready for use */

6.13.7.2. Default Object Destruction

You must free the memory for an object when you are finished with it. The default destructor function is:

void  ajCodDel (AjPCod *pthys);

It is used as follows:

    AjPCod     cod    = NULL;
    AjPOutfile codout = NULL;

    cod    = ajAcdGetCodon("infile");
    codout = ajAcdGetOutcodon("outfile");

    ...

    ajCodDel(&cod);
    ajOutfileClose(&codout);

6.13.7.3. Alternative Object Construction and Loading

There are two alternative constructor functions. ajCodNewCodenum creates a codon usage object with the amino acid assignments taken from a standard genetic code. In contrast, ajCodNewCod will duplicate an existing codon object and return a pointer to the new object:

AjPCod  ajCodNewCodenum (ajint code); 
AjPCod  ajCodNewCod (const AjPCod thys);

6.13.8. Reading and Writing Codon Usage Tables

6.13.8.1. Reading a codon usage table

To use a codon usage table created directly (i.e. not one defined in the ACD file) it's necessary to assign a codon index to it. ajCodRead will read codon usage data from a file (fn). The file format can be specified explicitly, given as a prefix (format::) to the filename, or be given as NULL (in which case all known formats are tried):

AjBool  ajCodRead (AjPCod thys, const AjPStr fn, const AjPStr format); 

Codon input and output formats are printed by the entrails utility using the ajCodPrintFormat function.

6.13.8.2. Writing a codon usage table

There are two analogous functions for writing codon usage table information to file which differ in the type of file object (AjPFile and AjPOutfile) passed:

void  ajCodWrite (AjPCod thys, AjPFile outf); 
void  ajCodWriteOut (const AjPCod thys, AjPOutfile outf);

Usually codon usage table output files (AjPOutfile) are loaded from ACD file processing (see above). Therefore ajCodWriteOut is called for such an AjPOutfile corresponding to an outcodon: ACD data definition:

    AjPCod     cod    = NULL;
    AjPOutfile codout = NULL;

    cod    = ajAcdGetCodon("infile");
    codout = ajAcdGetOutcodon("outfile");

    ajCodWriteOut(cod, codout);

Where ajCodWrite is used it's necessary to handle the creation of the output file manually. Use ajFileNewOutNameC (or other functions) for doing so:

    AjPCod     cod    = NULL;
    AjPFile     codout = NULL;

    /* Open a file called "OutputFileName" */
    codout =  ajFileNewOutNameC ("OutputFileName");

    ajCodWrite(cod, codout);

6.13.9. Getting and Setting Elements

The following elements of a codon usage table may be retrieved or set:

  • Name

  • Release

  • Description

  • Division

  • Species

  • Genetic code

  • Number of CDSs

  • Number of codons

The functions to get elements have Get in their name. Variants of these functions for returning C-type (char *) strings instead of AjPStr are available but not shown:

const AjPStr  ajCodGetName (const AjPCod thys);           /* Name */
const AjPStr  ajCodGetRelease (const AjPCod thys);        /* Release */
const AjPStr  ajCodGetDesc (const AjPCod thys);           /* Description */
const AjPStr  ajCodGetDivision (const AjPCod thys);       /* Division */
const AjPStr  ajCodGetSpecies (const AjPCod thys);        /* Species */
ajint         ajCodGetCode (const AjPCod thys);           /* Genetic code */
ajint         ajCodGetNumcds (const AjPCod thys);         /* Number of CDSs */
ajint         ajCodGetNumcodon (const AjPCod thys);       /* Number of codons */

Additionally, ajCodGetCodonlist will write the codon triplets to a list of strings:

void         ajCodGetCodonlist(const AjPCod cod, AjPList list);

The functions to set elements have Set in their name. Again, variants for C-type strings are available:

void  ajCodSetDescS (AjPCod thys, const AjPStr desc);                 /* Description */
void  ajCodSetDivisionS (AjPCod thys, const AjPStr division);         /* Division */
void  ajCodSetNameS (AjPCod thys, const AjPStr name);                 /* Name */
void  ajCodSetReleaseS (AjPCod thys, const AjPStr release);           /* Release */
void  ajCodSetSpeciesS (AjPCod thys, const AjPStr species);           /* Species */
void  ajCodSetCodenum (AjPCod thys, ajint geneticcode);               /* Genetic code */
void  ajCodSetNumcds (AjPCod thys, ajint numcds);                     /* Number of CDSs */
void  ajCodSetNumcodons (AjPCod thys, ajint numcodon);                /* Number of codons */
void  ajCodSetBacktranslate (AjPCod thys);                            /* Amino acid index used for back-translation. */
void  ajCodSetTripletsS (AjPCod thys, const AjPStr s, ajint *c);      /* Number of codons */

6.13.9.1. Clearing a codon usage table

In case you want to reuse a codon usage table there are two functions for clearing the data:

/* Zero all entries including the genetic codes (the amino acids for each codon) */
void  ajCodClear(AjPCod thys);

/* Zero the name, number count and fraction codon elements. */
void  ajCodClearData(AjPCod thys);

6.13.10. Calculated Properties

Various properties of codon usage tables may be calculated including:

  • Codon adaptive index (see NAR 15:1281-1295)

  • Codon adaptive index W values (see NAR 15:1281-1295)

  • Gribskov statistic (count per thousand)

  • Effective number of codons (Gene 87:23-29))

  • Fractional count and codons per thousand

  • Sequence composition

The functions are as follows:

/* Codon adaptive index from a codon usage table */
double  ajCodCalcCaiCod (const AjPCod thys);                   

/* Codon adaptive index for a coding sequence */
double  ajCodCalcCaiSeq (const AjPCod cod, const AjPStr str);  

/* Gribskov statistic (count per thousand) */
void  ajCodCalcGribskov (AjPCod thys, const AjPStr s);     

/* Effective number of codons */
double  ajCodCalcNc (const AjPCod thys);                     

/* Fractional count and codons per thousand */
void  ajCodCalcUsage (AjPCod thys, ajint c); 

6.13.11. Back Translation

There are functions to backtranslate a string representing an amino acid sequence. ajCodBacktranslate will perform a simple back translation whereas ajCodBacktranslateAmbig will backtranslate a string to a fully ambiguous nucleotide sequence:

void  ajCodBacktranslate (AjPStr *b, const AjPStr a, const AjPCod thys); 
void  ajCodBacktranslateAmbig (AjPStr *b, const AjPStr a, const AjPCod thys);

Before you call these functions you should first call ajCodSetBacktranslate. This initialises the codon usage object with the most commonly used triplet index for the amino acids.

6.13.12. Miscellaneous Functions

These include:

/* Return one codon value given a possibly ambiguous base */
ajint  ajCodBase (ajint c);                         

/* Convert triplet index to triple */
char*  ajCodTriplet (ajint idx);                    

/* Return a codon index given a three character codon */
ajint  ajCodIndex (const AjPStr s);                 

/* Return a codon index given a three character codon */
ajint  ajCodIndexC (const char *codon);             

/* Tests the output format for an outcodon ACD type */
ajint  ajCodOutFormat (const AjPStr name);