EMBOSS provides a codon usage table object that holds arrays describing codon usage in a nucleotide sequence.
Codon usage tables for input and output are usually defined in the ACD file, however they can be created directly if this is required. Functions are provided for reading and writing the data, getting and setting elements of the object, calculating properties of the codon usage table and for back translation. There are also some miscellaneous functions, mostly used for interconverting triplet base codes to codon indices.
Code that creates a codon usage table needs to first create an output object, then populate it with codon usage data from coding sequences, calculate statistics on codon frequencies and write to a file which uses the user-selected format.
AJAX library files for handling codon usage tables are listed in the table (Table 6.22, “AJAX Library Files for Handling Codon Usage Tables”). Library file documentation, including a complete description of datatypes and functions, is available at:
http://emboss.open-bio.org/rel/dev/libs/ |
Library File Documentation | Description |
---|---|
ajcod | Codon usage table handling |
ajcod.h/c
. Defines the AjPCod
object and functions for handling codon usage tables. It also contains static data structures and functions for handling them at a low level. You are unlikely to need these unless you plan to extend the core functionality of the library.
The ACD datatype for handling codon usage tables is:
codon
Codon usage table file.
A typical ACD definition for codon usage table input:
codon: infile [ parameter: "Y" ]
A typical ACD definition for codon usage table output:
outcodon: outfile [ parameter: "Y" ]
All data definitions for codon usage table input and output should have an intuitive name: no standard names are currently defined.
Attributes that are typically specified are summarised below.
parameter:
If a codon usage table is the primary input or output of an EMBOSS application it should be defined as a parameter by using parameter: "Y"
(see Section A.4, “Global Attributes”).
For handling codon usage tables, including input codon usage tables defined in the ACD file, use:
AjPCod
Codon usage table object (for codon
ACD datatype).
For handling output codon usage tables defined in the ACD file use:
AjPOutfile
General output file (for outcodon
ACD datatype).
Datatypes and functions for handling codon usage tables via the ACD file are shown below (Table 6.23, “Datatypes and Functions for Codon Usage Table Input and Output”).
To read a codon usage table | To write a codon usage table | |
ACD datatype | codon | outcodon |
---|---|---|
AJAX datatype | AjPCod | AjPOutfile |
To retrieve from ACD | ajAcdGetCodon | ajAcdGetOutcodon |
Your application code will call embInit
to process the ACD file and command line (see Section 6.3, “Handling ACD Files”). All values from the ACD file are read into memory and files are opened as necessary. You have a handle on the files and memory through the ajAcdGet*
family of functions which return pointers to appropriate objects.
To retrieve an input codon usage table an object pointer is declared and then initialised using ajAcdGetCodon
:
AjPCod cod = NULL; cod = ajAcdGetCodon("infile");
To retrieve an output codon usage table stream an object pointer is declared and initialised using ajAcdGetOutcodon
:
AjPOutfile codout = NULL; codout = ajAcdGetOutcodon("outfile");
Currently there are no functions for this.
It is your responsibility to close any files and free up memory at the end of the program.
To close an output codon usage table use ajOutfileClose
:
ajOutfileClose(&codout);
To use a codon usage table object that is not defined in the ACD file you must first instantiate the appropriate object pointer. The default construction function is:
AjPCod ajCodNew (void);
All constructors return the address of a new object. In the following code the pointer does not need to be initialised to NULL
but it is good practice to do so:
AjPCod cod = NULL; cod = ajCodNew(); /* The object is instantiated and ready for use */
You must free the memory for an object when you are finished with it. The default destructor function is:
void ajCodDel (AjPCod *pthys);
It is used as follows:
AjPCod cod = NULL; AjPOutfile codout = NULL; cod = ajAcdGetCodon("infile"); codout = ajAcdGetOutcodon("outfile"); ... ajCodDel(&cod); ajOutfileClose(&codout);
There are two alternative constructor functions. ajCodNewCodenum
creates a codon usage object with the amino acid assignments taken from a standard genetic code. In contrast, ajCodNewCod
will duplicate an existing codon object and return a pointer to the new object:
AjPCod ajCodNewCodenum (ajint code); AjPCod ajCodNewCod (const AjPCod thys);
To use a codon usage table created directly (i.e. not one defined in the ACD file) it's necessary to assign a codon index to it. ajCodRead
will read codon usage data from a file (fn
). The file format can be specified explicitly, given as a prefix (format::
) to the filename, or be given as NULL
(in which case all known formats are tried):
AjBool ajCodRead (AjPCod thys, const AjPStr fn, const AjPStr format);
Codon input and output formats are printed by the entrails utility using the ajCodPrintFormat
function.
There are two analogous functions for writing codon usage table information to file which differ in the type of file object (AjPFile
and AjPOutfile
) passed:
void ajCodWrite (AjPCod thys, AjPFile outf); void ajCodWriteOut (const AjPCod thys, AjPOutfile outf);
Usually codon usage table output files (AjPOutfile
) are loaded from ACD file processing (see above). Therefore ajCodWriteOut
is called for such an AjPOutfile
corresponding to an outcodon:
ACD data definition:
AjPCod cod = NULL; AjPOutfile codout = NULL; cod = ajAcdGetCodon("infile"); codout = ajAcdGetOutcodon("outfile"); ajCodWriteOut(cod, codout);
Where ajCodWrite
is used it's necessary to handle the creation of the output file manually. Use ajFileNewOutNameC
(or other functions) for doing so:
AjPCod cod = NULL; AjPFile codout = NULL; /* Open a file called "OutputFileName" */ codout = ajFileNewOutNameC ("OutputFileName"); ajCodWrite(cod, codout);
The following elements of a codon usage table may be retrieved or set:
Name
Release
Description
Division
Species
Genetic code
Number of CDSs
Number of codons
The functions to get elements have Get
in their name. Variants of these functions for returning C-type (char *
) strings instead of AjPStr
are available but not shown:
const AjPStr ajCodGetName (const AjPCod thys); /* Name */ const AjPStr ajCodGetRelease (const AjPCod thys); /* Release */ const AjPStr ajCodGetDesc (const AjPCod thys); /* Description */ const AjPStr ajCodGetDivision (const AjPCod thys); /* Division */ const AjPStr ajCodGetSpecies (const AjPCod thys); /* Species */ ajint ajCodGetCode (const AjPCod thys); /* Genetic code */ ajint ajCodGetNumcds (const AjPCod thys); /* Number of CDSs */ ajint ajCodGetNumcodon (const AjPCod thys); /* Number of codons */
Additionally, ajCodGetCodonlist
will write the codon triplets to a list of strings:
void ajCodGetCodonlist(const AjPCod cod, AjPList list);
The functions to set elements have Set
in their name. Again, variants for C-type strings are available:
void ajCodSetDescS (AjPCod thys, const AjPStr desc); /* Description */ void ajCodSetDivisionS (AjPCod thys, const AjPStr division); /* Division */ void ajCodSetNameS (AjPCod thys, const AjPStr name); /* Name */ void ajCodSetReleaseS (AjPCod thys, const AjPStr release); /* Release */ void ajCodSetSpeciesS (AjPCod thys, const AjPStr species); /* Species */ void ajCodSetCodenum (AjPCod thys, ajint geneticcode); /* Genetic code */ void ajCodSetNumcds (AjPCod thys, ajint numcds); /* Number of CDSs */ void ajCodSetNumcodons (AjPCod thys, ajint numcodon); /* Number of codons */ void ajCodSetBacktranslate (AjPCod thys); /* Amino acid index used for back-translation. */ void ajCodSetTripletsS (AjPCod thys, const AjPStr s, ajint *c); /* Number of codons */
In case you want to reuse a codon usage table there are two functions for clearing the data:
/* Zero all entries including the genetic codes (the amino acids for each codon) */ void ajCodClear(AjPCod thys); /* Zero the name, number count and fraction codon elements. */ void ajCodClearData(AjPCod thys);
Various properties of codon usage tables may be calculated including:
Codon adaptive index (see NAR 15:1281-1295)
Codon adaptive index W values (see NAR 15:1281-1295)
Gribskov statistic (count per thousand)
Effective number of codons (Gene 87:23-29))
Fractional count and codons per thousand
Sequence composition
The functions are as follows:
/* Codon adaptive index from a codon usage table */ double ajCodCalcCaiCod (const AjPCod thys); /* Codon adaptive index for a coding sequence */ double ajCodCalcCaiSeq (const AjPCod cod, const AjPStr str); /* Gribskov statistic (count per thousand) */ void ajCodCalcGribskov (AjPCod thys, const AjPStr s); /* Effective number of codons */ double ajCodCalcNc (const AjPCod thys); /* Fractional count and codons per thousand */ void ajCodCalcUsage (AjPCod thys, ajint c);
There are functions to backtranslate a string representing an amino acid sequence. ajCodBacktranslate
will perform a simple back translation whereas ajCodBacktranslateAmbig
will backtranslate a string to a fully ambiguous nucleotide sequence:
void ajCodBacktranslate (AjPStr *b, const AjPStr a, const AjPCod thys); void ajCodBacktranslateAmbig (AjPStr *b, const AjPStr a, const AjPCod thys);
Before you call these functions you should first call ajCodSetBacktranslate
. This initialises the codon usage object with the most commonly used triplet index for the amino acids.
These include:
/* Return one codon value given a possibly ambiguous base */ ajint ajCodBase (ajint c); /* Convert triplet index to triple */ char* ajCodTriplet (ajint idx); /* Return a codon index given a three character codon */ ajint ajCodIndex (const AjPStr s); /* Return a codon index given a three character codon */ ajint ajCodIndexC (const char *codon); /* Tests the output format for an outcodon ACD type */ ajint ajCodOutFormat (const AjPStr name);