Translation of a nucleotide sequence into a protein sequence is a common task. AJAX provides all the basic functionality you would expect. The nucleic sequence can be in a variety of forms (an AJAX string (AjPStr
), C-type string (char *
) or AJAX sequence object (AjPSeq
) and can be translated in all reading frames. The reverse complement of a sequence can also be translated.
The AJAX library file for handling sequence translation is listed in the table (Table 6.11, “AJAX Library Files for Handling Sequence Translation”). Library file documentation, including a complete description of datatypes and functions, is available at:
http://emboss.open-bio.org/rel/dev/libs/ |
Library File Documentation | Description |
---|---|
ajtranslate | Sequence translation |
ajtranslate.h/c
. Defines a sequence translation object (AjPTrn
) and include functions for handling sequence translation.
There is no dedicated ACD datatype for handling translation. Such operations are performed on a nucleotide sequence and so require a sequence input of the appropriate type. A genetic code is also required, a choice of which is provided to the user (usually) via a menu implemented by a list
ACD datatype. The ACD datatypes you'll require are therefore:
For general information on menu and sequence handling see:
Handling of ACD menus (Section 6.19, “Handling Menus”) |
Handling of sequences (Section 6.7, “Handling Sequences”) |
A typical ACD definition for single sequence input:
sequence: sequence [ parameter: "Y" type: "nucleotide" ]
The available genetic codes must be defined in the ACD file and the list
datatype may be used for this. EMBOSS supports a standard set of genetic codes which are given as follows:
list: table [ additional: "Y" default: "0" minimum: "1" maximum: "1" header: "Genetic codes" values: "0:Standard; 1:Standard (with alternative initiation codons); 2:Vertebrate Mitochondrial; 3:Yeast Mitochondrial; 4:Mold, Protozoan, Coelenterate Mitochondrial and Mycoplasma/Spiroplasma; 5:Invertebrate Mitochondrial; 6:Ciliate Macronuclear and Dasycladacean; 9:Echinoderm Mitochondrial; 10:Euplotid Nuclear; 11:Bacterial; 12:Alternative Yeast Nuclear; 13:Ascidian Mitochondrial; 14:Flatworm Mitochondrial; 15:Blepharisma Macronuclear; 16:Chlorophycean Mitochondrial; 21:Trematode Mitochondrial; 22:Scenedesmus obliquus; 23:Thraustochytrium Mitochondrial" delimiter: ";" codedelimiter: ":" information: "Code to use" ]
The order of the codes is currently important; the list must be given in the exact order shown above.
For handling sequence translation, which requires sequence and menu input, use:
Datatypes and functions for handling translation via the ACD file are shown below (Table 6.12, “Datatypes and Function for Sequence Translation”). Here a single selection from the list is retrieved but other types of menu, input sequence or access methods could be used.
To read a sequence | To read a single selection from a list | |
ACD datatype | sequence | list |
---|---|---|
Object | AjPSeq | AjPStr |
To retrieve from ACD | ajAcdGetSeq | ajAcdGetListSingle |
Your application code will call embInit
to process the ACD file and command line (see Section 6.3, “Handling ACD Files”). All values from the ACD file are read into memory and files are opened as necessary. You have a handle on the files and memory through the ajAcdGet*
family of functions which return pointers to appropriate objects.
To retrieve the sequence or menu selection object pointers are declared then initialised using the appropriate ajAcdGet*
function.
To retrieve an input sequence:
AjPSeq seq=NULL; seq = ajAcdGetSeq("sequence");
The option selected from the list of genetic codes is required as an integer. ajAcdGetListSingle
returns the selection as a string, the initial part of which is converted to an integer using ajStrToInt
. This integer is passed to ajTrnNewI
for the creation of the translation table object. This is why the list order in the ACD file is important! You must also declare a translation object pointer:
AjPTrn trnTable = NULL; /* Translation object pointer */ AjPStr gcode = NULL; /* Genetic code (selection from list) */ ajint n = 0; /* Selection */ gcode = ajAcdGetListSingle("table"); ajStrToInt(gcode,&n); trnTable = ajTrnNewI(n);
It is your responsibility to free up memory at the end of the program. You must call the default destructor function for the translation, sequence and string objects used for the ACD data definitions:
/* Deletes a translation table object */ void ajTrnDel(AjPTrn* pthis); /* Delete a string object. */ void ajStrDel (AjPStr *Pstr); /* Delete a sequence object. */ void ajSeqDel (AjPSeq* Pseq);
Function ajTrnExit
is automatically called on exit to clean up internal memory used for housekeeping of translation processing:
void ajTrnExit(void);
To use a translation object you must first instantiate the appropriate object pointer. Default construction functions are provided. They will read a translation data file from the EMBOSS data search directory (see the EMBOSS Users Guide) called EGC.
, where n
n
is the number of the genetic code to use. This number can be provided explicitly to ajTrnNewI
. Alternatively a file can be opened by filename by calling ajTrnNew
or ajTrnNewC
:
/* ReadsEGC.
wheretrnFileNameInt
trnFileNameInt
is supplied as a parameter. */ AjPTrn ajTrnNewI (ajint trnFileNameInt); /* ReadstrnFileName
. */ AjPTrn ajTrnNew (const AjPStr trnFileName); /* ReadstrnFileName
. */ AjPTrn ajTrnNewC (const char *trnFileName);
All constructors return the address of a new object. The pointers do not need to be initialised to NULL
but it is good practice to do so:
AjPStr gcode =NULL; AjPTrn trnTable = NULL; ajint n = 0; gcode = ajAcdGetListSingle("table"); ajStrToInt(gcode,&n); trnTable = ajTrnNewI(n); /* The object is instantiated and ready for use */
Alternatively:
AjPStr name = NULL; AjPStr gcode = NULL; AjPTrn trnTable = NULL; ajint n = 0; gcode = ajAcdGetListSingle("table"); ajStrToInt(gcode,&n); name = ajStrNew(); ajFmtPrintS(&name, "EGC.%d", n); /* Create the string EGC.n */ trnTable = ajTrnNew(name); /* The object is instantiated and ready for use */
For the examples above you must free a single string and sequence:
AjPSeq seq =NULL; AjPStr gcode =NULL; seq = ajAcdGetSeq("sequence"); gcode = ajAcdGetListSingle("table"); ajStrToInt(gcode,&n); /* Do something */ ajSeqDel(&seq); ajStrDel(&str);
You must free the memory for the translation object before the pointer is re-used and also once you are finished with it. A default destructor function is provided:
/* Deletes a translation table object */ void ajTrnDel(AjPTrn* pthis);
It is used as follows:
AjPStr gcode =NULL; AjPTrn trnTable = NULL; ajint n = 0; gcode = ajAcdGetListSingle("table"); ajStrToInt(gcode,&n); trnTable = ajTrnNewI(n); /* The object is instantiated and ready for use */ ajTrnDel(&trnTable); /* The memory is freed and the pointer reset to NULL, ready for re-use. */
ajTrnSeqSeqOrig
creates a peptide sequence containing the full translation of a nucleotide sequence, including any trailing partial codon (1 or 2 base) which translate to X unless the first 2 bases can only define one amino acid:
AjPSeq ajTrnSeqSeqOrig (const AjPTrn trnObj, const AjPSeq seq, ajint frame);
A nucleotide sequence held in an AJAX string (AjPStr
), C-type string (char *
) or AJAX sequence object (AjPSeq
) can be translated into protein using:
void ajTrnSeqSeq (const AjPTrn trnObj, const AjPStr str, AjPStr *pep); void ajTrnSeqC (const AjPTrn trnObj, const char *str, ajint len, AjPStr *pep); void ajTrnSeqSeq (const AjPTrn trnObj, const AjPSeq seq, AjPStr *pep);
These functions translate in frame 1 (from the first base) to the last full triplet codon. If there are 1 or 2 bases extra at the end then they are ignored.
To translate the reverse complement of a sequence call:
void ajTrnSeqRevC (const AjPTrn trnObj, const char *str, ajint len, AjPStr *pep); void ajTrnSeqRevS (const AjPTrn trnObj, const AjPStr str, AjPStr *pep); void ajTrnSeqRevSeq (const AjPTrn trnObj, const AjPSeq seq, AjPStr *pep);
These functions translate in frame -1 (from the last base) to the first full triplet codon. If there are 1 or 2 bases extra at the start then they are ignored. All functions will append the translation to the input peptide.
Alternative translation is available for people who define frame '-1' as being the frame starting from the first base of a reverse-complemented sequence. To translate the reverse complement of a sequence call:
void ajTrnSeqAltRevC (const AjPTrn trnObj, const char *str, ajint len, AjPStr *pep); void ajTrnSeqAltRevS (const AjPTrn trnObj, const AjPStr str, AjPStr *pep); void ajTrnSeqAltRevSeq (const AjPTrn trnObj, const AjPSeq seq, AjPStr *pep);
These functions translate in frame -4 (from the last base) to the first full triplet codon, (i.e. if there are 1 or 2 bases extra at the start then they are ignored. All functions will append the translation to the input peptide.
The frame of translation may be specified:
void ajTrnSeqFrameC (const AjPTrn trnObj, const char *seq, ajint len, ajint frame, AjPStr *pep); void ajTrnSeqFrameS (const AjPTrn trnObj, const AjPStr seq, ajint frame, AjPStr *pep); void ajTrnSeqFrameSeq (const AjPTrn trnObj, const AjPSeq seq, ajint frame, AjPStr *pep); AjPSeq ajTrnSeqFramePep (const AjPTrn trnObj, const AjPSeq seq, ajint frame);
All functions will append the translation to the input peptide. In contrast, ajTrnSeqFramePep
returns a AjPSeq
object with the new peptide.
These functions translate in the specified frame (which must be one of 1,2,3,-1,-2,-3,4,5,6,-4,-5,-6) to the last full triplet codon, i.e. if there are 1 or 2 bases extra at the end, they are ignored. Frames -6 to -1 give translations in the reverse sense, frames 1 to 3 give normal forward translations. Frames 4 to 6 reverse complement the DNA sequence then reverse the peptide sequence. Frames 4 to 6 are therefore reversed protein sequences useful mainly for displaying beneath the original DNA sequence.
Frame -1 is defined as the translation of the reverse complemented sequence which matches the codons used in frame 1. For example, in the sequence ACGT
the first codon of frame 1 is ACG
and the last codon of frame -1 is the reverse complement of ACG
i.e. CGT
.
Frame -4 is defined as the translation of the reverse complement, starting the translation in the first codon of the reversed sequence. In the sequence ACGT, the last codon is CGT and so frame -4 translates from the reverse complement of CGT (i.e. ACG) - this is for those people who define frame -1 as using the first codon when the sequence is reverse-complemented. This is also known as the 'alternative frame -1'.
Frame -5 starts on the penultimate base. (Alternative frame -2). Frame -6 starts on the ante-penultimate base. (Alternative frame -3). Frame 4 is the same as frame -1, 5 is -2, 6 is -3.
To complete a translation by attempting to translate the last 1 or two bases of a frame call:
ajint ajTrnSeqDangleC (const AjPTrn trnObj, const char *seq, ajint frame, AjPStr *pep); ajint ajTrnSeqDangleS (const AjPTrn trnObj, const AjPStr seq, ajint frame, AjPStr *pep);
In both cases, the translation is appended to the input peptide.
There are functions to translate a single codon into one-letter or three-letter amino acid codes. Alternative functions that take a C-type (char *
) string are available but not shown:
/* Translates a codon into a 3-letter code. */ char ajTrnCodonS (const AjPTrn trnObj, const AjPStr codon); /* Translates the reverse complement of a codon into a 3-letter code. */ char ajTrnCodonRevS (const AjPTrn trnObj, const AjPStr codon); /* Translates a codon into a 1-letter code. */ char ajTrnCodonC (const AjPTrn trnObj, const char *codon); /* Translates the reverse complement of a codon into a 1-letter code. */ char ajTrnCodonRevC (const AjPTrn trnObj, const char *codon);
There are a couple of functions for retrieving the elements of a translation object:
AjPStr ajTrnGetTitle (const AjPTrn thys); AjPStr ajTrnGetFilename (const AjPTrn thys);
To check whether the input codon is a start codon, a stop codon or something else, call
ajint ajTrnCodonstrTypeC (const AjPTrn trnObj, const char *codon, char *aa);
ajint ajTrnCodonstrTypeS (const AjPTrn trnObj, const AjPStr codon, char *aa);
To return the genetic code description as a string for a given translation table file name number call:
const AjPStr ajTrnName(ajint trnFileNameInt);
To create a suitably named sequence object to hold a peptide translation call:
AjPSeq ajTrnNewPep(const AjPSeq nucleicSeq, ajint frame);
To read a translaton data file (used internally when translation is initialised):
void ajTrnReadFile(AjPTrn trnObj, AjPFile trnFile);