The EMBASSY phylipnew package includes various applications for phylogenetic analysis.
A set of phylogenenetic data types is available to replicate the available input data types for phylip with automatic detection of the data formats (for example, distance matrix files). These are shown in the table (Table 6.19, “Phylogenenetic datatypes”).
AJAX datatype | ACD datatype (for reading) | ACD datatype (for writing) |
---|---|---|
AjPPhyloDist distance matrix data | distances | outdistance |
AjPPhyloFreq frequency data | frequencies | outfreq |
AjPPhyloProp properties data | properties | outproperties |
AjPPhyloState state data | discretestates | outdiscrete |
AjPPhyloTree phylogenetic tree data | tree | outtree |
AJAX library files for handling phylogenetic data are listed in the table (Table 6.20, “AJAX Library Files for Handling Phylogenetic Data”). Library file documentation, including a complete description of datatypes and functions, is available at:
http://emboss.open-bio.org/rel/dev/libs/ |
Library File Documentation | Description |
---|---|
ajphylo | Data structures and functions for handling the phylipnew applications. |
ajnexus | Data structures and functions for parsing the NEXUS file format. |
ajphylo.h/c
. Defines the objects and functions for handling phylogenetic data. These include:
Phylogeny distance matrix object (AjPPhyloDist
)
Phylogeny frequencies object (AjPPhyloFreq
)
Phylogeny properties object (AjPPhyloProp
)
Phylogeny discrete state data object (AjPPhyloState
)
Phylogeny tree object (AjPPhyloTree
).
They also include static functions for handling phylogenetic data at a low level. You are unlikely to need these unless you plan to extend the phylogeny handling code.
ajnexus.h/c
. Functions and objects (including static data structures and functions) for parsing the NEXUS
file format. You are unlikely to need this library file. See the online library documentation for further information.
For handling phylogenetic data input files defined in the ACD file use:
AjPPhyloState*
Phylogeny discrete state data object (for discretestates
ACD datatypes).
AjPPhyloDist
Phylogeny distance matrix object (for distances
ACD datatype).
AjPPhyloFreq
Phylogeny frequencies object (for frequencies
ACD datatype).
AjPPhyloProp
Phylogeny properties object (for properties
ACD datatype).
AjPPhyloTree*
Phylogeny tree object (for tree
ACD datatype).
For handling phylogenetic data output files defined in the ACD file use:
AjPOutfile
Output file (for all phylogenetic output ACD datatypes).
The datatypes for handling phylogenetic data input are:
discretestates
Discrete states file.
distances
Distance matrix.
frequencies
Frequency value(s).
properties
Property value(s).
tree
Phylogenetic tree.
The datatypes for handling phylogenetic data output are:
outdiscrete
Output file for phylogenetics discrete characteristics data.
outdistances
Output file for phylogenetics distance matrix data.
outfreq
Output file for phylogenetics character frequency data.
outproperties
Output file for phylogenetics property data.
outtree
Output file for phylogenetic tree data.
Typical ACD definitions for phylogenetic data input and output are shown below.
Input of discrete states data:
discretestates: discretestatesfile [ parameter: "Y" characters: "01PB?" knowntype: "discrete states" information: "Phylip discrete states file" ]
Input of distances data:
distances: distancesfile [ parameter: "Y" knowntype: "distance matrix" information: "Phylip distance matrix file" ]
Input of properties data:
properties: propertiesfile [ characters: "01" length: "$(infile.discretelength)" knowntype: "ancestral states" information: "Phylip ancestral states file" ]
Input of tree data:
tree: treefile [ parameter: "Y" knowntype: "newick" information: "Phylip tree file (optional)" ]
Output of discrete states data:
outdiscrete: outdiscretefile [ parameter: "Y" ]
Output of properties data:
outproperties: outpropertiesfile [ parameter: "Y" ]
All data definitions for phylogenetic data input and output should have intuitive names. There are some general guidelines but currently no specific naming rules are enforced. See Appendix A, ACD Syntax Reference.
Attributes that are typically specified are summarised below. They are datatype-specific (Section A.5, “Datatype-specific Attributes”) unless they are indicated as being global attributes (Section A.4, “Global Attributes”).
parameter:
If the phylogenetic data is the primary input or output of an EMBOSS application then it should be defined as a parameter by using the global attribute parameter: "Y"
.
characters:
Specifies the allowed discrete state or property characters for a discretestates
or properties
object respectively.
knowntype:
This global attribute is typically specified for all the phylogenetic input and output types.
information:
A global attribute used for the user prompt and in the application documentation.
length:
Specifies the number of property values per set (properties
datatype) or the number of frequency loci / values per set (frequencies
datatype).
size: Specifies the number of discrete state sets (discretestates
datatype), the number of frequency sets (frequencies
datatype) or the number of trees (tree
datatype).
Various calculated attributes (Section A.6, “Calculated Attributes”) of the datatypes are available at the level of the ACD file.
Datatypes and functions for handling phylogenetic data via the ACD file are shown below (Table 6.21, “Datatypes and Functions for Phylogenetic Data Input and Output”).
ACD datatype | AJAX datatype | To retrieve from ACD |
---|---|---|
Phylogenetic Data Input | ||
discretestates | AjPPhyloState* | ajAcdGetDiscretestates |
distances | AjPPhyloDist | ajAcdGetDistances |
frequencies | AjPPhyloFreq | ajAcdGetFrequencies |
properties | AjPPhyloProp | ajAcdGetProperties |
tree | AjPPhyloTree* | ajAcdGetTree |
Phylogenetic Data Output | ||
outdiscrete | AjPOutfile | ajAcdGetOutdiscrete |
outdistance | AjPOutfile | ajAcdGetOutdistance |
Outfreq | AjPOutfile | ajAcdGetOutfreq |
outproperties | AjPOutfile | ajAcdGetOutproperties |
Outtree | AjPOutfile | ajAcdGetOuttree |
Your application code will call embInit
to process the ACD file and command line (see Section 6.3, “Handling ACD Files”). All values from the ACD file are read into memory and files are opened as necessary. You have a handle on the files and memory through the ajAcdGet*
family of functions which return pointers to appropriate objects.
To retrieve input phylogenetic data an object pointer is declared and then initialised using the appropriate ajAcdGet*
function.
AjPPhyloState *data=NULL; data = ajAcdGetDiscretestates("discretestatesfile");
To retrieve an output phylogenetic data stream an object pointer is declared and initialised using the appropriate ajAcdGet*
function.
AjPOutfile outfile=NULL; outfile = ajAcdGetOutdiscrete("outdiscretefile");
AjPOutfile outfile=NULL; outfile = ajAcdGetOutdistance("outdistancefile");
AjPOutfile outfile=NULL; outfile = ajAcdGetOutproperties("outpropertiesfile");
There are functions to retrieve a single (the first) state or tree object from file:
AjPPhyloState ajAcdGetDiscretestatesSingle (const char *token); AjPPhyloTree ajAcdGetTreeSingle (const char *token);
Where these are used, it is still necessary to call the appropriate destructor function (see below) to ensure that the array of state or tree objects allocated during ACD file processing is freed.
Currently there are no functions for this.
It is your responsibility to close any files and free up memory at the end of the program.
To close an output phylogenetic data stream call ajOutfileClose
with the address of the output file:
ajOutfileClose(&outfile);
To use a phylogenetic data object that is not defined in the ACD file you must first instantiate the appropriate object pointer. The default constructor functions are:
AjPPhyloDist ajPhyloDistNew (void); AjPPhyloFreq ajPhyloFreqNew (void); AjPPhyloProp ajPhyloPropNew (void); AjPPhyloState ajPhyloStateNew (void); AjPPhyloTree ajPhyloTreeNew (void);
You must free the memory for an object once you are finished with it. The default destructor functions are:
void ajPhyloDistDel (AjPPhyloDist* pthis); void ajPhyloFreqDel (AjPPhyloFreq* pthis); void ajPhyloPropDel (AjPPhyloProp* pthis); void ajPhyloStateDel (AjPPhyloState* pthis); void ajPhyloTreeDel (AjPPhyloTree* pthis);
The default constructor and destructor functions are used as follows:
AjPPhyloDist dist = NULL; AjPPhyloFreq freq = NULL; AjPPhyloProp prop = NULL; AjPPhyloState state = NULL; AjPPhyloTree tree = NULL; /* Call constructor functions */ dist = ajPhyloDistNew(); freq = ajPhyloFreqNew(); prop = ajPhyloPropNew(); state = ajPhyloStateNew(); tree = ajPhyloTreeNew(); /* Do something with instantiated objects */ ... /* Call destructor functions */ ajPhyloDistDel (&dist); ajPhyloFreqDel (&freq); ajPhyloPropDel (&prop); ajPhyloStateDel (&state); ajPhyloTreeDel (&tree);
There are two alternative destructor functions used to free arrays of state and tree objects:
void ajPhyloStateDelarray(AjPPhyloState** pthis); void ajPhyloTreeDelarray(AjPPhyloTree** pthis);
They are used for state and tree objects instead of the default destructor to free memory from ACD file processing:
AjPPhyloState* states = NULL; AjPPhyloTree* trees = NULL; states = ajAcdGetDiscretestates(discretestatesfile); trees = ajAcdGetTree(treefile); /* Do something with objects */ ajPhyloStateDelarray(&states); ajPhyloTreeDelarray(&trees);
The functions for this are:
AjPPhyloDist* ajPhyloDistRead (const AjPStr filename, ajint size, AjBool missing); AjPPhyloFreq ajPhyloFreqRead (const AjPStr filename, AjBool contchar, AjBool genedata, AjBool indiv); AjPPhyloProp ajPhyloPropRead (const AjPStr filename, const AjPStr propchars, ajint len, ajint size); AjPPhyloState* ajPhyloStateRead (const AjPStr filename, const AjPStr statechars); AjPPhyloTree* ajPhyloTreeRead (const AjPStr filename, ajint size);
They are provided in case phylogenetic data needs to be processed outside the context of ACD file processing. See the on-line documentation for further information.
Currently there is a single function for this. It returns the size of a properties object:
ajint ajPhyloPropGetSize (const AjPPhyloProp thys);
These report the elements of each phylip object to the debug file. The functions are:
void ajPhyloDistTrace (const AjPPhyloDist thys); void ajPhyloFreqTrace (const AjPPhyloFreq thys); void ajPhyloPropTrace (const AjPPhyloProp thys); void ajPhyloStateTrace (const AjPPhyloState thys); void ajPhyloTreeTrace (const AjPPhyloTree thys);