A feature is a region of interest in a molecular sequence. Features include things like restriction enzyme cut sites, protein secondary structure prediction states, exon positions, regions of motif matches and so on. EMBOSS supports, for input and output, most of the common sequence feature formats (see the EMBOSS Users Guide) that were developed for the major sequence databases and for input of features into the genome databases. The name of a features file and the format of the features in the file are specified on the command line using a Uniform Feature Object (UFO) (see the EMBOSS Users Guide).
Many applications do not read and write features directly. Features are also used to store the results of sequence analysis, and can be written out as 'reports' where a report format defined through ACD is used to write out a feature table and (for some formats) the original sequence (see Section 6.15, “Handling Application Reports”).
Features are annotations of simple ranges in a sequence (start and end) or of a numbered group of features which have a 'join' (to combine exons in a coding sequence) or some other combination ('group' or one-of' in the EMBL/Genbank feature table). These complex features are stored as a parent feature with a set of simpler features for each component. Currently these are stored in the same feature table. In a future release these may become subfeatures to simplify sorting operations.
The feature types need to be standardised to allow interconversion of formats. EMBOSS uses a set of data files installed in the share/EMBOSS/data/ directory to define types and tag names for each input/output format, and for internal use. The master internal naming files are Efeatures.emboss and Etags.emboss for nucleotid efeatures, and Efeatures.protein and Etags.protein for protein features. These include the files for the major feature format definitions so that most feature types and tags (where there is no clash between formats) will be stored and returned unchanged. For any type or tag that does not appear in these files, the first name defined is used as a default ('misc_feature' for nucleotide type, 'polypeptide region' for protein type, 'note' for a tag name
A feature object is modelled on the GFF3 feature data format, where a features is described by:
Start and end position
Name describing the feature
The strand direction (in a nucleic sequence)
A score
A feature object also holds data on:
A second start and end position for features where the start or end is wider than a single base or residue.
Source records the names of the program or database from which the features were derived.
The feature type (feature key) using an internal name derived from the Sequence Ontology (SO) and defined in the Efeatures.emboss
(nucleotide) or Efeatures.protein
(protein) data files and include all EMBL/GenBank and UniProt feature types.
List of tag names and values which are defined in the Etags.emboss
(nucleotide) or Etags.protein
(protein) data files and include all EMBL/GenBank or UniProt feature qualifiers.
Frame 1..3, -1..-3 for coding nucleotide features or 0 for non-coding or protein features.
Flag bit mask for EMBL location to record features between bases (11^12), types of join/order/one-of, and other attributes.
Group number for the individual exons and the parent of join/order/one-of feature locations in the EMBL/GenBank feature table.
Remote ID where the feature location (e.g. for an exon used by a join) is in another entry in the same database.
A label for the location of the feature in another entry.
Exon number.
A feature table is simply a group of features and is stored in one of three contexts:
As part of a sequence file
As part of a database entry
As a raw feature table (a file that does not contain the sequence the features refer to)
Most feature table definitions have a controlled vocabulary, i.e. there is a specified list of feature key names and feature tag names that can be used. This means that you cannot edit feature tables to add in features with new keys. If you edit a feature table you must stick to the allowed set of feature keys.
'Named note' tags are a way to store feature tag names that are not in the alowed set. The default (note) feature tag is stored with a value that begins with '*name' followed by the value. This preserves the annotation in a readable form when features are written out using a standard format such as EMBL or GFF3.
Features to be read or written by an application are defined in the application ACD file, although it is possible to create feature tables directly if this is required.
A set of command line qualifiers are available for features. These allow you to set such things as file name and format and the region of the sequence containing the features of interest. These qualifiers may be "hard-coded" as attributes in the ACD file (see Section A.5.2.8, “features
” and Section A.5.3.2, “featout
”).
AJAX provides comprehensive functionality for handling features including:
Features may be read and written directly as an alternative to ACD processing
Elements of the objects for handling features may be retrieved or set directly
Handling of feature tags
Querying the properties of features and feature tables
Processing of features and feature tables
AJAX library files for handling features are listed in the table (Table 6.13, “AJAX Library Files for Handling Features”). Library file documentation, including a complete description of datatypes and functions, is available at:
http://emboss.open-bio.org/rel/dev/libs/ |
Library File Documentation | Description |
---|---|
ajfeat | General sequence feature handling |
ajfeatdata | Feature datatypes |
ajfeat.h/c
. Most of the functions you will ever need for general feature handling. They also contain static functions for handling features at a low level. You are unlikely to need these unless you plan to implement code to extend the core functionality of the library.
ajfeatdata.h/c
. Basic feature objects (AjPFeattable
, AjPFeature
and AjPFeattabOut
) for general use, e.g. retrieving features via ACD file processing. It also defines a feature input object (AjPFeattabIn
) used for low level feature input handling.
A feature table (not individual features) may be specified for input or output in an ACD file.
The datatype for handling feature table input is:
features
Feature table input.
The datatype for handling feature table output is:
featout
Feature table output.
Features can also be read from an input sequence and written alongside an output sequence if the features:
ACD attribute is set. If set then the sequence output will include feature information either in the same file (if the sequence format supports it) or in a separate file (by default in GFF format).
ACD datatypes for sequence input include:
ACD datatypes for sequence output include:
A typical ACD definition for feature input:
features: features [ parameter: "Y" type: "protein" ]
A typical ACD definition for feature output:
featout: outfeat [ parameter: "Y" type: "protein" multiple: "N" ]
A typical ACD definition for sequence input with features:
# single input sequence sequence: sequence [ parameter: "Y" type: "gapany" features: "Y" ]
A typical ACD definition for sequence output with features:
# single sequence seqout: outseq [ parameter: "Y" type: "gapany" features: "Y" ]
The use of the sequence
datatype is for illustrative purposes; the other sequence input and output types could also have been given.
All data definitions for feature input and output should have standard parameter names. These include:
features
for any feature inputs
outfeat
for any outputs
Alternatives and variations (e.g. afeatures
, bfeatures
for multiple inputs, are allowed)
For more information see Appendix A, ACD Syntax Reference.
Attributes that are typically specified are summarised below. They are datatype-specific (Section A.5, “Datatype-specific Attributes”) unless they are indicated as being global attributes (Section A.4, “Global Attributes”).
parameter:
Features are typically the primary input or output of an EMBOSS application and, as such, should be defined as parameters by using the global attribute parameter: "Y"
.
type:
Specifies the type of the sequence (protein
or nucleotide
) to which the features pertain and is used for validation purposes.
multiple:
A boolean attribute that can be set for a featout
data definition to specify the feature annotation is for multiple sequences.
offormat:
GFF format is used by default for the output feature(s). The format is normally set at the command line but a default may be hardcoded with offormat:
. All common feature formats are supported (see the EMBOSS Users Guide).
For handling feature tables, including input feature tables defined in the ACD file, use:
AjPFeattable
Feature table which includes a list of AjPFeature
objects (for the features
ACD datatype).
For handling feature table outputs defined in the ACD file use:
AjPFeattabOut
Feature table output (for the featout
ACD datatype).
There is also a basic object for handling individual features:
AjPFeature
Biological feature.
There is a datatype for low level feature input beyond that provided by the static datatypes in the various library files:
AjPFeattabIn
Low level feature table input.
You are unlikely to need AjSFeattabIn
unless you plan to implement code to support new feature formats for EMBOSS. For advice on how to do this ask the EMBOSS developers.
All sequence objects can include a feature table. On input through an ACD datatype features will be read if the features:
attribute is true in the ACD definition.
In developing applications, feature tables are most likely to be used as part of a report output. A sequence is read, analysis results are generated as features, and both are output as a report format (see Section 6.15, “Handling Application Reports”).
Datatypes and functions for handling features via the ACD file are shown below (Table 6.14, “Datatypes and Functions for Feature Input and Output”).
To read features | To write features | |
ACD datatype | features | featout |
---|---|---|
AJAX datatype | AjPFeattable | AjPFeattabOut |
To retrieve from ACD | ajAcdGetFeatures | ajAcdGetFeatout |
Your application code will call embInit
to process the ACD file and command line (see Section 6.3, “Handling ACD Files”). All values from the ACD file are read into memory and files are opened as necessary. You have a handle on the files and memory through the ajAcdGet*
family of functions which return pointers to appropriate objects.
To retrieve input features an object pointer is declared and then initialised using ajAcdGetFeatures
:
AjPFeattable features=NULL; features = ajAcdGetFeatures("features");
To retrieve an output feature stream an object pointer is declared and initialised using ajAcdGetFeatout
:
AjPFeattabOut outfeat=NULL; outfeat = ajAcdGetFeatout("outfeat");
The features
input datatype has various inbuilt command line qualifiers (see ) including -fbegin
and -fend
which specify a start and end position for the features, and -freverse
to reverse the orientation of nucleotide features.
When a feature table is read the feature values are held in the appropriate feature table object. Regardless of the range values you still get the entire table loaded into memory. The functions ajFeattableTrim
or ajFeattableTrimOff
are used to trim the features to the region defined by -fbegin
and -fend
:
AjFeattable ftable=NULL; ftable = ajAcdGetFeatures("features"); ajFeattableTrim(ftable); /* ajFeattableTrimOff(ftable,0,ajFeattableGetLen(ftable));*/
When a sequence is read the feature values are held in a feature table in the appropriate sequence object. Regardless of the sequence range values you still get the entire sequence loaded into memory. The function ajSeqTrim
(or ajSeqsetTrim
for a AjPSeqset
object) is used to trim the sequence and features to the region defined by -sbegin
and -send
:
AjPSeq seq=NULL; seq = ajAcdGetSeq("sequence"); ajSeqTrim(seq);
It is your responsibility to free up memory at the end of the program. You must call the default destructor function (see below) on any objects returned by calls to ajAcdGet*
.
Additionally you must call embExit
to clean up internal memory including that allocated for the housekeeping of feature tables:
embExit();
To use a feature table object that is not defined in the ACD file you must first instantiate the appropriate object pointer. The default construction functions ajFeattableNew
is provided for this. ajFeattableNew
leaves the type of feature table uninitialised. It is set when a feature is added to the table. TYpe-specific functions ajFeattableNewDna
and ajFeattableNewProt
create feature tables for nucleotide and protein features respectively. Function ajFeattableNewSeq
creates a feature table with the type and length matching a sequence object.
Feature table output objects are typically loaded from ACD file processing (see above). In the unlikely event you need to create one manually you can use the default constructor ajFeattabOutNew
or functions ajFeattabOutNewCSF
or ajFeattabOutNewSSF
which use an existing output file and a specified type and sequence name
To create a feature object (usually to be stored within a feature table), a similar set of functions is available. ajFeatNew
creates a feature with all attributes including the type. ajFeatNewProt
creates a feature with all attributes required by protein features (no strand or frame). ajFeatNewII
and ajFeatNewIIRev
are generic constructors requiring only the start and end values. The feature type will default to a "miscellaneous feature" or "polypeptide region" value. ajFeatNewFeat
is a copy constructor making a feature object from an existing feature.
/* Feature Object */ AjPFeature ajFeatNew (AjPFeattable thys, const AjPStr source, const AjPStr type, ajint Start, ajint End, float score, char strand, ajint frame); AjPFeature ajFeatNewProt(AjPFeattable thys, const AjPStr source, const AjPStr type, ajint Start, ajint End, float score); AjPFeature ajFeatNewII (AjPFeattable thys, ajint Start, ajint End); AjPFeature ajFeatNewIIRev (AjPFeattable thys, ajint Start, ajint End); AjPFeature ajFeatNewFeat (const AjPFeature orig); /* General Feature Table Object */ AjPFeattable ajFeattableNew (const AjPStr name); AjPFeattable ajFeattableNewDna (const AjPStr name); AjPFeattable ajFeattableNewProt (const AjPStr name); AjPFeattable ajFeattableNewSeq (const AjPSeq seq); /* Output Feature Table Object */ AjPFeattabOut ajFeattabOutNew (void); AjPFeattabOut ajFeattabOutNewCSF (const char* fmt, const AjPStr name, const char* type, AjPFile file); AjPFeattabOut ajFeattabOutNewSSF (const AjPStr fmt, const AjPStr name, const char* type, AjPFile file);
The parameters to ajFeatNew
are as follows:
thys
Pointer to the feature table to which the new feature is added
source
Analysis basis for feature
type
Type of feature (e.g. exon)
Start
Start position of the feature
End
End position of the feature
score
Analysis score for the feature
strand
Strand of the feature
frame
Frame of the feature
All constructors return the address of a new object. The pointers do not need to be initialised to NULL
but it is good practice to do so.
You must free the memory for an object once you are finished with it. The functions are:
void ajFeatDel (AjPFeature *pthis); void ajFeattableDel (AjPFeattable *pthis) ; void ajFeattabOutDel (AjPFeattabOut* pthis);
In the example below, a feature table and individual features are built manually using the default constructor functions. The features are written out to a feature table retrieved from ACD processing:
AjPFeattable feattable; AjPStr name = NULL; AjPStr source = NULL; AjPStr type = NULL; char strand = '+'; ajint frame = 0; AjPFeature feature = NULL; AjPFeattabOut output = NULL; ajint i; float score = 0.0; embInit("demofeatures", argc, argv); output = ajAcdGetFeatout("outfeat"); ajStrAssignC(&name,"seq1"); feattable = ajFeattableNew(name); ajStrAssignC(&source,"demofeature"); score = 1.0; for(i=1;i<11;i++) { if(i & 1) ajStrAssignC(&type,"CDS"); else ajStrAssignC(&type,"misc_feature"); feature = ajFeatNew(feattable, source, type, i, i+10, score, strand, frame); } ajFeattableWrite(output, feattable); ajStrDel(&source); ajStrDel(&name); ajStrDel(&type); ajFeattableDel(&feattable); ajFeattabOutDel(&output);
There are a variety of alternative ways to create a feature object. The start and end position of the features may be specified:
AjPFeature ajFeatNewII (AjPFeattable thys, ajint Start, ajint End); AjPFeature ajFeatNewIIRev (AjPFeattable thys, ajint Start, ajint End); AjPFeature ajFeatNewProt (AjPFeattable thys, const AjPStr source, const AjPStr type, ajint Start, ajint End, float score);
ajFeatNewIIRev
sets features to be on the reverse strand whereas ajFeatNewProt
is for protein features.
For cases where a copy of a feature is required that can be safely changed and/or deleted you can use ajFeatNewFeat
:
AjPFeature ajFeatNewFeat (const AjPFeature orig);
There are a variety of alternative ways to create a feature table object, either by name or from an existing sequence object:
/* DNA feature table */ AjPFeattable ajFeattableNewDna (const AjPStr name); /* Protein feature table */ AjPFeattable ajFeattableNewProt (const AjPStr name); /* From existing sequence; type is determined by the sequence type. */ AjPFeattable ajFeattableNewSeq (const AjPSeq seq);
For cases where a copy of a feature table is required that can be safely changed and/or deleted you can use:
/* Copy whole feature table */ AjPFeattable ajFeattableNewFtable (const AjPFeattable orig); /* Copy limited number of features */ AjPFeattable ajFeattableNewFtableLimit (const AjPFeattable orig, ajint limit);
A feature table may be retrieved from a sequence object using these functions (defined in ajseq.c
):
AjPFeattable ajSeqGetFeatCopy (const AjPSeq thys); const AjPFeattable ajSeqGetFeat (const AjPSeq thys);
To add a new feature (AjPFeature
) to a feature table (AjPFeattable
) call:
void ajFeattableAdd (AjPFeattable thys, AjPFeature feature);
To clear a feature table of all features call:
void ajFeattableClear (AjPFeattable thys);
To clear an output feature table of all features call:
void ajFeattabOutClear (AjPFeattabOut *thys);
Features may be read directly, using feature table input objects (these are the functions used by ACD processing):
AjPFeattable ajFeattableNewRead (AjPFeattabIn ftin); AjPFeattable ajFeattableNewReadUfo (AjPFeattabIn tabin, const AjPStr Ufo);
ajFeattableNewReadUfo
will parse a UFO, open an input file and read a feature table. ajFeattableNewRead
is a generic interface function for reading in features from a feature table input object.
Features may be written directly i.e. without using ACD processing (which uses either GFF3 format by default or the format defined by the environment variable EMBOSS_OUTFEATFORMAT
). Features may be written in any format defined in the data structure FeatOOutFormat
defined in ajfeat.c
:
AjBool ajFeatOutFormatDefault (AjPStr* pformat); AjBool ajFeattableWriteUfo (AjPFeattabOut tabout, const AjPFeattable thys, const AjPStr Ufo); AjBool ajFeattableWrite (const AjPFeattable ft, AjPFeattabOut ftout); AjBool ajFeattableWrite (AjPFeattable thys, const AjPStr ufo);
ajFeatOutFormatDefault
sets the default output format which is "gff" unless the EMBOSS_OUTFEATFORMAT variable is defined or a format is passed in the pformat
parameter.
ajFeattableWriteUfo
and ajFeattableWrite
are equivalent to ajFeattableNewRead
and ajFeattableNewReadUfo
but for writing. ajFeattableWriteUfo
will parse a UFO, open an output file and write a feature table to it. ajFeattableWrite
is generic interface function for writing features to a file given the file handle, class of map, data format of output and possibly other associated data.
The following functions write the feature table in the indicated format:
/* DDBJ format */ AjBool ajFeattableWriteDdbj (const AjPFeattable features, AjPFile file); /* EMBL format */ AjBool ajFeattableWriteEmbl (const AjPFeattable features, AjPFile file); /* Genbank format */ AjBool ajFeattableWriteGenbank (const AjPFeattable features, AjPFile file); /* GFF format */ AjBool ajFeattableWriteGff2 (const AjPFeattable features, AjPFile file); /* GFF format */ AjBool ajFeattableWriteGff3 (const AjPFeattable features, AjPFile file); /* PIR format */ AjBool ajFeattableWritePir (const AjPFeattable features, AjPFile file); /* SwissProt format */ AjBool ajFeattableWriteSwiss (const AjPFeattable features, AjPFile file);
Feature tables may be written to an application report using the following functions defined in ajreport.h/c
:
void ajReportSetType (AjPReport thys, const AjPFeattable ftable, const AjPSeq seq); AjBool ajReportWrite (AjPReport thys, const AjPFeattable ftable, const AjPSeq seq); void ajReportWriteHeader (AjPReport thys, const AjPFeattable ftable, const AjPSeq seq); void ajReportWriteTail (AjPReport thys, const AjPFeattable ftable, const AjPSeq seq);
For more information on reports, see Section 6.15, “Handling Application Reports”.
Functions described here are for manipulating an output feature table object.
To open the output file call:
AjBool ajFeattabOutOpen (AjPFeattabOut thys, const AjPStr ufo);
Elements of the output feature table object may be retrieved and queried using:
AjPFile ajFeattabOutFile (const AjPFeattabOut thys); AjPStr ajFeattabOutFilename (const AjPFeattabOut thys); /* These functions are used internally to test whether the output file has been opened and used */ AjBool ajFeattabOutIsLocal (const AjPFeattabOut thys); AjBool ajFeattabOutIsOpen (const AjPFeattabOut thys);
Elements of an output feature table are set with:
/* sets the UFO (format and filename) for feature output */ AjBool ajFeattabOutSet (AjPFeattabOut thys, const AjPStr ufo); /* sets the base file name (.format) for feature output */ void ajFeattabOutSetBasename (AjPFeattabOut thys, const AjPStr basename); /* sets the feature table type 'any', 'N' 'nucleotide' or 'P' 'protein' */ AjBool ajFeattabOutSetType (AjPFeattabOut thys, const AjPStr type); AjBool ajFeattabOutSetTypeC (AjPFeattabOut thys, const char* type);
The elements of a feature object may be retrieved using the following:
/* End position */ ajuint ajFeatGetEnd (const AjPFeature thys); /* Direction (ajTrue for a forward direction, ajFalse for reverse) */ AjBool ajFeatGetForward (const AjPFeature thys); /* Reading frame */ ajint ajFeatGetFrame (const AjPFeature thys); /* Sequence length */ ajuint ajFeatGetLength(const AjPFeature thys); /* Finds a named note tag (a general tag value with a *name prefix) */ AjBool ajFeatGetNoteS (const AjPFeature thys, AjPStr* val, const AjPStr name); AjBool ajFeatGetNoteSI (const AjPFeature thys, AjPStr* val, const AjPStr name, ajint count); AjBool ajFeatGetNoteC (const AjPFeature thys, AjPStr* val, const char* name); AjBool ajFeatGetNoteCI (const AjPFeature thys, AjPStr* val, const char* name, ajint count); /* Returns the nth value of a named feature tag. If not found as a tag, also searches for a named note tag*/ AjBool ajFeatGetTagC (const AjPFeature thys, const char* tname, ajint num, AjPStr* Pval) AjBool ajFeatGetTagS (const AjPFeature thys, const AjPStr name, ajint num, AjPStr* val); /* Score */ float ajFeatGetScore (const AjPFeature thys); /* Source name */ const AjPStr ajFeatGetSource (const AjPFeature thys); /* Start position */ ajuint ajFeatGetStart (const AjPFeature thys); /* Strand */ char ajFeatGetStrand (const AjPFeature thys); /* Returns the type (key) of a feature object. */ const AjPStr ajFeatGetType (const AjPFeature thys);
Note that ajFeatGetType
returns a copy of the pointer to the type (key) of the specified feature object. The key is still owned by the feature and should not to be destroyed!
The elements of a feature table object may be retrieved be using the following:
/* Returns the feature table start position, or 1 if no start has been set. */ ajint ajFeattableGetBegin (const AjPFeattable thys); /* Returns the features table end position, or the feature table length if no end has been set.*/ ajint ajFeattableGetEnd (const AjPFeattable thys); /* Returns the name of a feature table object. */ const AjPStr ajFeattableGetName (const AjPFeattable thys); const char* ajFeattableGetTypeC (const AjPFeattable thys); const AjPStr ajFeattableGetTypeS (const AjPFeattable thys); /* Returns the sequence length of a feature table */ ajint ajFeattableGetLen (const AjPFeattable thys); /* Returns the number of features */ ajint ajFeattableGetSize (const AjPFeattable thys);
ajFeattableGetName
, ajFeattableGetTypeC
and ajFeattableGetTypeS
return a copy of the pointer to the name or type (key). This is still owned by the feature table and so should not to be destroyed.
The elements (indicated in comments) of a feature object may be set using the following:
/* Description */ void ajFeatSetDesc (AjPFeature thys, const AjPStr desc); /* Append to description */ void ajFeatSetDescApp (AjPFeature thys, const AjPStr desc); /* Score */ void ajFeatSetScore (AjPFeature thys, float score); /* Strand */ void ajFeatSetStrand (AjPFeature thys, AjBool rev);
ajFeattableSetDefname
will provides a unique name for the current program run for a feature table.
The elements of a feature table object may be set using the following:
/* Name */ void ajFeattableSetDefname (AjPFeattable thys, const AjPStr setname); /* Sequence length */ void ajFeattableSetLength (AjPFeattable thys, ajuint len) /* Type to nucleotide */ void ajFeattableSetNuc (AjPFeattable thys); /* Type to protein */ void ajFeattableSetProt (AjPFeattable thys); /* Begin and end range */ void ajFeattableSetRange (AjPFeattable thys, ajint fbegin, ajint fend) ;
*/
Feature tags (names and values) are stored as pairs in a list. Tags can be returned as arrays or iterated over. When values are added (annotating a feature) they are usually simply defined as a name and value pair which replaces any existing value with the same tag name. Some tag names allow multiple values (for example the EMBL/Genbank feature 'note' used for general annotation and for 'named note tags' with a '*name' prefix to the value). These can be added as extra values using the ajFeatTagAdd
functions. The tag values are validated against the most recent EMBL/GenBank feature table documentation. Warning messages are generated if variable EMBOSS_FEATWARN
is set true, but turned off by default to avoid excessive warnings on data from other sources. Functions for handling feature tags include:
/* Sets a feature tag value, creating a new feature tag even if one already exists. */ AjBool ajFeatTagAdd (AjPFeature thys, const AjPStr tag, const AjPStr value); AjBool ajFeatTagAddC (AjPFeature thys, const char* tag, const AjPStr value); AjBool ajFeatTagAddCC (AjPFeature thys, const char* tag, const char* value); /* Sets a feature tag value */ AjBool ajFeatTagSet (AjPFeature thys, const AjPStr tag, const AjPStr value); AjBool ajFeatTagSetC (AjPFeature thys, const char* tag, const AjPStr value); /* Returns an iterator over all feature tag-value pairs */ AjIList ajFeatTagIter (const AjPFeature thys); /* Returns the tag-value pairs of a feature object */ AjBool ajFeatTagval (AjIList iter, AjPStr* tagnam, AjPStr* tagval); /* Traces (to the debug file) the tag-value pairs of a feature object */ void ajFeatTagTrace (const AjPFeature thys);
Functions for handling the FeattabIn
object are available but not covered here as you will not normally need to use this object.
Functions are available to examine complex feature locations to process joins and their child (exon) features. Functions are provided to test whether a feature has a remote id defined (the feature refers to another sequence) and to test the base range is withing the range required for processing or for output. The properties of features may be queried using the following:
/* Tests whether the feature is a child member of a join */ AjBool ajFeatIsChild (const AjPFeature gf); /* Tests whether the feature is a member of a complement around a multiple (join, etc.) */ AjBool ajFeatIsCompMult (const AjPFeature gf); /* Tests whether the feature is a member of a join, group order or one_of */ AjBool ajFeatIsMultiple (const AjPFeature gf); /* Checks whether the feature is in another (remote id) sequence */ AjBool ajFeatIsLocal (const AjPFeature gf); /* ... and tests the location is within a specified range */ AjBool ajFeatIsLocalRange (const AjPFeature gf, ajuint start, ajuint end);
The type (nucleotide or protein) of a feature table may be queried using:
/* Returns ajTrue if nucleotide */ AjBool ajFeattableIsNuc (const AjPFeattable thys); /* Returns ajTrue if protein */ AjBool ajFeattableIsProt (const AjPFeattable thys);
There are a couple of functions for processing features:
void ajFeatReverse (AjPFeature thys, ajint ilen) ; AjBool ajFeatTrimOffRange (AjPFeature ft, ajuint ioffset, ajuint begin, ajuint end, AjBool dobegin, AjBool doend);
ajFeatReverse
will reverse a feature by reversing all positions and strand data.
ajFeatTrimOffRange
trims a feature table using the specified begin
and end
values. ajFeatTrimOffRange
is called where a sequence has been trimmed, so it is necessary to specify any missing sequence positions at the start (ioffset
).
There are a few functions for processing whole feature tables. All features in a feature table may be reversed or trimmed using:
/* Reverse the features in a feature table by iterating through and reversing all positions and strands. */ void ajFeattableReverse (AjPFeattable thys) ; /* Trim a feature table using the Begin and Ends. */ AjBool ajFeattableTrimOff (AjPFeattable thys, ajuint ioffset, ajuint ilen);
There are functions to convert a position (start or end value) in a feature table into a true position in the source sequence, using any offset information from trimming the feature table within a set range:
ajuint ajFeattablePos (const AjPFeattable thys, ajint ipos); ajuint ajFeattablePosI (const AjPFeattable thys, ajuint imin, ajint ipos); ajuint ajFeattablePosII (ajuint ilen, ajuint imin, ajint ipos);
If ipos
is negative, it is counted from the end of the string rather than the beginning. For strings the result can go off the end to the terminating NULL
. For sequences the maximum is the last base.
Finally, features in a feature table may be sorted using:
/* End position */ void ajFeatSortByEnd (AjPFeattable Feattab); /* Start position */ void ajFeatSortByStart (AjPFeattable Feattab); /* Type */ void ajFeatSortByType (AjPFeattable Feattab);