A hash table is a data structure holding key:value pairs which allows for very efficient retrieval of a value given a key. For example, a key might be the accession number of a sequence and a value might be the sequence itself. Hash tables work by translating the key into a number called a "hash" using a hashing function. The hash then provides the index of an array element or "bucket" in the data structure holding the value. Hash tables have the great advantage that the search time is independent of the number of items stored, and that new key:value pairs can be added efficiently.
EMBOSS supports a standard hash table using strings for keys and generic hash tables with keys of arbitrary datatype. All the basic functionality you would expect for creating, manipulating and using hash tables is provided. This includes:
Hash functions for the construction of string hash functions
Functions to add and remove key:value pairs or change the value associated with a key
Functions to return the true key or value associated with a given key
Ability to apply a function to each key:value pair
Printing a string hash table
There is no ACD datatype for hash tables, they are always created directly from within the C source code.
AJAX library files for handling hash tables are listed in the table (Table 6.31, “AJAX Library Files for Handling Tables”). Library file documentation, including a complete description of datatypes and functions with usage notes is available at:
http://emboss.open-bio.org/rel/dev/libs/ |
Library File Documentation | Description |
---|---|
ajtable | Hash table functions |
ajtable.h/c
. Defines the hash table object (AjPTable
) and functions for handling hash tables. They also contains static functions for handling same at a low level. You are unlikely to need the static functions unless you plan to extend the hash table code.
For handling tables use:
AjPTable
Hash table - a set of key/value pairs using a simple hash function to provide rapid searching for keys. Tables can hold any data type.
To use a hash table object you must first instantiate the appropriate object pointer. Default constructor functions are provided for generic hash tables and EMBOSS string (AjPStr
) hash tables:
/* Generic hash table */ AjPTable ajTableNewLen(ajuint size) /* Generic hash table with compare and hash functions */ AjPTable ajTableNewFunctionLen (ajint n, ajint cmp(const void* x, const void* y), unsigned hash(const void* key, unsigned hashsize)); /* String hash table */ AjPTable ajTablestrNew (ajint n); AjPTable ajTablestrNewLen (ajint n); AjPTable ajTablestrNewCaseLen (ajint n);
Both functions will create, initialise and return a new empty hash table that can hold an arbitrary number of key:value pairs. For a generic hash table, created with ajTableNewFunctionLen
, an estimate of the number of unique keys (n
) is passed, and functions for comparison (cmp
) and for hashing the keys (hash
) should be provided. If either of the cmp
or hash
arguments are NULL
then it will use a default general-purpose hash function. The ajTableNewLen
uses the general-purpose functions. Note that these cannot inspect the data, and can only use the addresses or values provided.
All constructors return the address of a new object. In the following code comp
is the comparison function and hash
is the hash function. An initial size of 50 is used. The pointer does not need to be initialised to NULL
but it is good practice to do so:
AjPTable table = NULL; table = ajTableNewFunctionLen(50, comp, hash); /* The table is instantiated and ready for use */
The comparison function must return the result of comparing two keys as an integer which is 0
if the comparison is equal, +1
if the x
is higher than y
or -1
otherwise. The hash function must return the hash value for a given key.
Assume that you want a hash table with a case-insensitive C-type character string (char *
) key. The comparison function may then be defined as ajCharCmpCase
(available from ajstr.h
) which returns the sort order of two text strings:
ajint comp(const void* x, const void* y) { const char* sx; const char* sy; sx = (const char*) x; sy = (const char*) y; return (ajint) ajCharCmpCase(sx, sy); }
The following hash function may be used:
ajuint hash(const void* key, ajuint hashsize) { ajuint hash; const char* s; s = (const char*) key; for(hash = 0; *s; s++) hash = (hash * 127 + toupper((ajint)*s)) % hashsize; return hash; }
You must free the memory for an object once your are finished with it. A default destructor function is provided for generic and string hash tables:
void ajTableFree (AjPTable* Ptable); void ajTablestrFree (AjPTable* Ptable);
It is the responsibility of the calling function to destroy any objects once they are finished with:
AjPTable table = NULL; table = ajTableNewFunctionLen(50, comp, hash); /* Do something with the instantiated table */ ajTableFree(&table); /* The memory is freed and the pointer reset to NULL, ready for re-use. */ table = ajTablestrNew(50); /* The pointer variable is reallocated. Do something else with the new table. */ ajTableFree(&table); /* Done with the object so the memory is freed. */
A few table hash functions used during hash table construction (see above) are provided for convenience for string tables only. These provide case-sensitive and case-insensitive hashing for hash tables of EMBOSS strings (AjPStr
) and C-type (char *
) strings:
/* Case-insensitive hashing of EMBOSS strings */ unsigned ajTablestrHash (const void* key, unsigned hashsize); /* Case-sensitive hashing of EMBOSS strings */ unsigned ajTablestrHashCase (const void* key, unsigned hashsize);
A few table comparison functions used during hash table construction (see above) are provided for convenience for string tables only. These provide case-sensitive and case-insensitive comparison for hash tables of EMBOSS strings (AjPStr
):
/* Case-sensitive comparison of EMBOSS strings */ ajint ajTablestrCmp (const void* ptrx, const void* ptry); /* Case-insensitive comparison of EMBOSS strings */ ajint ajTablestrCmpCase (const void* ptrx, const void* ptry);
Key:value pairs may be added to a table, removed from a table or the value associated with a key may be changed.
ajTablePut
is used to set the value associated with a given key in a hash table. It will return the previous value or, if the table does not hold the specified key, it will add the key and value and return NULL
.
void* ajTablePut (AjPTable table, const void* key, void* value);
ajTableRemove
will remove the key:value pair from a table and returns the removed value. If the table does not hold the specified key then the function merely returns NULL
:
void* ajTableRemove (AjPTable table, const void* key);
The true key or value associated with a given key may be returned by using:
/* Return the value */ void* ajTableFetch (const AjPTable table, const void* key); /* Return the key */ void* ajTableFetchKey (const AjPTable table, const void* key);
ajTableFetch
is the standard function that returns a value given a query key.
ajTableFetchKey
is provided to return the true key (with any capitalisation) in cases where a case-insensitive hash table is used.
To return the number of key:value pairs in a hash table use:
ajint ajTableGetLength (const AjPTable table);
It is possible to apply a function to each key:value in a hash table using:
void ajTableMap (AjPTable table, void apply(const void* key, void** Pvalue, void* ptr), void* ptr); void ajTableMapDel (AjPTable table, void apply(const void** Pkey, void** Pvalue, void* ptr), void* ptr);
The function called by ajTableMap
should not delete any keys (although values can be modified) whereas the function called by ajTableMapDel
may delete keys. In both cases a function must be provided.
Assume that you have a hash table with EMBOSS string keys and integer values and that you want to apply one function to print the key:value pair and another to free a key:value-pair.
The print functions will look something like:
static void print(const void* key, void** value, void* cl) { AjPStr keystr; ajint *valptr; keystr = (AjPStr) key; valptr = (ajint *) *value; ajUser("type '%S' found %d times", keystr, *valptr); return; }
whereas the delete function would be:
static void delete(const void** key, void** value, void* cl) { AjPStr keystr; ajint *valptr; keystr = (AjPStr) *key; valptr = (ajint *) *value; ajStrDel(&keystr); AJFREE(valptr); *key = NULL; *value = NULL; return; }
To call the functions you would use something like:
AjPTable table = NULL; /* Allocate table */ ajTableMap(table, print, NULL); ajTableMapDel(table, delete, NULL);