EMBOSS has a wide range of string handling functions. All of them start with the prefix ajStr
and are defined in the ajstr.c
source file. You'll practice using a few of them here by writing a new application that prompts the user for one or more strings, processes them, and writes the results to a plain text output file. This builds upon the string handling code used when modifying helloworld.
To begin, create an ACD file for a new application. We'll call the latter stringplay
. The steps are as follows:
Create the ACD file myemboss/emboss_acd/stringplay.acd
Add an application:
definition with documentation:
and group:
attributes
Add one or more string:
definitions with appropriate default:
, information:
and help:
attributes
An output file is needed, so add an outfile
ACD data definition
Test the file by running acdc stringplay
Fix any warning or error messages generated by acdc
Run make install
(depending on your installation) in the emboss_acd
directory (or above) to install this file
The ACD file might look something like this:
application: stringplay [ documentation: "An application for experimenting with basic string handling." groups: "Test" ] string: astring [ parameter: "Y" default: "ParameterString" information: "First input string (parameter)" help: "This string is a parameter meaning you needn't specify the label (-astring) on the command line when specifying a value for it." ] string: bstring [ standard: "Y" default: "StandardString" information: "Second input string (standard qualifier)" help: "This string is a standard qualifier meaning it will be prompted for if not specified on the command line." ] string: cstring [ additional: "Y" default: "AdditionalString" information: "Third input string (additional qualifier)" help: "This string is a additional qualifier meaning it will only be prompted for if -options is given on the command line." ] string: dstring [ default: "AdvancedString" information: "Fourth input string (advanced qualifier)" help: "This string is an advanced qualifier meaning it will never be prompted for." ] outfile: outfile [ parameter: "Y" ]
When testing the ACD file, acdc will 'run' it exactly as if the application source code existed. It will prompt for any string inputs, assuming you defined them with parameter:
or standard:
attributes in the ACD file.
If you defined any strings with the additional:
attribute then they will be prompted for only if you specify -options
on the command line when you run stringplay. If you didn't specify parameter:
, standard:
or additional:
then they will default to being an advanced
qualifier and will never be prompted for.
Regardless of how you define the string inputs, they can be set on the command line by using e.g.:
-astring |
where -astring
is the label of the data definition and StringValue
is the value of the string to set.
If you define the strings with parameter:
then the flag (-astring
) is optional, although values for parameters must be given in the order they appear in the ACD file.
It is also usually sensible to define a default value with the default:
attribute. For additional
and advanced
qualifiers a default value is essential.
You should experiment by running acdc stringplay
using the above ACD file, specifying values on the command line with and without any data labels.
The C source code (myemboss/src/stringplay.c
) must declare variables to hold all the strings and the output file from the ACD file. It must also pick up the values from the ACD file. At a minimum, the application should write the strings to the output file. It should then close the output file and exit cleanly.
If you are using the stringplay.acd
shown above then the C source code should look something like this:
#include "emboss.h" int main (int argc, char* argv[]) { AjPStr astring = NULL; AjPStr bstring = NULL; AjPStr cstring = NULL; AjPStr dstring = NULL; AjPFile outf = NULL; embInitP("stringplay", argc, argv, "myemboss"); astring = ajAcdGetString("astring"); bstring = ajAcdGetString("bstring"); cstring = ajAcdGetString("cstring"); dstring = ajAcdGetString("dstring"); outf = ajAcdGetOutfile("outfile"); /* functional part of code would go here */ ajFmtPrintF(outf, "astring: %S\nbstring: %S\ncstring: %S\ndstring: %S\n", astring, bstring, cstring, dstring); ajFileClose(&outf); ajStrDel(&astring); ajStrDel(&bstring); ajStrDel(&cstring); ajStrDel(&dstring); embExit(); return 0; }
To compile your program give the following command from the myemboss
directory:
make |
And, if you are using myemboss as part of a fully installed EMBOSS system, install it using:
make install |
To test it you can use acdc stringplay
.
The EMBOSS string handling library is very extensive. You can inspect ajstr.h
and ajstr.c
to get a feel for what's available. The functions are organised into sections for convenience and these include assignment, combination, cut, substitution, query, element retrieval, formatting, comparison, and so on. Just a few functions from these categories are shown below:
/* A string "assignment" function to copy a string to a string. */ AjBool ajStrAssignS(AjPStr* Pstr, const AjPStr str); /* A string "combination" function which appends a strings */ AjBool ajStrAppendS(AjPStr* Pstr, const AjPStr str); /* A string "combination" function which inserts a text string into a string at a specified postion. */ AjBool ajStrInsertS (AjPStr* pthis, ajint pos, const AjPStr str); /* A string "cut" function which removes a substring from a string. */ AjBool ajStrCutRange(AjPStr* Pstr, ajint pos1, ajint pos2); /* A string "cut" function which removes the end from a string reducing it to a defined length. */ AjBool ajStrTruncateLen(AjPStr* Pstr, size_t len); /* A string "substitution" function which reverses the order of characters in a string. */ AjBool ajStrReverse(AjPStr* Pstr); /* A string "query" function which counts occurrences of a character in a string. */ ajint ajStrCalcCountK(const AjPStr str, char chr); /* A string "element retrieval" function which returns a single character at a given position from a string. */ char ajStrGetCharPos(const AjPStr str, ajint pos); /* A string "formatting" function which converts a string to upper case. */ AjBool ajStrFmtUpper(AjPStr* Pstr); /* A string "comparison" function which is a simple test for matching two strings. */ AjBool ajStrMatchS (const AjPStr thys, const AjPStr str); /* A string "comparison" to test for matching the start of a string against a given prefix string. */ AjBool ajStrPrefixS(const AjPStr str, const AjPStr str2); /* A string "comparison" function to find the first occurrence in a string of a second string. */ ajint ajStrFindS (const AjPStr str, const AjPStr str2);
Many of the functions return AjBool
. This might indicate whether the function was reallocated, was successful or not, whether a test was true or not and so on. You should refer to the documentation in ajstr.c
for the exact meaning.
AjPStr* Pstr
indicates a string argument that is (or might be) modified by the function. const
in front of an argument indicates a variable with a constant value. So here, const AjPStr str
indicates a string argument that is used but not modified by the function; the pointer cannot be reallocated by the function.
Here are a few examples to illustrate how some of the string functions are called. First, ajStrAssignS
is used to assign an existing string to a new string. The function should follow construction of newstr
. newstr
is modified which is why the address of newstr
is required:
ajStrAssignS(&newstr, astr);
Here ajStrFmtUpper
is used to convert a string to uppercase. Again, the address of the string is required:
ajStrFmtUpper(&astring);
Here two strings are compared and some action taken if they are identical:
if(ajStrMatchS(astring, bstring)) {/* Do something */}
It is good practice to explicitly construct (allocate memory for) each string before it is used, and destroy it (free memory) for the string once you are finished with it. A default constructor function (ajStrNew
) and destructor function (ajStrDel
) are provided for this purpose. Their use is shown below:
/* newstr is NULL, no memory is allocated */ AjPStr newstr=NULL; /* newstr now points to a string object in memory and is ready for use by our program */ newstr = ajStrNew(); /* memory for the string object is freed. The program can terminate without potentially leaking memory. */ ajStrDel(&newstr);
You do not need to call the constructor function for ACD variables because memory is allocated for these by the call to embInitP
. There is risk of memory error if you do. In the following rather contrived and bugged code the call to ajAcdGetString
will overwrite the pointer to memory allocated and returned by the (unnecessary) call to ajStrNewC
:
/* This code is bugged */ AjPStr astring=NULL; /* allocates memory for all ACD data items */ embInitP("stringplay", argc, argv, "myemboss"); /* newstr now points to a string object in memory */ astring = ajStrNewC("Hello"); /* this call will break the handle to the memory allocated by ajStrNewC */ astring = ajAcdGetString("astring");
The correct code is:
AjPStr astring=NULL; /* allocates memory for all ACD data items */ embInitP("stringplay", argc, argv, "myemboss"); /* Get a string value from ACD */ astring = ajAcdGetString("astring"); /* memory for the string object is freed. The program can terminate without potentially leaking memory. */ ajStrDel(&astring);
A string object was created by embInitP
and must be freed once you are done with it. That is what the call ajStrDel(&astring);
is for.