2.5. String Handling

EMBOSS has a wide range of string handling functions. All of them start with the prefix ajStr and are defined in the ajstr.c source file. You'll practice using a few of them here by writing a new application that prompts the user for one or more strings, processes them, and writes the results to a plain text output file. This builds upon the string handling code used when modifying helloworld.

2.5.1. stringplay.acd

To begin, create an ACD file for a new application. We'll call the latter stringplay. The steps are as follows:

  • Create the ACD file myemboss/emboss_acd/stringplay.acd

  • Add an application: definition with documentation: and group: attributes

  • Add one or more string: definitions with appropriate default:, information: and help: attributes

  • An output file is needed, so add an outfile ACD data definition

  • Test the file by running acdc stringplay

  • Fix any warning or error messages generated by acdc

  • Run make install (depending on your installation) in the emboss_acd directory (or above) to install this file

The ACD file might look something like this:

application: stringplay
[
  documentation: "An application for experimenting with basic string handling."
  groups: "Test"
]

string: astring
[  
  parameter: "Y"
  default: "ParameterString"
  information: "First input string (parameter)"
  help: "This string is a parameter meaning you needn't specify the label (-astring) on the command line when specifying a value for it."
]

string: bstring
[  
  standard: "Y"
  default: "StandardString"
  information: "Second input string (standard qualifier)"
  help: "This string is a standard qualifier meaning it will be prompted for if not specified on the command line."
]

string: cstring
[  
  additional: "Y"
  default: "AdditionalString"
  information: "Third input string (additional qualifier)"
  help: "This string is a additional qualifier meaning it will only be prompted for if -options is given on the command line."
]

string: dstring
[  
  default: "AdvancedString"
  information: "Fourth input string (advanced qualifier)"
  help: "This string is an advanced qualifier meaning it will never be prompted for."
]

outfile: outfile
[
  parameter: "Y"
]

When testing the ACD file, acdc will 'run' it exactly as if the application source code existed. It will prompt for any string inputs, assuming you defined them with parameter: or standard: attributes in the ACD file.

If you defined any strings with the additional: attribute then they will be prompted for only if you specify -options on the command line when you run stringplay. If you didn't specify parameter:, standard: or additional: then they will default to being an advanced qualifier and will never be prompted for.

Regardless of how you define the string inputs, they can be set on the command line by using e.g.:

-astring StringValue

where -astring is the label of the data definition and StringValue is the value of the string to set.

If you define the strings with parameter: then the flag (-astring) is optional, although values for parameters must be given in the order they appear in the ACD file.

It is also usually sensible to define a default value with the default: attribute. For additional and advanced qualifiers a default value is essential.

You should experiment by running acdc stringplay using the above ACD file, specifying values on the command line with and without any data labels.

2.5.2. stringplay.c

The C source code (myemboss/src/stringplay.c) must declare variables to hold all the strings and the output file from the ACD file. It must also pick up the values from the ACD file. At a minimum, the application should write the strings to the output file. It should then close the output file and exit cleanly.

If you are using the stringplay.acd shown above then the C source code should look something like this:

#include "emboss.h"
int main (int argc, char* argv[])
{
  AjPStr  astring = NULL;
  AjPStr  bstring = NULL;
  AjPStr  cstring = NULL;
  AjPStr  dstring = NULL;
  AjPFile outf    = NULL;

  embInitP("stringplay", argc, argv, "myemboss");
  astring = ajAcdGetString("astring");
  bstring = ajAcdGetString("bstring");
  cstring = ajAcdGetString("cstring");
  dstring = ajAcdGetString("dstring");
  outf    = ajAcdGetOutfile("outfile");

  /* functional part of code would go here */
  ajFmtPrintF(outf, "astring: %S\nbstring: %S\ncstring: %S\ndstring: %S\n",
              astring, bstring, cstring, dstring);

  ajFileClose(&outf);
  ajStrDel(&astring);
  ajStrDel(&bstring);
  ajStrDel(&cstring);
  ajStrDel(&dstring);

  embExit();
  return 0;
}

2.5.3. Compilation and Testing

To compile your program give the following command from the myemboss directory:

make

And, if you are using myemboss as part of a fully installed EMBOSS system, install it using:

make install

To test it you can use acdc stringplay.

2.5.4. Adding Functionality

The EMBOSS string handling library is very extensive. You can inspect ajstr.h and ajstr.c to get a feel for what's available. The functions are organised into sections for convenience and these include assignment, combination, cut, substitution, query, element retrieval, formatting, comparison, and so on. Just a few functions from these categories are shown below:

/* A string "assignment" function to copy a string to a string. */
AjBool ajStrAssignS(AjPStr* Pstr, const AjPStr str);

/* A string "combination" function which appends a strings */
AjBool ajStrAppendS(AjPStr* Pstr, const AjPStr str);

/* A string "combination" function which inserts a text string into a string at a specified postion. */
AjBool ajStrInsertS (AjPStr* pthis, ajint pos, const AjPStr str);

/* A string "cut" function which removes a substring from a string. */
AjBool ajStrCutRange(AjPStr* Pstr, ajint pos1, ajint pos2);

/* A string "cut" function which removes the end from a string reducing it to a defined length. */
AjBool ajStrTruncateLen(AjPStr* Pstr, size_t len);

/* A string "substitution" function which reverses the order of characters in a string. */
AjBool ajStrReverse(AjPStr* Pstr);

/* A string "query" function which counts occurrences of a character in a string. */
ajint ajStrCalcCountK(const AjPStr str, char chr);

/* A string "element retrieval" function which returns a single character at a given position from a string. */
char ajStrGetCharPos(const AjPStr str, ajint pos);

/* A string "formatting" function which converts a string to upper case. */
AjBool ajStrFmtUpper(AjPStr* Pstr);

/* A string "comparison" function which is a simple test for matching two strings. */
AjBool ajStrMatchS (const AjPStr thys, const AjPStr str);

/* A string "comparison" to test for matching the start of a string against a given prefix string. */
AjBool ajStrPrefixS(const AjPStr str, const AjPStr str2);

/* A string "comparison" function to find the first occurrence in a string of a second string. */
ajint ajStrFindS (const AjPStr str, const AjPStr str2);

Many of the functions return AjBool. This might indicate whether the function was reallocated, was successful or not, whether a test was true or not and so on. You should refer to the documentation in ajstr.c for the exact meaning.

AjPStr* Pstr indicates a string argument that is (or might be) modified by the function. const in front of an argument indicates a variable with a constant value. So here, const AjPStr str indicates a string argument that is used but not modified by the function; the pointer cannot be reallocated by the function.

Here are a few examples to illustrate how some of the string functions are called. First, ajStrAssignS is used to assign an existing string to a new string. The function should follow construction of newstr. newstr is modified which is why the address of newstr is required:

ajStrAssignS(&newstr, astr);

Here ajStrFmtUpper is used to convert a string to uppercase. Again, the address of the string is required:

ajStrFmtUpper(&astring);

Here two strings are compared and some action taken if they are identical:

if(ajStrMatchS(astring, bstring))
{/* Do something */}

2.5.4.1. String Memory Management

It is good practice to explicitly construct (allocate memory for) each string before it is used, and destroy it (free memory) for the string once you are finished with it. A default constructor function (ajStrNew) and destructor function (ajStrDel) are provided for this purpose. Their use is shown below:

/* newstr is NULL, no memory is allocated */
AjPStr newstr=NULL;  

/* newstr now points to a string object in memory and is ready for use by our program */
newstr = ajStrNew(); 

/* memory for the string object is freed.  The program can terminate without potentially leaking memory. */
ajStrDel(&newstr);

You do not need to call the constructor function for ACD variables because memory is allocated for these by the call to embInitP. There is risk of memory error if you do. In the following rather contrived and bugged code the call to ajAcdGetString will overwrite the pointer to memory allocated and returned by the (unnecessary) call to ajStrNewC:

/* This code is bugged */
AjPStr astring=NULL;  

/* allocates memory for all ACD data items */
embInitP("stringplay", argc, argv, "myemboss");

/* newstr now points to a string object in memory */
astring = ajStrNewC("Hello"); 

/* this call will break the handle to the memory allocated by ajStrNewC */
astring = ajAcdGetString("astring");

The correct code is:

AjPStr astring=NULL;  

/* allocates memory for all ACD data items */
embInitP("stringplay", argc, argv, "myemboss");

/* Get a string value from ACD */
astring = ajAcdGetString("astring");

/* memory for the string object is freed.  The program can terminate without potentially leaking memory. */
ajStrDel(&astring);

A string object was created by embInitP and must be freed once you are done with it. That is what the call ajStrDel(&astring); is for.