10.10. HMMER port

10.10.1. HMMER Port: ohmmalign.acd

The ACD file is essentially the same as in the wrapper apart from a few changes resulting from differences in options between the two releases. One thing to point out is in the output section. In the port the alignment output is handled by an outfile ACD data item. The align datatype that the wrapper used wasn't available back in 2001.

section: output 
    information: "Output section"
    type: "page"

outfile: outfile  
    parameter: "Y"
    knowntype: "hmmalign output"

endsection: output

10.10.2. HMMER Port: ohmmalign.c Heading Code

This shows the top of the C source code. You can see straight away that there's an amalgamation of the original HMMER code and additions for EMBOSS. For example the EMBOSS interface is imported before the first include directive.

 * HMMER - Biological sequence analysis with profile HMMs
 * Copyright (C) 1992-1998 Washington University School of Medicine
 * All Rights Reserved
 *     This source code is distributed under the terms of the
 *     GNU General Public License. See the files COPYING and LICENSE
 *     for details.

/* hmmalign.c
 * SRE, Thu Dec 18 16:05:29 1997 [St. Louis]
 * main() for aligning a set of sequences to an HMM.
 * RCS $Id: ch10s10.html,v 2010/09/06 12:54:15 ajb Exp $
 * Modified for EMBOSS by Alan Bleasby (ISMB 2001)

#include "emboss.h"
#include <stdio.h>
#include <stdlib.h>

#include "structs.h"            /* data structures, macros, #define's   */
#include "config.h"             /* compile-time configuration constants */
#include "funcs.h"              /* function declarations                */
#include "globals.h"            /* alphabet global variables            */
#include "squid.h"              /* general sequence analysis library    */

#include "dbmalloc.h"

extern void emboss_rseqs(AjPSeqset seqset, char ***ret_rseqs,SQINFO **ret_sqinfo,int *ret_num);

static void include_alignment(char *seqfile, struct plan7_s *hmm, int do_mapped, char ***rseq, char ***dsq, SQINFO **sqinfo, struct p7trace_s ***tr, int *nseq); main() Function Variable Declarations

The main() function and the variable declarations are shown here. So far as possible the original code is left unchanged, but it's necessary to add variables for AJAX-specific types to hold values to be retrieved from the ACD file. These are set to NULL as before.

int main(int argc, char **argv) 
  const char      *hmmfile;     /* file to read HMMs from                  */
  HMMFILE         *hmmfp;       /* opened hmmfile for reading              */
  const char      *seqfile;     /* file to read target sequence from       */ 
  char           **rseq;        /* raw, unaligned sequences                */ 
  SQINFO          *sqinfo;      /* info associated with sequences          */
  char           **dsq;         /* digitized raw sequences                 */
  int              nseq;        /* number of sequences                     */  
  char           **aseq;        /* aligned sequences                       */
  AINFO            ainfo;       /* alignment information                   */
  float           *wgt;         /* per-sequence weights                    */
  int              i;
  struct plan7_s    *hmm;       /* HMM to align to                         */ 
  struct p7trace_s **tr;        /* traces for aligned sequences            */

  int   be_quiet;               /* TRUE to suppress verbose banner          */
  int   matchonly;              /* TRUE to show only match state syms       */
  const char *outfile;          /* optional alignment output file           */
  FILE *ofp;                    /* handle on alignment output file          */
  AjPFile ajwithali;            /* name of additional alignment file to align */
  AjPFile ajmapali;             /* name of additional alignment file to map   */
  AjBool ajmatch    = ajFalse;
  AjPFile outf      = NULL;
  AjPStr  outfname  = NULL;
  AjPFile inf       = NULL;
  AjPStr  infname   = NULL;
  AjPSeqset seqset  = NULL;
  AjPStr  ajseqfile = NULL;
  char*  mapali     = NULL;
  char*  withali    = NULL; ACD File Processing

This shows the code for retrieving the values from ACD. A few housekeeping variables are needed for that. mapali and withali are C type strings (char *) used to hold the names of the EMBOSS input files. The names are retrieved by a call to ajFileGetNameS. The files are then closed, leaving them ready for HMMER. A similar thing is done for the other HMMER input and output files:

   * Parse command line
  matchonly = FALSE;
  outfile   = NULL;
  be_quiet  = FALSE;
  withali   = NULL;
  mapali    = NULL;


  ajmatch = ajAcdGetBoolean("matchonly");


  ajmapali = ajAcdGetInfile("mapalifile");

  if (ajmapali)
      mapali = ajCharNewS(ajFileGetNameS(ajmapali));

  ajwithali = ajAcdGetInfile("withalifile");

  if (ajwithali)
      withali = ajCharNewS(ajFileGetNameS(ajwithali));



  outf = ajAcdGetOutfile("outfile");
  outfname = ajStrNewC((char *)ajFileGetNameC(outf));


  outfile = ajStrGetPtr(outfname);

  inf = ajAcdGetInfile("hmmfile");
  infname = ajStrNewC((char *)ajFileGetNameC(inf));
  hmmfile = ajStrGetPtr(infname);

  seqset = ajAcdGetSeqset("sequences");
  ajseqfile = ajStrNewC(ajStrGetPtr(seqset->Filename));
  seqfile = ajStrGetPtr(ajseqfile); Handling Input Data

The code for managing the data input is here. You can see that the housekeeping variables used for holding the ACD values are passed into the HMMER functions. The only thing to point out here is that exception handling is dealt with by calls to the EMBOSS function ajFatal. Clearly it's necessary to work through the original source code when writing a port:

  * Open HMM file (might be in HMMERDB or current directory).
  * Read a single HMM from it.
  * Currently hmmalign disallows the J state and
  * only allows one domain per sequence. To preserve
  * the S/W entry information, the J state is explicitly
  * disallowed, rather than calling a Plan7*Config() function.
  * this is a workaround in 2.1 for the 2.0.x "yo!" bug.
  if ((hmmfp = HMMFileOpen(hmmfile, "HMMERDB")) == NULL)
    ajFatal("Failed to open HMM file %s\n", hmmfile);

  if (!HMMFileRead(hmmfp, &hmm)) 
    ajFatal("Failed to read any HMMs from %s\n", hmmfile);


  if (hmm == NULL) 
    ajFatal("HMM file %s corrupt or in incorrect format? Parse failed", hmmfile);

  hmm->xt[XTE][MOVE] = 1.;               /* only 1 domain/sequence ("global" alignment) */
  hmm->xt[XTE][LOOP] = 0.;
  P7Logoddsify(hmm, TRUE);

  /* do we have the map we might need? */
  if (mapali != NULL && ! (hmm->flags & PLAN7_MAP))
    ajFatal("HMMER: HMM file %s has no map; you can't use --mapali.", hmmfile);

   * Open sequence file in current directory.
   * Read all seqs from it.
  if (! SeqfileFormat(seqfile, &format, NULL))
    switch (squid_errno) {
    case SQERR_NOFILE: 
      ajFatal("Sequence file %s could not be opened for reading", seqfile);
    case SQERR_FORMAT: 
      ajFatal("Failed to determine format of sequence file %s", seqfile);

  if (! ReadMultipleRseqs(seqfile, format, &rseq, &sqinfo, &nseq))
    ajFatal("Failed to read any sequences from file %s", seqfile);

 emboss_rseqs(seqset,&rseq,&sqinfo,&nseq); Exiting Cleanly

There's then lots of native code which isn't shown. Finally the program must exit with a call to embExit():

if (outfile != NULL && (ofp = fopen(outfile, "w")) != NULL)
      WriteSELEX(ofp, aseq, &ainfo, 50);
      printf("Alignment saved in file %s\n", outfile);
    WriteSELEX(stdout, aseq, &ainfo, 50);

   * Cleanup and exit
  for (i = 0; i < nseq; i++) 
      FreeSequence(rseq[i], &(sqinfo[i]));
  FreeAlignment(aseq, &ainfo);



  current_size = malloc_inuse(&histid2);
  if (current_size != orig_size) malloc_list(2, histid1, histid2);
  else fprintf(stderr, "[No memory leaks.]\n");

  return 0;