8.7. wEMBOSS Tutorial

Here are a few simple examples that will help to get you to get started with wEMBOSS.

8.7.1. Exercise : Starting up wEMBOSS, creating a "project", running a program

Start up your Web browser and surf to a site where you have access to wEMBOSS. Click on the button Start wEMBOSS and type in your username and password. When you enter in wEMBOSS for the first time, you will get a warning message that prompts you to create a "project". Click on the button New project and create a project named exercises.

Before you can run a program on a sequence, you must first write it into a file at the side of the server. Click on the button New file and note how a text box opens at the bottom of the page. Type in a few lines of text consisting just of the letters A, C, G and T. Finally click Save as, type in the name myseq and click OK.

Editing

At the left of the page you can find a menu with program names. Click on ALPHABETIC LIST OF PROGRAMS and on compseq. You will get access to a panel that allows you to configure and run the program compseq. compseq is a simple program that just computes the oligonucleotide composition of a nucleic acid (or the oligopeptide composition of a protein).

At the top you can select the sequence to be used by the program. Click from the EMBOSS databases or a current project file and type myseq into the filename or USA box. If you click elsewhere in the page, you will see that the page is redrawn and information about the length of the sequence appears. Click on the button Run compseq (at the top or at the bottom of the page). You will note that, quite quickly, a program output page with the result will appear.

If you click on the PM button at the top you return to the Project Management page. Note at the right a list of PROJECT RESULTS. You can always make your result appear again by clicking on compseq.

8.7.2. Exercise : Accessing "public" databanks, using the sequence selectors, managing graphical output

As example we will run the program plotorf, which searches for open reading frames, on a fragment of the E. coli genome containing the ompA gene. This sequence can be found in EMBL/GenBank/DDBJ with the accession number V00307. You can convince yourself of this by searching with Entrez, an SRS server, an MRS server or whatever databank searching tool you've got, for "escherichia ompa".

First you should find out whether the EMBL databank is available in your wEMBOSS server. Run the program showdb. Then, in the Project Management page select nucList and click Edit. A text area will appear at the bottom of the page. You will note that the text already reads :

#nucleics of exercises

Edit this so that it becomes :

#nucleics of exercises
embl:V00307
myseq

If the EMBL is not available at your site you might have to alternatively type genbank:V00307 or ebi_embl:V00307 or some other promising database name the server provides. If you do not have any suitable databank installed you will have to retrieve the sequence from a public site and then save it into a file as you did with myseq in the previous exercise. Take the opportunity to add myseq to the file as shown above. When you have finished editing, do not forget to Save as. Incidentally, note that you could have bypassed the nucList editing step by checking add filename to nucList before you saved myseq during exercise 1.

Go to the Program Page for plotorf. You can find it in the alphabetic list and also in NUCLEIC/GENE FINDING. Note that now you have a from the sequence selector (nucList or protList) selector with the sequence names you typed into nucList. Select embl:V00307 and run the program.

You will obtain a program output page with a graphic. You can now try to recover this graphic. Click, using the right button of the mouse, on default.1.png to pop up a menu from which you should choose Save Target as. You will get a file browser with the option to save the file as catch.png; you can choose a more sensible name instead. Finally, open MS Word or a similar program and import the file.

8.7.3. Exercise : running a program on multiple sequences, using the output of one program as input of another, using plug-ins and applets

As example we will make a multiple sequence alignment and a phylogenetic tree of 4 proteases from the latex of the papaya tree. Using your experience from exercise 1, use the New file function to make a file called PapList with the content:

swissprot:papa1_carpa
swissprot:papa2_carpa
swissprot:papa3_carpa
swissprot:papa4_carpa

Of course, first convince yourself that the SwissProt is available on your server. Then run emma on list::PapList (do not forget the list specification! You can eventually add list::PapList to protList, so that it appears in the selector). Before you start emma, select GCG MSF in the File format for output sequence set selector near the bottom of the page.

You will obtain two output files: papa1_carpa.aln with a multiple sequence alignment in GCG MSF format and papa1_carpa.dnd with a phylogenetic tree (actually the "guide tree" of the CLUSTAL program) in nested parentheses format. If you click with the left button of the mouse on right click to save locally near papa1_carpa.dnd you will get a pop up window with an invitation to save the file locally or open it with some local software. What works will depend very much on what is installed on your own computer. For opening a tree file we can recommend TreeView (http://darwin.zoology.gla.ac.uk/%7Erpage/treeviewx/download.html). Opening the alignment will not work quite like that with the current version of emma because it has a name ending with .aln whereas wEMBOSS wants .msf. It would work if you had on your server, instead of emma, the clustal program from the wrappers4EMBOSS suite, which is distributed together with wEMBOSS. For viewing alignments we recommend, if you have a PC with Windows, GeneDoc (ftp://ftp.psc.edu/biomed/genedoc/), otherwise we recommend SeqPup (http://iubio.bio.indiana.edu/soft/molbio/seqpup/java/).

Go to the Project Management page. In the right hand part of the page click on the Files button next to emma. Click OK at the bottom right of the page. You will note that the names papa1_carpa.aln and papa1_carpa.dnd appear in the list with data files. You could instead copy the files, using a name of your choice, by selecting just one file, typing a name into the renamed box and then repeating the procedure for the second file. You could e.g. save papa1_carpa.aln as papaya_proteases.msf, enabling you to use the View button to open it with GeneDoc or SeqPup.

Copying

Select papa1_carpa.dnd (or whatever it is named) in the list with data files, select ATV in the selector near View with and click on View with. After a while an applet displaying the tree should appear. You can use the menu option Help to get a short list of the things you can do with the applet ATV.

Note that it is not recommended to use papa1_carpa.dnd as a phylogenetic tree. In order to obtain a real tree you can give the file papa1_carpa.aln as an input to another program. If the Embassy package PHYLIP is installed on your server, try fproml. You can, if you like, use the Phylip tree file (optional) / from project(s) data selector to force fproml to evaluate the tree in papa1_carpa.dnd rather than to search for the best tree.

For a more realistic data set you could repeat the exercise with a set of breast cancer type 1 susceptibility proteins. Just run emma on swissprot:brca1_* and proceed as before.