Here are a few simple examples that will help to get you to get started with wEMBOSS.
Start up your Web browser and surf to a site where you have access to wEMBOSS. Click on the button Start wEMBOSS
and type in your username and password. When you enter in wEMBOSS for the first time, you will get a warning message that prompts you to create a "project". Click on the button New project
and create a project named exercises
.
Before you can run a program on a sequence, you must first write it into a file at the side of the server. Click on the button New file
and note how a text box opens at the bottom of the page. Type in a few lines of text consisting just of the letters A
, C
, G
and T
. Finally click Save as
, type in the name myseq
and click OK
.
At the left of the page you can find a menu with program names. Click on ALPHABETIC LIST OF PROGRAMS
and on compseq
. You will get access to a panel that allows you to configure and run the program compseq. compseq is a simple program that just computes the oligonucleotide composition of a nucleic acid (or the oligopeptide composition of a protein).
At the top you can select the sequence to be used by the program. Click from the EMBOSS databases or a current project file
and type myseq
into the filename or USA
box. If you click elsewhere in the page, you will see that the page is redrawn and information about the length of the sequence appears. Click on the button Run compseq
(at the top or at the bottom of the page). You will note that, quite quickly, a program output page with the result will appear.
If you click on the PM
button at the top you return to the Project Management page. Note at the right a list of PROJECT RESULTS
. You can always make your result appear again by clicking on compseq
.
As example we will run the program plotorf, which searches for open reading frames, on a fragment of the E. coli genome containing the ompA
gene. This sequence can be found in EMBL/GenBank/DDBJ
with the accession number V00307
. You can convince yourself of this by searching with Entrez, an SRS server, an MRS server or whatever databank searching tool you've got, for "escherichia ompa".
First you should find out whether the EMBL databank is available in your wEMBOSS server. Run the program showdb. Then, in the Project Management page select nucList
and click Edit
. A text area will appear at the bottom of the page. You will note that the text already reads :
#nucleics of exercises
Edit this so that it becomes :
#nucleics of exercises embl:V00307 myseq
If the EMBL is not available at your site you might have to alternatively type genbank:V00307
or ebi_embl:V00307
or some other promising database name the server provides. If you do not have any suitable databank installed you will have to retrieve the sequence from a public site and then save it into a file as you did with myseq
in the previous exercise. Take the opportunity to add myseq
to the file as shown above. When you have finished editing, do not forget to Save as
. Incidentally, note that you could have bypassed the nucList editing step by checking add filename to nucList
before you saved myseq during exercise 1.
Go to the Program Page for plotorf. You can find it in the alphabetic list and also in NUCLEIC
/GENE FINDING
. Note that now you have a from the sequence selector (nucList or protList)
selector with the sequence names you typed into nucList
. Select embl:V00307
and run the program.
You will obtain a program output page with a graphic. You can now try to recover this graphic. Click, using the right button of the mouse, on default.1.png
to pop up a menu from which you should choose Save Target as
. You will get a file browser with the option to save the file as catch.png
; you can choose a more sensible name instead. Finally, open MS Word or a similar program and import the file.
As example we will make a multiple sequence alignment and a phylogenetic tree of 4 proteases from the latex of the papaya tree. Using your experience from exercise 1, use the New file
function to make a file called PapList
with the content:
swissprot:papa1_carpa swissprot:papa2_carpa swissprot:papa3_carpa swissprot:papa4_carpa
Of course, first convince yourself that the SwissProt is available on your server. Then run emma on list::PapList
(do not forget the list specification! You can eventually add list::PapList
to protList
, so that it appears in the selector). Before you start emma, select GCG MSF
in the File format for output sequence set
selector near the bottom of the page.
You will obtain two output files: papa1_carpa.aln
with a multiple sequence alignment in GCG MSF format and papa1_carpa.dnd
with a phylogenetic tree (actually the "guide tree" of the CLUSTAL program) in nested parentheses format. If you click with the left button of the mouse on right click to save locally
near papa1_carpa.dnd
you will get a pop up window with an invitation to save the file locally or open it with some local software. What works will depend very much on what is installed on your own computer. For opening a tree file we can recommend TreeView (http://darwin.zoology.gla.ac.uk/%7Erpage/treeviewx/download.html). Opening the alignment will not work quite like that with the current version of emma because it has a name ending with .aln
whereas wEMBOSS wants .msf
. It would work if you had on your server, instead of emma, the clustal program from the wrappers4EMBOSS suite, which is distributed together with wEMBOSS. For viewing alignments we recommend, if you have a PC with Windows, GeneDoc (ftp://ftp.psc.edu/biomed/genedoc/
), otherwise we recommend SeqPup (http://iubio.bio.indiana.edu/soft/molbio/seqpup/java/).
Go to the Project Management page. In the right hand part of the page click on the Files
button next to emma
. Click OK
at the bottom right of the page. You will note that the names papa1_carpa.aln
and papa1_carpa.dnd
appear in the list with data files. You could instead copy the files, using a name of your choice, by selecting just one file, typing a name into the renamed
box and then repeating the procedure for the second file. You could e.g. save papa1_carpa.aln
as papaya_proteases.msf
, enabling you to use the View
button to open it with GeneDoc or SeqPup.
Select papa1_carpa.dnd
(or whatever it is named) in the list with data files, select ATV
in the selector near View with
and click on View with
. After a while an applet displaying the tree should appear. You can use the menu option Help
to get a short list of the things you can do with the applet ATV.
Note that it is not recommended to use papa1_carpa.dnd
as a phylogenetic tree. In order to obtain a real tree you can give the file papa1_carpa.aln
as an input to another program. If the Embassy package PHYLIP is installed on your server, try fproml
. You can, if you like, use the Phylip tree file (optional)
/ from project(s) data
selector to force fproml to evaluate the tree in papa1_carpa.dnd
rather than to search for the best tree.
For a more realistic data set you could repeat the exercise with a set of breast cancer type 1 susceptibility proteins. Just run emma on swissprot:brca1_*
and proceed as before.