Here are a few simple examples that will help to get you to get started with wEMBOSS.
Start up your Web browser and surf to a site where you have access to wEMBOSS. Click on the button
Start wEMBOSS and type in your username and password. When you enter in wEMBOSS for the first time, you will get a warning message that prompts you to create a "project". Click on the button
New project and create a project named
Before you can run a program on a sequence, you must first write it into a file at the side of the server. Click on the button
New file and note how a text box opens at the bottom of the page. Type in a few lines of text consisting just of the letters
T. Finally click
Save as, type in the name
myseq and click
At the left of the page you can find a menu with program names. Click on
ALPHABETIC LIST OF PROGRAMS and on
compseq. You will get access to a panel that allows you to configure and run the program compseq. compseq is a simple program that just computes the oligonucleotide composition of a nucleic acid (or the oligopeptide composition of a protein).
At the top you can select the sequence to be used by the program. Click
from the EMBOSS databases or a current project file and type
myseq into the
filename or USA box. If you click elsewhere in the page, you will see that the page is redrawn and information about the length of the sequence appears. Click on the button
Run compseq (at the top or at the bottom of the page). You will note that, quite quickly, a program output page with the result will appear.
If you click on the
PM button at the top you return to the Project Management page. Note at the right a list of
PROJECT RESULTS. You can always make your result appear again by clicking on
As example we will run the program plotorf, which searches for open reading frames, on a fragment of the E. coli genome containing the
ompA gene. This sequence can be found in
EMBL/GenBank/DDBJ with the accession number
V00307. You can convince yourself of this by searching with Entrez, an SRS server, an MRS server or whatever databank searching tool you've got, for "escherichia ompa".
First you should find out whether the EMBL databank is available in your wEMBOSS server. Run the program showdb. Then, in the Project Management page select
nucList and click
Edit. A text area will appear at the bottom of the page. You will note that the text already reads :
#nucleics of exercises
Edit this so that it becomes :
#nucleics of exercises embl:V00307 myseq
If the EMBL is not available at your site you might have to alternatively type
ebi_embl:V00307 or some other promising database name the server provides. If you do not have any suitable databank installed you will have to retrieve the sequence from a public site and then save it into a file as you did with
myseq in the previous exercise. Take the opportunity to add
myseq to the file as shown above. When you have finished editing, do not forget to
Save as. Incidentally, note that you could have bypassed the nucList editing step by checking
add filename to nucList before you saved myseq during exercise 1.
Go to the Program Page for plotorf. You can find it in the alphabetic list and also in
GENE FINDING. Note that now you have a
from the sequence selector (nucList or protList) selector with the sequence names you typed into
embl:V00307 and run the program.
You will obtain a program output page with a graphic. You can now try to recover this graphic. Click, using the right button of the mouse, on
default.1.png to pop up a menu from which you should choose
Save Target as. You will get a file browser with the option to save the file as
catch.png; you can choose a more sensible name instead. Finally, open MS Word or a similar program and import the file.
As example we will make a multiple sequence alignment and a phylogenetic tree of 4 proteases from the latex of the papaya tree. Using your experience from exercise 1, use the
New file function to make a file called
PapList with the content:
swissprot:papa1_carpa swissprot:papa2_carpa swissprot:papa3_carpa swissprot:papa4_carpa
Of course, first convince yourself that the SwissProt is available on your server. Then run emma on
list::PapList (do not forget the list specification! You can eventually add
protList, so that it appears in the selector). Before you start emma, select
GCG MSF in the
File format for output sequence set selector near the bottom of the page.
You will obtain two output files:
papa1_carpa.aln with a multiple sequence alignment in GCG MSF format and
papa1_carpa.dnd with a phylogenetic tree (actually the "guide tree" of the CLUSTAL program) in nested parentheses format. If you click with the left button of the mouse on
right click to save locally near
papa1_carpa.dnd you will get a pop up window with an invitation to save the file locally or open it with some local software. What works will depend very much on what is installed on your own computer. For opening a tree file we can recommend TreeView (http://darwin.zoology.gla.ac.uk/%7Erpage/treeviewx/download.html). Opening the alignment will not work quite like that with the current version of emma because it has a name ending with
.aln whereas wEMBOSS wants
.msf. It would work if you had on your server, instead of emma, the clustal program from the wrappers4EMBOSS suite, which is distributed together with wEMBOSS. For viewing alignments we recommend, if you have a PC with Windows, GeneDoc (
ftp://ftp.psc.edu/biomed/genedoc/), otherwise we recommend SeqPup (http://iubio.bio.indiana.edu/soft/molbio/seqpup/java/).
Go to the Project Management page. In the right hand part of the page click on the
Files button next to
OK at the bottom right of the page. You will note that the names
papa1_carpa.dnd appear in the list with data files. You could instead copy the files, using a name of your choice, by selecting just one file, typing a name into the
renamed box and then repeating the procedure for the second file. You could e.g. save
papaya_proteases.msf, enabling you to use the
View button to open it with GeneDoc or SeqPup.
papa1_carpa.dnd (or whatever it is named) in the list with data files, select
ATV in the selector near
View with and click on
View with. After a while an applet displaying the tree should appear. You can use the menu option
Help to get a short list of the things you can do with the applet ATV.
Note that it is not recommended to use
papa1_carpa.dnd as a phylogenetic tree. In order to obtain a real tree you can give the file
papa1_carpa.aln as an input to another program. If the Embassy package PHYLIP is installed on your server, try
fproml. You can, if you like, use the
Phylip tree file (optional) /
from project(s) data selector to force fproml to evaluate the tree in
papa1_carpa.dnd rather than to search for the best tree.
For a more realistic data set you could repeat the exercise with a set of breast cancer type 1 susceptibility proteins. Just run emma on
swissprot:brca1_* and proceed as before.