The left hand side of the main Jemboss window (Section 9.2.9.1, “Main Jemboss Window”) gives access to all programs available through the Jemboss interface.
At the top of the pane, the category menus group together programs with similar analysis characteristics.
Click on Alignment
and then highlight global
from the submenu to see all programs that offer a global alignment of sequences. Highlight and click on stretcher
to see the program form appear in the central Jemboss pane.
Located on the Jemboss toolbar, the favourites menu offers a selection of commonly used programs. These can be edited (Section 9.7.2, “Programme Selection”) to customise the list and optimise program access.
Click on the Favourites
menu and select Global Alignments
. This will alter the program in the central pane to Needle
.
Further down the left hand pane all the programs are listed alphabetically. The scroll bar to the right allows access to any one of these programs. However, if the name of the required program is known, access may be quicker using the Go To
box (Section 9.4.5, “Go To Box”).
Directly above the alphabetical program list is an entry field. Any entry accesses the program list and highlights a program name according to the letters in the entry field. This method can be faster than any other selection method as only a few letters of the program name need be typed in.
Type m
in the Go To
box to highlight the first program beginning with m
.
Add at
into the Go To
box so the entry now reads mat
. This will highlight the first entry beginning with mat
, which is the global alignment program matcher. Hit the return button on the computer keyboard to bring up the matcher program form in the central pane.
The same text entry can be used to reselect the same program in the event of mis-entry (see Section 9.7.3, “Input/Output Options”)
Should the results of the selected program require sequence features in any format, then the Use Feature Information
box at the top of the input section should be selected. This option is only available for those programs that retrieve sequences: seqret, seqretsplit, skipseq, splitter and union.
This is the default selection and allows entry of either stored files (including listfiles (Section 6.6, “The Uniform Sequence Address (USA)”) via drag and drop from Local (Section 9.3.1, “Local File Management”) and Remote (Section 9.3.14, “Remote File management”) File Managers as well as from the Sequence List (Section 9.7.3.1, “Sequence Input”). If the file to be dragged is a listfile (Section 9.3.5.2, “Re-writing a File with New Data”) then the entire entry must be prefixed with an @
sign to indicate to Jemboss the nature of the data.
USAs (Section 6.6, “The Uniform Sequence Address (USA)”) can be entered directly into the field.
Hit the Browse files
button to the right of the entry field. This immediately accesses the Jemboss home directory (Section 9.3.2, “Home Directory”). Double click on the Example
folder in this directory and select the bgal_ecoli.fasta
file (if this has not been created, see the practical in Section 9.3.5.1, “Saving Analysis Results” and Section 9.3.9, “Rename”) and open the file. The entire path of this file will now be written into the entry field.
Hit the Reset
button to clear the entry field. Open the local file manager (Section 9.3.1, “Local File Management”) and drag the bgal_ecoli.fasta
file into the entry field. Once there is visual indication that the mouse is over the input field, drop the file by releasing the mouse button. The entire file path will be displayed in the field.
Hit the Reset
button to clear the field once more.
Open the remote file manager (Section 9.3.14, “Remote File management”) and drag in the bgal_frag.fasta
file (see Section 9.3.16, “Moving Data between File Managers”). The remote path is displayed in the field.
Hit the Reset
button to remove the remote entry. Click on the Input Sequence Options
button, select the uniprot
option from the Databases available
drop down menu and hit the OK
button. This database is now written in the entry field. Type bgal_ecoli
in the entry field after the colon. The bgal_ecoli
sequence will now be retrieved from the uniprot database.
The database retrieval option using such a USA might only be possible if the desktop computer is connected to the Internet as the sequence may need to be retrieved from a remote database.
Selection of this option allows a sequence or a list of sequences to be pasted into a larger field. Sequences should be pasted in using the desktop shortcut for paste (<CONTROL> + V
for Windows, <Apple> + V
for Macintosh, middle mouse button for Unix)
This option is useful only for those programs requiring a number of input files such as emma
, the multiple sequence alignment tool. It consists of 20 File/ Database Entry fields (Section 9.4.7.1, “File/Database Entry”) and accepts files specified in the usual manner.
Very few of these sequence attributes are necessary for a successful analysis run as they can be detected automatically.
Hit the Input Sequence Options
button to see potential sequence attributes.
Lists all databases available for a particular installation of Jemboss. Full names plus any name derivatives are shown, e.g. both uniprot
and uni
are often used to specify the uniprot protein database.
Lists all of the EMBOSS-acceptable formats (Section A.1, “Supported Sequence Formats”). It is not normally necessary to specify the format as the program can generally detect this, however if the sequence format is somewhat obscure (e.g. ig
or jackknifer
), it may be required.
Used if only a portion of a larger sequence need be analysed. Thus an entire database entry can be retrieved but only the relevant portion will undergo analysis.
Enter 300 in the begin field and 600 in the end field.
A selection here will ensure that the analysis run also includes a check of the reverse complement sequence. It can be used, for example, for nucleotide sequence translations and finding open reading frames or stem loops.
This specifies the type of sequence file used as input. This is generally obvious to the program, but may be necessary for specific types of sequences, for example a peptide sequence composed of a disproportionate number of alanines, threonines, glycines and cytosines, or a nucleotide sequence containing several ambiguity codes. Only one of these options may be selected.
Forces the program to return the sequence text in either upper or lower case. The default is upper case. Only one of these options may be selected.
The UFO (Uniform Feature Object) is the standard way of specifying file formats containing feature information (Section 5.3, “Introduction to Feature Formats”). In order to use this option the Use feature information
box (Section 9.4.6.1, “Features”) should be selected.
You use the UFO features
box to optionally load in a features file in association with any sequence you have specified on the main application form. The UFO command line syntax needs to be used is explained elsewhere (Section 6.7, “The Uniform Feature Object (UFO)”).
This is a large bar running halfway across the central pane with text in red capitals. Its action is to load the sequence in advance of the analysis run. This is only relevant in cases where there are parameter dependencies on the form which are based on the sequence. The most obvious of these cases are alignment programs, which select default matrices and penalties based on whether the sequence is nucleotide or protein.
Hit the LOAD SEQUENCE ATTRIBUTES
bar to update the default gap penalties. Select No
to the confirmation message so the inputted start and end sites are not overwritten.
This bar will load sequence attributes for the entire sequence, and so will offer to override any attributes selected in the Input Sequence Options
(Section 9.4.8, “Input Sequence Options”).
Enter uni:bgal1_entcl
in the second sequence filename entry box and load sequence attributes for that sequence also. Look at the begin
and end
sequence attribute options to ensure the full 1028 peptides of the sequence have been loaded.
Any options (Section 6.1, “Introduction to the EMBOSS Command Line”) needed for analysis of the input file are listed after the input section. These parameters are required for the analysis to complete. All mandatory parameters are subject to a default setting, which may or may not be visible to the user. Consult the documentation (Section 9.9, “Documentation”) for each program to ascertain these settings.
Use the drop down menu to alter the matrix selection to EPAM250
. Hit the GO
button to retrieve the local alignment. Minimize the Saved Results (Section 9.6.8.1, “Saved Results Window”) window.
Depending on the program, the output section may contain a single option to alter the output sequence format (such as matcher), or it may contain a more comprehensive list of parameters that may be included in the final output (e.g. remap). All output section parameters are subject to a default setting, which may or may not be visible to the user. Consult the documentation (Section 9.9, “Documentation”) for each program to ascertain these settings.
For all programs returning a sequence an Output Sequence Name
entry field is available, and will name the appropriate results tab (Section 9.6.8.1, “Saved Results Window”) with whatever name is entered. Only the filename is returned, and any filename extensions will be lost.
Select seqret by selecting the Database Sequence Retrieval
option from the Favourites
menu at the top of the Jemboss window. Type uni:bgal_ecoli
into the Sequence Filename
field and bgal_ecoli_1
into the Output Sequence Name
field in the output section. Hit GO
and note the name of the results tab (Section 9.6.8.1, “Saved Results Window”) containing the returned sequence.
Currently the name is not transferred when the results are saved, it is for display purposes only.
Close the Results window.
Available for any program which outputs a sequence, the output sequence options allow the user to customise a returned sequence should there be such a requirement.
The Separate file for each entry
option can be toggled on and off and allows the data to be returned as separate results tabs and not as a single, multiple sequence file. This may be easier to view, but each tab must be saved separately whereas a single multiple data tab can be saved in one go.
The default output for any sequence in EMBOSS is fasta , but any one of the formats currently supported can be selected from the drop down menu.
Adds the specified extension to the filename. Anything entered here, however, is overridden by an entry in the Output Sequence Name
box (Section 9.4.18, “Output Section”).
This option is not available if the Separate file for each entry
option is selected.
This option is for programs which return more than one data file. The base filename chosen will be applied to all data and ascending numbers appended to the name.
This option is not available if the Separate file for each entry
option is selected.
The features format only needs to be specified here and no colon (':
') is required. In order to use this option, the Use feature Information
box (Section 9.4.6.1, “Features”) should be selected.
Select the Use Feature Information
box at the top of the seqret program form. Enter uni:bgal_ecoli
in the Sequence Filename
field (resetting any other entries if necessary). Open the Output Sequence Options
and delete any entries currently visible. In the Features format
entry field type swiss
. Hit the GO
button.
Two results tabs will be returned. The first will be bgal_ecoli.swiss
and contain the features of this protein in swissprot format and the second, bgal_ecoli.fasta
, will be the sequence. If the swiss
format is not entered then Jemboss will return the features in the default GFF format.
Close the Results window.
The features output filename (only) needs to be specified here. In order to use this option, the Use feature Information
(Section 9.4.6.1, “Features”) should be selected.
The Use Feature Information
option should be selected on the seqret program form. Enter uni:bgal_ecoli
in the Sequence Filename
field (deleting anything else if necessary). Open the Output Sequence Options
and in the Feature Format
entry field, type swiss
. In the Features Filename
entry field type features
. Hit OK
to close the options menu and hit the GO
button. The results will be the same as for the previous example except that the output tab for the features is now called features
.
If the Separate file for each entry
option is selected then the individual sequences appear in separate tabs, but the feature information will appear consecutively in the same tab.
Enter uni:bgal*_e*
in the Sequence Filename
field. Select the Separate file for each entry
option in the output sequence options. Leave everything else as in the practical above and hit GO
. Scroll to the end of the features
tab and compare to the end of the features
tab for the last practical. The bgal1_entcl
features, and possibly others, should have been added.
Close the Results window
The default output format for any single sequence returned by EMBOSS is FASTA. The default for alignment programs may differ between programs and the default is displayed in parentheses. These defaults may be altered using the drop down menus.
There are two options for those programs which offer a graphical output. The default PNG output is a static line drawing of the output image. The alternative is Jemboss Graphics which can be selected from the drop down menu. This offers an interactive graphical display.
Select dotmatcher
by typing do
into the Go To
field and hitting return. Enter uni:bgal_ecoli
into the first Sequence Filename
field and uni:bgal1_entcl
into the second. Hit the GO
button to return results as a static image.
PNG graphics files must be saved with a .png
extension to the filename to allow them to be recognised by the software.
Close the graphics window.
Leave the entries in dotmatcher
and alter the drop down menu in the Output Section
to read Jemboss Graphics
. Hit the GO
button. Graphics should appear in an interactive graphic.
The font size may be altered using the drop down menu on the graphics toolbar. The view may be altered using the percentage zoom menu, also on the toolbar. Hover the mouse over anywhere on the graphic to see the coordinates of that location.
Open the File
menu on the graph display and select Display data
.
An EMBOSS data file window opens to reveal a text version of the dotmatcher graphic. This information cannot be saved.
Hit the Options
menu on the graphic toolbar to alter the axes and label information. Any alterations can be selected using the OK
button to close the options window. The APPLY
button will effect the changes on the graphic without closing the window. These changes will remain even when the CANCEL
button is then applied.
Delete the text in the Main Title
field and enter bgal_ecoli vs bgal1_entcl
. Hit the APPLY
button.
This field will accept unlimited characters, but the title appears only on one line of the graphic, centred in the middle of the graph. Thus if the title is too long, it will disappear off the end of the graphic.
The number format for both X and Y axis can be altered using the drop down menu.
The number of ticks displayed on each axis can be entered in the appropriate fields. There is no limit to the number of ticks entered, however too many will result in a thick, black, indistinguishable line under the axis.
The axes Start
and End
sites are labelled by default. The Start
site is always zero and the End
site represents the length of the sequence. This may lead to irregular axis numbering. There is no limit placed on new entries, thus if the End
site is longer than the actual sequence the graph will move to the left (on the X axis) and down (on the Y axis).
The title of each axis may be altered by entering the required text. There is no limit to the text that may be entered, but longer text may disappear off the end of the axis.
The height and width of the graph may be altered. The plot is created as a disproportional plot, but this can be altered by adjusting the height and/or width of the plot.
Should it be required, the colour of the graph can be altered by clicking once with the left hand mouse button on the Graph Colour
square. A new colour may be selected from the resulting palette. The colour affects the data only, and not the axes. The width of the graph line may also be made thicker by adjusting the Graph Line Width
. There is currently no limit to the line width which can be selected, but a larger line width may obscure data.
Graphics are saved as they are viewed on the screen, and can be saved in a variety of different formats.
Select the File
menu at the top of the dotmatcher graphic. Select the Print
option to reveal the Save
field and the default PNG format. Alter the format using the right hand Select Format
drop down menu to jpeg
. Enter dotmatcher.jpeg
into the File Name
field and save to the same folder as the other files in this section.
Advanced program parameters are hidden on the initial program form as they are not required for the analysis run. They are revealed by clicking on the Advanced Options
button and scrolling down the program form.
Hit the Advanced Options
button to reveal the additional program parameters. Alter the window size to 5
and hit the GO
button to display small but almost identical matches. Compare the results with those of the analysis run using the larger window size.
The majority of programs do not require a great amount of compute power and the results are ready immediately. These programs are run interactively (Section 9.4.38, “Interactive”). Some analyses, however, take extra memory and time and it makes sense to run them in batch mode (Section 9.4.39, “Batch”). The mode in which any program is run can be altered using the drop down Execution mode
menu at the left of the GO
button (in older versions of Jemboss it appears at the bottom right).
This is the default for the majority of programs. Results immediately appear in the Saved Results windows (Section 9.6.8.1, “Saved Results Window”) on screen as the analysis run finishes. During the run the Jemboss screen is locked and it is impossible to conduct any further analyses whilst the current one is running.
This is the mode in which the dotmatcher example above is run.
Any program can be altered to run in batch mode. This may be advantageous if, for example, the desktop computer is slow or there are a number of analyses which need to be carried out before comparing results.
Those analyses that require a greater amount of compute power are by default run in batch mode. The entire analysis is done in the background and Jemboss can continue to be used whilst the analysis is running.
Alter the drop down menu at the bottom left of the central Jemboss pane to read batch and hit the GO
button once again for the dotmatcher analysis. The process is sent to the Job Manager (Section 9.6.3, “Job Manager”) and is noted on screen by the message sending batch process now
. Results can be retrieved once the run is completed.
Any program set to run in batch by default may be altered to run in interactive mode, however this would freeze the Jemboss window for the duration of the run.