GCG was a commercial package of bioinformatics tools provided by the Genetics Computer Group and latterly, by Accelrys. The table below (Table B.51, “Table of equivalent GCG / EMBOSS programs”) summarises equivalent programs from GCG and EMBOSS and should help you make the switch from GCG to EMBOSS. There are one or two GCG programs not covered by EMBOSS, conversely, many EMBOSS programs for which there was never an equivalent in GCG. Other comparisons of EMBOSS to GCG are available on-line:
Table B.51. Table of equivalent GCG / EMBOSS programsGCG application | EMBOSS application | Notes | Example session |
---|
assemble | merger | Merges two overlapping sequences into one. Produces a merged file and an alignment file. Matrix options accessible using the -opt flag. | % merger
Merge two overlapping nucleic acid sequences
Input sequence: cam1.fasta
Second sequence: cam2.fasta
Output sequence [cam1.fasta]: cam_both.fasta
Output alignment [cam1.out2]: cam_both.aln
|
backtranslate | backtranseq | Translates protein back into a nucleotide sequence. Default codon usage table is the standard human one. To alter this use the -opt flag. | % backtranseq
Back translate a protein sequence
Input sequence: calm_human
Output sequence [calm_human.fasta]:
|
bestfit | water, matcher | Finds the best local alignment(s) between two sequences. matcher (Huang and Miller algorithm) provides a faster match and should be used for longer sequences. water (Smith-Waterman algorithm) is more accurate and should be used for shorter sequences. Matrix options for matcher are available using the -opt flag. | % matcher
Finds the best local alignments between two sequences
Input sequence: cam1_long.fasta
Second sequence: cam2_long.fasta
Output alignment [cam1_1-429.matcher]:
% water
Smith-Waterman local alignment.
Input sequence: cam1.fasta
Second sequence(s): cam2.fasta
Gap opening penalty [10.0]:
Gap extension penalty [0.5]:
Output alignment [cam1.water]:
|
breakup | splitter | Takes a sequence and splits it into smaller overlapping sequences. Use the -opt flag to select the size of each fragment. | % splitter
Split a sequence into (overlapping) smaller sequences
Input sequence(s): cam1.fasta
Output sequence [cam1.fasta]:
|
chopup | | It is not necessary to have a separate program in EMBOSS for this, as all programs read and write a number of different file formats. |
codonfrequency | chips, cusp, compseq | chips calculates the effection number of codons used (Wright Nc statistic). cusp creates a codon usage table from coding sequence (CDS). compseq counts the composition of user-specifed words within the sequence. Use the -opt flag for further word specification. | % chips
Codon usage statistics
Input sequence(s): cam1.fasta
Output file [cam1_1-429.chips]:
% cusp
Create a codon usage table
Input sequence(s): cam1.fasta
Output file [cam1_1-429.cusp]:
% compseq
Counts the composition of dimer/trimer/etc words in a sequence
Input sequence(s): cam1.fasta
Word size to consider (e.g. 2=dimer) [2]:
Output file [cam1_1-429.composition]:
|
codonpreference | syco, wobble | syco identifies coding sequence from codon frequency bias information (Gribskov statistic). Further options for plot specification can be retrieved using the -opt flag. wobble plots a graph of the third "wobble" codon in a sequence. Use the -opt flag to alter the window size. | % syco
Synonymous codon usage Gribskov statistic plot
Input sequence: cam1.fasta
Graph type [x11]: ps
Created syco.ps
% wobble
Wobble base plot
Input sequence: cam1.fasta
Graph type [x11]: ps
Output file [cam1_1-429.wobble]:
Created wobble.ps
|
coilscan | pepcoil | Identifies coiled coil regions in a protein sequence (Lupas, van Dyke and Stock algorithm). | % pepcoil
Predicts coiled coil regions
Input sequence(s): calm_human
Window size [28]:
Output file [calm_human.pepcoil]:
|
compare | dottup, dotmatcher | Comparison of similar regions across two sequences displayed in graphcal format. dottup is designed for identical matches, and dotmatcher for regions of similarity. Use the -opt flag to select matrix options. | % dottup
Displays a wordmatch dotplot of two sequences
Input sequence: cam1.fasta
Second sequence: cam2.fasta
Word size [10]:
Graph type [x11]: ps
Created dottup.ps
% dotmatcher
Displays a thresholded dotplot of two sequences
Input sequence: cam1.fasta
Second sequence: cam2.fasta
Graph type [x11]: ps
Created dotmatcher.ps
|
composition | compseq, pepstats | compseq counts the composition of user-specifed words within the sequence. Use the -opt flag for further word specification. pepstats calculates peptide sequence composition. | % compseq
Counts the composition of dimer/trimer/etc words in a sequence
Input sequence(s): cam1.fasta
Word size to consider (e.g. 2=dimer) [2]:
Output file [cam1_1-429.composition]:
% pepstats
Protein statistics
Input sequence(s): calm_human
Output file [calm_human.pepstats]:
|
consensus | prophecy | Creates a matrix or profile from a multiple alignment. | % prophecy
Creates matrices/profiles from multiple alignments
Input sequence set: prot2.fasta
Profile type
F : Frequency
G : Gribskov
H : Henikoff
Select type [F]:
Enter a name for the profile [mymatrix]:
Enter threshold reporting percentage [75]:
Output file [prot2.prophecy]:
|
correspond | codecmp | Compares codon frequency matrices. | % codcmp
Codon usage table comparison
Codon usage file [Ehum.cut]:
Second Codon usage file [Ehum.cut]: Eacc.cut
Output file [outfile.codcmp]:
|
corrupt | msbar | Randomly mutates a sequence. Use the -opt flag to mutate in frame. | % msbar
Mutate sequence beyond all recognition
Input sequence(s): cam1.fasta
Number of times to perform the mutation operations [1]:
Point mutation operations
0 : None
1 : Any of the following
2 : Insertions
3 : Deletions
4 : Changes
5 : Duplications
6 : Moves
Types of point mutations to perform [0]:
Block mutation operations
0 : None
1 : Any of the following
2 : Insertions
3 : Deletions
4 : Changes
5 : Duplications
6 : Moves
Types of block mutations to perform [0]:
Codon mutation operations
0 : None
1 : Any of the following
2 : Insertions
3 : Deletions
4 : Changes
5 : Duplications
6 : Moves
Types of codon mutations to perform [0]:
Output sequence [cam1_1-429.fasta]:
|
dataset | dbiblast, dbigcg, dbifasta, dbiflat | Indexes the relevant database for use with EMBOSS. |
distances | No direct equivalent. | See the PHYLIP package. |
diverge | No direct equivalent. | See the PHYLIP package. |
dotplot | dottup, dotmatcher | |
extractpeptide | transeq | Translates a nucleotide sequence into protein. Use the -opt flag to specify information on the region, frame and genetic code. | % transeq
Translate nucleic acid sequences
Input sequence(s): cam1.fasta
Output sequence [cam1_1-429.pep]:
|
fetch | seqret, seqretsplit | seqret retrieves sequences from a database using the EMBOSS uniform sequence address. It can also by used with an input file to alter its format. seqretsplit splits a multi-sequence files into individual files containing a single sequence. Use the -opt flag to retrieve only the first sequence in a file. | % seqretsplit
Reads and writes (returns) sequences in individual files
Input sequence(s): prot2.fasta
Output sequence [calm_human.fasta]:
|
findpatterns | fuzznuc, fuzzpro | Fuzzy search of a pattern against a sequence on selection of sequences. Search allows mismatches. fuzznuc searches nucleotide and fuzzpro protein sequences. | % fuzznuc
Nucleic acid pattern search
Input sequence(s): cam1.fasta
Search pattern: AGGT
Number of mismatches [0]: 1
Output report [cam1_1-429.fuzznuc]:
% fuzzpro
Protein pattern search
Input sequence(s): prot2.fasta
Search pattern: PATTERN
Number of mismatches [0]: 3
Output report [calm_human.fuzzpro]:
|
frames | plotorf, showorf | Plots or displays open reading frames. plotorf uses ATG as a start and TAA, TAG, TGA as stop codons and displays the results as a graphic. showorf writes out the results of a frame translation as text. Use the -opt flag for more options. | % plotorf
Plot potential open reading frames
Input sequence: cam1.fasta
Graph type [x11]: ps
Created plotorf.ps
% showorf
Pretty output of DNA translations
Input sequence: cam1.fasta
Select Frames To Translate
0 : None
1 : F1
2 : F2
3 : F3
4 : R1
5 : R2
6 : R3
Select one or more values [1,2,3,4,5,6]:
Output file [cam1_1-429.showorf]:
|
fromEMBL, fromFasta, fromGenbank, fromIG, fromStaden, fromtrace. | | All EMBOSS applications read and write a variety of file formats, so an individual conversion program is not necessary. |
gap | stretcher, needle | Finds the best global alignment between two sequences. stretcher (Myers and Miller algorithm) provides a faster match and should be used for longer sequences. needle (Needleman-Wunsch algorithm) is more accurate and should be used for shorter sequences. Matrix options for stretcher are available using the -opt flag. | % stretcher
Finds the best global alignment between two sequences
Input sequence: cam1_long.fasta
Second sequence: cam2_long.fasta
Output alignment [cam1_1-429.stretcher]:
% needle
Needleman-Wunsch global alignment.
Input sequence: cam1.fasta
Second sequence(s): cam2.fasta
Gap opening penalty [10.0]:
Gap extension penalty [0.5]:
Output alignment [cam1_1-429.needle]:
|
gapshow | plotcon | Plots the quality of alignment conservation across a sliding window. Use the -opt flag to alter the comparison matrix. | % plotcon
Plots the quality of conservation of a sequence alignment
Input sequence set: emma.aln
Window size [4]:
Graph type [x11]: ps
Created plotcon.ps
|
getseq | newseq | Enter a short sequence into the program for use as an input file in other applications. | % newseq
Type in a short new sequence.
Name of the sequence: Test
Description of the sequence: Test Protein Sequence
Type of sequence
N : Nucleic
P : Protein
Type of sequence [N]: P
Output sequence [outfile.fasta]: Test.fasta
Enter the sequence: wearethediddymenthediddymenthediddymen
|
growtree | No direct equivalent. | Use emma as the interface to ClustalW or the PHYILP option. |
helicalwheel | pepwheel | Plots a protein sequence as a helix.Use the -opt flag to specify the output display. | % pepwheel
Shows protein sequences as helices
Input sequence: calm_human
Graph type [x11]: ps
Created pepwheel.ps
|
hmmerAlign, hmmerBuild, hmmerCalibrate, hmmerFetch, hmmerIndex, hmmerPfam, hmmerSearch. | See the HMMERNEW programs. | |
hthscan | helixturnhelix | Searches for 22 residue helix turn helix motifs in a protein sequence (Dodd and Egan).Use the -opt flag to search using their 20 residue region and further specify calculation parameters. | % helixturnhelix
Report nucleic acid binding motifs
Input sequence(s): calm_human
Output report [calm_human.hth]:
|
isoelectric | iep | Calculates the isoelectric point of a protein. | % iep calm_human
Calculates the isoelectric point of a protein
Output file [calm_human.iep]:
|
lookup | whichdb | Does not offer all the parameters that lookup does, but will find identifers or acccession numbers in a database, and optionally retrieve the sequence. | % whichdb
Search all databases for an entry
ID or Accession number: p62158
Output file [outfile.whichdb]:
Output file [cam1_1-429.restover]:
|
map, mapplot, mapsort | restrict, remap, restover | Calculates restriction maps based on the entries in the REBASE restriction enzyme database. Displays peptide translation of open reading frame. remap is the most felxible of these applications. Use the -opt flag to force specific cutters. | % restrict cam1.fasta
Finds restriction enzyme cleavage sites
Minimum recognition site length [4]:
Comma separated enzyme list [all]:
Output report [cam1_1-429.restrict]:
% remap
Display a sequence with restriction cut sites, translation etc..
Input sequence(s): cam1.fasta
Comma separated enzyme list [all]:
Minimum recognition site length [4]:
Output file [cam1_1-429.remap]:
% restover
Finds restriction enzymes that produce a specific overhang
Input sequence(s): cam1.fasta
Overlap sequence: overhang.fasta
Output file [cam1_1-429.restover]:
|
melttemp | dan | Calculates the melting temperature of a DNA or RNA sequence (Breslauer and Baldino statistics). Use the -opt flag to further specify calculations. | % dan
Calculates DNA RNA/DNA melting temperature
Input sequence(s): cam1.fasta
Enter window size [20]:
Enter Shift Increment [1]:
Enter DNA concentration (nM) [50.]:
Enter salt concentration (mM) [50.]:
Output report [cam1_1-429.dan]:
|
MEME | No direct equivalent | See the MEME applications. |
moment | hmoment | Calculates the hydrophobic moment of protein. Use the -opt flag to specify the angle of rotation. | % hmoment
Hydrophobic moment calculation
Input sequence(s): calm_human
Output file [calm_human.hmoment]:
|
motifs | patmatmotifs, pscan patmatmotifs searches the PROSITE database for patterns. | Use the -opt flag to specify patterns. pscan searches the PRINTS database for fingerprint motifs. | % patmatmotifs
Search a PROSITE motif database with a protein sequence
Input sequence: calm_human
Output report [calm_human.patmatmotifs]:
% pscan
Scans proteins using PRINTS
Input sequence(s): calm_human
Minimum number of elements per fingerprint [2]:
Maximum number of elements per fingerprint [20]:
Output file [calm_human.pscan]:
|
names | infoseq | Describes sequence attributes such as name, length, GC content. | % infoseq
Displays some simple information about sequences
Input sequence(s): calm_human
# USA Name Accession Type Length Description
fasta::calm_human:CALM_HUMAN CALM_HUMAN P62158 P 148 Calmodulin (CaM).
|
nooverlap | diffseq | Finds differences between two sequences. Use the -opt flag to output the information in columns. | % diffseq
Find differences between nearly identical sequences
Input sequence: cam1.fasta
Second sequence: cam2.fasta
Word size [10]:
Output report [cam1_1-429.diffseq]:
Output features [CaM1_1-429.diffgff]:
Second output features [CaM2.diffgff]:
|
pepdata | getorf, sixpack | Translates all six open reading frames. getorf displays selected translations. sixpack displays DNA sequence and peptide translation. Use the -opt flag for either program to specify the codon usage information. | % getorf
Finds and extracts open reading frames (ORFs)
Input sequence(s): cam1.fasta
Output sequence [cam1_1-429.orf]:
% sixpack
Display a DNA sequence with frame translation and ORFs
Input sequence: cam1.fasta
Output file [cam1_1-429.sixpack]:
Output sequence [cam1_1-429.fasta]:
|
pepplot | pepinfo, garnier | pepinfo displays biophysical properties of the protein sequence and plots hydrophobicity (Kyte and Doolittle, Sweet and Eisenberg, Eisernberg). Use the -opt flag to select parameters for the hydrophobicity plots. garnier displays a secondary structure plot (Garnier, Ogusthorpe and Robson). | % pepinfo
Plots simple amino acid properties in parallel
Input sequence: calm_human
Graph type [x11]: ps
Output file [calm_human.pepinfo]:
Created pepinfo.ps
% garnier
Predicts protein secondary structure
Input sequence(s): calm_human
Output report [calm_human.garnier]:
|
peptidemap | digest Peptide full or partial digest of a protein sequence. | | % digest
Protein proteolytic enzyme or reagent cleavage digest
Input sequence: calm_human
Enzymes and Reagents
1 : Trypsin
2 : Lys-C
3 : Arg-C
4 : Asp-N
5 : V8-bicarb
6 : V8-phosph
7 : Chymotrypsin
8 : CNBr
Select number [1]:
Output report [calm_human.digest]:
|
peptidestructure, plotstructure | garnier | Displays secondary structure plot (Garnier, Ogusthorpe and Robson). | % garnier
Predicts protein secondary structure
Input sequence(s): calm_human
Output report [calm_human.garnier]:
|
pileup | emma | Wrapper to the ClustalW multiple sequence alignment program. Accepts all EMBOSS input formats. | % emma
Multiple alignment program - interface to ClustalW program
Input sequence(s): prot_all.fasta
Output sequence [cam2.aln]:
Dendogram output filename [cam2.dnd]:
CLUSTAL W (1.83) Multiple Sequence Alignments
Sequence type explicitly set to Protein
Sequence format is Pearson
Sequence 1: CaM2 148 aa
Sequence 2: CaM3 148 aa
Sequence 3: CaM1 148 aa
Start of Pairwise alignments
Aligning...
Sequences (1:2) Aligned. Score: 93
Sequences (1:3) Aligned. Score: 100
Sequences (2:3) Aligned. Score: 97
Guide tree file created: [00002524C]
Start of Multiple Alignment
There are 2 groups
Aligning...
Group 1: Sequences: 2 Score:2070
Group 2: Sequences: 3 Score:1098
Alignment Score 1439
GCG-Alignment file created [00002524B]
|
plasmidmap | lindna, cirdna | Display of linear and circular DNA. |
plotsimilarity | plotcon. See also gapshow. | |
pretty, prettybox | cons, prettyplot, showalign | cons calculates a consensus from a multiple alignment using specified parameters. prettyplot displays an alignment with specified colours and boxed in display. showalign displays the alignment in editable text format. Use the -opt flag for all three programs to set values. |
prime | eprimer3 Allows selection of a variety of different primers under several conditions. Use the -opt flag to alter parameters. | | % eprimer3
Picks PCR primers and hybridization oligos
Input sequence(s): cam3.fasta
Output file [cam3.eprimer3]:
|
profilegap, profilemake | prophet, prophecy | prophecy creates matrices or profiles from multiple alignments. prophet reads in these files to create gapped alignment of proteins. | % prophecy
Creates matrices/profiles from multiple alignments
Input sequence set: emma.aln
Profile type
F : Frequency
G : Gribskov
H : Henikoff
Select type [F]:
Enter a name for the profile [mymatrix]:
Enter threshold reporting percentage [75]:
Output file [emma.prophecy]:
% prophet
Gapped alignment for profiles
Input sequence(s): calm_human
Profile or matrix file: emma.prophecy
Gap opening coefficient [1.0]:
Gap extension coefficient [1.0]:
Output file [calm_human.prophet]:
|
profilescan | patmatdb | Uses a motif to search a protein sequence. | % patmatdb
Search a protein sequence with a motif
Input sequence(s): emma.aln
Protein motif to search for: HATS
Output report [cam2.patmatdb]:
|
profilesearch | profit | Scans a sequence or database with a matrix or profile. Uses the matrix file created by prophecy. | % profit
Scan a sequence or database with a matrix or profile
Profile or matrix file: emma.prophecy
Input sequence(s): calm_human
Output file [emma.profit]:
|
reformat | seqret | Reformatting files is redundant in EMBOSS as each application reads and write a variety of different formats. However, if anything needs converting, seqret will do it. | % seqret
Reads and writes (returns) sequences
Input sequence(s): calm.gcg
Output sequence [calm_human.fasta]:
|
repeat | equicktandem, etandem, einverted, palindrome | Searches for tandem repeats, inverted or palindromic sequences in a nucleotide input file. | % equicktandem
Finds tandem repeats
Input sequence: cam1.fasta
Maximum repeat size [600]:
Threshold score [20]:
Output report [cam1_1-429.qtan]:
% etandem
Looks for tandem repeats in a nucleotide sequence
Input sequence: cam1.fasta
Minimum repeat size [10]:
Maximum repeat size [10]:
Output report [cam1_1-429.tan]:
% einverted
Finds DNA inverted repeats
Input sequence: cam1.fasta
Gap penalty [12]:
Minimum score threshold [50]:
Match score [3]:
Mismatch score [-4]:
Output file [cam1_1-429.inv]:
% palindrome
Looks for inverted repeats in a nucleotide sequence
Input sequence(s): cam1.fasta
Enter minimum length of palindrome [10]:
Enter maximum length of palindrome [100]:
Enter maximum gap between repeated regions [100]:
Number of mismatches allowed [0]:
Output file [cam1_1-429.pal]:
Report overlapping matches [Y]:
|
replace | biosed, degapseq | biosed replaces specified characters in a text file. degapseq is specific for removing gaps. | % biosed
Replace or delete sequence sections
Input sequence(s): cam1.fasta
Sequence section to match [N]:
Replacement sequence section [A]:
Output sequence [cam1_1-429.fasta]:
% degapseq
Removes gap characters from sequences
Input sequence(s): cam1.fasta
Output sequence [cam1_1-429.fasta]:
|
reverse | revseq | Reverses and complements a sequence. Almost any program in the suite can reverse and complement a sequence using the -reverse option. Alternatively the [start:end:reverse] syntax will accomplish the same task. | % revseq
Reverse and complement a sequence
Input sequence(s): cam1.fasta
Output sequence [cam1_1-429.rev]:
|
sample | extractseq | Extracts specific regions from a sequence. Use the -opt flag to save them to a separate file. | % extractseq
Extract regions from a sequence
Input sequence: cam1.fasta
Regions to extract (eg: 4-57,78-94) [1-429]: 1-25
Output sequence [cam1_1-429.fasta]:
|
seg | maskseq | Masks low complexity regions within a sequences. Use the -opt flag to select a region to mask. | % maskfeat
Mask off features of a sequence.
Input sequence(s): cam1.fasta
Output sequence [cam1_1-429.fasta]:
|
shuffle | shuffleseq | Shuffles one or a set of sequences. | % shuffleseq
Shuffles a set of sequences maintaining composition
Input sequence(s): calm_human
Output sequence [calm_human.fasta]:
|
spscan | sigcleave | Searches for signal sequences in proteins. Use the -opt flag to specify a prokaryotic sequence. | % sigcleave
Reports protein signal cleavage sites
Input sequence(s): calm_human
Minimum weight [3.5]:
Output report [calm_human.sig]:
|
stemloop | etandem, palindrome. See also repeat. | |
testcode | wobble | |
toFASTA, toPIR, toIG, toSTADEN | seqret. See also fromEMBL. | |
translate | transeq. See also extractpeptide. | |
window + statplot | freak | Calculates the base or residue frequency of a sequence. Use the -opt flag to select the window type for calculation of the plot. | % freak
Residue/base frequency table or plot
Input sequence(s): cam1.fasta
Residue letters [gc]:
Output file [cam1_1-429.freak]:
|
gcghelp | tfm | Stands for "the fine manual" and contains the individual program documentation. Type tfm followed by the program name. | % tfm stretcher
Displays a program's help documentation manual
|