B.7. GCG to EMBOSS Comparison

GCG was a commercial package of bioinformatics tools provided by the Genetics Computer Group and latterly, by Accelrys. The table below (Table B.51, “Table of equivalent GCG / EMBOSS programs”) summarises equivalent programs from GCG and EMBOSS and should help you make the switch from GCG to EMBOSS. There are one or two GCG programs not covered by EMBOSS, conversely, many EMBOSS programs for which there was never an equivalent in GCG. Other comparisons of EMBOSS to GCG are available on-line:

Table B.51. Table of equivalent GCG / EMBOSS programs
GCG applicationEMBOSS applicationNotesExample session
assemblemergerMerges two overlapping sequences into one. Produces a merged file and an alignment file. Matrix options accessible using the -opt flag.
% merger
Merge two overlapping nucleic acid sequences
Input sequence: cam1.fasta
Second sequence: cam2.fasta
Output sequence [cam1.fasta]: cam_both.fasta
Output alignment [cam1.out2]: cam_both.aln
   
backtranslatebacktranseqTranslates protein back into a nucleotide sequence. Default codon usage table is the standard human one. To alter this use the -opt flag.
% backtranseq
Back translate a protein sequence
Input sequence: calm_human
Output sequence [calm_human.fasta]:
   
bestfitwater, matcherFinds the best local alignment(s) between two sequences. matcher (Huang and Miller algorithm) provides a faster match and should be used for longer sequences. water (Smith-Waterman algorithm) is more accurate and should be used for shorter sequences. Matrix options for matcher are available using the -opt flag.
% matcher
Finds the best local alignments between two sequences
Input sequence: cam1_long.fasta
Second sequence: cam2_long.fasta
Output alignment [cam1_1-429.matcher]:
% water
Smith-Waterman local alignment.
Input sequence: cam1.fasta
Second sequence(s): cam2.fasta
Gap opening penalty [10.0]:
Gap extension penalty [0.5]:
Output alignment [cam1.water]:
breakupsplitterTakes a sequence and splits it into smaller overlapping sequences. Use the -opt flag to select the size of each fragment.
% splitter
Split a sequence into (overlapping) smaller sequences
Input sequence(s): cam1.fasta
Output sequence [cam1.fasta]:
   
chopupIt is not necessary to have a separate program in EMBOSS for this, as all programs read and write a number of different file formats.
codonfrequencychips, cusp, compseqchips calculates the effection number of codons used (Wright Nc statistic). cusp creates a codon usage table from coding sequence (CDS). compseq counts the composition of user-specifed words within the sequence. Use the -opt flag for further word specification.
% chips
Codon usage statistics
Input sequence(s): cam1.fasta
Output file [cam1_1-429.chips]:
% cusp
Create a codon usage table
Input sequence(s): cam1.fasta
Output file [cam1_1-429.cusp]:
% compseq
Counts the composition of dimer/trimer/etc words in a sequence
Input sequence(s): cam1.fasta
Word size to consider (e.g. 2=dimer) [2]:
Output file [cam1_1-429.composition]:
codonpreferencesyco, wobblesyco identifies coding sequence from codon frequency bias information (Gribskov statistic). Further options for plot specification can be retrieved using the -opt flag. wobble plots a graph of the third "wobble" codon in a sequence. Use the -opt flag to alter the window size.
% syco
Synonymous codon usage Gribskov statistic plot
Input sequence: cam1.fasta
Graph type [x11]: ps
Created syco.ps
% wobble
Wobble base plot
Input sequence: cam1.fasta
Graph type [x11]: ps
Output file [cam1_1-429.wobble]:
Created wobble.ps
  
coilscanpepcoilIdentifies coiled coil regions in a protein sequence (Lupas, van Dyke and Stock algorithm).
% pepcoil
Predicts coiled coil regions
Input sequence(s): calm_human
Window size [28]:
Output file [calm_human.pepcoil]:
  
comparedottup, dotmatcherComparison of similar regions across two sequences displayed in graphcal format. dottup is designed for identical matches, and dotmatcher for regions of similarity. Use the -opt flag to select matrix options.
% dottup
Displays a wordmatch dotplot of two sequences
Input sequence: cam1.fasta
Second sequence: cam2.fasta
Word size [10]:
Graph type [x11]: ps
Created dottup.ps
% dotmatcher
Displays a thresholded dotplot of two sequences
Input sequence: cam1.fasta
Second sequence: cam2.fasta
Graph type [x11]: ps
Created dotmatcher.ps
  
compositioncompseq, pepstatscompseq counts the composition of user-specifed words within the sequence. Use the -opt flag for further word specification. pepstats calculates peptide sequence composition.
% compseq
Counts the composition of dimer/trimer/etc words in a sequence
Input sequence(s): cam1.fasta
Word size to consider (e.g. 2=dimer) [2]:
Output file [cam1_1-429.composition]:
% pepstats
Protein statistics
Input sequence(s): calm_human
Output file [calm_human.pepstats]:
  
consensusprophecyCreates a matrix or profile from a multiple alignment.
% prophecy
Creates matrices/profiles from multiple alignments
Input sequence set: prot2.fasta
Profile type
         F : Frequency
         G : Gribskov
         H : Henikoff
Select type [F]:
Enter a name for the profile [mymatrix]:
Enter threshold reporting percentage [75]:
Output file [prot2.prophecy]:  
   
correspondcodecmpCompares codon frequency matrices.
% codcmp
Codon usage table comparison
Codon usage file [Ehum.cut]:
Second Codon usage file [Ehum.cut]: Eacc.cut
Output file [outfile.codcmp]:
   
corruptmsbarRandomly mutates a sequence. Use the -opt flag to mutate in frame.
% msbar
Mutate sequence beyond all recognition
Input sequence(s): cam1.fasta
Number of times to perform the mutation operations [1]:
Point mutation operations
         0 : None
         1 : Any of the following
         2 : Insertions
         3 : Deletions
         4 : Changes
         5 : Duplications
         6 : Moves
Types of point mutations to perform [0]:
Block mutation operations
         0 : None
         1 : Any of the following
         2 : Insertions
         3 : Deletions
         4 : Changes
         5 : Duplications
         6 : Moves
Types of block mutations to perform [0]:
Codon mutation operations
         0 : None
         1 : Any of the following
         2 : Insertions
         3 : Deletions
         4 : Changes
         5 : Duplications
         6 : Moves
Types of codon mutations to perform [0]:
Output sequence [cam1_1-429.fasta]:
   
datasetdbiblast, dbigcg, dbifasta, dbiflatIndexes the relevant database for use with EMBOSS.
distancesNo direct equivalent.See the PHYLIP package.
divergeNo direct equivalent.See the PHYLIP package.
dotplotdottup, dotmatcher
extractpeptidetranseqTranslates a nucleotide sequence into protein. Use the -opt flag to specify information on the region, frame and genetic code.
% transeq
Translate nucleic acid sequences
Input sequence(s): cam1.fasta
Output sequence [cam1_1-429.pep]:
    
fetchseqret, seqretsplitseqret retrieves sequences from a database using the EMBOSS uniform sequence address. It can also by used with an input file to alter its format. seqretsplit splits a multi-sequence files into individual files containing a single sequence. Use the -opt flag to retrieve only the first sequence in a file.
% seqretsplit
Reads and writes (returns) sequences in individual files
Input sequence(s): prot2.fasta
Output sequence [calm_human.fasta]:
    
findpatternsfuzznuc, fuzzproFuzzy search of a pattern against a sequence on selection of sequences. Search allows mismatches. fuzznuc searches nucleotide and fuzzpro protein sequences.
% fuzznuc
Nucleic acid pattern search
Input sequence(s): cam1.fasta
Search pattern: AGGT
Number of mismatches [0]: 1
Output report [cam1_1-429.fuzznuc]:
% fuzzpro
Protein pattern search
Input sequence(s): prot2.fasta
Search pattern: PATTERN
Number of mismatches [0]: 3
Output report [calm_human.fuzzpro]:
    
framesplotorf, showorfPlots or displays open reading frames. plotorf uses ATG as a start and TAA, TAG, TGA as stop codons and displays the results as a graphic. showorf writes out the results of a frame translation as text. Use the -opt flag for more options.
% plotorf
Plot potential open reading frames
Input sequence: cam1.fasta
Graph type [x11]: ps
Created plotorf.ps
% showorf
Pretty output of DNA translations
Input sequence: cam1.fasta
Select Frames To Translate
         0 : None
         1 : F1
         2 : F2
         3 : F3
         4 : R1
         5 : R2
         6 : R3
Select one or more values [1,2,3,4,5,6]:
Output file [cam1_1-429.showorf]:
    
fromEMBL, fromFasta, fromGenbank, fromIG, fromStaden, fromtrace.All EMBOSS applications read and write a variety of file formats, so an individual conversion program is not necessary.
gapstretcher, needleFinds the best global alignment between two sequences. stretcher (Myers and Miller algorithm) provides a faster match and should be used for longer sequences. needle (Needleman-Wunsch algorithm) is more accurate and should be used for shorter sequences. Matrix options for stretcher are available using the -opt flag.
% stretcher
Finds the best global alignment between two sequences
Input sequence: cam1_long.fasta
Second sequence: cam2_long.fasta
Output alignment [cam1_1-429.stretcher]:
% needle
Needleman-Wunsch global alignment.
Input sequence: cam1.fasta
Second sequence(s): cam2.fasta
Gap opening penalty [10.0]:
Gap extension penalty [0.5]:
Output alignment [cam1_1-429.needle]:
    
gapshowplotconPlots the quality of alignment conservation across a sliding window. Use the -opt flag to alter the comparison matrix.
% plotcon
Plots the quality of conservation of a sequence alignment
Input sequence set: emma.aln
Window size [4]:
Graph type [x11]: ps
Created plotcon.ps
    
getseqnewseqEnter a short sequence into the program for use as an input file in other applications.
% newseq
Type in a short new sequence.
Name of the sequence: Test
Description of the sequence: Test Protein Sequence
Type of sequence
         N : Nucleic
         P : Protein
Type of sequence [N]: P
Output sequence [outfile.fasta]: Test.fasta
Enter the sequence: wearethediddymenthediddymenthediddymen
  
growtreeNo direct equivalent.Use emma as the interface to ClustalW or the PHYILP option.
helicalwheelpepwheelPlots a protein sequence as a helix.Use the -opt flag to specify the output display.
% pepwheel
Shows protein sequences as helices
Input sequence: calm_human
Graph type [x11]: ps
Created pepwheel.ps
    
hmmerAlign, hmmerBuild, hmmerCalibrate, hmmerFetch, hmmerIndex, hmmerPfam, hmmerSearch.See the HMMERNEW programs.
hthscanhelixturnhelixSearches for 22 residue helix turn helix motifs in a protein sequence (Dodd and Egan).Use the -opt flag to search using their 20 residue region and further specify calculation parameters.
% helixturnhelix
Report nucleic acid binding motifs
Input sequence(s): calm_human
Output report [calm_human.hth]:
    
isoelectriciepCalculates the isoelectric point of a protein.
% iep calm_human
Calculates the isoelectric point of a protein
Output file [calm_human.iep]:
    
lookupwhichdbDoes not offer all the parameters that lookup does, but will find identifers or acccession numbers in a database, and optionally retrieve the sequence.
% whichdb
Search all databases for an entry
ID or Accession number: p62158
Output file [outfile.whichdb]:
Output file [cam1_1-429.restover]:
    
map, mapplot, mapsortrestrict, remap, restoverCalculates restriction maps based on the entries in the REBASE restriction enzyme database. Displays peptide translation of open reading frame. remap is the most felxible of these applications. Use the -opt flag to force specific cutters.
% restrict cam1.fasta
Finds restriction enzyme cleavage sites
Minimum recognition site length [4]:
Comma separated enzyme list [all]:
Output report [cam1_1-429.restrict]:
% remap
Display a sequence with restriction cut sites, translation etc..
Input sequence(s): cam1.fasta
Comma separated enzyme list [all]:
Minimum recognition site length [4]:
Output file [cam1_1-429.remap]:
% restover
Finds restriction enzymes that produce a specific overhang
Input sequence(s): cam1.fasta
Overlap sequence: overhang.fasta
Output file [cam1_1-429.restover]:
    
melttempdanCalculates the melting temperature of a DNA or RNA sequence (Breslauer and Baldino statistics). Use the -opt flag to further specify calculations.
% dan
Calculates DNA RNA/DNA melting temperature
Input sequence(s): cam1.fasta
Enter window size [20]:
Enter Shift Increment [1]:
Enter DNA concentration (nM) [50.]:
Enter salt concentration (mM) [50.]:
Output report [cam1_1-429.dan]:
    
MEMENo direct equivalentSee the MEME applications.
momenthmomentCalculates the hydrophobic moment of protein. Use the -opt flag to specify the angle of rotation.
% hmoment
Hydrophobic moment calculation
Input sequence(s): calm_human
Output file [calm_human.hmoment]:
    
motifspatmatmotifs, pscan patmatmotifs searches the PROSITE database for patterns.Use the -opt flag to specify patterns. pscan searches the PRINTS database for fingerprint motifs.
% patmatmotifs
Search a PROSITE motif database with a protein sequence
Input sequence: calm_human
Output report [calm_human.patmatmotifs]:
% pscan
Scans proteins using PRINTS
Input sequence(s): calm_human
Minimum number of elements per fingerprint [2]:
Maximum number of elements per fingerprint [20]:
Output file [calm_human.pscan]:
    
namesinfoseqDescribes sequence attributes such as name, length, GC content.
% infoseq
Displays some simple information about sequences
Input sequence(s): calm_human
# USA            			 Name     Accession Type Length Description
fasta::calm_human:CALM_HUMAN CALM_HUMAN  P62158    P    148   Calmodulin (CaM).
    
nooverlapdiffseqFinds differences between two sequences. Use the -opt flag to output the information in columns.
% diffseq
Find differences between nearly identical sequences
Input sequence: cam1.fasta
Second sequence: cam2.fasta
Word size [10]:
Output report [cam1_1-429.diffseq]:
Output features [CaM1_1-429.diffgff]:
Second output features [CaM2.diffgff]:
    
pepdatagetorf, sixpackTranslates all six open reading frames. getorf displays selected translations. sixpack displays DNA sequence and peptide translation. Use the -opt flag for either program to specify the codon usage information.
% getorf
Finds and extracts open reading frames (ORFs)
Input sequence(s): cam1.fasta
Output sequence [cam1_1-429.orf]:
% sixpack
Display a DNA sequence with frame translation and ORFs
Input sequence: cam1.fasta
Output file [cam1_1-429.sixpack]:
Output sequence [cam1_1-429.fasta]:
    
pepplotpepinfo, garnierpepinfo displays biophysical properties of the protein sequence and plots hydrophobicity (Kyte and Doolittle, Sweet and Eisenberg, Eisernberg). Use the -opt flag to select parameters for the hydrophobicity plots. garnier displays a secondary structure plot (Garnier, Ogusthorpe and Robson).
% pepinfo
Plots simple amino acid properties in parallel
Input sequence: calm_human
Graph type [x11]: ps
Output file [calm_human.pepinfo]:
Created pepinfo.ps
% garnier
Predicts protein secondary structure
Input sequence(s): calm_human
Output report [calm_human.garnier]:
    
peptidemapdigest Peptide full or partial digest of a protein sequence.
% digest
Protein proteolytic enzyme or reagent cleavage digest
Input sequence: calm_human
Enzymes and Reagents
         1 : Trypsin
         2 : Lys-C
         3 : Arg-C
         4 : Asp-N
         5 : V8-bicarb
         6 : V8-phosph
         7 : Chymotrypsin
         8 : CNBr
Select number [1]:
Output report [calm_human.digest]:
    
peptidestructure, plotstructuregarnierDisplays secondary structure plot (Garnier, Ogusthorpe and Robson).
% garnier
Predicts protein secondary structure
Input sequence(s): calm_human
Output report [calm_human.garnier]:
    
pileupemmaWrapper to the ClustalW multiple sequence alignment program. Accepts all EMBOSS input formats.
% emma
Multiple alignment program - interface to ClustalW program
Input sequence(s): prot_all.fasta
Output sequence [cam2.aln]:
Dendogram output filename [cam2.dnd]:



 CLUSTAL W (1.83) Multiple Sequence Alignments



Sequence type explicitly set to Protein
Sequence format is Pearson
Sequence 1: CaM2            148 aa
Sequence 2: CaM3            148 aa
Sequence 3: CaM1            148 aa
Start of Pairwise alignments
Aligning...
Sequences (1:2) Aligned. Score:  93
Sequences (1:3) Aligned. Score:  100
Sequences (2:3) Aligned. Score:  97
Guide tree        file created:   [00002524C]
Start of Multiple Alignment
There are 2 groups
Aligning...
Group 1: Sequences:   2      Score:2070
Group 2: Sequences:   3      Score:1098
Alignment Score 1439
GCG-Alignment file created      [00002524B]
    
plasmidmaplindna, cirdnaDisplay of linear and circular DNA.
plotsimilarityplotcon. See also gapshow.
pretty, prettyboxcons, prettyplot, showaligncons calculates a consensus from a multiple alignment using specified parameters. prettyplot displays an alignment with specified colours and boxed in display. showalign displays the alignment in editable text format. Use the -opt flag for all three programs to set values.
primeeprimer3 Allows selection of a variety of different primers under several conditions. Use the -opt flag to alter parameters.
% eprimer3
Picks PCR primers and hybridization oligos
Input sequence(s): cam3.fasta
Output file [cam3.eprimer3]:
    
profilegap, profilemakeprophet, prophecyprophecy creates matrices or profiles from multiple alignments. prophet reads in these files to create gapped alignment of proteins.
% prophecy
Creates matrices/profiles from multiple alignments
Input sequence set: emma.aln
Profile type
         F : Frequency
         G : Gribskov
         H : Henikoff
Select type [F]:
Enter a name for the profile [mymatrix]:
Enter threshold reporting percentage [75]:
Output file [emma.prophecy]:
% prophet
Gapped alignment for profiles
Input sequence(s): calm_human
Profile or matrix file: emma.prophecy
Gap opening coefficient [1.0]:
Gap extension coefficient [1.0]:
Output file [calm_human.prophet]:
    
profilescanpatmatdbUses a motif to search a protein sequence.
% patmatdb
Search a protein sequence with a motif
Input sequence(s): emma.aln
Protein motif to search for: HATS
Output report [cam2.patmatdb]:
    
profilesearchprofitScans a sequence or database with a matrix or profile. Uses the matrix file created by prophecy.
% profit
Scan a sequence or database with a matrix or profile
Profile or matrix file: emma.prophecy
Input sequence(s): calm_human
Output file [emma.profit]:
    
reformatseqretReformatting files is redundant in EMBOSS as each application reads and write a variety of different formats. However, if anything needs converting, seqret will do it.
% seqret
Reads and writes (returns) sequences
Input sequence(s): calm.gcg
Output sequence [calm_human.fasta]:
    
repeatequicktandem, etandem, einverted, palindromeSearches for tandem repeats, inverted or palindromic sequences in a nucleotide input file.
% equicktandem
Finds tandem repeats
Input sequence: cam1.fasta
Maximum repeat size [600]:
Threshold score [20]:
Output report [cam1_1-429.qtan]:
% etandem
Looks for tandem repeats in a nucleotide sequence
Input sequence: cam1.fasta
Minimum repeat size [10]:
Maximum repeat size [10]:
Output report [cam1_1-429.tan]:
% einverted
Finds DNA inverted repeats
Input sequence: cam1.fasta
Gap penalty [12]:
Minimum score threshold [50]:
Match score [3]:
Mismatch score [-4]:
Output file [cam1_1-429.inv]:
% palindrome
Looks for inverted repeats in a nucleotide sequence
Input sequence(s): cam1.fasta
Enter minimum length of palindrome [10]:
Enter maximum length of palindrome [100]:
Enter maximum gap between repeated regions [100]:
Number of mismatches allowed [0]:
Output file [cam1_1-429.pal]:
Report overlapping matches [Y]:
    
replacebiosed, degapseqbiosed replaces specified characters in a text file. degapseq is specific for removing gaps.
% biosed
Replace or delete sequence sections
Input sequence(s): cam1.fasta
Sequence section to match [N]:
Replacement sequence section [A]:
Output sequence [cam1_1-429.fasta]:   
% degapseq
Removes gap characters from sequences
Input sequence(s): cam1.fasta
Output sequence [cam1_1-429.fasta]:
    
reverserevseqReverses and complements a sequence. Almost any program in the suite can reverse and complement a sequence using the -reverse option. Alternatively the [start:end:reverse] syntax will accomplish the same task.
% revseq
Reverse and complement a sequence
Input sequence(s): cam1.fasta
Output sequence [cam1_1-429.rev]:
    
sampleextractseqExtracts specific regions from a sequence. Use the -opt flag to save them to a separate file.
% extractseq
Extract regions from a sequence
Input sequence: cam1.fasta
Regions to extract (eg: 4-57,78-94) [1-429]: 1-25
Output sequence [cam1_1-429.fasta]:
    
segmaskseqMasks low complexity regions within a sequences. Use the -opt flag to select a region to mask.
% maskfeat
Mask off features of a sequence.
Input sequence(s): cam1.fasta
Output sequence [cam1_1-429.fasta]:
    
shuffleshuffleseqShuffles one or a set of sequences.
% shuffleseq
Shuffles a set of sequences maintaining composition
Input sequence(s): calm_human
Output sequence [calm_human.fasta]:
    
spscansigcleaveSearches for signal sequences in proteins. Use the -opt flag to specify a prokaryotic sequence.
% sigcleave
Reports protein signal cleavage sites
Input sequence(s): calm_human
Minimum weight [3.5]:
Output report [calm_human.sig]:
    
stemloopetandem, palindrome. See also repeat.
testcodewobble
toFASTA, toPIR, toIG, toSTADENseqret. See also fromEMBL.
translatetranseq. See also extractpeptide.
window + statplotfreakCalculates the base or residue frequency of a sequence. Use the -opt flag to select the window type for calculation of the plot.
% freak
Residue/base frequency table or plot
Input sequence(s): cam1.fasta
Residue letters [gc]:
Output file [cam1_1-429.freak]:
    
gcghelptfmStands for "the fine manual" and contains the individual program documentation. Type tfm followed by the program name.
% tfm stretcher
Displays a program's help documentation manual