B.7. GCG to EMBOSS Comparison

B.7. GCG to EMBOSS Comparison
Prev	Appendix B. Applications and Packages Reference	Next

GCG was a commercial package of bioinformatics tools provided by the Genetics Computer Group and latterly, by Accelrys. The table below (Table B.51, “Table of equivalent GCG / EMBOSS programs”) summarises equivalent programs from GCG and EMBOSS and should help you make the switch from GCG to EMBOSS. There are one or two GCG programs not covered by EMBOSS, conversely, many EMBOSS programs for which there was never an equivalent in GCG. Other comparisons of EMBOSS to GCG are available on-line:

Helix Systems at NIH
http://helix.nih.gov/apps/bioinfo/emboss-gcg.html
Italian EMBnet node
http://www.ba.itb.cnr.it/BIGHome/ita/EMBnet_a/GCGtoEMBOSS.html
Norwegian EMBnet node
http://www.biotek.uio.no/EMBNET/gcgtoemb.html
Belgian EMBnet node
http://www.be.embnet.org/services/EMBOSSHelp/EMBOSSPrograms

Table B.51. Table of equivalent GCG / EMBOSS programs
GCG application	EMBOSS application	Notes	Example session
assemble	merger	Merges two overlapping sequences into one. Produces a merged file and an alignment file. Matrix options accessible using the `-opt` flag.	`%` `merger` Merge two overlapping nucleic acid sequences Input sequence: cam1.fasta Second sequence: cam2.fasta Output sequence [cam1.fasta]: cam_both.fasta Output alignment [cam1.out2]: cam_both.aln
backtranslate	backtranseq	Translates protein back into a nucleotide sequence. Default codon usage table is the standard human one. To alter this use the `-opt` flag.	`%` `backtranseq` Back translate a protein sequence Input sequence: calm_human Output sequence [calm_human.fasta]:
bestfit	water, matcher	Finds the best local alignment(s) between two sequences. matcher (Huang and Miller algorithm) provides a faster match and should be used for longer sequences. water (Smith-Waterman algorithm) is more accurate and should be used for shorter sequences. Matrix options for matcher are available using the `-opt` flag.	`%` `matcher` Finds the best local alignments between two sequences Input sequence: cam1_long.fasta Second sequence: cam2_long.fasta Output alignment [cam1_1-429.matcher]: `%` `water` Smith-Waterman local alignment. Input sequence: cam1.fasta Second sequence(s): cam2.fasta Gap opening penalty [10.0]: Gap extension penalty [0.5]: Output alignment [cam1.water]:
breakup	splitter	Takes a sequence and splits it into smaller overlapping sequences. Use the `-opt` flag to select the size of each fragment.	`%` `splitter` Split a sequence into (overlapping) smaller sequences Input sequence(s): cam1.fasta Output sequence [cam1.fasta]:
chopup		It is not necessary to have a separate program in EMBOSS for this, as all programs read and write a number of different file formats.
codonfrequency	chips, cusp, compseq	chips calculates the effection number of codons used (Wright Nc statistic). cusp creates a codon usage table from coding sequence (CDS). compseq counts the composition of user-specifed words within the sequence. Use the `-opt` flag for further word specification.	`%` `chips` Codon usage statistics Input sequence(s): cam1.fasta Output file [cam1_1-429.chips]: `%` `cusp` Create a codon usage table Input sequence(s): cam1.fasta Output file [cam1_1-429.cusp]: `%` `compseq` Counts the composition of dimer/trimer/etc words in a sequence Input sequence(s): cam1.fasta Word size to consider (e.g. 2=dimer) [2]: Output file [cam1_1-429.composition]:
codonpreference	syco, wobble	syco identifies coding sequence from codon frequency bias information (Gribskov statistic). Further options for plot specification can be retrieved using the `-opt` flag. wobble plots a graph of the third "wobble" codon in a sequence. Use the `-opt` flag to alter the window size.	`%` `syco` Synonymous codon usage Gribskov statistic plot Input sequence: cam1.fasta Graph type [x11]: ps Created syco.ps `%` `wobble` Wobble base plot Input sequence: cam1.fasta Graph type [x11]: ps Output file [cam1_1-429.wobble]: Created wobble.ps
coilscan	pepcoil	Identifies coiled coil regions in a protein sequence (Lupas, van Dyke and Stock algorithm).	`%` `pepcoil` Predicts coiled coil regions Input sequence(s): calm_human Window size [28]: Output file [calm_human.pepcoil]:
compare	dottup, dotmatcher	Comparison of similar regions across two sequences displayed in graphcal format. dottup is designed for identical matches, and dotmatcher for regions of similarity. Use the `-opt` flag to select matrix options.	`%` `dottup` Displays a wordmatch dotplot of two sequences Input sequence: cam1.fasta Second sequence: cam2.fasta Word size [10]: Graph type [x11]: ps Created dottup.ps `%` `dotmatcher` Displays a thresholded dotplot of two sequences Input sequence: cam1.fasta Second sequence: cam2.fasta Graph type [x11]: ps Created dotmatcher.ps
composition	compseq, pepstats	compseq counts the composition of user-specifed words within the sequence. Use the `-opt` flag for further word specification. pepstats calculates peptide sequence composition.	`%` `compseq` Counts the composition of dimer/trimer/etc words in a sequence Input sequence(s): cam1.fasta Word size to consider (e.g. 2=dimer) [2]: Output file [cam1_1-429.composition]: `%` `pepstats` Protein statistics Input sequence(s): calm_human Output file [calm_human.pepstats]:
consensus	prophecy	Creates a matrix or profile from a multiple alignment.	`%` `prophecy` Creates matrices/profiles from multiple alignments Input sequence set: prot2.fasta Profile type F : Frequency G : Gribskov H : Henikoff Select type [F]: Enter a name for the profile [mymatrix]: Enter threshold reporting percentage [75]: Output file [prot2.prophecy]:
correspond	codecmp	Compares codon frequency matrices.	`%` `codcmp` Codon usage table comparison Codon usage file [Ehum.cut]: Second Codon usage file [Ehum.cut]: Eacc.cut Output file [outfile.codcmp]:
corrupt	msbar	Randomly mutates a sequence. Use the `-opt` flag to mutate in frame.	`%` `msbar` Mutate sequence beyond all recognition Input sequence(s): cam1.fasta Number of times to perform the mutation operations [1]: Point mutation operations 0 : None 1 : Any of the following 2 : Insertions 3 : Deletions 4 : Changes 5 : Duplications 6 : Moves Types of point mutations to perform [0]: Block mutation operations 0 : None 1 : Any of the following 2 : Insertions 3 : Deletions 4 : Changes 5 : Duplications 6 : Moves Types of block mutations to perform [0]: Codon mutation operations 0 : None 1 : Any of the following 2 : Insertions 3 : Deletions 4 : Changes 5 : Duplications 6 : Moves Types of codon mutations to perform [0]: Output sequence [cam1_1-429.fasta]:
dataset	dbiblast, dbigcg, dbifasta, dbiflat	Indexes the relevant database for use with EMBOSS.
distances	No direct equivalent.	See the PHYLIP package.
diverge	No direct equivalent.	See the PHYLIP package.
dotplot	dottup, dotmatcher
extractpeptide	transeq	Translates a nucleotide sequence into protein. Use the `-opt` flag to specify information on the region, frame and genetic code.	`%` `transeq` Translate nucleic acid sequences Input sequence(s): cam1.fasta Output sequence [cam1_1-429.pep]:
fetch	seqret, seqretsplit	seqret retrieves sequences from a database using the EMBOSS uniform sequence address. It can also by used with an input file to alter its format. seqretsplit splits a multi-sequence files into individual files containing a single sequence. Use the `-opt` flag to retrieve only the first sequence in a file.	`%` `seqretsplit` Reads and writes (returns) sequences in individual files Input sequence(s): prot2.fasta Output sequence [calm_human.fasta]:
findpatterns	fuzznuc, fuzzpro	Fuzzy search of a pattern against a sequence on selection of sequences. Search allows mismatches. fuzznuc searches nucleotide and fuzzpro protein sequences.	`%` `fuzznuc` Nucleic acid pattern search Input sequence(s): cam1.fasta Search pattern: AGGT Number of mismatches [0]: 1 Output report [cam1_1-429.fuzznuc]: `%` `fuzzpro` Protein pattern search Input sequence(s): prot2.fasta Search pattern: PATTERN Number of mismatches [0]: 3 Output report [calm_human.fuzzpro]:
frames	plotorf, showorf	Plots or displays open reading frames. plotorf uses ATG as a start and TAA, TAG, TGA as stop codons and displays the results as a graphic. showorf writes out the results of a frame translation as text. Use the `-opt` flag for more options.	`%` `plotorf` Plot potential open reading frames Input sequence: cam1.fasta Graph type [x11]: ps Created plotorf.ps `%` `showorf` Pretty output of DNA translations Input sequence: cam1.fasta Select Frames To Translate 0 : None 1 : F1 2 : F2 3 : F3 4 : R1 5 : R2 6 : R3 Select one or more values [1,2,3,4,5,6]: Output file [cam1_1-429.showorf]:
fromEMBL, fromFasta, fromGenbank, fromIG, fromStaden, fromtrace.		All EMBOSS applications read and write a variety of file formats, so an individual conversion program is not necessary.
gap	stretcher, needle	Finds the best global alignment between two sequences. stretcher (Myers and Miller algorithm) provides a faster match and should be used for longer sequences. needle (Needleman-Wunsch algorithm) is more accurate and should be used for shorter sequences. Matrix options for stretcher are available using the `-opt` flag.	`%` `stretcher` Finds the best global alignment between two sequences Input sequence: cam1_long.fasta Second sequence: cam2_long.fasta Output alignment [cam1_1-429.stretcher]: `%` `needle` Needleman-Wunsch global alignment. Input sequence: cam1.fasta Second sequence(s): cam2.fasta Gap opening penalty [10.0]: Gap extension penalty [0.5]: Output alignment [cam1_1-429.needle]:
gapshow	plotcon	Plots the quality of alignment conservation across a sliding window. Use the `-opt` flag to alter the comparison matrix.	`%` `plotcon` Plots the quality of conservation of a sequence alignment Input sequence set: emma.aln Window size [4]: Graph type [x11]: ps Created plotcon.ps
getseq	newseq	Enter a short sequence into the program for use as an input file in other applications.	`%` `newseq` Type in a short new sequence. Name of the sequence: Test Description of the sequence: Test Protein Sequence Type of sequence N : Nucleic P : Protein Type of sequence [N]: P Output sequence [outfile.fasta]: Test.fasta Enter the sequence: wearethediddymenthediddymenthediddymen
growtree	No direct equivalent.	Use emma as the interface to ClustalW or the PHYILP option.
helicalwheel	pepwheel	Plots a protein sequence as a helix.Use the `-opt` flag to specify the output display.	`%` `pepwheel` Shows protein sequences as helices Input sequence: calm_human Graph type [x11]: ps Created pepwheel.ps
hmmerAlign, hmmerBuild, hmmerCalibrate, hmmerFetch, hmmerIndex, hmmerPfam, hmmerSearch.	See the HMMERNEW programs.
hthscan	helixturnhelix	Searches for 22 residue helix turn helix motifs in a protein sequence (Dodd and Egan).Use the `-opt` flag to search using their 20 residue region and further specify calculation parameters.	`%` `helixturnhelix` Report nucleic acid binding motifs Input sequence(s): calm_human Output report [calm_human.hth]:
isoelectric	iep	Calculates the isoelectric point of a protein.	`%` `iep calm_human` Calculates the isoelectric point of a protein Output file [calm_human.iep]:
lookup	whichdb	Does not offer all the parameters that lookup does, but will find identifers or acccession numbers in a database, and optionally retrieve the sequence.	`%` `whichdb` Search all databases for an entry ID or Accession number: p62158 Output file [outfile.whichdb]: Output file [cam1_1-429.restover]:
map, mapplot, mapsort	restrict, remap, restover	Calculates restriction maps based on the entries in the REBASE restriction enzyme database. Displays peptide translation of open reading frame. remap is the most felxible of these applications. Use the `-opt` flag to force specific cutters.	`%` `restrict cam1.fasta` Finds restriction enzyme cleavage sites Minimum recognition site length [4]: Comma separated enzyme list [all]: Output report [cam1_1-429.restrict]: `%` `remap` Display a sequence with restriction cut sites, translation etc.. Input sequence(s): cam1.fasta Comma separated enzyme list [all]: Minimum recognition site length [4]: Output file [cam1_1-429.remap]: `%` `restover` Finds restriction enzymes that produce a specific overhang Input sequence(s): cam1.fasta Overlap sequence: overhang.fasta Output file [cam1_1-429.restover]:
melttemp	dan	Calculates the melting temperature of a DNA or RNA sequence (Breslauer and Baldino statistics). Use the `-opt` flag to further specify calculations.	`%` `dan` Calculates DNA RNA/DNA melting temperature Input sequence(s): cam1.fasta Enter window size [20]: Enter Shift Increment [1]: Enter DNA concentration (nM) [50.]: Enter salt concentration (mM) [50.]: Output report [cam1_1-429.dan]:
MEME	No direct equivalent	See the MEME applications.
moment	hmoment	Calculates the hydrophobic moment of protein. Use the `-opt` flag to specify the angle of rotation.	`%` `hmoment` Hydrophobic moment calculation Input sequence(s): calm_human Output file [calm_human.hmoment]:
motifs	patmatmotifs, pscan patmatmotifs searches the PROSITE database for patterns.	Use the `-opt` flag to specify patterns. pscan searches the PRINTS database for fingerprint motifs.	`%` `patmatmotifs` Search a PROSITE motif database with a protein sequence Input sequence: calm_human Output report [calm_human.patmatmotifs]: `%` `pscan` Scans proteins using PRINTS Input sequence(s): calm_human Minimum number of elements per fingerprint [2]: Maximum number of elements per fingerprint [20]: Output file [calm_human.pscan]:
names	infoseq	Describes sequence attributes such as name, length, GC content.	`%` `infoseq` Displays some simple information about sequences Input sequence(s): calm_human # USA Name Accession Type Length Description fasta::calm_human:CALM_HUMAN CALM_HUMAN P62158 P 148 Calmodulin (CaM).
nooverlap	diffseq	Finds differences between two sequences. Use the `-opt` flag to output the information in columns.	`%` `diffseq` Find differences between nearly identical sequences Input sequence: cam1.fasta Second sequence: cam2.fasta Word size [10]: Output report [cam1_1-429.diffseq]: Output features [CaM1_1-429.diffgff]: Second output features [CaM2.diffgff]:
pepdata	getorf, sixpack	Translates all six open reading frames. getorf displays selected translations. sixpack displays DNA sequence and peptide translation. Use the `-opt` flag for either program to specify the codon usage information.	`%` `getorf` Finds and extracts open reading frames (ORFs) Input sequence(s): cam1.fasta Output sequence [cam1_1-429.orf]: `%` `sixpack` Display a DNA sequence with frame translation and ORFs Input sequence: cam1.fasta Output file [cam1_1-429.sixpack]: Output sequence [cam1_1-429.fasta]:
pepplot	pepinfo, garnier	pepinfo displays biophysical properties of the protein sequence and plots hydrophobicity (Kyte and Doolittle, Sweet and Eisenberg, Eisernberg). Use the `-opt` flag to select parameters for the hydrophobicity plots. garnier displays a secondary structure plot (Garnier, Ogusthorpe and Robson).	`%` `pepinfo` Plots simple amino acid properties in parallel Input sequence: calm_human Graph type [x11]: ps Output file [calm_human.pepinfo]: Created pepinfo.ps `%` `garnier` Predicts protein secondary structure Input sequence(s): calm_human Output report [calm_human.garnier]:
peptidemap	digest Peptide full or partial digest of a protein sequence.		`%` `digest` Protein proteolytic enzyme or reagent cleavage digest Input sequence: calm_human Enzymes and Reagents 1 : Trypsin 2 : Lys-C 3 : Arg-C 4 : Asp-N 5 : V8-bicarb 6 : V8-phosph 7 : Chymotrypsin 8 : CNBr Select number [1]: Output report [calm_human.digest]:
peptidestructure, plotstructure	garnier	Displays secondary structure plot (Garnier, Ogusthorpe and Robson).	`%` `garnier` Predicts protein secondary structure Input sequence(s): calm_human Output report [calm_human.garnier]:
pileup	emma	Wrapper to the ClustalW multiple sequence alignment program. Accepts all EMBOSS input formats.	`%` `emma` Multiple alignment program - interface to ClustalW program Input sequence(s): prot_all.fasta Output sequence [cam2.aln]: Dendogram output filename [cam2.dnd]: CLUSTAL W (1.83) Multiple Sequence Alignments Sequence type explicitly set to Protein Sequence format is Pearson Sequence 1: CaM2 148 aa Sequence 2: CaM3 148 aa Sequence 3: CaM1 148 aa Start of Pairwise alignments Aligning... Sequences (1:2) Aligned. Score: 93 Sequences (1:3) Aligned. Score: 100 Sequences (2:3) Aligned. Score: 97 Guide tree file created: [00002524C] Start of Multiple Alignment There are 2 groups Aligning... Group 1: Sequences: 2 Score:2070 Group 2: Sequences: 3 Score:1098 Alignment Score 1439 GCG-Alignment file created [00002524B]
plasmidmap	lindna, cirdna	Display of linear and circular DNA.
plotsimilarity	plotcon. See also gapshow.
pretty, prettybox	cons, prettyplot, showalign	cons calculates a consensus from a multiple alignment using specified parameters. prettyplot displays an alignment with specified colours and boxed in display. showalign displays the alignment in editable text format. Use the `-opt` flag for all three programs to set values.
prime	eprimer3 Allows selection of a variety of different primers under several conditions. Use the `-opt` flag to alter parameters.		`%` `eprimer3` Picks PCR primers and hybridization oligos Input sequence(s): cam3.fasta Output file [cam3.eprimer3]:
profilegap, profilemake	prophet, prophecy	prophecy creates matrices or profiles from multiple alignments. prophet reads in these files to create gapped alignment of proteins.	`%` `prophecy` Creates matrices/profiles from multiple alignments Input sequence set: emma.aln Profile type F : Frequency G : Gribskov H : Henikoff Select type [F]: Enter a name for the profile [mymatrix]: Enter threshold reporting percentage [75]: Output file [emma.prophecy]: `%` `prophet` Gapped alignment for profiles Input sequence(s): calm_human Profile or matrix file: emma.prophecy Gap opening coefficient [1.0]: Gap extension coefficient [1.0]: Output file [calm_human.prophet]:
profilescan	patmatdb	Uses a motif to search a protein sequence.	`%` `patmatdb` Search a protein sequence with a motif Input sequence(s): emma.aln Protein motif to search for: HATS Output report [cam2.patmatdb]:
profilesearch	profit	Scans a sequence or database with a matrix or profile. Uses the matrix file created by prophecy.	`%` `profit` Scan a sequence or database with a matrix or profile Profile or matrix file: emma.prophecy Input sequence(s): calm_human Output file [emma.profit]:
reformat	seqret	Reformatting files is redundant in EMBOSS as each application reads and write a variety of different formats. However, if anything needs converting, seqret will do it.	`%` `seqret` Reads and writes (returns) sequences Input sequence(s): calm.gcg Output sequence [calm_human.fasta]:
repeat	equicktandem, etandem, einverted, palindrome	Searches for tandem repeats, inverted or palindromic sequences in a nucleotide input file.	`%` `equicktandem` Finds tandem repeats Input sequence: cam1.fasta Maximum repeat size [600]: Threshold score [20]: Output report [cam1_1-429.qtan]: `%` `etandem` Looks for tandem repeats in a nucleotide sequence Input sequence: cam1.fasta Minimum repeat size [10]: Maximum repeat size [10]: Output report [cam1_1-429.tan]: `%` `einverted` Finds DNA inverted repeats Input sequence: cam1.fasta Gap penalty [12]: Minimum score threshold [50]: Match score [3]: Mismatch score [-4]: Output file [cam1_1-429.inv]: `%` `palindrome` Looks for inverted repeats in a nucleotide sequence Input sequence(s): cam1.fasta Enter minimum length of palindrome [10]: Enter maximum length of palindrome [100]: Enter maximum gap between repeated regions [100]: Number of mismatches allowed [0]: Output file [cam1_1-429.pal]: Report overlapping matches [Y]:
replace	biosed, degapseq	biosed replaces specified characters in a text file. degapseq is specific for removing gaps.	`%` `biosed` Replace or delete sequence sections Input sequence(s): cam1.fasta Sequence section to match [N]: Replacement sequence section [A]: Output sequence [cam1_1-429.fasta]: `%` `degapseq` Removes gap characters from sequences Input sequence(s): cam1.fasta Output sequence [cam1_1-429.fasta]:
reverse	revseq	Reverses and complements a sequence. Almost any program in the suite can reverse and complement a sequence using the `-reverse` option. Alternatively the `[start:end:reverse]` syntax will accomplish the same task.	`%` `revseq` Reverse and complement a sequence Input sequence(s): cam1.fasta Output sequence [cam1_1-429.rev]:
sample	extractseq	Extracts specific regions from a sequence. Use the `-opt` flag to save them to a separate file.	`%` `extractseq` Extract regions from a sequence Input sequence: cam1.fasta Regions to extract (eg: 4-57,78-94) [1-429]: 1-25 Output sequence [cam1_1-429.fasta]:
seg	maskseq	Masks low complexity regions within a sequences. Use the `-opt` flag to select a region to mask.	`%` `maskfeat` Mask off features of a sequence. Input sequence(s): cam1.fasta Output sequence [cam1_1-429.fasta]:
shuffle	shuffleseq	Shuffles one or a set of sequences.	`%` `shuffleseq` Shuffles a set of sequences maintaining composition Input sequence(s): calm_human Output sequence [calm_human.fasta]:
spscan	sigcleave	Searches for signal sequences in proteins. Use the `-opt` flag to specify a prokaryotic sequence.	`%` `sigcleave` Reports protein signal cleavage sites Input sequence(s): calm_human Minimum weight [3.5]: Output report [calm_human.sig]:
stemloop	etandem, palindrome. See also repeat.
testcode	wobble
toFASTA, toPIR, toIG, toSTADEN	seqret. See also fromEMBL.
translate	transeq. See also extractpeptide.
window + statplot	freak	Calculates the base or residue frequency of a sequence. Use the `-opt` flag to select the window type for calculation of the plot.	`%` `freak` Residue/base frequency table or plot Input sequence(s): cam1.fasta Residue letters [gc]: Output file [cam1_1-429.freak]:
gcghelp	tfm	Stands for "the fine manual" and contains the individual program documentation. Type `tfm` followed by the program name.	`%` `tfm stretcher` Displays a program's help documentation manual

Prev	Up	Next
B.6. All Applications (by group)	Home	Appendix C. Command-line Qualifier Reference