Appdoc:Geecee
From EMBOSS
Contents |
Function
Calculate fractional GC content of nucleic acid sequences
Description
geecee calculates the fraction of G+C bases of the input nucleic acid sequence(s). It sums the number of G and C bases in the input sequence(s) and writes the result to file as the fraction (in the interval 0.0 to 1.0) of the length of the whole sequence.
Usage
Here is a sample session with geecee
% geecee tembl:L46634 Calculate fractional GC content of nucleic acid sequences Output file [l46634.geecee]:
Go to the input files for this example
Go to the output files for this example
Command line arguments
| Qualifier | Type | Description | Allowed values | Default |
|---|---|---|---|---|
| Standard (Mandatory) qualifiers | ||||
| [-sequence] (Parameter 1) | seqall | Nucleotide sequence(s) filename and optional format, or reference (input USA) | Readable sequence(s) | Required |
| [-outfile] (Parameter 2) | outfile | Output file name | Output file | <*>.geecee |
| Additional (Optional) qualifiers | ||||
| (none) | ||||
| Advanced (Unprompted) qualifiers | ||||
| (none) | ||||
| Associated qualifiers | ||||
| "-sequence" associated seqall qualifiers | ||||
| -sbegin1 -sbegin_sequence | integer | Start of each sequence to be used | Any integer value | 0 |
| -send1 -send_sequence | integer | End of each sequence to be used | Any integer value | 0 |
| -sreverse1 -sreverse_sequence | boolean | Reverse (if DNA) | Boolean value Yes/No | N |
| -sask1 -sask_sequence | boolean | Ask for begin/end/reverse | Boolean value Yes/No | N |
| -snucleotide1 -snucleotide_sequence | boolean | Sequence is nucleotide | Boolean value Yes/No | N |
| -sprotein1 -sprotein_sequence | boolean | Sequence is protein | Boolean value Yes/No | N |
| -slower1 -slower_sequence | boolean | Make lower case | Boolean value Yes/No | N |
| -supper1 -supper_sequence | boolean | Make upper case | Boolean value Yes/No | N |
| -sformat1 -sformat_sequence | string | Input sequence format | Any string | |
| -sdbname1 -sdbname_sequence | string | Database name | Any string | |
| -sid1 -sid_sequence | string | Entryname | Any string | |
| -ufo1 -ufo_sequence | string | UFO features | Any string | |
| -fformat1 -fformat_sequence | string | Features format | Any string | |
| -fopenfile1 -fopenfile_sequence | string | Features file name | Any string | |
| "-outfile" associated outfile qualifiers | ||||
| -odirectory2 -odirectory_outfile | string | Output directory | Any string | |
| General qualifiers | ||||
| -auto | boolean | Turn off prompts | Boolean value Yes/No | N |
| -stdout | boolean | Write first file to standard output | Boolean value Yes/No | N |
| -filter | boolean | Read first file from standard input, write first file to standard output | Boolean value Yes/No | N |
| -options | boolean | Prompt for standard and additional values | Boolean value Yes/No | N |
| -debug | boolean | Write debug output to program.dbg | Boolean value Yes/No | N |
| -verbose | boolean | Report some/full command line options | Boolean value Yes/No | Y |
| -help | boolean | Report command line options. More information on associated and general qualifiers can be found with -help -verbose | Boolean value Yes/No | N |
| -warning | boolean | Report warnings | Boolean value Yes/No | Y |
| -error | boolean | Report errors | Boolean value Yes/No | Y |
| -fatal | boolean | Report fatal errors | Boolean value Yes/No | Y |
| -die | boolean | Report dying program messages | Boolean value Yes/No | Y |
Input file format
geecee reads any nucleic acid sequence USAs.
Input example
'tembl:L46634' is a sequence entry in the example nucleic acid database 'tembl'
Database entry: tembl:L46634
ID L46634; SV 1; linear; genomic DNA; STD; VRL; 1272 BP.
XX
AC L46634; L46689;
XX
DT 06-NOV-1995 (Rel. 45, Created)
DT 04-MAR-2000 (Rel. 63, Last updated, Version 3)
XX
DE Human herpesvirus 7 (clone ED132'1.2) telomeric repeat region.
XX
KW telomeric repeat.
XX
OS Human herpesvirus 7
OC Viruses; dsDNA viruses, no RNA stage; Herpesvirales; Herpesviridae;
OC Betaherpesvirinae; Roseolovirus.
XX
RN [1]
RP 1-1272
RX PUBMED; 7494318.
RA Secchiero P., Nicholas J., Deng H., Xiaopeng T., van Loon N., Ruvolo V.R.,
RA Berneman Z.N., Reitz M.S.Jr., Dewhurst S.;
RT "Identification of human telomeric repeat motifs at the genome termini of
RT human herpesvirus 7: structural analysis and heterogeneity";
RL J. Virol. 69(12):8041-8045(1995).
XX
FH Key Location/Qualifiers
FH
FT source 1..1272
FT /organism="Human herpesvirus 7"
FT /strain="JI"
FT /mol_type="genomic DNA"
FT /clone="ED132'1.2"
FT /db_xref="taxon:10372"
FT repeat_region 207..928
FT /note="long and complex repeat region composed of various
FT direct repeats, including TAACCC (TRS), degenerate copies
FT of TRS motifs and a 14-bp repeat, TAGGGCTGCGGCCC"
FT misc_signal 938..998
FT /note="pac2 motif"
FT misc_feature 1009
FT /note="right genome terminus (...ACA)"
XX
SQ Sequence 1272 BP; 346 A; 455 C; 222 G; 249 T; 0 other;
aagcttaaac tgaggtcaca cacgacttta attacggcaa cgcaacagct gtaagctgca 60
ggaaagatac gatcgtaagc aaatgtagtc ctacaatcaa gcgaggttgt agacgttacc 120
tacaatgaac tacacctcta agcataacct gtcgggcaca gtgagacacg cagccgtaaa 180
ttcaaaactc aacccaaacc gaagtctaag tctcacccta atcgtaacag taaccctaca 240
actctaatcc tagtccgtaa ccgtaacccc aatcctagcc cttagcccta accctagccc 300
taaccctagc tctaacctta gctctaactc tgaccctagg cctaacccta agcctaaccc 360
taaccgtagc tctaagttta accctaaccc taaccctaac catgaccctg accctaaccc 420
tagggctgcg gccctaaccc tagccctaac cctaacccta atcctaatcc tagccctaac 480
cctagggctg cggccctaac cctagcccta accctaaccc taaccctagg gctgcggccc 540
taaccctaac cctagggctg cggcccgaac cctaacccta accctaaccc taaccctagg 600
gctgcggccc taaccctaac cctagggctg cggccctaac cctaacccta gggctgcggc 660
ccgaacccta accctaaccc taaccctagg gctgcggccc taaccctaac cctagggctg 720
cggccctaac cctaacccta actctagggc tgcggcccta accctaaccc taaccctaac 780
cctagggctg cggcccgaac cctagcccta accctaaccc tgaccctgac cctaacccta 840
accctaaccc taaccctaac cctaacccta accctaaccc taaccctaac cctaacccta 900
accctaaccc taaccctaac cctaaccccg cccccactgg cagccaatgt cttgtaatgc 960
cttcaaggca ctttttctgc gagccgcgcg cagcactcag tgaaaaacaa gtttgtgcac 1020
gagaaagacg ctgccaaacc gcagctgcag catgaaggct gagtgcacaa ttttggcttt 1080
agtcccataa aggcgcggct tcccgtagag tagaaaaccg cagcgcggcg cacagagcga 1140
aggcagcggc tttcagactg tttgccaagc gcagtctgca tcttaccaat gatgatcgca 1200
agcaagaaaa atgttctttc ttagcatatg cgtggttaat cctgttgtgg tcatcactaa 1260
gttttcaagc tt 1272
//
Output file format
Output example
File: l46634.geecee
#Sequence GC content L46634 0.53
The first non-blank line is the title line. Subsequent lines consist of two columns of data.
- The first column is the name of the sequence.
- The second column is the percentage G+C content of the sequence.
Data files
None.
Notes
None.
References
None.
Warnings
None.
Diagnostic Error Messages
None.
Exit status
0 on successful completion.
Known bugs
None.
See also
| cpgplot | Identify and plot CpG islands in nucleotide sequence(s) |
| cpgreport | Identify and report CpG-rich regions in nucleotide sequence(s) |
| newcpgreport | Identify CpG islands in nucleotide sequence(s) |
| newcpgseek | Identify and report CpG-rich regions in nucleotide sequence(s) |
Author(s)
Richard Bruskiewich while he was at Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.
Please report all bugs to the EMBOSS bug team (emboss-bug (@) emboss.open-bio.org) not to the original author.
History
Completed 18th June 1999.
Target users
This program is intended to be used by everyone and everything, from naive users to embedded scripts.
Comments
None
/BODY>

