dbigcg |
Please help by correcting and extending the Wiki pages.
A GCG-format database consists of *.seq and *.ref files. The data in the *.seq files is often compressed.
The resulting index-file format is used by the software on the EMBL database CD-ROM distribution and by the Staden package in addition to EMBOSS, and appears to be the most generally used and publicly available index file format for these databases.
% dbigcg Index a GCG formatted database Database name: EMBL EMBL : EMBL SWISS : Swiss-Prot, SpTrEMBL, TrEMBLnew GENBANK : Genbank, DDBJ PIR : NBRF Entry format [EMBL]: EMBL Database directory [.]: embl Wildcard database filename [*.seq]: Release number [0.0]: Index date [00/00/00]: General log output file [outfile.dbigcg]: |
Go to the output files for this example
Index a GCG formatted database Version: EMBOSS:6.4.0.0 Standard (Mandatory) qualifiers: [-dbname] string Database name (Any string from 2 to 19 characters, matching regular expression /[A-z][A-z0-9_]+/) -idformat menu [EMBL] Entry format (Values: EMBL (EMBL); SWISS (Swiss-Prot, SpTrEMBL, TrEMBLnew); GENBANK (Genbank, DDBJ); PIR (NBRF)) -directory directory [.] Database directory -filenames string [*.seq] Wildcard database filename (Any string) -release string [0.0] Release number (Any string up to 9 characters) -date string [00/00/00] Index date (Date string dd/mm/yy) -outfile outfile [*.dbigcg] General log output file Additional (Optional) qualifiers: (none) Advanced (Unprompted) qualifiers: -fields menu [acc] Index fields (Values: acc (acnum accession number index); sv (seqvn sequence version and gi number index); des (des description index); key (keyword keywords index); org (taxon taxonomy and organism index)) -exclude string Wildcard filename(s) to exclude (Any string) -maxindex integer [0] Maximum index length (Integer 0 or more) -sortoptions string [-T . -k 1,1] Sort options, typically '-T .' to use current directory for work files and '-k 1,1' to force GNU sort to use the first field (Any string) -[no]systemsort boolean [Y] Use system sort utility -[no]cleanup boolean [Y] Clean up temporary files -indexoutdir outdir [.] Index file output directory Associated qualifiers: "-directory" associated qualifiers -extension string Default file extension "-outfile" associated qualifiers -odirectory string Output directory "-indexoutdir" associated qualifiers -extension string Default file extension General qualifiers: -auto boolean Turn off prompts -stdout boolean Write first file to standard output -filter boolean Read first file from standard input, write first file to standard output -options boolean Prompt for standard and additional values -debug boolean Write debug output to program.dbg -verbose boolean Report some/full command line options -help boolean Report command line options and exit. More information on associated and general qualifiers can be found with -help -verbose -warning boolean Report warnings -error boolean Report errors -fatal boolean Report fatal errors -die boolean Report dying program messages -version boolean Report version number and exit |
Qualifier | Type | Description | Allowed values | Default | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Standard (Mandatory) qualifiers | ||||||||||||||
[-dbname] (Parameter 1) |
string | Database name | Any string from 2 to 19 characters, matching regular expression /[A-z][A-z0-9_]+/ | Required | ||||||||||
-idformat | list | Entry format |
|
EMBL | ||||||||||
-directory | directory | Database directory | Directory | . | ||||||||||
-filenames | string | Wildcard database filename | Any string | *.seq | ||||||||||
-release | string | Release number | Any string up to 9 characters | 0.0 | ||||||||||
-date | string | Index date | Date string dd/mm/yy | 00/00/00 | ||||||||||
-outfile | outfile | General log output file | Output file | <*>.dbigcg | ||||||||||
Additional (Optional) qualifiers | ||||||||||||||
(none) | ||||||||||||||
Advanced (Unprompted) qualifiers | ||||||||||||||
-fields | list | Index fields |
|
acc | ||||||||||
-exclude | string | Wildcard filename(s) to exclude | Any string | |||||||||||
-maxindex | integer | Maximum index length | Integer 0 or more | 0 | ||||||||||
-sortoptions | string | Sort options, typically '-T .' to use current directory for work files and '-k 1,1' to force GNU sort to use the first field | Any string | -T . -k 1,1 | ||||||||||
-[no]systemsort | boolean | Use system sort utility | Boolean value Yes/No | Yes | ||||||||||
-[no]cleanup | boolean | Clean up temporary files | Boolean value Yes/No | Yes | ||||||||||
-indexoutdir | outdir | Index file output directory | Output directory | . | ||||||||||
Associated qualifiers | ||||||||||||||
"-directory" associated directory qualifiers | ||||||||||||||
-extension | string | Default file extension | Any string | |||||||||||
"-outfile" associated outfile qualifiers | ||||||||||||||
-odirectory | string | Output directory | Any string | |||||||||||
"-indexoutdir" associated outdir qualifiers | ||||||||||||||
-extension | string | Default file extension | Any string | |||||||||||
General qualifiers | ||||||||||||||
-auto | boolean | Turn off prompts | Boolean value Yes/No | N | ||||||||||
-stdout | boolean | Write first file to standard output | Boolean value Yes/No | N | ||||||||||
-filter | boolean | Read first file from standard input, write first file to standard output | Boolean value Yes/No | N | ||||||||||
-options | boolean | Prompt for standard and additional values | Boolean value Yes/No | N | ||||||||||
-debug | boolean | Write debug output to program.dbg | Boolean value Yes/No | N | ||||||||||
-verbose | boolean | Report some/full command line options | Boolean value Yes/No | Y | ||||||||||
-help | boolean | Report command line options and exit. More information on associated and general qualifiers can be found with -help -verbose | Boolean value Yes/No | N | ||||||||||
-warning | boolean | Report warnings | Boolean value Yes/No | Y | ||||||||||
-error | boolean | Report errors | Boolean value Yes/No | Y | ||||||||||
-fatal | boolean | Report fatal errors | Boolean value Yes/No | Y | ||||||||||
-die | boolean | Report dying program messages | Boolean value Yes/No | Y | ||||||||||
-version | boolean | Report version number and exit | Boolean value Yes/No | N |
This file contains non-printing characters and so cannot be displayed here.
This file contains non-printing characters and so cannot be displayed here.
This file contains non-printing characters and so cannot be displayed here.
This file contains non-printing characters and so cannot be displayed here.
######################################## # Program: dbigcg # Rundate: Fri 15 Jul 2011 12:00:00 # Dbname: EMBL # Release: 0.0 # Date: 15/07/11 # CurrentDirectory: /homes/user/test/qa/dbigcg-ex-keep/ # IndexDirectory: ./ # IndexDirectoryPath: /homes/user/test/qa/dbigcg-ex-keep/ # Maxindex: 0 # Fields: 2 # Field 1: id # Field 2: acc # Directory: /homes/user/test/embl/ # DirectoryPath: /homes/user/test/embl/ # Filenames: *.seq # Exclude: # Files: 9 # File 1: /homes/user/test/embl/eem_ba1.seq # File 2: /homes/user/test/embl/eem_est.seq # File 3: /homes/user/test/embl/eem_fun.seq # File 4: /homes/user/test/embl/eem_htginv1.seq # File 5: /homes/user/test/embl/eem_hum1.seq # File 6: /homes/user/test/embl/eem_in.seq # File 7: /homes/user/test/embl/eem_ov.seq # File 8: /homes/user/test/embl/eem_ro.seq # File 9: /homes/user/test/embl/eem_vi.seq ######################################## # Commandline: dbigcg # -dbname EMBL # -idformat EMBL # -directory ../../embl ######################################## filename: '/homes/user/test/embl/eem_ba1.seq' id: 10 acc: 14 filename: '/homes/user/test/embl/eem_est.seq' id: 1 acc: 1 filename: '/homes/user/test/embl/eem_fun.seq' id: 1 acc: 1 filename: '/homes/user/test/embl/eem_htginv1.seq' id: 5 acc: 5 filename: '/homes/user/test/embl/eem_hum1.seq' id: 15 acc: 18 filename: '/homes/user/test/embl/eem_in.seq' id: 2 acc: 2 filename: '/homes/user/test/embl/eem_ov.seq' id: 2 acc: 2 filename: '/homes/user/test/embl/eem_ro.seq' id: 3 acc: 3 filename: '/homes/user/test/embl/eem_vi.seq' id: 1 acc: 2 Index acc: maxlen 8 items 48 Total 9 files 40 entries (0 duplicates) |
dbigcg creates four index files. All are binary but with a simple format.
Having created the EMBOSS indices for this file, a database can then be defined in the file emboss.defaults as something like:
DB embl [ type: N format: embl method: gcg directory: /data/gcg/gcgembl ]
Program name | Description |
---|---|
dbiblast | Index a BLAST database |
dbifasta | Index a fasta file database |
dbiflat | Index a flat file database |
dbxcompress | Compress an uncompressed dbx index |
dbxedam | Index the EDAM ontology using b+tree indices |
dbxfasta | Index a fasta file database using b+tree indices |
dbxflat | Index a flat file database using b+tree indices |
dbxgcg | Index a GCG formatted database using b+tree indices |
dbxobo | Index an obo ontology using b+tree indices |
dbxreport | Validate index and report internals for dbx databases |
dbxresource | Index a data resource catalogue using b+tree indices |
dbxstat | Dump statistics for dbx databases |
dbxtax | Index NCBI taxonomy using b+tree indices |
dbxuncompress | Uncompress a compressed dbx index |
Please report all bugs to the EMBOSS bug team (emboss-bug © emboss.open-bio.org) not to the original author.