[EMBOSS] dbifasta index file format
pmr at ebi.ac.uk
Mon Apr 10 05:05:36 EDT 2006
Graziano P. wrote:
> hello EMBOSS users,
> I have some databases in fasta format (ncbi | format)
> and I want to index them using dbifasta, then I want
> to access the index files using a program that will be
> developed by a computer scientist of my group.
> I need to index the databases by accession number,
> ginumber and description. I have read in the dbifasta
> help info about the structure of the index files when
> the databases were indexed by accession number, but I
> have not found info about the structure of the index
> files when the databases are indexed by description.
> Anyone knows where I can find detailed information
> about the structure of the index files?
The dbifasta index files use the same format as the Staden package, the old
EMBL CD-ROM distribution, and Erik Sonnhammer's "efetch" utility.
They were documented in some old Staden documentation and papers.
They are also documented in the EMBOSS distribution under doc/manuals/ in file
internals-indexing.txt (see attached). I see that this document was written
before we indexed the descriptions!!!
The description (title) indexing is the same as the accession number indexing.
The files are called des.hit and des.trg. dbifasta has a -maxindex option to
limit the size of the longest words indexed (the index files have a value for
the maximum record length).
We also have a script in the distribution scripts/dbilist.pl which can list
the contents of the description index (in the database index directory, run it
as dbilist.pl des)
The new dbxfasta index files are very different. For very large databases we
recommend dbxfasta. For smaller databases dbifasta is fine and we will
continue to support it.
Hope that helps. If you need more details, just ask.
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
More information about the EMBOSS