[EMBOSS] index RefSeq for EMBOSS

David.Bauer at schering.de David.Bauer at schering.de
Mon Apr 24 01:52:50 EDT 2006



You can also try the new indexing programs dbxflat and dbxfasta, which can
handle files larger than 2 GB.

Regards,
David.

emboss-bounces at lists.open-bio.org schrieb am 21/04/2006 17:43:27:

> Hi,
>
> Yes I also index refseq. I think the problem here is that dbiflat
> can only handle files which are less than 2GB. So try splitting the
> files first.
>
> Best,
> Isabelle
>
> -----Original Message-----
> From: emboss-bounces at lists.open-bio.org [mailto:emboss-
> bounces at lists.open-bio.org] On Behalf Of Olivier Friard
> Sent: Friday, April 21, 2006 17:00
> To: emboss at emboss.open-bio.org
> Subject: [EMBOSS] index RefSeq for EMBOSS
>
>
> Hi,
>
> I tried to index the RefSeq database:
>
> 1) I downloaded all
> ftp://ftp.ncbi.nih.gov/refseq/release/complete/complete*.genomic.gbff.gz

> file (GB format)
>
> 2) gunziped
>
> 3) Added the rs_dna entry to my .embossrc file
>
>
> DB rs_dna [
>     type: "N"
>     method: "emblcd"
>     format: "GB"
>     dir: "/home/users/friard/data/refseq_genomic/"
>     file: "*.gbff"
>     release: ""
>     comment: "RefSeq Genomic  (upd)"
>     indexdir: "/home/users/friard/data/refseq_genomic/"
> ]
>
>
> 4) used dbiflat with following arguments (from the directory where files

> are stored)
>
> dbiflat
> Index a flat file database
> Database name: rs_dna
>        EMBL : EMBL
>       SWISS : Swiss-Prot, SpTrEMBL, TrEMBLnew
>          GB : Genbank, DDBJ
>      REFSEQ : Refseq
> Entry format [SWISS]: REFSEQ
> Database directory [.]:
> Wildcard database filename [*.dat]: *.gbff
> Release number [0.0]:
> Index date [00/00/00]:
>
> The indexes were created but when I try to access to a sequence (i.e
> seqret rs_rna:NC_000004) then results is not the correct sequence but an

> other one with the NC_000004 ID!
>
>
>
> I also downloaded the file in FASTA format and tried to index them with
> the dbifasta command (format: ncbi) without positive results:
>
> seqret rs_dna:nc_000004
> Reads and writes (returns) sequences
> Error: Unable to read sequence 'rs_dna:nc_000004'
> Died: seqret terminated: Bad value for '-sequence' and no prompt
>
>
> Does anyone index the RefSeq successfully?
> Thank you in advance
>
>
>
>
>
>
> --
>
> Olivier Friard
> Laboratorio di Biologia Computazionale
> Facoltà di Scienze MFN
> Università di Torino
> via Accademia Albertina 13, 10124 TORINO (Italy)
>
> tel. +39 011 6704689
>
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/emboss
>
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss





More information about the EMBOSS mailing list