File format produced by ABI sequencing machines. It contains the 'trace data' which includes the probabilities of the four nucleotide bases along the sequencing run together with the sequence deduced from that data. The sequence information is what is normally read in and used by EMBOSS programs, although the trace data is available and may be utilised by some specialised EMBOSS programs. The code for parsing this format is heavily based on David Mathog's Fortran library, which is bundled with a description of ABI trace file format (abi.txt
):
ftp://saf.bio.caltech.edu/pub/software/molbio/abitools.zip |
ABI trace is a binary format so an example is not given here.
File format used by the AceDB database. AceDB is a genome database designed for the flexible handling of bioinformatic data. It includes tools designed to manipulate genomic data, but is increasingly also used for non-biological data. For further information see:
http://www.acedb.org/ |
DNA : "HSFAU" ttcctctttctcgactccatcttcgcggtagctgggaccgccgttcagtc gccaatatgcagctctttgtccgcgcccaggagctacacaccttcgaggt gaccggccaggaaacggtcgcccagatcaaggctcatgtagcctcactgg agggcattgccccggaagatcaagtcgtgctcctggcaggcgcgcccctg gaggatgaggccactctgggccagtgcggggtggaggccctgactaccct ggaagtagcaggccgcatgcttggaggtaaagttcatggttccctggccc gtgctggaaaagtgagaggtcagactcctaaggtggccaaacaggagaag aagaagaagaagacaggtcgggctaagcggcggatgcagtacaaccggcg ctttgtcaacgttgtgcccacctttggcaagaagaagggccccaatgcca actcttaagtcttttgtaattctggctttctctaataaaaaagccactta gttcagtcaaaaaaaaaa
ASN.1, or Abstract Syntax Notation One, is an International Standards Organization (ISO) format used to achieve interoperability between platforms. ASN.1 is used for the storage and retrieval of data such as nucleotide and protein sequences, structures, genomes, and MEDLINE records. The EMBOSS format ASN1 is a subset of ASN.1 containing the entry name, accession number, description and sequence. It is similar to the current ASN.1 output of the readseq application. For further information see:
http://www.ncbi.nlm.nih.gov/Sitemap/Summary/asn1.html |
seq { id { local id 1 }, descr { title "" }, inst { repr raw, mol aa, length 131, topology linear, { seq-data iupacaa "TSPASIRPPAGPSSRPAMVSSRRTRPSPPGPRRPTGRPCCSAAPRRPQATGGWKTCSGTCT TSTSTRHRGRSGWSARTTTAACLRASRKSMRAACSRSAGSRPNRFAPTLMSSCITSTTGPPAWAGDRSHE" } } , seq { id { local id 1 }, descr { title "" }, inst { repr raw, mol aa, length 131, topology linear, { seq-data iupacaa "TSPASIRPPAGPSSR---------RPSPPGPRRPTGRPCCSAAPRRPQATGGWKTCSGTCT TSTSTRHRGRSGW----------RASRKSMRAACSRSAGSRPNRFAPTLMSSCITSTTGPPAWAGDRSHE" } } , seq { id { local id 1 }, descr { title "" }, inst { repr raw, mol aa, length 131, topology linear, { seq-data iupacaa "TSPASIRPPAGPSSRPAMVSSR--RPSPPPPRRPPGRPCCSAAPPRPQATGGWKTCSGTCT TSTSTRHRGRSGWSARTTTAACLRASRKSMRAACSR--GSRPPRFAPPLMSSCITSTTGPPPPAGDRSHE" } } , seq { id { local id 1 }, descr { title "" }, inst { repr raw, mol aa, length 131, topology linear, { seq-data iupacaa "TSPASLRPPAGPSSRPAMVSSRR-RPSPPGPRRPT----CSAAPRRPQATGGYKTCSGTCT TSTSTRHRGRSGYSARTTTAACLRASRKSMRAACSR--GSRPNRFAPTLMSSCLTSTTGPPAYAGDRSHE" } } ,
This is not a file format as such. It is a method used by EMBOSS as an alternative to specifying a filename or other sequence reference on the command line. The sequence is used directly (i.e. as-[it-]is), for example:
asis:"AALRNTY" |
The method is part of the Uniform Sequence Address (USA) specification (Section 6.6, “The Uniform Sequence Address (USA)”) used to specify sequences on the command line.
asis::MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR
Clustal multiple sequence alignment format. This is also supported for input and output of non-aligned sequences. Clustalw is a widely used program for multiple sequence alignment. For further information see:
http://www.clustal.org/ |
CLUSTAL W(1.4) multiple sequence alignment IXI_234 TSPASIRPPAGPSSRPAMVSSRRTRPSPPGPRRPTGRPCCSAAPRRPQAT IXI_235 TSPASIRPPAGPSSR---------RPSPPGPRRPTGRPCCSAAPRRPQAT IXI_236 TSPASIRPPAGPSSRPAMVSSR--RPSPPPPRRPPGRPCCSAAPPRPQAT IXI_237 TSPASLRPPAGPSSRPAMVSSRR-RPSPPGPRRPT----CSAAPRRPQAT IXI_234 GGWKTCSGTCTTSTSTRHRGRSGWSARTTTAACLRASRKSMRAACSRSAG IXI_235 GGWKTCSGTCTTSTSTRHRGRSGW----------RASRKSMRAACSRSAG IXI_236 GGWKTCSGTCTTSTSTRHRGRSGWSARTTTAACLRASRKSMRAACSR--G IXI_237 GGYKTCSGTCTTSTSTRHRGRSGYSARTTTAACLRASRKSMRAACSR--G IXI_234 SRPNRFAPTLMSSCITSTTGPPAWAGDRSHE IXI_235 SRPNRFAPTLMSSCITSTTGPPAWAGDRSHE IXI_236 SRPPRFAPPLMSSCITSTTGPPPPAGDRSHE IXI_237 SRPNRFAPTLMSSCLTSTTGPPAYAGDRSHE
CODATA format used by various information systems and software tools at the Lawrence Berkeley National Laboratory (LBNL). For further information see:
http://merrill.olm.net/mdocs/seedis/codata.html |
Sequence files in CODATA format may contain multiple sequences. A sequence entry begins with a line with the text ENTRY
followed by the sequence ID. A line with SEQUENCE
followed by a sequence numbering line marks the start of the sequence, given on the next and subsequent lines. A sequence entry ends with a line containing '///
' only.
ENTRY IXI_234 SEQUENCE 5 10 15 20 25 30 1 T S P A S I R P P A G P S S R P A M V S S R R T R P S P P G 31 P R R P T G R P C C S A A P R R P Q A T G G W K T C S G T C 61 T T S T S T R H R G R S G W S A R T T T A A C L R A S R K S 91 M R A A C S R S A G S R P N R F A P T L M S S C I T S T T G 121 P P A W A G D R S H E /// ENTRY IXI_235 SEQUENCE 5 10 15 20 25 30 1 T S P A S I R P P A G P S S R - - - - - - - - - R P S P P G 31 P R R P T G R P C C S A A P R R P Q A T G G W K T C S G T C 61 T T S T S T R H R G R S G W - - - - - - - - - - R A S R K S 91 M R A A C S R S A G S R P N R F A P T L M S S C I T S T T G 121 P P A W A G D R S H E /// ENTRY IXI_236 SEQUENCE 5 10 15 20 25 30 1 T S P A S I R P P A G P S S R P A M V S S R - - R P S P P P 31 P R R P P G R P C C S A A P P R P Q A T G G W K T C S G T C 61 T T S T S T R H R G R S G W S A R T T T A A C L R A S R K S 91 M R A A C S R - - G S R P P R F A P P L M S S C I T S T T G 121 P P P P A G D R S H E /// ENTRY IXI_237 SEQUENCE 5 10 15 20 25 30 1 T S P A S L R P P A G P S S R P A M V S S R R - R P S P P G 31 P R R P T - - - - C S A A P R R P Q A T G G Y K T C S G T C 61 T T S T S T R H R G R S G Y S A R T T T A A C L R A S R K S 91 M R A A C S R - - G S R P N R F A P T L M S S C L T S T T G 121 P P A Y A G D R S H E ///
DAS format is for output only. It conforms to the output of a DAS Distributed Annotation System version 1.53 annotation server. DAS is an XML format for sequence data. For more information see:
http://www.acedb.org/ |
<?xml version="1.0" standalone="no"?> <!DOCTYPE DASSEQUENCE SYSTEM "http://www.biodas.org/dtd/dassequence.dtd"> <DASSEQUENCE> <SEQUENCE id="X65923" start="1" stop="518" moltype="DNA" version="X65923.1"> ttcctctttctcgactccatcttcgcggtagctgggaccgccgttcagtc gccaatatgcagctctttgtccgcgcccaggagctacacaccttcgaggt gaccggccaggaaacggtcgcccagatcaaggctcatgtagcctcactgg agggcattgccccggaagatcaagtcgtgctcctggcaggcgcgcccctg gaggatgaggccactctgggccagtgcggggtggaggccctgactaccct ggaagtagcaggccgcatgcttggaggtaaagttcatggttccctggccc gtgctggaaaagtgagaggtcagactcctaaggtggccaaacaggagaag aagaagaagaagacaggtcgggctaagcggcggatgcagtacaaccggcg ctttgtcaacgttgtgcccacctttggcaagaagaagggccccaatgcca actcttaagtcttttgtaattctggctttctctaataaaaaagccactta gttcagtcaaaaaaaaaa </SEQUENCE> </DASSEQUENCE>
DAS format is for output only. It conforms to the output of a DAS Distributed Annotation System version 1.53 sequence server. DASDNA is an XML format for sequence data. For more information see:
http://www.acedb.org/ |
<?xml version="1.0" standalone="no"?> <!DOCTYPE DASDNA SYSTEM "http://www.biodas.org/dtd/dasdna.dtd"> <DASDNA> <SEQUENCE id="X65923" start="1" stop="518" version="X65923.1"> <DNA length="518"> ttcctctttctcgactccatcttcgcggtagctgggaccgccgttcagtc gccaatatgcagctctttgtccgcgcccaggagctacacaccttcgaggt gaccggccaggaaacggtcgcccagatcaaggctcatgtagcctcactgg agggcattgccccggaagatcaagtcgtgctcctggcaggcgcgcccctg gaggatgaggccactctgggccagtgcggggtggaggccctgactaccct ggaagtagcaggccgcatgcttggaggtaaagttcatggttccctggccc gtgctggaaaagtgagaggtcagactcctaaggtggccaaacaggagaag aagaagaagaagacaggtcgggctaagcggcggatgcagtacaaccggcg ctttgtcaacgttgtgcccacctttggcaagaagaagggccccaatgcca actcttaagtcttttgtaattctggctttctctaataaaaaagccactta gttcagtcaaaaaaaaaa </DNA> </SEQUENCE> </DASDNA>
Debug format is for debugging purposes only. All elements in the internal data structures used to hold a sequence are printed out, although not all fields in the output file will contain data. The data generated depends very much on the input format used. No example is given below. For further information see the EMBOSS Developers Guide.
EMBL entry format, or at least a minimal subset of the fields. EMBL is the public nucleotide sequence database and includes all publicly available DNA sequences with their annotation. EMBL is part of the International Nucleotide Sequence Database Collaboration, which is comprised of the European Molecular Biology Laboratory (EMBL), DNA DataBank of Japan (DDBJ), and GenBank at the National Center for Biotechnology Information. Daily sharing of data is the basis of the collaboration. The Staden package and many others use EMBL or similar formats for sequence data.
An EMBL entry includes a code to identify the sequence, the sequence itself and a table of features of biological interest such as coding regions with their protein translations, repeats and functional sites. Bibliographic information such as literature references, experimental details, author contact information, cross-links to other databases is included. For further information see:
http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html |
Where EMBL format is used for input, the EMBL format will be kept in step with the latest version of the database although data in older EMBL formats will be still parsable.
Where EMBL format is used for output, fields for which data are available will be completed and others with no information are omitted. Exactly what data will be present depends very much on the source of input sequences. The EMBOSS command line allows data, such as accession numbers, to be provided if they do not form part of the input sequence data (see Section 6.4, “Datatype-specific Command Line Qualifiers”).
ID X65923; SV 1; linear; mRNA; STD; HUM; 518 BP. XX AC X65923; XX DT 13-MAY-1992 (Rel. 31, Created) DT 18-APR-2005 (Rel. 83, Last updated, Version 11) XX DE H.sapiens fau mRNA XX KW fau gene. XX OS Homo sapiens (human) OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; OC Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; OC Homo. XX RN [1] RP 1-518 RA Michiels L.M.R.; RT ; RL Submitted (29-APR-1992) to the EMBL/GenBank/DDBJ databases. RL L.M.R. Michiels, University of Antwerp, Dept of Biochemistry, RL Universiteisplein 1, 2610 Wilrijk, BELGIUM XX RN [2] RP 1-518 RX PUBMED; 8395683. RA Michiels L., Van der Rauwelaert E., Van Hasselt F., Kas K., Merregaert J.; RT "fau cDNA encodes a ubiquitin-like-S30 fusion protein and is expressed as RT an antisense sequence in the Finkel-Biskis-Reilly murine sarcoma virus"; RL Oncogene 8(9):2537-2546(1993). XX DR H-InvDB; HIT000322806. XX FH Key Location/Qualifiers FH FT source 1..518 FT /organism="Homo sapiens" FT /chromosome="11q" FT /map="13" FT /mol_type="mRNA" FT /clone_lib="cDNA" FT /clone="pUIA 631" FT /tissue_type="placenta" FT /db_xref="taxon:9606" FT misc_feature 57..278 FT /note="ubiquitin like part" FT CDS 57..458 FT /gene="fau" FT /db_xref="GDB:135476" FT /db_xref="GOA:P62861" FT /db_xref="HGNC:3597" FT /db_xref="HSSP:1GJZ" FT /db_xref="InterPro:IPR006846" FT /db_xref="UniProtKB/Swiss-Prot:P35544" FT /protein_id="CAA46716.1" FT /translation="MQLFVRAQELHTFEVTGQETVAQIKAHVASLEGIAPEDQVVLLAG FT APLEDEATLGQCGVEALTTLEVAGRMLGGKVHGSLARAGKVRGQTPKVAKQEKKKKKTG FT RAKRRMQYNRRFVNVVPTFGKKKGPNANS" FT misc_feature 98..102 FT /note="nucleolar localization signal" FT misc_feature 279..458 FT /note="S30 part" FT polyA_signal 484..489 FT polyA_site 509 XX SQ Sequence 518 BP; 125 A; 139 C; 148 G; 106 T; 0 other; ttcctctttc tcgactccat cttcgcggta gctgggaccg ccgttcagtc gccaatatgc 60 agctctttgt ccgcgcccag gagctacaca ccttcgaggt gaccggccag gaaacggtcg 120 cccagatcaa ggctcatgta gcctcactgg agggcattgc cccggaagat caagtcgtgc 180 tcctggcagg cgcgcccctg gaggatgagg ccactctggg ccagtgcggg gtggaggccc 240 tgactaccct ggaagtagca ggccgcatgc ttggaggtaa agttcatggt tccctggccc 300 gtgctggaaa agtgagaggt cagactccta aggtggccaa acaggagaag aagaagaaga 360 agacaggtcg ggctaagcgg cggatgcagt acaaccggcg ctttgtcaac gttgtgccca 420 cctttggcaa gaagaagggc cccaatgcca actcttaagt cttttgtaat tctggctttc 480 tctaataaaa aagccactta gttcagtcaa aaaaaaaa 518 //
The format used by the latest version of the Staden package. Staden is package of programs for sequence handling and analysis that is particularly useful for analysis of sequence trace data and large scale sequence assembly projects. For further information see:
http://staden.sourceforge.net/ |
ID xb63c7.s1 EN xb63c7.s1 LN xb63c7.s1.ztr LT ZTR QR 440 AQ 29.960000 AV 42 42 42 42 42 42 42 25 27 31 34 33 38 39 40 40 40 38 37 37 37 AV 38 38 38 38 37 37 38 38 38 38 37 38 38 39 39 39 38 37 38 AV 38 39 39 39 39 39 39 40 40 40 39 39 39 40 39 39 39 39 39 AV 39 37 37 37 37 37 36 36 37 36 36 36 34 34 34 34 37 37 37 AV 37 37 37 37 37 36 34 36 36 36 36 37 34 32 31 32 31 33 33 AV 31 31 33 30 29 29 27 27 30 32 33 31 32 31 31 33 33 32 36 AV 36 36 36 36 36 33 33 34 34 36 34 33 34 32 32 33 31 31 32 AV 29 30 31 30 31 30 31 26 26 30 25 26 27 26 25 26 26 25 27 AV 27 22 22 26 26 25 27 27 27 30 27 29 30 30 27 27 29 29 29 AV 29 29 30 29 29 29 30 30 29 29 30 24 29 30 30 30 30 30 31 AV 30 31 32 30 33 34 34 34 36 34 34 33 33 33 33 34 27 26 27 AV 26 26 25 26 27 27 27 29 30 29 29 29 34 36 30 31 30 24 25 AV 25 24 24 25 25 25 22 22 22 22 27 27 27 33 34 34 34 33 33 AV 33 33 36 36 34 34 32 33 30 29 29 30 31 31 30 29 29 29 29 AV 29 29 29 29 32 34 34 32 29 27 30 30 30 29 29 29 29 30 29 AV 29 23 23 25 26 26 25 25 24 26 27 26 25 25 26 26 24 22 22 AV 23 24 24 24 25 24 24 24 24 25 21 21 25 27 27 27 27 27 23 AV 22 22 22 22 23 23 27 22 21 22 21 22 22 21 22 23 24 24 21 AV 22 22 17 21 22 21 21 19 20 17 19 18 18 18 19 19 16 19 16 AV 16 15 16 16 13 16 14 13 12 12 13 12 11 4 11 10 10 10 10 AV 4 4 11 11 11 10 10 10 12 12 11 11 10 9 9 8 8 8 4 4 8 9 AV 9 4 9 9 9 10 12 13 12 13 4 16 15 17 17 4 17 15 18 19 19 AV 20 17 19 19 20 16 4 39 4 39 39 39 4 39 SQ CAGGTTCGAC TCTAGAGGAT CCCCTGAAAT ATTAAAACTA AAATGTGTAT AATAAAAATT GTATACCAAT TTCAGTGATA AATAATTTAT TTTATAGAAA AAAGAAGAAC AAAGCTGATG ATTAAAACTG AACTCGATTT TCTGATTGGA AGAACTTGTA CCAATCGATG ATATGAGATG TTAAAAACTG GAATTGATAT TTAACCGATT GAACCTGAAT GAAAAACAAC GGACCTGAAA ATTAAATTAT TATTTTAAAT TGACATTTTG AAAATTTCCC CCGTAATTTT ATTGCAATTT TAATTGAAAG TTTATTAATT GTGAAATGTG CTTTTTAAGA TGTTGCAAAC ACCTAATTAC TATTTTCACT TTTGAG-TAT GT--ATTTTC TAAATAACTT --GGT-TGAT TTCC-AATT- TAATTTTCAA AAG-CCA-G // SF /home6/jkb/work/course/t/m13mp18.vector CF /home6/jkb/work/course/t/lorist6.vector TN xb63c7 PR 1 SC 6249 SP 41 SI 1400..2000 CH 0 QR 377 QL 0 SL 24 SR 440
Standard FASTA format. FASTA is probably the most widely used of all sequence formats and may hold multiple sequences. It was originally defined as sequence query input to the FASTA homology search program. For further information see:
http://fasta.bioch.virginia.edu/ |
A sequence entry has a one-line header line followed by one or more lines of sequence data. The header line begins with the '>
' character. The next word on the line is interpreted as sequence-specific code, such as a database identifier (ID) or accession number. The rest of the header line is (typically) a description of the sequence. Some versions use control-A
to mark line breaks in the description. These are ignored by EMBOSS and most other applications. The sequence itself is given on the lines following which are recommended to be shorter than 80 characters. Blank lines are ignored.
In practice there are many different styles of identifier and description in use. EMBOSS supports the following styles in which ID
is the sequence identifier, Accession
is the accession number, given (optionally) after the ID, and Description
is the rest of the header line e.g.:
>ID Description >ID Accession Description
Sequence output specified to be in fasta format will use the 'FASTA (with accession)' variant, shown below.
GCG-style FASTA format. An optional database name (Database
) may be included as part of the sequence identifier:
>Database:ID Accession Description
>embl:X65923 X65923.1 H.sapiens fau mRNA ttcctctttctcgactccatcttcgcggtagctgggaccgccgttcagtcgccaatatgc agctctttgtccgcgcccaggagctacacaccttcgaggtgaccggccaggaaacggtcg cccagatcaaggctcatgtagcctcactggagggcattgccccggaagatcaagtcgtgc tcctggcaggcgcgcccctggaggatgaggccactctgggccagtgcggggtggaggccc tgactaccctggaagtagcaggccgcatgcttggaggtaaagttcatggttccctggccc gtgctggaaaagtgagaggtcagactcctaaggtggccaaacaggagaagaagaagaaga agacaggtcgggctaagcggcggatgcagtacaaccggcgctttgtcaacgttgtgccca cctttggcaagaagaagggccccaatgccaactcttaagtcttttgtaattctggctttc tctaataaaaaagccacttagttcagtcaaaaaaaaaa
This is standard FASTA format where the identifier is taken as-is without any parsing. This allows users to keep the original identifier in cases where EMBOSS would normally interpret it using one of the other FASTA styles. To use this format, you must explicitly specify "pearson" as the sequence format.
>gnl|em|HSFAU X65923 H.sapiens fau mRNA ttcctctttctcgactccatcttcgcggtagctgggaccgccgttcagtcgccaatatgc agctctttgtccgcgcccaggagctacacaccttcgaggtgaccggccaggaaacggtcg cccagatcaaggctcatgtagcctcactggagggcattgccccggaagatcaagtcgtgc tcctggcaggcgcgcccctggaggatgaggccactctgggccagtgcggggtggaggccc tgactaccctggaagtagcaggccgcatgcttggaggtaaagttcatggttccctggccc gtgctggaaaagtgagaggtcagactcctaaggtggccaaacaggagaagaagaagaaga agacaggtcgggctaagcggcggatgcagtacaaccggcgctttgtcaacgttgtgccca cctttggcaagaagaagggccccaatgccaactcttaagtcttttgtaattctggctttc tctaataaaaaagccacttagttcagtcaaaaaaaaaa
This is standard FASTA format but with the accession number or sequence version included after the identifier. The first word of the description is accepted as an accession number and/or sequence version if it matches a recognized format, for example EMBL/GenBank or SwissProt accession numbers.
The format is detected with other FASTA formats on input, and the accession number is automatically added by EMBOSS where it is available.
>X65923 X65923.1 H.sapiens fau mRNA ttcctctttctcgactccatcttcgcggtagctgggaccgccgttcagtcgccaatatgc agctctttgtccgcgcccaggagctacacaccttcgaggtgaccggccaggaaacggtcg cccagatcaaggctcatgtagcctcactggagggcattgccccggaagatcaagtcgtgc tcctggcaggcgcgcccctggaggatgaggccactctgggccagtgcggggtggaggccc tgactaccctggaagtagcaggccgcatgcttggaggtaaagttcatggttccctggccc gtgctggaaaagtgagaggtcagactcctaaggtggccaaacaggagaagaagaagaaga agacaggtcgggctaagcggcggatgcagtacaaccggcgctttgtcaacgttgtgccca cctttggcaagaagaagggccccaatgccaactcttaagtcttttgtaattctggctttc tctaataaaaaagccacttagttcagtcaaaaaaaaaa
This is a derivative FASTA format with the database name given first in the description line, followed (optionally) by the accession number:
>Database Description >Database Accession Description
The format is detected with other FASTA formats on input, but cannot be specified for output.
>embl:X65923 X65923.1 H.sapiens fau mRNA ttcctctttctcgactccatcttcgcggtagctgggaccgccgttcagtcgccaatatgc agctctttgtccgcgcccaggagctacacaccttcgaggtgaccggccaggaaacggtcg cccagatcaaggctcatgtagcctcactggagggcattgccccggaagatcaagtcgtgc tcctggcaggcgcgcccctggaggatgaggccactctgggccagtgcggggtggaggccc tgactaccctggaagtagcaggccgcatgcttggaggtaaagttcatggttccctggccc gtgctggaaaagtgagaggtcagactcctaaggtggccaaacaggagaagaagaagaaga agacaggtcgggctaagcggcggatgcagtacaaccggcgctttgtcaacgttgtgccca cctttggcaagaagaagggccccaatgccaactcttaagtcttttgtaattctggctttc tctaataaaaaagccacttagttcagtcaaaaaaaaaa
Same as FASTA (NCBI style) except that the sequence GI code is given instead of the entry ID in the description line. The description line contains the entry GI, database name, and optional accession number separated by pipe ('|
') characters:
>gi|12345|gnl|Database|Accession Description
The format is detected with other FASTA formats on input, and can be specified as "gifasta" for output.
>gi|31302|gnl|genbank|X65923 (X65923.1) H.sapiens fau mRNA. TTCCTCTTTCTCGACTCCATCTTCGCGGTAGCTGGGACCGCCGTTCAGTCGCCAATATGC AGCTCTTTGTCCGCGCCCAGGAGCTACACACCTTCGAGGTGACCGGCCAGGAAACGGTCG CCCAGATCAAGGCTCATGTAGCCTCACTGGAGGGCATTGCCCCGGAAGATCAAGTCGTGC TCCTGGCAGGCGCGCCCCTGGAGGATGAGGCCACTCTGGGCCAGTGCGGGGTGGAGGCCC TGACTACCCTGGAAGTAGCAGGCCGCATGCTTGGAGGTAAAGTTCATGGTTCCCTGGCCC GTGCTGGAAAAGTGAGAGGTCAGACTCCTAAGGTGGCCAAACAGGAGAAGAAGAAGAAGA AGACAGGTCGGGCTAAGCGGCGGATGCAGTACAACCGGCGCTTTGTCAACGTTGTGCCCA CCTTTGGCAAGAAGAAGGGCCCCAATGCCAACTCTTAAGTCTTTTGTAATTCTGGCTTTC TCTAATAAAAAAGCCACTTAGTTCAGTCAAAAAAAAAA
FASTA format with an NCBI-style description line with the database name, entry ID and optional accession or sequence version number separated by pipe ('|
') characters in the description line. NCBI has a very limited vocabulary of approved database names which can be used is exactly that database name was used or specified with the -osdbname
qualifier.
>gnl|Database|Accession Description >Database|Seqversion|ID Description
The format is detected with other FASTA formats on input, and can be specified as "ncbi" for output.
There are many variants on this theme. If you find one that does not appear to be supported please let the EMBOSS developers know.
>emb|X65923.1|X65923 H.sapiens fau mRNA ttcctctttctcgactccatcttcgcggtagctgggaccgccgttcagtcgccaatatgc agctctttgtccgcgcccaggagctacacaccttcgaggtgaccggccaggaaacggtcg cccagatcaaggctcatgtagcctcactggagggcattgccccggaagatcaagtcgtgc tcctggcaggcgcgcccctggaggatgaggccactctgggccagtgcggggtggaggccc tgactaccctggaagtagcaggccgcatgcttggaggtaaagttcatggttccctggccc gtgctggaaaagtgagaggtcagactcctaaggtggccaaacaggagaagaagaagaaga agacaggtcgggctaagcggcggatgcagtacaaccggcgctttgtcaacgttgtgccca cctttggcaagaagaagggccccaatgccaactcttaagtcttttgtaattctggctttc tctaataaaaaagccacttagttcagtcaaaaaaaaaa
Fastq is a new format created for the storing of very large number of short reads from next-generation sequencing instruments. There are a number of variants of fastq format which have to be explicitly named on the command line as there is no reliable method to detect the precise format automatically. These formats are described in detail below. All the variations are in the formatting of the quality scores, so EMBOSS also supports fastq as a simple sequence format, ignoring the quality scores, on input. When used on output, fastq is an alias for the fastq-sanger format as this is the one most commonly used.
Sequence output specified to be in fastq format will use the 'FASTQ (Sanger) style' variant, shown below.
Fastq-illumina supports the Illumina 1.3 standard for representing phred quality scores encoded as one character per base. This format must be specified explicitly on the command line as data files are also valid as fastq-sanger format which allows all possible characters as quality scores.
@FASTQ-ILL100R:1:2:3:4#0/1 ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTA + hgfedcba`_^]\[ZYXWVUTSRQPONMLKJIHGFEDCBA@
Fastq-sanger supports the Sanger Institute standard for representing phred quality scores encoded as one character per base. This format must be specified explicitly on the command line as data files are also valid as fastq-illumina or fastq-solexa format which accept many of the same characters as quality scores.
As an output format this is the current EMBOSS default if "fastq" is specified.
@FASTQ-SAN100R:1:2:3:4#0/1 ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTAC + ~}|{zyxwvutsrqponmlkjihgfedcba`_^]\[ZYXWVUTSRQPONMLKJIHGFEDCBA@?>=<;:9876543210/.-,+*)('&%$#"!
Fastq-solexa supports the Illumina 1.0 or Solexa standard for representing solexa quality scores encoded as one character per base. This format must be specified explicitly on the command line as data files are also valid as fastq-sanger format which allows all possible characters as quality scores.
@FASTQ-SLX100R:1:2:3:4#0/1 ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTAC + hgfedcba`_^]\[ZYXWVUTSRQPONMLKJIHGFEDCBA@?>=<;
Fitch is an old format, not in common use, for phylogenetic analysis by Walter Fitch's programs. It is one of the formats originally adopted from the readseq program.
X65923, 518 bases ttc ctc ttt ctc gac tcc atc ttc gcg gta gct ggg acc gcc gtt cag tcg cca ata tgc agc tct ttg tcc gcg ccc agg agc tac aca cct tcg agg tga ccg gcc agg aaa cgg tcg ccc aga tca agg ctc atg tag cct cac tgg agg gca ttg ccc cgg aag atc aag tcg tgc tcc tgg cag gcg cgc ccc tgg agg atg agg cca ctc tgg gcc agt gcg ggg tgg agg ccc tga cta ccc tgg aag tag cag gcc gca tgc ttg gag gta aag ttc atg gtt ccc tgg ccc gtg ctg gaa aag tga gag gtc aga ctc cta agg tgg cca aac agg aga aga aga aga aga aga cag gtc ggg cta agc ggc gga tgc agt aca acc ggc gct ttg tca acg ttg tgc cca cct ttg gca aga aga agg gcc cca atg cca act ctt aag tct ttt gta att ctg gct ttc tct aat aaa aaa gcc act tag ttc agt caa aaa aaa aa
GCG format was used by the Accelrys GCG, formerly known as the GCG Wisconsin package. GCG was a commercial software package of programs and utilities for gene and protein analysis. For further information see:
http://www.accelrys.com/products/gcg/ |
A sequence file in GCG format must contain a single sequence only. Such files begin with one or more description lines with informative text about the file contents. The start of the sequence is marked by a line ending with two dot (..
) characters. This line typically also gives the sequence length, the date when the file was created, the type of the sequence and the GCG checksum. The dots delimit the descriptive information from the sequence data that follows. In GCG 9.x and GCG 10.x formats, the format and sequence type are identified on the first line of the file. In GCG 8.x format, anything up to the first line containing ..
is considered as heading, and the remainder is sequence data.
When GCG format is specified as the input, EMBOSS will first assume the later format (GCG 9.x and 10.x) before trying with the GCG 8 format. When GCG format is specified for output, the later format will be generated regardless of whether GCG or GCG8 was specified.
Latest format:
!!NA_SEQUENCE 1.0 H.sapiens fau mRNA HSFAU Length: 518 Type: N Check: 2981 .. 1 ttcctctttc tcgactccat cttcgcggta gctgggaccg ccgttcagtc 51 gccaatatgc agctctttgt ccgcgcccag gagctacaca ccttcgaggt 101 gaccggccag gaaacggtcg cccagatcaa ggctcatgta gcctcactgg 151 agggcattgc cccggaagat caagtcgtgc tcctggcagg cgcgcccctg 201 gaggatgagg ccactctggg ccagtgcggg gtggaggccc tgactaccct 251 ggaagtagca ggccgcatgc ttggaggtaa agttcatggt tccctggccc 301 gtgctggaaa agtgagaggt cagactccta aggtggccaa acaggagaag 351 aagaagaaga agacaggtcg ggctaagcgg cggatgcagt acaaccggcg 401 ctttgtcaac gttgtgccca cctttggcaa gaagaagggc cccaatgcca 451 actcttaagt cttttgtaat tctggctttc tctaataaaa aagccactta 501 gttcagtcaa aaaaaaaa
GCG 8 format:
H.sapiens fau mRNA HSFAU Length: 518 Type: N Check: 2981 .. 1 ttcctctttc tcgactccat cttcgcggta gctgggaccg ccgttcagtc 51 gccaatatgc agctctttgt ccgcgcccag gagctacaca ccttcgaggt 101 gaccggccag gaaacggtcg cccagatcaa ggctcatgta gcctcactgg 151 agggcattgc cccggaagat caagtcgtgc tcctggcagg cgcgcccctg 201 gaggatgagg ccactctggg ccagtgcggg gtggaggccc tgactaccct 251 ggaagtagca ggccgcatgc ttggaggtaa agttcatggt tccctggccc 301 gtgctggaaa agtgagaggt cagactccta aggtggccaa acaggagaag 351 aagaagaaga agacaggtcg ggctaagcgg cggatgcagt acaaccggcg 401 ctttgtcaac gttgtgccca cctttggcaa gaagaagggc cccaatgcca 451 actcttaagt cttttgtaat tctggctttc tctaataaaa aagccactta 501 gttcagtcaa aaaaaaaa
GenBank entry format supports all the fields in the latest database format. GENBANK is part of the International Nucleotide Sequence Database Collaboration and uses the same content as the EMBL and DDBJ databases. The format is described in greater detail at:
http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html |
Where GenBank format is used for output, fields for which data are available will be completed and others with no information will be omitted. Exactly what data will be present depends very much on the source of input sequences. The EMBOSS command line allows data, such as accession numbers, to be provided if they do not form part of the input sequence data (see Section 6.4, “Datatype-specific Command Line Qualifiers”).
LOCUS X65923 518 bp mRNA linear PRI 18-APR-2005 DEFINITION H.sapiens fau mRNA. ACCESSION X65923 VERSION X65923.1 GI:31302 KEYWORDS fau gene. SOURCE Homo sapiens (human) ORGANISM Homo sapiens Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo. REFERENCE 1 (bases 1 to 518) AUTHORS Michiels,L., Van der Rauwelaert,E., Van Hasselt,F., Kas,K. and Merregaert,J. TITLE fau cDNA encodes a ubiquitin-like-S30 fusion protein and is expressed as an antisense sequence in the Finkel-Biskis-Reilly murine sarcoma virus JOURNAL Oncogene 8 (9), 2537-2546 (1993) PUBMED 8395683 REFERENCE 2 (bases 1 to 518) AUTHORS Michiels,L.M.R. TITLE Direct Submission JOURNAL Submitted (29-APR-1992) L.M.R. Michiels, University of Antwerp, Dept of Biochemistry, Universiteisplein 1, 2610 Wilrijk, BELGIUM FEATURES Location/Qualifiers source 1. .518 /organism="Homo sapiens" /mol_type="mRNA" /db_xref="taxon:9606" /chromosome="11q" /map="13" /clone="pUIA 631" /tissue_type="placenta" /clone_lib="cDNA" gene 1. .518 /gene="fau" CDS 57. .458 /gene="fau" /codon_start=1 /protein_id="CAA46716.1" /db_xref="GI:31303" /db_xref="GDB:135476" /db_xref="GOA:P35544" /db_xref="GOA:P62861" /db_xref="GOA:Q05472" /db_xref="HGNC:3597" /db_xref="UniProtKB/Swiss-Prot:P35544" /db_xref="UniProtKB/Swiss-Prot:P62861" /translation="MQLFVRAQELHTFEVTGQETVAQIKAHVASLEGIAPEDQVVLLA GAPLEDEATLGQCGVEALTTLEVAGRMLGGKVHGSLARAGKVRGQTPKVAKQEKKKKK TGRAKRRMQYNRRFVNVVPTFGKKKGPNANS" misc_feature 57. .278 /gene="fau" /note="ubiquitin like part" misc_feature 98. .102 /gene="fau" /note="nucleolar localization signal" misc_feature 279. .458 /gene="fau" /note="S30 part" polyA_signal 484. .489 /gene="fau" polyA_site 509 /gene="fau" ORIGIN 1 TTCCTCTTTC TCGACTCCAT CTTCGCGGTA GCTGGGACCG CCGTTCAGTC GCCAATATGC 61 AGCTCTTTGT CCGCGCCCAG GAGCTACACA CCTTCGAGGT GACCGGCCAG GAAACGGTCG 121 CCCAGATCAA GGCTCATGTA GCCTCACTGG AGGGCATTGC CCCGGAAGAT CAAGTCGTGC 181 TCCTGGCAGG CGCGCCCCTG GAGGATGAGG CCACTCTGGG CCAGTGCGGG GTGGAGGCCC 241 TGACTACCCT GGAAGTAGCA GGCCGCATGC TTGGAGGTAA AGTTCATGGT TCCCTGGCCC 301 GTGCTGGAAA AGTGAGAGGT CAGACTCCTA AGGTGGCCAA ACAGGAGAAG AAGAAGAAGA 361 AGACAGGTCG GGCTAAGCGG CGGATGCAGT ACAACCGGCG CTTTGTCAAC GTTGTGCCCA 421 CCTTTGGCAA GAAGAAGGGC CCCAATGCCA ACTCTTAAGT CTTTTGTAAT TCTGGCTTTC 481 TCTAATAAAA AAGCCACTTA GTTCAGTCAA AAAAAAAA //
GenPept entry format supports all the fields in the latest database format. GENPEPT is an automatic translation of the GenBank. The database is available from:
ftp://ftp.ncifcrf.gov/pub/genpept |
Where GenPept format is used for output, fields for which data are available will be completed and others with no information will be omitted. Exactly what data will be present depends very much on the source of input sequences. The EMBOSS command line allows data, such as accession numbers, to be provided if they do not form part of the input sequence data (see Section 6.4, “Datatype-specific Command Line Qualifiers”).
GenPept currently uses the same parser as the closely related RefseqP format so these can be used interchangeably until the original formats diverge.
LOCUS CAA46716 133 aa linear PRI 18-APR-2005 DEFINITION fau [Homo sapiens]. ACCESSION CAA46716 VERSION CAA46716.1 GI:31303 DBSOURCE embl accession X65923.1 KEYWORDS . SOURCE Homo sapiens (human) ORGANISM Homo sapiens Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo. REFERENCE 1 (residues 1 to 133) AUTHORS Michiels,L., Van der Rauwelaert,E., Van Hasselt,F., Kas,K. and Merregaert,J. TITLE fau cDNA encodes a ubiquitin-like-S30 fusion protein and is expressed as an antisense sequence in the Finkel-Biskis-Reilly murine sarcoma virus JOURNAL Oncogene 8 (9), 2537-2546 (1993) PUBMED 8395683 REFERENCE 2 (residues 1 to 133) AUTHORS Michiels,L.M.R. TITLE Direct Submission JOURNAL Submitted (29-APR-1992) L.M.R. Michiels, University of Antwerp, Dept of Biochemistry, Universiteisplein 1, 2610 Wilrijk, BELGIUM FEATURES Location/Qualifiers source 1..133 /organism="Homo sapiens" /db_xref="taxon:9606" /chromosome="11q" /map="13" /clone="pUIA 631" /tissue_type="placenta" /clone_lib="cDNA" Protein 1..133 /name="fau" Region 1..74 /region_name="Fubi" /note="Fubi is a ubiquitin-like protein encoded by the fau gene which has an N-terminal ubiquitin-like domain (also referred to as FUBI) fused to the ribosomal protein S30. Fubi is thought to be a tumor suppressor protein and the FUBI domain may act as a...; cd01793" /db_xref="CDD:29195" Region 74..133 /region_name="Ribosomal_S30" /note="Ribosomal protein S30; cl02062" /db_xref="CDD:141357" CDS 1..133 /gene="fau" /coded_by="X65923.1:57..458" /db_xref="GDB:135476" /db_xref="GOA:P35544" /db_xref="GOA:P62861" /db_xref="GOA:Q05472" /db_xref="HGNC:3597" /db_xref="UniProtKB/Swiss-Prot:P35544" /db_xref="UniProtKB/Swiss-Prot:P62861" ORIGIN 1 mqlfvraqel htfevtgqet vaqikahvas legiapedqv vllagapled eatlgqcgve 61 alttlevagr mlggkvhgsl aragkvrgqt pkvakqekkk kktgrakrrm qynrrfvnvv 121 ptfgkkkgpn ans //
The General Feature Format version 3 (GFF3) is an extension by Lincoln Stein and the Sequence Ontology project of the GFF2 format developed at the Sanger Institute for describing genes and other features associated with DNA, RNA and protein sequences (see Section A.2, “Supported Feature Formats”). GFF format is normally used to hold pure feature information only, but can hold the sequence immediately after the feature table. A complete specification of the format is available at:
http://www.sequenceontology.org/gff3.shtml |
##gff-version 3 ##sequence-region HSFAU 1 518 #!Date 2009-07-29 #!Type DNA #!Source-version EMBOSS 6.1.0 ##FASTA >HSFAU X65923 H.sapiens fau mRNA ttcctctttctcgactccatcttcgcggtagctgggaccgccgttcagtcgccaatatgc agctctttgtccgcgcccaggagctacacaccttcgaggtgaccggccaggaaacggtcg cccagatcaaggctcatgtagcctcactggagggcattgccccggaagatcaagtcgtgc tcctggcaggcgcgcccctggaggatgaggccactctgggccagtgcggggtggaggccc tgactaccctggaagtagcaggccgcatgcttggaggtaaagttcatggttccctggccc gtgctggaaaagtgagaggtcagactcctaaggtggccaaacaggagaagaagaagaaga agacaggtcgggctaagcggcggatgcagtacaaccggcgctttgtcaacgttgtgccca cctttggcaagaagaagggccccaatgccaactcttaagtcttttgtaattctggctttc tctaataaaaaagccacttagttcagtcaaaaaaaaaa
The General Feature Format version 2 (GFF) is a format developed at the Sanger Institute for describing genes and other features associated with DNA, RNA and protein sequences (see Section A.2, “Supported Feature Formats”). GFF format is normally used to hold pure feature information only, but can hold the sequence as part of the structured header. A complete specification of the format is available at:
http://www.sanger.ac.uk/Software/formats/GFF/GFF_Spec.shtml |
##gff-version 2 ##source-version EMBOSS 6.1.0 ##date 2009-07-29 ##DNA HSFAU ##ttcctctttctcgactccatcttcgcggtagctgggaccgccgttcagtcgccaatatgc ##agctctttgtccgcgcccaggagctacacaccttcgaggtgaccggccaggaaacggtcg ##cccagatcaaggctcatgtagcctcactggagggcattgccccggaagatcaagtcgtgc ##tcctggcaggcgcgcccctggaggatgaggccactctgggccagtgcggggtggaggccc ##tgactaccctggaagtagcaggccgcatgcttggaggtaaagttcatggttccctggccc ##gtgctggaaaagtgagaggtcagactcctaaggtggccaaacaggagaagaagaagaaga ##agacaggtcgggctaagcggcggatgcagtacaaccggcgctttgtcaacgttgtgccca ##cctttggcaagaagaagggccccaatgccaactcttaagtcttttgtaattctggctttc ##tctaataaaaaagccacttagttcagtcaaaaaaaaaa ##end-DNA
Hennig86 is used by the Hennig86 package for interactive phylogenetic analysis. It allows phylogenetic trees to be read, written, drawn and analysed, and the most parsimonious trees to be calculated. For further information see:
http://www.cladistics.org/education/hennig86.html |
xread ' Written by EMBOSS 29/07/09 ' 518 1 X65923 11331311131320313301311323221023122203323321130213233001012302313111213323233302202310303033113202212033223302200032213233302013002231301210233130312202223011233332200201300213212313312230223232333312202201202233031312223302123222212202233312031033312200210230223323012311220221000211301221133312233321231220000212020221302031331002212233000302202002002002002002030221322231002322322012302103003322323111213003211212333033111223002002002223333001233003131100213111121001131223111313100100000023303110211302130000000000 ;
Intelligenetics format is used by Intelligenetics, an old sequence analysis package which is no longer under development. It is also used by various other packages as it is a relatively simple format. It has the problem that non-sequence file can sometimes look like Intelligenetics files. EMBOSS has a format "igstrict" which requires full compliance with this format. Less strict files, for example files lacking a 1 at the end of the sequence, can be read only if "ig" is specified as the format on the command line.
;H.sapiens fau mRNA, 518 bases HSFAU ttcctctttctcgactccatcttcgcggtagctgggaccgccgttcagtc gccaatatgcagctctttgtccgcgcccaggagctacacaccttcgaggt gaccggccaggaaacggtcgcccagatcaaggctcatgtagcctcactgg agggcattgccccggaagatcaagtcgtgctcctggcaggcgcgcccctg gaggatgaggccactctgggccagtgcggggtggaggccctgactaccct ggaagtagcaggccgcatgcttggaggtaaagttcatggttccctggccc gtgctggaaaagtgagaggtcagactcctaaggtggccaaacaggagaag aagaagaagaagacaggtcgggctaagcggcggatgcagtacaaccggcg ctttgtcaacgttgtgcccacctttggcaagaagaagggccccaatgcca actcttaagtcttttgtaattctggctttctctaataaaaaagccactta gttcagtcaaaaaaaaaa1
Jackknifer format is used by the Parsimony Jackknifer (JAC) or PHYSYS phylogenetics package. The format was among those adopted from the readseq package. For further information see:
http://evolution.genetics.washington.edu/phylip/software.old.html |
' Written by EMBOSS 29/07/09 (IXI_234) TSPASIRPPAGPSSRPAMVSSRRTRPSPPGPRRPTGRPCCSAAPRRPQAT (IXI_235) TSPASIRPPAGPSSR---------RPSPPGPRRPTGRPCCSAAPRRPQAT (IXI_236) TSPASIRPPAGPSSRPAMVSSR--RPSPPPPRRPPGRPCCSAAPPRPQAT (IXI_237) TSPASLRPPAGPSSRPAMVSSRR-RPSPPGPRRPT----CSAAPRRPQAT (IXI_234) GGWKTCSGTCTTSTSTRHRGRSGWSARTTTAACLRASRKSMRAACSRSAG (IXI_235) GGWKTCSGTCTTSTSTRHRGRSGW----------RASRKSMRAACSRSAG (IXI_236) GGWKTCSGTCTTSTSTRHRGRSGWSARTTTAACLRASRKSMRAACSR--G (IXI_237) GGYKTCSGTCTTSTSTRHRGRSGYSARTTTAACLRASRKSMRAACSR--G (IXI_234) SRPNRFAPTLMSSCITSTTGPPAWAGDRSHE (IXI_235) SRPNRFAPTLMSSCITSTTGPPAWAGDRSHE (IXI_236) SRPPRFAPPLMSSCITSTTGPPPPAGDRSHE (IXI_237) SRPNRFAPTLMSSCLTSTTGPPAYAGDRSHE ;
Jackknifernon format is the non-interleaved version of the format used by the Parsimony Jackknifer (JAC) or PHYSYS package for jackknifed sequences. The format was among those adopted from the readseq package. For further information see:
http://evolution.genetics.washington.edu/phylip/software.old.html |
' Written by EMBOSS 29/07/09 (IXI_234) TSPASIRPPAGPSSRPAMVSSRRTRPSPPGPRRPTGRPCCSAAPRRPQAT GGWKTCSGTCTTSTSTRHRGRSGWSARTTTAACLRASRKSMRAACSRSAG SRPNRFAPTLMSSCITSTTGPPAWAGDRSHE (IXI_235) TSPASIRPPAGPSSR---------RPSPPGPRRPTGRPCCSAAPRRPQAT GGWKTCSGTCTTSTSTRHRGRSGW----------RASRKSMRAACSRSAG SRPNRFAPTLMSSCITSTTGPPAWAGDRSHE (IXI_236) TSPASIRPPAGPSSRPAMVSSR--RPSPPPPRRPPGRPCCSAAPPRPQAT GGWKTCSGTCTTSTSTRHRGRSGWSARTTTAACLRASRKSMRAACSR--G SRPPRFAPPLMSSCITSTTGPPPPAGDRSHE (IXI_237) TSPASLRPPAGPSSRPAMVSSRR-RPSPPGPRRPT----CSAAPRRPQAT GGYKTCSGTCTTSTSTRHRGRSGYSARTTTAACLRASRKSMRAACSR--G SRPNRFAPTLMSSCLTSTTGPPAYAGDRSHE ;
MASE format is used by the SeaView multiple alignment editor. For further information see:
http://pbil.univ-lyon1.fr/software/seaview.html |
This style is not automatically detected by EMBOSS as it may accept non-sequence data. However, most Mase format data is accepted using the "igstrict" format.
;;Written by EMBOSS on Wed 29 Jul 2009 10:48:43 ;H.sapiens fau mRNA X65923 ttcctctttctcgactccatcttcgcggtagctgggaccgccgttcagtcgccaatatgc agctctttgtccgcgcccaggagctacacaccttcgaggtgaccggccaggaaacggtcg cccagatcaaggctcatgtagcctcactggagggcattgccccggaagatcaagtcgtgc tcctggcaggcgcgcccctggaggatgaggccactctgggccagtgcggggtggaggccc tgactaccctggaagtagcaggccgcatgcttggaggtaaagttcatggttccctggccc gtgctggaaaagtgagaggtcagactcctaaggtggccaaacaggagaagaagaagaaga agacaggtcgggctaagcggcggatgcagtacaaccggcgctttgtcaacgttgtgccca cctttggcaagaagaagggccccaatgccaactcttaagtcttttgtaattctggctttc tctaataaaaaagccacttagttcagtcaaaaaaaaaa
Mega is the format used by the MEGA package. MEGA is an integrated tool for automatic and manual sequence alignment, inferring phylogenetic trees, mining web-based databases, estimating rates of molecular evolution, and testing evolutionary hypotheses. For further information see:
http://www.megasoftware.net/ |
#mega !Title: Written by EMBOSS 29/07/09; !Format DataType=Protein DataFormat=Interleaved Identical=. Indel=- Missing=? ; #IXI_234 TSPASIRPPAGPSSRPAMVSSRRTRPSPPGPRRPTGRPCCSAAPRRPQAT #IXI_235 ...............---------.......................... #IXI_236 ......................--.....P....P.........P..... #IXI_237 .....L.................-...........----........... #IXI_234 GGWKTCSGTCTTSTSTRHRGRSGWSARTTTAACLRASRKSMRAACSRSAG #IXI_235 ........................----------................ #IXI_236 ...............................................--. #IXI_237 ..Y....................Y.......................--. #IXI_234 SRPNRFAPTLMSSCITSTTGPPAWAGDRSHE #IXI_235 ............................... #IXI_236 ...P....P.............PP....... #IXI_237 ..............L........Y.......
Meganon is the non-interleaved version of the format used by the MEGA package. For further information see:
http://www.megasoftware.net/ |
#mega !Title: Written by EMBOSS 29/07/09; !Format DataType=Protein Identical=. Indel=- Missing=? ; #IXI_234 TSPASIRPPAGPSSRPAMVSSRRTRPSPPGPRRPTGRPCCSAAPRRPQATGGWKTCSGTCTTSTSTRHRGRSGWSARTTTAACLRASRKSMRAACSRSAGSRPNRFAPTLMSSCITSTTGPPAWAGDRSHE #IXI_235 ...............---------..................................................----------............................................... #IXI_236 ......................--.....P....P.........P....................................................--....P....P.............PP....... #IXI_237 .....L.................-...........----.............Y....................Y.......................--...............L........Y.......
MSF is the format used for multiple sequences by Accelrys GCG, formerly known as the GCG Wisconsin Package. GCG was a commercial software package of programs and utilities for gene and protein analysis. For further information see:
http://www.accelrys.com/products/gcg/ |
!!AA_MULTIPLE_ALIGNMENT 1.0 msf MSF: 131 Type: P 22/01/02 CompCheck: 3003 .. Name: IXI_234 Len: 131 Check: 6808 Weight: 1.00 Name: IXI_235 Len: 131 Check: 4032 Weight: 1.00 Name: IXI_236 Len: 131 Check: 2744 Weight: 1.00 Name: IXI_237 Len: 131 Check: 9419 Weight: 1.00 // 1 50 IXI_234 TSPASIRPPAGPSSRPAMVSSRRTRPSPPGPRRPTGRPCCSAAPRRPQAT IXI_235 TSPASIRPPAGPSSR.........RPSPPGPRRPTGRPCCSAAPRRPQAT IXI_236 TSPASIRPPAGPSSRPAMVSSR..RPSPPPPRRPPGRPCCSAAPPRPQAT IXI_237 TSPASLRPPAGPSSRPAMVSSRR.RPSPPGPRRPT....CSAAPRRPQAT 51 100 IXI_234 GGWKTCSGTCTTSTSTRHRGRSGWSARTTTAACLRASRKSMRAACSRSAG IXI_235 GGWKTCSGTCTTSTSTRHRGRSGW..........RASRKSMRAACSRSAG IXI_236 GGWKTCSGTCTTSTSTRHRGRSGWSARTTTAACLRASRKSMRAACSR..G IXI_237 GGYKTCSGTCTTSTSTRHRGRSGYSARTTTAACLRASRKSMRAACSR..G 101 131 IXI_234 SRPNRFAPTLMSSCITSTTGPPAWAGDRSHE IXI_235 SRPNRFAPTLMSSCITSTTGPPAWAGDRSHE IXI_236 SRPPRFAPPLMSSCITSTTGPPPPAGDRSHE IXI_237 SRPNRFAPTLMSSCLTSTTGPPAYAGDRSHE
The NBRF or PIR format is that used in the PIR database, the integrated protein informatics resource for genomic and proteomic research. For further information see:
http://pir.georgetown.edu/pirwww |
A file in PIR format may contain multiple sequences. The first line is a header beginning with '>
', followed by a two-letter code describing the sequence type (P1
for a complete protein sequence, F1
for a protein fragment, DL
for a DNA sequence, DC
for a circular DNA sequence, RL
, RC
, or XX
), followed by the sequence database ID code. The second line contains a textual description of the sequence, followed by one or more lines with the sequence itself. The end of the sequence is marked by a '*
' (asterisk) character which does not imply a stop codon.
>P1;104K_THEAN Example protein sequence in NBRF format. The final '*' is ignored. MKFLVLLFNI LCLFPILGAD ELVMSPIPTT DVQPKVTFDI NSEVSSGPLY LNPVEMAGVK YLQLQRQPGV QVHKVVEGDI VIWENEEMPL YTCAIVTQNE VPYMAYVELL EDPDLIFFLK EGDQWAPIPE DQYLARLQQL RQQIHTESFF SLNLSFQHEN YKYEMVSSFQ HSIKMVVFTP KNGHICKMVY DKNIRIFKAL YNEYVTSVIG FFRGLKLLLL NIFVIDDRGM IGNKYFQLLD DKYAPISVQG YVATIPKLKD FAEPYHPIIL DISDIDYVNF YLGDATYHDP GFKIVPKTPQ CITKVVDGNE VIYESSNPSV ECVYKVTYYD KKNESMLRLD LNHSPPSYTS YYAKREGVWV TSTYIDLEEK IEELQDHRST ELDVMFMSDK DLNVVPLTNG NLEYFMVTPK PHRDIIIVFD GSEVLWYYEG LENHLVCTWI YVTEGAPRLV HLRVKDRIPQ NTDIYMVKFG EYWVRISKTQ YTQEIKKLIK KSKKKLPSIE EEDSDKHGGP PKGPEPPTGP GHSSSESKEH EDSKESKEPK EHGSPKETKE GEVTKKPGPA KEHKPSKIPV YTKRPEFPKK SKSPKRPESP KSPKRPVSPQ RPVSPKSPKR PESLDIPKSP KRPESPKSPK RPVSPQRPVS PRRPESPKSP KSPKSPKSPK VPFDPKFKEK LYDSYLDKAA KTKETVTLPP VLPTDESFTH TPIGEPTAEQ PDDIEPIEES VFIKETGILT EEVKTEDIHS ETGEPEEPKR PDSPTKHSPK PTGTHPSMPK KRRRSDGLAL STTDLESEAG RILRDPTGKI VTMKRSKSFD DLTTVREKEH MGAEIRKIVV DDDGTEADDE DTHPSKEKHL STVRRRRPRP KKSSKSSKPR KPDSAFVPSI GIL*
The Nexus or PAUP format is used by the PAUP package of tools for inferring and interpreting phylogenetic trees. For further information see:
http://paup.csit.fsu.edu/ |
#NEXUS [TITLE: Written by EMBOSS 29/07/09] begin data; dimensions ntax=4 nchar=131; format interleave datatype=protein missing=X gap=-; matrix IXI_234 TSPASIRPPAGPSSRPAMVSSRRTRPSPPGPRRPTGRPCCSAAPRRPQAT IXI_235 TSPASIRPPAGPSSR---------RPSPPGPRRPTGRPCCSAAPRRPQAT IXI_236 TSPASIRPPAGPSSRPAMVSSR--RPSPPPPRRPPGRPCCSAAPPRPQAT IXI_237 TSPASLRPPAGPSSRPAMVSSRR-RPSPPGPRRPT----CSAAPRRPQAT IXI_234 GGWKTCSGTCTTSTSTRHRGRSGWSARTTTAACLRASRKSMRAACSRSAG IXI_235 GGWKTCSGTCTTSTSTRHRGRSGW----------RASRKSMRAACSRSAG IXI_236 GGWKTCSGTCTTSTSTRHRGRSGWSARTTTAACLRASRKSMRAACSR--G IXI_237 GGYKTCSGTCTTSTSTRHRGRSGYSARTTTAACLRASRKSMRAACSR--G IXI_234 SRPNRFAPTLMSSCITSTTGPPAWAGDRSHE IXI_235 SRPNRFAPTLMSSCITSTTGPPAWAGDRSHE IXI_236 SRPPRFAPPLMSSCITSTTGPPPPAGDRSHE IXI_237 SRPNRFAPTLMSSCLTSTTGPPAYAGDRSHE ; end; begin assumptions; options deftype=unord; end;
The NEXUSNON or PAUPNON format is the non-interleaved version of the format used by the PAUP package of tools for inferring and interpreting phylogenetic trees. For further information see:
http://paup.csit.fsu.edu/ |
#NEXUS [TITLE: Written by EMBOSS 29/07/09] begin data; dimensions ntax=4 nchar=131; format datatype=protein missing=X gap=-; matrix IXI_234 TSPASIRPPAGPSSRPAMVSSRRTRPSPPGPRRPTGRPCCSAAPRRPQATGGWKTCSGTCTTSTSTRHRGRSGWSARTTTAACLRASRKSMRAACSRSAGSRPNRFAPTLMSSCITSTTGPPAWAGDRSHE IXI_235 TSPASIRPPAGPSSR---------RPSPPGPRRPTGRPCCSAAPRRPQATGGWKTCSGTCTTSTSTRHRGRSGW----------RASRKSMRAACSRSAGSRPNRFAPTLMSSCITSTTGPPAWAGDRSHE IXI_236 TSPASIRPPAGPSSRPAMVSSR--RPSPPPPRRPPGRPCCSAAPPRPQATGGWKTCSGTCTTSTSTRHRGRSGWSARTTTAACLRASRKSMRAACSR--GSRPPRFAPPLMSSCITSTTGPPPPAGDRSHE IXI_237 TSPASLRPPAGPSSRPAMVSSRR-RPSPPGPRRPT----CSAAPRRPQATGGYKTCSGTCTTSTSTRHRGRSGYSARTTTAACLRASRKSMRAACSR--GSRPNRFAPTLMSSCLTSTTGPPAYAGDRSHE ; end; begin assumptions; options deftype=unord; end;
The Protein Data Bank is a collection of 3D structures. These files contain protein sequences for one or more chains in a structure. These are in two forms, either as sequence residues in the SEQRES records (format pdbseq) or as structural elements in the ATOM records (format pdb). EMBOSS reads both versions. If the structure is incomplete the SEQRES records will have the full sequence but parts will be missing from the ATOM records. For further information see:
http://www.wwpdb.org/ |
HEADER HORMONE 10-MAY-82 2INS 2INS 3 COMPND DES-*PHE B1 INSULIN 2INS 4 SOURCE BOVINE (BOS TAURUS) 2INS 5 AUTHOR G.D.SMITH,W.L.DUAX,E.J.DODSON,G.G.DODSON,R.A.G.$DE *GRAAF, 2INSB 1 AUTHOR 1 C.D.REYNOLDS 2INS 7 REVDAT 7 31-MAY-84 2INSF 1 REMARK 2INSF 1 REVDAT 6 31-JAN-84 2INSE 1 REMARK 2INSE 1 REVDAT 5 27-OCT-83 2INSD 1 REMARK 2INSD 1 REVDAT 4 30-SEP-83 2INSC 1 REVDAT 2INSC 1 REVDAT 3 13-JUN-83 2INSB 1 AUTHOR JRNL 2INSC 2 REVDAT 2 07-MAR-83 2INSA 3 JRNL REMARK MTRIX 2INSC 3 REVDAT 1 05-AUG-82 2INS 0 2INSC 4 JRNL AUTH G.D.SMITH,W.L.DUAX,E.J.DODSON,G.G.DODSON, 2INS 8 JRNL AUTH 2 R.A.G.$DE *GRAAF,C.D.REYNOLDS 2INSB 2 JRNL TITL THE STRUCTURE OF DES-*PHE B1 BOVINE INSULIN 2INS 10 JRNL REF ACTA CRYSTALLOGR.,SECT.B V. 38 3028 1982 2INSA 1 JRNL REFN ASTM ACBCAR DK ISSN 0567-7408 107 2INSA 2 REMARK 1 2INS 13 REMARK 1 REFERENCE 1 2INSD 2 REMARK 1 AUTH J.BORDAS,G.G.DODSON,H.GREWE,M.H.J.KOCH,B.KREBS, 2INSD 3 REMARK 1 AUTH 2 J.RANDALL 2INSD 4 REMARK 1 TITL A COMPARATIVE ASSESSMENT OF THE ZINC-PROTEIN 2INSD 5 REMARK 1 TITL 2 COORDINATION IN 2*ZN-INSULIN AS DETERMINED BY X-RAY 2INSD 6 REMARK 1 TITL 3 ABSORPTION FINE STRUCTURE (/EXAFS$) AND X-RAY 2INSD 7 REMARK 1 TITL 4 CRYSTALLOGRAPHY 2INSD 8 REMARK 1 REF PROC.R.SOC.LONDON,SER.B V. 219 21 1983 2INSD 9 REMARK 1 REFN ASTM PRLBA4 UK ISSN 0080-4649 338 2INSE 2 ... < data omitted for brevity > REMARK 1 REFERENCE 14 2INSD 23 REMARK 1 EDIT M.O.DAYHOFF 2INS 89 REMARK 1 REF ATLAS OF PROTEIN SEQUENCE V. 5 187 1972 2INS 90 REMARK 1 REF 2 AND STRUCTURE (DATA SECTION) 2INS 91 REMARK 1 PUBL NATIONAL BIOMEDICAL RESEARCH FOUNDATION, 2INS 92 REMARK 1 PUBL 2 SILVER SPRING,MD. 2INS 93 REMARK 1 REFN ISBN 0-912466-02-2 435 2INS 94 REMARK 2 2INS 95 REMARK 2 RESOLUTION. 2.5 ANGSTROMS. 2INS 96 REMARK 3 2INS 97 REMARK 3 REFINEMENT. FAST FOURIER LEAST-SQUARES REFINEMENT FOLLOWED 2INS 98 REMARK 3 BY EXAMINATION OF DIFFERENCE MAPS USING THE *MMS-X* 2INS 99 REMARK 3 GRAPHICS SYSTEM. THE FINAL R IS 0.18 FOR 2128 2INS 100 REMARK 3 REFLECTIONS. THE RMS SHIFT AFTER REGULARIZATION IS 0.16 2INS 101 REMARK 3 ANGSTROMS AND THE DEVIATION OF A BOND FROM ITS IDEAL 2INS 102 REMARK 3 VALUE IS 0.20 ANGSTROMS. MODEL FIT PARAMETERS ARE SIGMA 2INS 103 REMARK 3 (BOND) = 0.02 ANGSTROMS AND SIGMA(ANGLE) = 3.0 DEGREES. 2INS 104 REMARK 4 2INS 105 ... < data omitted for brevity > REMARK 7 BECAUSE THE COORDINATES OF THE SYMMETRY-RELATED ATOMS ARE 2INS 140 REMARK 7 NOT INCLUDED IN THIS ENTRY THE COMPLETE CONNECTIVITY OF 2INS 141 REMARK 7 ATOMS ZN1 AND ZN2 CANNOT BE SPECIFIED. PARTIAL 2INS 142 REMARK 7 CONNECTIVITY IS GIVEN BY 2INS 143 REMARK 7 CONECT 229 227 228 791 2INS 144 REMARK 7 CONECT 624 622 623 792 2INS 145 REMARK 7 CONECT 791 247 826 ... ... ... ... 2INS 146 REMARK 7 CONECT 792 624 942 ... ... ... ... 2INS 147 REMARK 7 CONECT 826 791 2INS 148 REMARK 7 CONECT 942 792 2INS 149 REMARK 7 . 2INS 150 REMARK 7 . 2INS 151 REMARK 7 . 2INS 152 REMARK 8 2INS 153 REMARK 8 NO DENSITY WAS OBSERVED FOR TYR C 14 INDICATING THAT THIS 2INS 154 REMARK 8 SIDE CHAIN WAS MOVING FREELY. 2INS 155 REMARK 9 2INSA 20 REMARK 9 CORRECTION. CORRECT JOURNAL NAME FOR REFERENCES 2 AND 4. 2INSA 21 REMARK 9 UPDATE JRNL REFERENCE TO REFLECT PUBLICATION. CORRECT 2INSA 22 REMARK 9 MTRIX TRANSFORMATION. REVISE REMARKS 5 AND 7. 07-MAR-83. 2INSA 23 REMARK 10 2INSB 3 REMARK 10 CORRECTION. INSERT TYPESETTING CODES. 13-JUN-83. 2INSB 4 REMARK 11 2INSC 5 REMARK 11 CORRECTION. INSERT REVDAT RECORDS. 30-SEP-83. 2INSC 6 REMARK 12 2INSD 24 REMARK 12 CORRECTION. ADD NEW PUBLICATION AS REFERENCE 1 AND 2INSD 25 REMARK 12 RENUMBER THE OTHERS. 27-OCT-83. 2INSD 26 REMARK 13 2INSE 3 REMARK 13 CORRECTION. INSERT MISSING CODEN FOR REFERENCE 1. 2INSE 4 REMARK 13 31-JAN-84. 2INSE 5 REMARK 14 2INSF 3 REMARK 14 CORRECTION. CORRECT ISSN FOR REFERENCE 9. 31-MAY-84. 2INSF 4 SEQRES 1 A 21 GLY ILE VAL GLU GLN CYS CYS ALA SER VAL CYS SER LEU 2INS 156 SEQRES 2 A 21 TYR GLN LEU GLU ASN TYR CYS ASN 2INS 157 SEQRES 1 B 29 VAL ASN GLN HIS LEU CYS GLY SER HIS LEU VAL GLU ALA 2INS 158 SEQRES 2 B 29 LEU TYR LEU VAL CYS GLY GLU ARG GLY PHE PHE TYR THR 2INS 159 SEQRES 3 B 29 PRO LYS ALA 2INS 160 SEQRES 1 C 21 GLY ILE VAL GLU GLN CYS CYS ALA SER VAL CYS SER LEU 2INS 161 SEQRES 2 C 21 TYR GLN LEU GLU ASN TYR CYS ASN 2INS 162 SEQRES 1 D 29 VAL ASN GLN HIS LEU CYS GLY SER HIS LEU VAL GLU ALA 2INS 163 SEQRES 2 D 29 LEU TYR LEU VAL CYS GLY GLU ARG GLY PHE PHE TYR THR 2INS 164 SEQRES 3 D 29 PRO LYS ALA 2INS 165 FTNOTE 1 2INS 166 FTNOTE 1 THE QUASI-TWO-FOLD SYMMETRY BREAKS DOWN MOST SERIOUSLY AT 2INS 167 FTNOTE 1 RESIDUES 2INS 168 FTNOTE 1 GLY A 1 TO GLN A 5 AND GLY C 1 TO GLN C 5 2INS 169 FTNOTE 1 HIS B 5 AND HIS D 5 2INS 170 FTNOTE 1 PHE B 25 AND PHE D 25 2INS 171 FTNOTE 2 2INS 172 FTNOTE 2 THE FOLLOWING RESIDUES ARE DISORDERED - ARG B 22, 2INS 173 FTNOTE 2 LYS D 29. 2INS 174 FTNOTE 3 2INS 175 FTNOTE 3 SEE REMARK 8. 2INS 176 HET ZN1 1 1 ZINC ION ON 3-FOLD CRYSTAL AXIS 2INS 177 HET ZN2 2 1 ZINC ION ON 3-FOLD CRYSTAL AXIS 2INS 178 FORMUL 5 ZN1 ZN1 ++ 2INS 179 FORMUL 6 ZN2 ZN1 ++ 2INS 180 FORMUL 7 HOH *184(H2 01) 2INS 181 HELIX 1 A11 GLY A 1 VAL A 10 1 NOT IDEAL ALPH,SOME PI CNTCTS 2INS 182 HELIX 2 A12 SER A 12 GLU A 17 5 NOT IDEAL 3(10) 2INS 183 HELIX 3 B11 SER B 9 GLY B 20 1 NOT IDEAL ALPH,3(10) CONTCTS 2INS 184 HELIX 4 A21 GLY C 1 VAL C 10 1 NOT IDEAL ALPH,SOME PI CNTCTS 2INS 185 HELIX 5 A22 SER C 12 GLU C 17 5 NOT IDEAL 3(10) 2INS 186 HELIX 6 B21 SER D 9 GLY D 20 1 NOT IDEAL ALPH,3(10) CONTCTS 2INS 187 SHEET 1 B 2 PHE B 24 TYR B 26 0 2INS 188 SHEET 2 B 2 PHE D 24 TYR D 26 -1 O TYR D 26 N PHE B 24 2INS 189 TURN 1 1B1 CYS B 19 ARG B 22 2INS 190 TURN 2 1B2 GLY B 20 GLY B 23 2INS 191 TURN 3 2B1 CYS D 19 ARG D 22 2INS 192 TURN 4 2B2 GLY D 20 GLY D 23 2INS 193 SSBOND 1 CYS A 6 CYS A 11 2INS 194 SSBOND 2 CYS C 6 CYS C 11 2INS 195 SSBOND 3 CYS A 7 CYS B 7 2INS 196 SSBOND 4 CYS A 20 CYS B 19 2INS 197 SSBOND 5 CYS C 7 CYS D 7 2INS 198 SSBOND 6 CYS C 20 CYS D 19 2INS 199 SITE 1 D1 5 VAL B 12 TYR B 16 PHE B 24 PHE B 25 2INS 200 SITE 2 D1 5 TYR B 26 2INS 201 SITE 1 D2 5 VAL D 12 TYR D 16 PHE D 24 PHE D 25 2INS 202 SITE 2 D2 5 TYR D 26 2INS 203 SITE 1 H1 6 LEU A 13 TYR A 14 GLU B 13 ALA B 14 2INS 204 SITE 2 H1 6 LEU B 17 VAL B 18 2INS 205 SITE 1 H2 6 LEU C 13 TYR C 14 GLU D 13 ALA D 14 2INS 206 SITE 2 H2 6 LEU D 17 VAL D 18 2INS 207 SITE 1 SI1 7 GLY A 1 GLU A 4 GLN A 5 CYS A 7 2INS 208 SITE 2 SI1 7 TYR A 19 ASN A 21 CYS B 7 2INS 209 SITE 1 SI2 7 GLY C 1 GLU C 4 GLN C 5 CYS C 7 2INS 210 SITE 2 SI2 7 TYR C 19 ASN C 21 CYS D 7 2INS 211 CRYST1 81.600 81.600 34.000 90.00 90.00 120.00 R 3 18 2INS 212 ORIGX1 .012255 .007075 0.000000 0.00000 2INS 213 ORIGX2 0.000000 .014151 0.000000 0.00000 2INS 214 ORIGX3 0.000000 0.000000 .029412 0.00000 2INS 215 SCALE1 .012255 .007075 0.000000 0.00000 2INS 216 SCALE2 0.000000 .014151 0.000000 0.00000 2INS 217 SCALE3 0.000000 0.000000 .029412 0.00000 2INS 218 MTRIX1 1 -.880000 -.480000 .020000 0.00000 1 2INSA 24 MTRIX2 1 -.480000 .880000 -.020000 0.00000 1 2INSA 25 MTRIX3 1 -.010000 -.030000 -1.000000 0.00000 1 2INSA 26 END
The Protein Data Bank is a collection of 3D structures. PDB files may contain nucleotide sequences for one or more nucleic acid molecules. These are in two forms, either as sequence residues in the SEQRES records (format pdbnucseq) or as structural elements in the ATOM records (format pdbnuc). EMBOSS reads both versions. If the structure is incomplete the SEQRES records will have the full sequence but parts will be missing from the ATOM records. For further information see:
http://www.wwpdb.org/ |
HEADER TRANSCRIPTION/DNA 08-DEC-97 1A02 TITLE STRUCTURE OF THE DNA BINDING DOMAINS OF NFAT, FOS AND JUN TITLE 2 BOUND TO DNA COMPND MOL_ID: 1; COMPND 2 MOLECULE: DNA (5'- COMPND 3 D(*DTP*DTP*DGP*DGP*DAP*DAP*DAP*DAP*DTP*DTP*DTP*DGP*DTP*DTP* COMPND 4 DTP*DCP*DAP*DTP*DAP*DG)-3'); COMPND 5 CHAIN: A; COMPND 6 ENGINEERED: YES; COMPND 7 MOL_ID: 2; ... < data omitted for brevity > COMPND 26 MOLECULE: AP-1 FRAGMENT JUN; COMPND 27 CHAIN: J; COMPND 28 FRAGMENT: JUN; COMPND 29 SYNONYM: JUN; COMPND 30 ENGINEERED: YES; COMPND 31 MUTATION: YES SOURCE MOL_ID: 1; SOURCE 2 SYNTHETIC: YES; SOURCE 3 MOL_ID: 2; SOURCE 4 SYNTHETIC: YES; SOURCE 5 MOL_ID: 3; SOURCE 6 ORGANISM_SCIENTIFIC: HOMO SAPIENS; SOURCE 7 ORGANISM_COMMON: HUMAN; ... < data omitted for brevity > SOURCE 24 ORGANISM_TAXID: 9606; SOURCE 25 EXPRESSION_SYSTEM: ESCHERICHIA COLI; SOURCE 26 EXPRESSION_SYSTEM_TAXID: 562 KEYWDS TRANSCRIPTION FACTOR, NFAT, NF-AT, AP-1, FOS-JUN, KEYWDS 2 QUATERNARY PROTEIN-DNA COMPLEX, TRANSCRIPTION SYNERGY, KEYWDS 3 COMBINATORIAL GENE REGULATION, TRANSCRIPTION/DNA COMPLEX EXPDTA X-RAY DIFFRACTION AUTHOR L.CHEN,J.N.M.GLOVER,P.G.HOGAN,A.RAO,S.C.HARRISON REVDAT 2 24-FEB-09 1A02 1 VERSN REVDAT 1 27-MAY-98 1A02 0 JRNL AUTH L.CHEN,J.N.GLOVER,P.G.HOGAN,A.RAO,S.C.HARRISON JRNL TITL STRUCTURE OF THE DNA-BINDING DOMAINS FROM NFAT, JRNL TITL 2 FOS AND JUN BOUND SPECIFICALLY TO DNA. JRNL REF NATURE V. 392 42 1998 JRNL REFN ISSN 0028-0836 JRNL PMID 9510247 JRNL DOI 10.1038/32100 REMARK 1 REMARK 2 REMARK 2 RESOLUTION. 2.70 ANGSTROMS. REMARK 3 REMARK 3 REFINEMENT. REMARK 3 PROGRAM : X-PLOR 3.1 REMARK 3 AUTHORS : BRUNGER REMARK 3 REMARK 3 DATA USED IN REFINEMENT. REMARK 3 RESOLUTION RANGE HIGH (ANGSTROMS) : 2.70 REMARK 3 RESOLUTION RANGE LOW (ANGSTROMS) : 10.00 REMARK 3 DATA CUTOFF (SIGMA(F)) : 2.000 REMARK 3 DATA CUTOFF HIGH (ABS(F)) : 10000000.000 REMARK 3 DATA CUTOFF LOW (ABS(F)) : 0.1000 REMARK 3 COMPLETENESS (WORKING+TEST) (%) : 90.1 REMARK 3 NUMBER OF REFLECTIONS : 21643 REMARK 3 REMARK 3 FIT TO DATA USED IN REFINEMENT. REMARK 3 CROSS-VALIDATION METHOD : THROUGHOUT REMARK 3 FREE R VALUE TEST SET SELECTION : RANDOM REMARK 3 R VALUE (WORKING SET) : 0.246 REMARK 3 FREE R VALUE : 0.303 REMARK 3 FREE R VALUE TEST SET SIZE (%) : 7.500 REMARK 3 FREE R VALUE TEST SET COUNT : 1671 REMARK 3 ESTIMATED ERROR OF FREE R VALUE : 0.010 REMARK 3 REMARK 3 FIT IN THE HIGHEST RESOLUTION BIN. REMARK 3 TOTAL NUMBER OF BINS USED : 8 REMARK 3 BIN RESOLUTION RANGE HIGH (A) : 2.70 REMARK 3 BIN RESOLUTION RANGE LOW (A) : 2.82 REMARK 3 BIN COMPLETENESS (WORKING+TEST) (%) : 69.40 REMARK 3 REFLECTIONS IN BIN (WORKING SET) : 1784 REMARK 3 BIN R VALUE (WORKING SET) : 0.3690 REMARK 3 BIN FREE R VALUE : 0.3690 REMARK 3 BIN FREE R VALUE TEST SET SIZE (%) : 6.60 REMARK 3 BIN FREE R VALUE TEST SET COUNT : 188 REMARK 3 ESTIMATED ERROR OF BIN FREE R VALUE : 0.020 REMARK 3 REMARK 3 NUMBER OF NON-HYDROGEN ATOMS USED IN REFINEMENT. REMARK 3 PROTEIN ATOMS : 3073 REMARK 3 NUCLEIC ACID ATOMS : 814 REMARK 3 HETEROGEN ATOMS : 0 REMARK 3 SOLVENT ATOMS : 88 REMARK 3 REMARK 3 B VALUES. REMARK 3 FROM WILSON PLOT (A**2) : 61.00 REMARK 3 MEAN B VALUE (OVERALL, A**2) : 51.00 REMARK 3 OVERALL ANISOTROPIC B VALUE. REMARK 3 B11 (A**2) : NULL REMARK 3 B22 (A**2) : NULL REMARK 3 B33 (A**2) : NULL REMARK 3 B12 (A**2) : NULL REMARK 3 B13 (A**2) : NULL REMARK 3 B23 (A**2) : NULL REMARK 3 REMARK 3 ESTIMATED COORDINATE ERROR. REMARK 3 ESD FROM LUZZATI PLOT (A) : NULL REMARK 3 ESD FROM SIGMAA (A) : NULL REMARK 3 LOW RESOLUTION CUTOFF (A) : 10.00 REMARK 3 REMARK 3 CROSS-VALIDATED ESTIMATED COORDINATE ERROR. REMARK 3 ESD FROM C-V LUZZATI PLOT (A) : NULL REMARK 3 ESD FROM C-V SIGMAA (A) : NULL REMARK 3 REMARK 3 RMS DEVIATIONS FROM IDEAL VALUES. REMARK 3 BOND LENGTHS (A) : NULL REMARK 3 BOND ANGLES (DEGREES) : NULL REMARK 3 DIHEDRAL ANGLES (DEGREES) : NULL REMARK 3 IMPROPER ANGLES (DEGREES) : NULL REMARK 3 REMARK 3 ISOTROPIC THERMAL MODEL : RESTRAINED REMARK 3 REMARK 3 ISOTROPIC THERMAL FACTOR RESTRAINTS. RMS SIGMA REMARK 3 MAIN-CHAIN BOND (A**2) : 1.500 ; 1.500 REMARK 3 MAIN-CHAIN ANGLE (A**2) : 2.000 ; 2.000 REMARK 3 SIDE-CHAIN BOND (A**2) : 2.000 ; 2.000 REMARK 3 SIDE-CHAIN ANGLE (A**2) : 2.500 ; 2.500 REMARK 3 REMARK 3 NCS MODEL : NULL REMARK 3 REMARK 3 NCS RESTRAINTS. RMS SIGMA/WEIGHT REMARK 3 GROUP 1 POSITIONAL (A) : NULL ; NULL REMARK 3 GROUP 1 B-FACTOR (A**2) : NULL ; NULL REMARK 3 REMARK 3 PARAMETER FILE 1 : PARHCSDX.PRO REMARK 3 PARAMETER FILE 2 : PARAM_NDBX.DNA REMARK 3 PARAMETER FILE 3 : PARAM_NDBX.INT REMARK 3 PARAMETER FILE 4 : PARAM19.SOL REMARK 3 PARAMETER FILE 5 : NULL REMARK 3 TOPOLOGY FILE 1 : TOPHCSDX.PRO REMARK 3 TOPOLOGY FILE 2 : TOP_NDBX.DNA REMARK 3 TOPOLOGY FILE 3 : TOPH19.SOL REMARK 3 TOPOLOGY FILE 4 : NULL REMARK 3 TOPOLOGY FILE 5 : NULL REMARK 3 REMARK 3 OTHER REFINEMENT REMARKS: RESIDUES N 478 - N 485 AND N 628 - N REMARK 3 634 ARE DISORDERED REMARK 4 REMARK 4 1A02 COMPLIES WITH FORMAT V. 3.15, 01-DEC-08 REMARK 100 REMARK 100 THIS ENTRY HAS BEEN PROCESSED BY BNL. REMARK 200 REMARK 200 EXPERIMENTAL DETAILS REMARK 200 EXPERIMENT TYPE : X-RAY DIFFRACTION REMARK 200 DATE OF DATA COLLECTION : 16-SEP-96 REMARK 200 TEMPERATURE (KELVIN) : 100.00 REMARK 200 PH : 7.5 REMARK 200 NUMBER OF CRYSTALS USED : 1 REMARK 200 REMARK 200 SYNCHROTRON (Y/N) : Y REMARK 200 RADIATION SOURCE : NSLS REMARK 200 BEAMLINE : X25 REMARK 200 X-RAY GENERATOR MODEL : NULL REMARK 200 MONOCHROMATIC OR LAUE (M/L) : M REMARK 200 WAVELENGTH OR RANGE (A) : NULL REMARK 200 MONOCHROMATOR : NULL REMARK 200 OPTICS : NULL REMARK 200 REMARK 200 DETECTOR TYPE : IMAGE PLATE REMARK 200 DETECTOR MANUFACTURER : MARRESEARCH REMARK 200 INTENSITY-INTEGRATION SOFTWARE : DENZO REMARK 200 DATA SCALING SOFTWARE : SCALEPACK REMARK 200 REMARK 200 NUMBER OF UNIQUE REFLECTIONS : 22079 REMARK 200 RESOLUTION RANGE HIGH (A) : 2.700 REMARK 200 RESOLUTION RANGE LOW (A) : 20.000 REMARK 200 REJECTION CRITERIA (SIGMA(I)) : 0.000 REMARK 200 REMARK 200 OVERALL. REMARK 200 COMPLETENESS FOR RANGE (%) : 98.3 REMARK 200 DATA REDUNDANCY : 3.100 REMARK 200 R MERGE (I) : NULL REMARK 200 R SYM (I) : 0.08000 REMARK 200 <I/SIGMA(I)> FOR THE DATA SET : NULL REMARK 200 REMARK 200 IN THE HIGHEST RESOLUTION SHELL. REMARK 200 HIGHEST RESOLUTION SHELL, RANGE HIGH (A) : 2.70 REMARK 200 HIGHEST RESOLUTION SHELL, RANGE LOW (A) : 2.80 REMARK 200 COMPLETENESS FOR SHELL (%) : 93.3 REMARK 200 DATA REDUNDANCY IN SHELL : 2.70 REMARK 200 R MERGE FOR SHELL (I) : NULL REMARK 200 R SYM FOR SHELL (I) : 0.43000 REMARK 200 <I/SIGMA(I)> FOR SHELL : NULL REMARK 200 REMARK 200 DIFFRACTION PROTOCOL: SINGLE WAVELENGTH REMARK 200 METHOD USED TO DETERMINE THE STRUCTURE: MIR/MAD REMARK 200 SOFTWARE USED: CCP4, X-PLOR REMARK 200 STARTING MODEL: NULL REMARK 200 REMARK 200 REMARK: NULL REMARK 280 REMARK 280 CRYSTAL REMARK 280 SOLVENT CONTENT, VS (%): 68.00 REMARK 280 MATTHEWS COEFFICIENT, VM (ANGSTROMS**3/DA): 3.54 REMARK 280 REMARK 280 CRYSTALLIZATION CONDITIONS: THE COMPLEX WAS CRYSTALLIZED IN REMARK 280 300-400 MM AMMONIUM ACETATE SALT, PH 7.5 (10 MM)., VAPOR REMARK 280 DIFFUSION, HANGING DROP REMARK 290 REMARK 290 CRYSTALLOGRAPHIC SYMMETRY REMARK 290 SYMMETRY OPERATORS FOR SPACE GROUP: P 1 21 1 REMARK 290 REMARK 290 SYMOP SYMMETRY REMARK 290 NNNMMM OPERATOR REMARK 290 1555 X,Y,Z REMARK 290 2555 -X,Y+1/2,-Z REMARK 290 REMARK 290 WHERE NNN -> OPERATOR NUMBER REMARK 290 MMM -> TRANSLATION VECTOR REMARK 290 REMARK 290 CRYSTALLOGRAPHIC SYMMETRY TRANSFORMATIONS REMARK 290 THE FOLLOWING TRANSFORMATIONS OPERATE ON THE ATOM/HETATM REMARK 290 RECORDS IN THIS ENTRY TO PRODUCE CRYSTALLOGRAPHICALLY REMARK 290 RELATED MOLECULES. REMARK 290 SMTRY1 1 1.000000 0.000000 0.000000 0.00000 REMARK 290 SMTRY2 1 0.000000 1.000000 0.000000 0.00000 REMARK 290 SMTRY3 1 0.000000 0.000000 1.000000 0.00000 REMARK 290 SMTRY1 2 -1.000000 0.000000 0.000000 0.00000 REMARK 290 SMTRY2 2 0.000000 1.000000 0.000000 42.73000 REMARK 290 SMTRY3 2 0.000000 0.000000 -1.000000 0.00000 REMARK 290 REMARK 290 REMARK: NULL REMARK 300 REMARK 300 BIOMOLECULE: 1 REMARK 300 SEE REMARK 350 FOR THE AUTHOR PROVIDED AND/OR PROGRAM REMARK 300 GENERATED ASSEMBLY INFORMATION FOR THE STRUCTURE IN REMARK 300 THIS ENTRY. THE REMARK MAY ALSO PROVIDE INFORMATION ON REMARK 300 BURIED SURFACE AREA. REMARK 350 REMARK 350 COORDINATES FOR A COMPLETE MULTIMER REPRESENTING THE KNOWN REMARK 350 BIOLOGICALLY SIGNIFICANT OLIGOMERIZATION STATE OF THE REMARK 350 MOLECULE CAN BE GENERATED BY APPLYING BIOMT TRANSFORMATIONS REMARK 350 GIVEN BELOW. BOTH NON-CRYSTALLOGRAPHIC AND REMARK 350 CRYSTALLOGRAPHIC OPERATIONS ARE GIVEN. REMARK 350 REMARK 350 BIOMOLECULE: 1 REMARK 350 AUTHOR DETERMINED BIOLOGICAL UNIT: PENTAMERIC REMARK 350 SOFTWARE DETERMINED QUATERNARY STRUCTURE: PENTAMERIC REMARK 350 SOFTWARE USED: PISA REMARK 350 TOTAL BURIED SURFACE AREA: 9500 ANGSTROM**2 REMARK 350 SURFACE AREA OF THE COMPLEX: 26430 ANGSTROM**2 REMARK 350 CHANGE IN SOLVENT FREE ENERGY: -43.0 KCAL/MOL REMARK 350 APPLY THE FOLLOWING TO CHAINS: N, A, B, F, J REMARK 350 BIOMT1 1 1.000000 0.000000 0.000000 0.00000 REMARK 350 BIOMT2 1 0.000000 1.000000 0.000000 0.00000 REMARK 350 BIOMT3 1 0.000000 0.000000 1.000000 0.00000 REMARK 465 REMARK 465 MISSING RESIDUES REMARK 465 THE FOLLOWING RESIDUES WERE NOT LOCATED IN THE REMARK 465 EXPERIMENT. (M=MODEL NUMBER; RES=RESIDUE NAME; C=CHAIN REMARK 465 IDENTIFIER; SSSEQ=SEQUENCE NUMBER; I=INSERTION CODE.) REMARK 465 REMARK 465 M RES C SSSEQI REMARK 465 MET N 378 REMARK 465 ARG N 379 REMARK 465 GLY N 380 REMARK 465 SER N 381 REMARK 465 HIS N 382 REMARK 465 HIS N 383 REMARK 465 HIS N 384 REMARK 465 HIS N 385 REMARK 465 HIS N 386 REMARK 465 HIS N 387 REMARK 465 THR N 388 REMARK 465 ASP N 389 REMARK 465 PRO N 390 REMARK 465 HIS N 391 REMARK 465 ALA N 392 REMARK 465 SER N 393 REMARK 465 SER N 394 REMARK 465 VAL N 395 REMARK 465 PRO N 396 REMARK 465 LEU N 397 REMARK 465 GLU N 398 REMARK 465 MET F 138 REMARK 465 LYS F 139 REMARK 465 LEU F 193 REMARK 465 MET J 263 REMARK 465 LYS J 264 REMARK 465 ALA J 265 REMARK 465 GLU J 266 REMARK 470 REMARK 470 MISSING ATOM REMARK 470 THE FOLLOWING RESIDUES HAVE MISSING ATOMS(M=MODEL NUMBER; REMARK 470 RES=RESIDUE NAME; C=CHAIN IDENTIFIER; SSEQ=SEQUENCE NUMBER; REMARK 470 I=INSERTION CODE): REMARK 470 M RES CSSEQI ATOMS REMARK 470 ARG N 478 CG CD NE CZ NH1 NH2 REMARK 470 ILE N 479 CG1 CG2 CD1 REMARK 470 THR N 480 OG1 CG2 REMARK 470 THR N 483 OG1 CG2 REMARK 470 VAL N 484 CG1 CG2 REMARK 470 THR N 485 OG1 CG2 REMARK 470 ASP N 629 CG OD1 OD2 REMARK 470 LYS N 630 CG CD CE NZ REMARK 470 ASP N 631 CG OD1 OD2 REMARK 470 LYS N 632 CG CD CE NZ REMARK 470 SER N 633 OG REMARK 470 GLN N 634 CG CD OE1 NE2 REMARK 500 REMARK 500 GEOMETRY AND STEREOCHEMISTRY REMARK 500 SUBTOPIC: COVALENT BOND ANGLES REMARK 500 REMARK 500 THE STEREOCHEMICAL PARAMETERS OF THE FOLLOWING RESIDUES REMARK 500 HAVE VALUES WHICH DEVIATE FROM EXPECTED VALUES BY MORE REMARK 500 THAN 6*RMSD (M=MODEL NUMBER; RES=RESIDUE NAME; C=CHAIN REMARK 500 IDENTIFIER; SSEQ=SEQUENCE NUMBER; I=INSERTION CODE). REMARK 500 REMARK 500 STANDARD TABLE: REMARK 500 FORMAT: (10X,I3,1X,A3,1X,A1,I4,A1,3(1X,A4,2X),12X,F5.1) REMARK 500 REMARK 500 EXPECTED VALUES PROTEIN: ENGH AND HUBER, 1999 REMARK 500 EXPECTED VALUES NUCLEIC ACID: CLOWNEY ET AL 1996 REMARK 500 REMARK 500 M RES CSSEQI ATM1 ATM2 ATM3 REMARK 500 DA A4005 O4' - C1' - N9 ANGL. DEV. = 2.1 DEGREES REMARK 500 DG A4020 N9 - C1' - C2' ANGL. DEV. = 8.6 DEGREES REMARK 500 DA B5008 O4' - C1' - N9 ANGL. DEV. = 1.9 DEGREES REMARK 500 DC B5011 C3' - C2' - C1' ANGL. DEV. = -5.1 DEGREES REMARK 500 DC B5011 N1 - C1' - C2' ANGL. DEV. = 9.7 DEGREES REMARK 500 DC B5011 O4' - C1' - N1 ANGL. DEV. = 3.0 DEGREES REMARK 500 ARG N 411 NE - CZ - NH2 ANGL. DEV. = 3.6 DEGREES REMARK 500 ARG N 466 NE - CZ - NH2 ANGL. DEV. = 3.3 DEGREES REMARK 500 ARG J 282 NE - CZ - NH2 ANGL. DEV. = 3.6 DEGREES REMARK 500 REMARK 500 REMARK: NULL REMARK 500 REMARK 500 GEOMETRY AND STEREOCHEMISTRY REMARK 500 SUBTOPIC: TORSION ANGLES REMARK 500 REMARK 500 TORSION ANGLES OUTSIDE THE EXPECTED RAMACHANDRAN REGIONS: REMARK 500 (M=MODEL NUMBER; RES=RESIDUE NAME; C=CHAIN IDENTIFIER; REMARK 500 SSEQ=SEQUENCE NUMBER; I=INSERTION CODE). REMARK 500 REMARK 500 STANDARD TABLE: REMARK 500 FORMAT:(10X,I3,1X,A3,1X,A1,I4,A1,4X,F7.2,3X,F7.2) REMARK 500 REMARK 500 EXPECTED VALUES: GJ KLEYWEGT AND TA JONES (1996). PHI/PSI- REMARK 500 CHOLOGY: RAMACHANDRAN REVISITED. STRUCTURE 4, 1395 - 1400 REMARK 500 REMARK 500 M RES CSSEQI PSI PHI REMARK 500 SER N 405 100.91 -51.63 REMARK 500 GLU N 409 90.38 70.97 REMARK 500 HIS N 420 86.75 -161.88 REMARK 500 ASN N 451 53.96 -146.42 REMARK 500 THR N 462 -172.45 -63.42 REMARK 500 ALA N 463 -146.20 -158.78 REMARK 500 ASP N 464 -162.81 60.49 REMARK 500 THR N 480 -0.98 -52.59 REMARK 500 LYS N 482 -162.09 -110.34 REMARK 500 ASN N 495 -8.57 86.88 REMARK 500 ARG N 537 104.93 -41.22 REMARK 500 CYS N 588 169.94 174.80 REMARK 500 VAL N 590 -30.50 -32.58 REMARK 500 THR N 604 -141.10 -89.57 REMARK 500 ASP N 631 -53.08 -172.49 REMARK 500 SER N 633 -104.65 -168.34 REMARK 500 PRO N 635 13.91 -61.60 REMARK 500 ASN N 636 31.15 -155.42 REMARK 500 LYS N 664 -42.18 -147.02 REMARK 500 REMARK 500 REMARK: NULL DBREF 1A02 N 396 678 UNP Q13469 NFAC2_HUMAN 396 678 DBREF 1A02 F 138 193 UNP P01100 FOS_HUMAN 138 193 DBREF 1A02 J 263 318 UNP P05412 AP1_HUMAN 253 308 DBREF 1A02 A 4001 4020 PDB 1A02 1A02 4001 4020 DBREF 1A02 B 5001 5020 PDB 1A02 1A02 5001 5020 SEQADV 1A02 MET F 138 UNP P01100 GLU 138 ENGINEERED SEQADV 1A02 SER F 154 UNP P01100 CYS 154 ENGINEERED SEQADV 1A02 MET J 263 UNP P05412 ILE 253 ENGINEERED SEQADV 1A02 SER J 279 UNP P05412 CYS 269 ENGINEERED SEQRES 1 A 20 DT DT DG DG DA DA DA DA DT DT DT DG DT SEQRES 2 A 20 DT DT DC DA DT DA DG SEQRES 1 B 20 DA DA DC DT DA DT DG DA DA DA DC DA DA SEQRES 2 B 20 DA DT DT DT DT DC DC SEQRES 1 N 301 MET ARG GLY SER HIS HIS HIS HIS HIS HIS THR ASP PRO SEQRES 2 N 301 HIS ALA SER SER VAL PRO LEU GLU TRP PRO LEU SER SER SEQRES 3 N 301 GLN SER GLY SER TYR GLU LEU ARG ILE GLU VAL GLN PRO SEQRES 4 N 301 LYS PRO HIS HIS ARG ALA HIS TYR GLU THR GLU GLY SER SEQRES 5 N 301 ARG GLY ALA VAL LYS ALA PRO THR GLY GLY HIS PRO VAL SEQRES 6 N 301 VAL GLN LEU HIS GLY TYR MET GLU ASN LYS PRO LEU GLY SEQRES 7 N 301 LEU GLN ILE PHE ILE GLY THR ALA ASP GLU ARG ILE LEU SEQRES 8 N 301 LYS PRO HIS ALA PHE TYR GLN VAL HIS ARG ILE THR GLY SEQRES 9 N 301 LYS THR VAL THR THR THR SER TYR GLU LYS ILE VAL GLY SEQRES 10 N 301 ASN THR LYS VAL LEU GLU ILE PRO LEU GLU PRO LYS ASN SEQRES 11 N 301 ASN MET ARG ALA THR ILE ASP CYS ALA GLY ILE LEU LYS SEQRES 12 N 301 LEU ARG ASN ALA ASP ILE GLU LEU ARG LYS GLY GLU THR SEQRES 13 N 301 ASP ILE GLY ARG LYS ASN THR ARG VAL ARG LEU VAL PHE SEQRES 14 N 301 ARG VAL HIS ILE PRO GLU SER SER GLY ARG ILE VAL SER SEQRES 15 N 301 LEU GLN THR ALA SER ASN PRO ILE GLU CYS SER GLN ARG SEQRES 16 N 301 SER ALA HIS GLU LEU PRO MET VAL GLU ARG GLN ASP THR SEQRES 17 N 301 ASP SER CYS LEU VAL TYR GLY GLY GLN GLN MET ILE LEU SEQRES 18 N 301 THR GLY GLN ASN PHE THR SER GLU SER LYS VAL VAL PHE SEQRES 19 N 301 THR GLU LYS THR THR ASP GLY GLN GLN ILE TRP GLU MET SEQRES 20 N 301 GLU ALA THR VAL ASP LYS ASP LYS SER GLN PRO ASN MET SEQRES 21 N 301 LEU PHE VAL GLU ILE PRO GLU TYR ARG ASN LYS HIS ILE SEQRES 22 N 301 ARG THR PRO VAL LYS VAL ASN PHE TYR VAL ILE ASN GLY SEQRES 23 N 301 LYS ARG LYS ARG SER GLN PRO GLN HIS PHE THR TYR HIS SEQRES 24 N 301 PRO VAL SEQRES 1 F 56 MET LYS ARG ARG ILE ARG ARG GLU ARG ASN LYS MET ALA SEQRES 2 F 56 ALA ALA LYS SER ARG ASN ARG ARG ARG GLU LEU THR ASP SEQRES 3 F 56 THR LEU GLN ALA GLU THR ASP GLN LEU GLU ASP GLU LYS SEQRES 4 F 56 SER ALA LEU GLN THR GLU ILE ALA ASN LEU LEU LYS GLU SEQRES 5 F 56 LYS GLU LYS LEU SEQRES 1 J 56 MET LYS ALA GLU ARG LYS ARG MET ARG ASN ARG ILE ALA SEQRES 2 J 56 ALA SER LYS SER ARG LYS ARG LYS LEU GLU ARG ILE ALA SEQRES 3 J 56 ARG LEU GLU GLU LYS VAL LYS THR LEU LYS ALA GLN ASN SEQRES 4 J 56 SER GLU LEU ALA SER THR ALA ASN MET LEU ARG GLU GLN SEQRES 5 J 56 VAL ALA GLN LEU FORMUL 6 HOH *88(H2 O) HELIX 1 1 ARG N 522 GLU N 527 1 6 HELIX 2 2 SER N 570 LEU N 577 1 8 HELIX 3 3 ARG F 140 LYS F 192 1 53 HELIX 4 4 ARG J 267 GLN J 317 1 51 SHEET 1 A 3 LEU N 410 VAL N 414 0 SHEET 2 A 3 VAL N 442 LEU N 445 -1 O VAL N 442 N GLU N 413 SHEET 3 A 3 ARG N 510 THR N 512 -1 O ALA N 511 N VAL N 443 SHEET 1 B 5 GLU N 490 ILE N 492 0 SHEET 2 B 5 VAL N 498 LEU N 503 -1 N VAL N 498 O ILE N 492 SHEET 3 B 5 LEU N 454 GLY N 461 -1 O LEU N 454 N LEU N 503 SHEET 4 B 5 ARG N 541 PRO N 551 -1 O ARG N 543 N GLY N 461 SHEET 5 B 5 ILE N 557 ALA N 563 -1 O VAL N 558 N ILE N 550 SHEET 1 C 5 GLU N 490 ILE N 492 0 SHEET 2 C 5 VAL N 498 LEU N 503 -1 N VAL N 498 O ILE N 492 SHEET 3 C 5 LEU N 454 GLY N 461 -1 O LEU N 454 N LEU N 503 SHEET 4 C 5 ARG N 541 PRO N 551 -1 O ARG N 543 N GLY N 461 SHEET 5 C 5 ILE N 567 GLU N 568 -1 N ILE N 567 O VAL N 542 SHEET 1 D 2 TYR N 474 ARG N 478 0 SHEET 2 D 2 ALA N 516 LYS N 520 -1 N GLY N 517 O HIS N 477 SHEET 1 E 4 MET N 579 GLN N 583 0 SHEET 2 E 4 GLN N 595 GLN N 601 -1 O THR N 599 N GLU N 581 SHEET 3 E 4 MET N 637 GLU N 641 -1 O LEU N 638 N LEU N 598 SHEET 4 E 4 VAL N 628 GLN N 634 -1 N ASP N 629 O PHE N 639 SHEET 1 F 4 GLN N 620 ALA N 626 0 SHEET 2 F 4 LYS N 608 LYS N 614 -1 O VAL N 609 N ALA N 626 SHEET 3 F 4 VAL N 654 ASN N 662 -1 O ASN N 657 N THR N 612 SHEET 4 F 4 LYS N 666 ARG N 667 -1 O LYS N 666 N ASN N 662 SHEET 1 G 5 GLN N 620 ALA N 626 0 SHEET 2 G 5 LYS N 608 LYS N 614 -1 O VAL N 609 N ALA N 626 SHEET 3 G 5 VAL N 654 ASN N 662 -1 O ASN N 657 N THR N 612 SHEET 4 G 5 GLN N 671 HIS N 676 -1 N GLN N 671 O PHE N 658 SHEET 5 G 5 CYS N 588 LEU N 589 1 O CYS N 588 N HIS N 676 CRYST1 64.660 85.460 83.370 90.00 112.03 90.00 P 1 21 1 2 ORIGX1 1.000000 0.000000 0.000000 0.00000 ORIGX2 0.000000 1.000000 0.000000 0.00000 ORIGX3 0.000000 0.000000 1.000000 0.00000 SCALE1 0.015466 0.000000 0.006258 0.00000 SCALE2 0.000000 0.011701 0.000000 0.00000 SCALE3 0.000000 0.000000 0.012939 0.00000 ATOM 1 O5' DT A4001 4.203 37.609 50.803 1.00 52.72 O ATOM 2 C5' DT A4001 3.376 36.889 51.712 1.00 52.47 C ATOM 3 C4' DT A4001 2.606 35.733 51.110 1.00 53.36 C ATOM 4 O4' DT A4001 2.221 36.073 49.751 1.00 52.48 O ATOM 5 C3' DT A4001 3.476 34.488 50.971 1.00 52.32 C ATOM 6 O3' DT A4001 2.688 33.294 51.069 1.00 54.15 O ATOM 7 C2' DT A4001 4.091 34.642 49.589 1.00 52.45 C ATOM 8 C1' DT A4001 2.948 35.269 48.813 1.00 50.40 C ATOM 9 N1 DT A4001 3.295 36.105 47.620 1.00 46.82 N ATOM 10 C2 DT A4001 3.315 35.475 46.389 1.00 44.10 C ATOM 11 O2 DT A4001 3.079 34.285 46.233 1.00 42.08 O ATOM 12 N3 DT A4001 3.626 36.299 45.334 1.00 41.51 N ATOM 13 C4 DT A4001 3.919 37.648 45.372 1.00 43.02 C ATOM 14 O4 DT A4001 4.181 38.250 44.332 1.00 44.08 O ATOM 15 C5 DT A4001 3.891 38.244 46.680 1.00 43.79 C ATOM 16 C7 DT A4001 4.204 39.702 46.805 1.00 42.62 C ATOM 17 C6 DT A4001 3.581 37.455 47.732 1.00 45.67 C ATOM 18 P DT A4002 3.429 31.863 51.098 1.00 57.02 P ATOM 19 OP1 DT A4002 2.467 30.721 51.156 1.00 54.83 O ... < data omitted for brevity > ATOM 3884 C LEU J 318 27.065 26.434 104.638 1.00 69.24 C ATOM 3885 O LEU J 318 25.978 26.660 105.215 1.00 71.58 O ATOM 3886 CB LEU J 318 26.876 28.109 102.776 1.00 63.64 C ATOM 3887 CG LEU J 318 27.060 29.547 102.286 1.00 64.34 C ATOM 3888 CD1 LEU J 318 25.692 30.140 101.986 1.00 63.18 C ATOM 3889 CD2 LEU J 318 27.769 30.403 103.324 1.00 64.33 C ATOM 3890 OXT LEU J 318 27.636 25.317 104.661 1.00 68.15 O TER 3891 LEU J 318 HETATM 3892 O HOH A6001 15.403 36.729 37.482 1.00 32.21 O HETATM 3893 O HOH A6002 27.779 37.839 46.346 1.00 14.74 O HETATM 3894 O HOH A6003 24.852 40.609 44.995 1.00 19.97 O HETATM 3895 O HOH A6011 63.157 40.356 39.359 1.00 65.68 O HETATM 3896 O HOH A6013 56.916 34.826 37.981 1.00 49.34 O ... < data omitted for brevity > MASTER 322 0 0 4 28 0 0 6 3974 5 0 38 END
Pfam multiple sequence alignment format is supported for input of sequences only. Pfam is a collection of multiple sequence alignments and hidden Markov models which covers many common protein domains and families. For further information see:
http://www.sanger.ac.uk/Software/Pfam/ |
# STOCKHOLM 1.0 #=GF ID 14-3-3 #=GF AC PF00244 #=GF DE 14-3-3 protein #=GF AU Finn RD #=GF AL Clustalw #=GF SE Prosite #=GF GA 25 25 #=GF TC 35.40 35.40 #=GF NC 19.10 19.10 #=GF TP Domain #=GF BM hmmbuild -f HMM SEED #=GF BM hmmcalibrate --seed 0 HMM #=GF RN [1] #=GF RM 95327195 #=GF RT Structure of a 14-3-3 protein and implications for #=GF RT coordination of multiple signalling pathways. #=GF RA Xiao B, Smerdon SJ, Jones DH, Dodson GG, Soneji Y, Aitken #=GF RA A, Gamblin SJ; #=GF RL Nature 1995;376:188-191. #=GF RN [2] #=GF RM 95327196 #=GF RT Crystal structure of the zeta isoform of the 14-3-3 #=GF RT protein. #=GF RA Liu D, Bienkowska J, Petosa C, Collier RJ, Fu H, Liddington #=GF RA R; #=GF RL Nature 1995;376:191-194. #=GF RN [3] #=GF RM 96182649 #=GF RT Interaction of 14-3-3 with signaling proteins is mediated #=GF RT by the recognition of phosphoserine. #=GF RA Muslin AJ, Tanner JW, Allen PM, Shaw AS; #=GF RL Cell 1996;84:889-897. #=GF RN [4] #=GF RM 97424374 #=GF RT The 14-3-3 protein binds its target proteins with a common #=GF RT site located towards the C-terminus. #=GF RA Ichimura T, Ito M, Itagaki C, Takahashi M, Horigome T, #=GF RA Omata S, Ohno S, Isobe T #=GF RL FEBS Lett 1997;413:273-276. #=GF RN [5] #=GF RM 96394689 #=GF RT Molecular evolution of the 14-3-3 protein family. #=GF RA Wang W, Shakes DC #=GF RL J Mol Evol 1996;43:384-398. #=GF RN [6] #=GF RM 96300316 #=GF RT Function of 14-3-3 proteins. #=GF RA Jin DY, Lyu MS, Kozak CA, Jeang KT #=GF RL Nature 1996;382:308-308. #=GF DR PROSITE; PDOC00633; #=GF DR SMART; 14_3_3; #=GF DR PRINTS; PR00305; #=GF DR SCOP; 1a4o; fa; #=GF DR INTERPRO; IPR000308; #=GF DR PDB; 1a37 A; 3; 228; #=GF DR PDB; 1a37 B; 3; 228; #=GF DR PDB; 1a38 A; 3; 228; #=GF DR PDB; 1a38 B; 3; 228; #=GF DR PDB; 1a4o A; 3; 228; #=GF DR PDB; 1a4o B; 3; 228; #=GF DR PDB; 1a4o C; 3; 228; #=GF DR PDB; 1a4o D; 3; 228; #=GF DR PDB; 1qja B; 3; 229; #=GF DR PDB; 1qja A; 3; 230; #=GF DR PDB; 1qjb A; 3; 232; #=GF DR PDB; 1qjb B; 3; 232; #=GF SQ 148 #=GS O61131/11-251 AC O61131 #=GS 143L_ARATH/7-245 AC P48349 #=GS O49082/8-245 AC O49082 ... < data omitted for brevity > #=GS RA24_SCHPO/6-241 AC P42656 #=GS 143B_HORVU/9-246 AC Q43470 #=GS 143N_ARATH/5-242 AC Q96300 #=GS 143Z_HUMAN/3-236 DR PDB; 1a37 A; 3; 228; #=GS 143Z_HUMAN/3-236 DR PDB; 1a37 B; 3; 228; #=GS 143Z_HUMAN/3-236 DR PDB; 1a38 A; 3; 228; #=GS 143Z_HUMAN/3-236 DR PDB; 1a38 B; 3; 228; #=GS 143Z_HUMAN/3-236 DR PDB; 1a4o A; 3; 228; #=GS 143Z_HUMAN/3-236 DR PDB; 1a4o B; 3; 228; #=GS 143Z_HUMAN/3-236 DR PDB; 1a4o C; 3; 228; #=GS 143Z_HUMAN/3-236 DR PDB; 1a4o D; 3; 228; #=GS 143Z_HUMAN/3-236 DR PDB; 1qja B; 3; 229; #=GS 143Z_HUMAN/3-236 DR PDB; 1qja A; 3; 230; #=GS 143Z_HUMAN/3-236 DR PDB; 1qjb A; 3; 232; #=GS 143Z_HUMAN/3-236 DR PDB; 1qjb B; 3; 232; O61131/11-251 RSDCTYRSKLAEQAERYDEMADAMRTLVEQCVnn.......dkdELTVEERNLLSVAYKNAVGARRASWRIISSVEQKEMSKA.NVHNKNIAATYRKKVEEELNNIC.QDILN.LLTKKLIPNT..SESESKVFYYKMKGDYYRYISEFS.CDE.GKKEASNFAQEAYQKATDIAENELPSTHPIRLGLALNYSVFFY..EILNQPHQACEMAKRAF...DDAITEFDNV..SEDS..YKDSTLI.MQLLRDNLTLWTSDLQGDQ O61132/1-232 ---------LAEQAERYDEMADAMRTLVEQCVnn.......dkdELTVEERNLLSVAYKNAVGARRASWRIISSVEQKEMSKA.NVHNKNVAATYRKKVEEELNNIC.QDILN.LLTKKLIPNT..SESESKVFYYKMKGDYYRYISEFS.CDE.GKKEASNFAQEAYQKDTDIAENELPSTHPIRLGLALNYSVFFY..EILNQLHQACEMAKRAF...DDAITEFDNV..SEDS..YKDSTLI.MQLLRDNLTLWTSDLQGDQ O96436/9-256 REEHVYRAKLAEQAERYDEMAEAMKNLVENCLdqnnsppgakgdELTVEERNLLSVAYKNAVGARRASWRIISSVEQKEANRN.HMANKALAASYRQKVENELNKIC.QEILT.LLTDKLLPRT..TDSESRVFYFKMKGDYYRYISEFS.NEE.GKKASAEQAEESYKRATDTAEAELPSTHPIRLGLALNYSVFYY..EILNQPQKACEMAKLAF...DDAITEFDSV..SEDS..YKDSTLI.MQLLRDNLTLWTSDLQTQE O60955/9-251 RDEYVYKAKLAEQAERYDEMAEAMKNLVENCLdeq.....qpkdELSVEERNLLSVAYKNAVGARRASWRIISSVEQKELSKQ.HMQNKALAAEYRQKVEEELNKIC.HDILQ.LLTDKLIPKT..SDSESKVFYYKMKGDYYRYISEFS.GEE.GKKQAADQAQESYQKATETAEAELPSTHPIRLGLALNYSVFFY..EILNLPQQACEMAKRAF...DDAITEFDNV..SEDS..YKDSTLI.MQLLRDNLTLWTSDLQADQ 1433_NEOCA/9-251 RDEYVYKAKLAEQAERYDEMAEAMKNLVENCLdeq.....qpkdELSVEERNLLSVAYKNAVGARRASWRIISSVEQKELSKQ.HMQNKALAAEYRQKVEEELNKIC.HDILQ.LLTDKLIPKT..SDSESKVFYYKMKGDYYRYISEFS.GEE.GKKQAADQAQESYQKATETAEGHSPATHPIRLGLALNYSVFFY..EILNLPQQACEMAKRAF...DDAITEFDNV..SEDS..YKDSTLI.MQLLRDNLTLWTSDLQADQ Q21539/1-97 --------------------------------............---------------------------------------.-----------------------.-----.----------..-----------MVADHFRYLVQYD.-DI.NREEHAHKSRIAYQEALGIAKDKMQPTHPIRLGLALNASALNF..DVLNLPKEANEIAQSAL...DSAHRELEKMksSLDS..YDISNL-.------------------- O65165/1-21 --------------------------------............---------------------------------------.-----------------------.-----.----------..------------------------.---.-------------------------------------------..-----------------...----------..----..-----LI.MQLLRDNLTLWTSDMQEDG Q9U491/4-239 REALVYRAKLAEQLERYDEMVDAMKEVVEMAE............ELTVEERNLLSVAYKNVIGSRRSSWRVFSAVEQTEGNRG.NAEKQACAKKFREVLESELDRVS.KDILE.LIDKYLIKSA..TKSDSKVFYLKMKGDYFRYMAEFS.VDP.QRKKAAEESNKAYQEASEIAATQLFPTHPIRLGLALNYSVYFY..EIMNDPDEACRLAQAAF...DDAIAKLDQL..SEES..YKDSTLI.MQLLRDNLTLWTSDPERDD 1433_XENLA/1-227 -------AKLSEQAERYDDMAASMKAVTELGA............ELSNEERNLLSVAYKNVVGARRSSWRVISSIEQKTEG--.NDKRQQMAREYREKVETELQDIC.KDVLD.LLDRFLVPNA..TPPESKVFYLKMKGDYYRYLSEVA.SGD.SKQETVASSQQAYQEAFEISKSEMQPTHPIRLGLALNFSVFYY..EILNSPEKACSLAKSAF...DEAIRELDTL..NEES..YKDSTLI.MQLLRDNLTLWTSENQGEE 143B_BOVIN/4-237 KSELVQKAKLAEQAERYDDMAAAMKAVTEQGH............ELSNEERNLLSVAYKNVVGARRSSWRVISSIEQKTER--.NEKKQQMGKEYREKIEAELQDIC.NDVLQ.LLDKYLIPNA..TQPESKVFYLKMKGDYFRYLSEVA.SGD.NKQTTVSNSQQAYQEAFEISKKEMQPTHPIRLGLALNFSVFYY..EILNSPEKACSLAKTAF...DEAIAELDTL..NEES..YKDSTLI.MQLLRDNLTLWTSENQGDE ... < data omitted for brevity > Q42058/7-112 RDTFVYLAKLSEXAERYEEMVESMKSVAKLNV............DLTVEERNLLSVGYKNVIGSRRASWRIFSSIEQKEAVKG.NDXNVKRIKEYMEKVELELSNIC.IDIMS.VLDEHLI---..------------------------.---.-------------------------------------------..-----------------...----------..----..-------.------------------- 1433_ENTHI/1-236 SEDCVFLSKLAEQSERYDEMVQYMKQVAALNT............ELSVEERNLLSVAYKNVIGSRRASWRIITSLEQKEQAKG.NDKHVEIIKGYRAKIEDELAKYC.DDVLK.VIKENLLPNA..STSESKVFYKKMEGDYYRYYAEFT.VDE.KRQEVADKSLAAYTEATEISNADLAPTHPIRLGLALNFSVFYY..EIMNDADKACQLAKQAF...DDSIAKLDEV..PESS..YKDSTLI.MQLLRDNLTLWTSDTADEE 1431_ENTHI/4-239 REDCVYTAKLAEQSERYDEMVQCMKQVAEMEA............ELSIEERNLLSVAYKNVIGAKRASWRIISSLEQKEQAKG.NDKHVEIIKGYRAKIEKELSTCC.DDVLK.VIQENLLPKA..STSESKVFFKKMEGDYYRYFAEFT.VDE.KRKEVADKSLAAYTEATEISNAELAPTHPIRLGLALNFSVFYF..EIMNDADKACQLAKQAF...DDAIAKLDEV..PENM..YKDSTLI.MQLLRDNLTLWTSDACDEE 1432_ENTHI/4-238 REDLVYLSKLAEQSERYEEMVQYMKQVAEMGT............ELSVEERNLISVAYKNVVGSRRASWRIISSLEQKEQAKG.NTQRVELIKTYRAKIEQELSQKC.DDVLK.IITEFLLKNS..TSIESKVFFKKMEGDYYRYYAEFT.VDE.KRKEVADKSLAAYQEATDTA-ASLVPTHPIRLGLALNFSVFYY..QIMNDADKACQLAKEAF...DEAIQKLDEV..PEES..YKESTLI.MQLLRDNLTLWTSDMGDDE Q9UAH0/5-239 REELIYMAKIAEQTERFEDMLEYMKKVVQTGQ............ELSVEERNLLSVAYKNTVGSRRSAWRSISAIQQKEESKG.S-KHLDLLTNQKKKIETELNLYC.DDILK.LLNDFLIKNA..TNAEAQVFFLKMKGDYYRYIAEYA.QGD.EHKKAADGALDSYNKACEIANSELRPTHPIRLGLALNFSVFHY..EVLNDPSKACTLAKTAF...DEAIGDIERI..QEDQ..YKDATTI.MQLIRDNLTLWTSEFQDDA Q9XYW8/1-233 -----YLAMLAEQCSRYKEMVQFLEDMVKQRD...........kDLNSDERNLLSIAYKNSISGGRSAVRTIMAYEAKEKKKE.NSTFLPYITEYKKQVEDELTKLC.QGVLK.TTDEQLLKKA..EDDEAKVFYIKMKGDYNRYIAEYA.EGD.LKKQVSDDALKAYDEATEIA-KTLPVLNPIALGLALNFSVFYY..EVINDHKKAIEIAKAAV...EKADKELPNI..DEDAdeNRDTVSI.YNLLKENLDMWVSEEEGDQ O61173/2-194 -EKNVYLAMLAEQCSRYKEMVQFLEDMVKQRD...........kDLNSDERNLLSIAYKNSISGGRSAVRTIMAYEAKEKKKE.NSTFLPYITEYKKQVEDELTKLC.QGVLK.TTDEQLLKKA..EDDEAKVFYIKMKGDYNRYIAEYA.EGD.LKKQVSDDALKAYDEATEIA-KTLPVLNPIALGLALNFSVFYY..EVINDHKKAIEIAKA--...----------..----..-------.------------------- Q9XYW7/4-229 -EKQVYLAMLAEQCSRYEDMMTFLEDMVKAKA...........eDLSSDERNLLSIAYKNTISLDRQAIRTLLAYESKEAKKA.ESPYLDYIKEYKAKVQKELEDLC.NKINR.TIDDNLLPKA..TTDEAKVFYHKMKGDYCRYIAENV.DGD.TKKKYSDEGLAAYNAALEAA-KNIDYKNPVKLGLALNLSVFYY..EVVGNKDEACKLAEDTLsksKEALNGADE-..EEDE..VKDAMSI.VNLLEENL----------- Q9SFK4/1-214 -----------------------MRKVCELDI............ELSEEERDLLTTGYKNVMEAKRVSLRVISSIEKMEDSKG.NDQNVKLIKGQQEMVKYEFFNVC.NDILS.LIDSHLIPST.tTNVESIVLFNRVKGDYFRYMAEFG.SDA.ERKENADNSLDAYKVAMEMAENSLAPTNMVRLGLALNFSIFNY..EIHKSIESACKLVKKAY...DEAITELDGL..DKNI..CEESMYI.IEMLKYNLSTWTSGDGNGN Q9XZV0/2-235 KEELLNRCKLNDLIENYGEMFEYLKELSHIKI............DLQPDELDLITRCTKCYIGHKRGQYRKILTLIDKDKIVD.NQKNSALLEILRKKLSEEILLLC.NSTIE.LSQNFLNNNV..FPKKTQLFFTKIIADHYRYIYEIN.GKE.DIKLKAKEYYE--KGLQTIKTCKYNSTETAYLTFYLNYSVFLH..DTMRNTEESIKVSKACL...YEALKDTEDI..VDNS..QKDIVLL.CQMLKDNISLWKTETNEDN #=GC SS_cons HHHHHHHHHHHHHTTCHHHHHHHHHHHHTTSC............CCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHCTTT--.CCHHHHHHHHHHHHHHHHHHHHH.HHHHH.HHHHTTTTCC..CSCHHHHHHHHHHHHHHHHHHHHC.CSC.HHHHHHHHHHHHHHHHHHHHHCHCCTTCHCHHHHHHHHHHHHC..HTSCCHHHCHHHHHHHH...HHHHTTCGGC..CTTT..HHHHHHH.HHHHHHHHHHCTCCCXXXX #=GC SA_cons 26310320300350512510050022003352............4045500400120033002310402420152179179--.38752510440144014203510.43002.0035201642..754403000010100011100201.867.7465125302500340252067635113122100001001127..31372485135106412...5415867932..3994..6651462.142043126627759XXXX //
PHYLIP multiple sequence alignment format (interleaved) is the format used in the Phylip package for phylogenetic analysis. This format imposes a restriction of 10 characters on all sequence names, which can cause problems when converting data from other formats. For further information see:
http://evolution.genetics.washington.edu/phylip.html |
4 131 IXI_234 TSPASIRPPA GPSSRPAMVS SRRTRPSPPG PRRPTGRPCC SAAPRRPQAT IXI_235 TSPASIRPPA GPSSR----- ----RPSPPG PRRPTGRPCC SAAPRRPQAT IXI_236 TSPASIRPPA GPSSRPAMVS SR--RPSPPP PRRPPGRPCC SAAPPRPQAT IXI_237 TSPASLRPPA GPSSRPAMVS SRR-RPSPPG PRRPT----C SAAPRRPQAT GGWKTCSGTC TTSTSTRHRG RSGWSARTTT AACLRASRKS MRAACSRSAG GGWKTCSGTC TTSTSTRHRG RSGW------ ----RASRKS MRAACSRSAG GGWKTCSGTC TTSTSTRHRG RSGWSARTTT AACLRASRKS MRAACSR--G GGYKTCSGTC TTSTSTRHRG RSGYSARTTT AACLRASRKS MRAACSR--G SRPNRFAPTL MSSCITSTTG PPAWAGDRSH E SRPNRFAPTL MSSCITSTTG PPAWAGDRSH E SRPPRFAPPL MSSCITSTTG PPPPAGDRSH E SRPNRFAPTL MSSCLTSTTG PPAYAGDRSH E
PHYLIP multiple sequence alignment format (non-interleaved) is the format used in older versions of the Phylip package for phylogenetic analysis. This format imposes a restriction of 10 characters on all sequence names, which can cause problems when converting data from other formats. For further information see:
http://evolution.genetics.washington.edu/phylip.html |
The non-interleaved format was used in Phylip version 3.2. It is also called phylip3 for back compatibility with earlier EMBOSS versions.
4 131 IXI_234 TSPASIRPPA GPSSRPAMVS SRRTRPSPPG PRRPTGRPCC SAAPRRPQAT GGWKTCSGTC TTSTSTRHRG RSGWSARTTT AACLRASRKS MRAACSRSAG SRPNRFAPTL MSSCITSTTG PPAWAGDRSH E IXI_235 TSPASIRPPA GPSSR----- ----RPSPPG PRRPTGRPCC SAAPRRPQAT GGWKTCSGTC TTSTSTRHRG RSGW------ ----RASRKS MRAACSRSAG SRPNRFAPTL MSSCITSTTG PPAWAGDRSH E IXI_236 TSPASIRPPA GPSSRPAMVS SR--RPSPPP PRRPPGRPCC SAAPPRPQAT GGWKTCSGTC TTSTSTRHRG RSGWSARTTT AACLRASRKS MRAACSR--G SRPPRFAPPL MSSCITSTTG PPPPAGDRSH E IXI_237 TSPASLRPPA GPSSRPAMVS SRR-RPSPPG PRRPT----C SAAPRRPQAT GGYKTCSGTC TTSTSTRHRG RSGYSARTTT AACLRASRKS MRAACSR--G SRPNRFAPTL MSSCLTSTTG PPAYAGDRSH E
Raw format is similar to "text/plain" format except that it removes any whitespace or digits, accepts only alphabetic characters and rejects anything else. Thus it is generally safer to use this format than "text/plain" format. If the file contains digits, spaces or TAB characters they are removed. Any other non-alphabetic characters (for example, punctuation marks) will cause the file to be rejected as erroneous. Gap characters ('-
') and translated STOP codon characters ('*
') are, however, legal.
ttcctctttctcgactccatcttcgcggtagctgggaccgccgttcagtc gccaatatgcagctctttgtccgcgcccaggagctacacaccttcgaggt gaccggccaggaaacggtcgcccagatcaaggctcatgtagcctcactgg agggcattgccccggaagatcaagtcgtgctcctggcaggcgcgcccctg gaggatgaggccactctgggccagtgcggggtggaggccctgactaccct ggaagtagcaggccgcatgcttggaggtaaagttcatggttccctggccc gtgctggaaaagtgagaggtcagactcctaaggtggccaaacaggagaag aagaagaagaagacaggtcgggctaagcggcggatgcagtacaaccggcg ctttgtcaacgttgtgcccacctttggcaagaagaagggccccaatgcca actcttaagtcttttgtaattctggctttctctaataaaaaagccactta gttcagtcaaaaaaaaaa
RefseqP entry format supports all the fields in the latest NCBI protein reference sequence database format. RefseqP is a database maintained by NCBI. For more information see:
http://www.ncbi.nlm.nih.gov/RefSeq/ |
Where RefseqP format is used for output, fields for which data are available will be completed and others with no information will be omitted. Exactly what data will be present depends very much on the source of input sequences. The EMBOSS command line allows data, such as accession numbers, to be provided if they do not form part of the input sequence data (see Section 6.4, “Datatype-specific Command Line Qualifiers”).
GenPept currently uses the same parser as the closely related RefseqP format so these can be used interchangeably until the original formats diverge.
LOCUS NP_001988 133 aa linear PRI 29-MAR-2009 DEFINITION ubiquitin-like protein fubi and ribosomal protein S30 precursor [Homo sapiens]. ACCESSION NP_001988 VERSION NP_001988.1 GI:4503659 DBSOURCE REFSEQ: accession NM_001997.3 KEYWORDS . SOURCE Homo sapiens (human) ORGANISM Homo sapiens Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo. REFERENCE 1 (residues 1 to 133) AUTHORS Yu,Y., Ji,H., Doudna,J.A. and Leary,J.A. TITLE Mass spectrometric analysis of the human 40S ribosomal subunit: native and HCV IRES-bound complexes JOURNAL Protein Sci. 14 (6), 1438-1446 (2005) PUBMED 15883184 REFERENCE 2 (residues 1 to 133) AUTHORS Andersen,J.S., Lam,Y.W., Leung,A.K., Ong,S.E., Lyon,C.E., Lamond,A.I. and Mann,M. TITLE Nucleolar proteome dynamics JOURNAL Nature 433 (7021), 77-83 (2005) PUBMED 15635413 REFERENCE 3 (residues 1 to 133) AUTHORS Mourtada-Maarabouni,M., Kirkham,L., Farzaneh,F. and Williams,G.T. TITLE Regulation of apoptosis by fau revealed by functional expression cloning and antisense expression JOURNAL Oncogene 23 (58), 9419-9426 (2004) PUBMED 15543234 REMARK GeneRIF: Overexpression of fau in the sense orientation induces cell death, which is inhibited both by Bcl-2 and by inhibition of caspases, in line with its proposed role in apoptosis REFERENCE 4 (residues 1 to 133) AUTHORS Kapp,L.D. and Lorsch,J.R. TITLE The molecular mechanics of eukaryotic translation JOURNAL Annu. Rev. Biochem. 73, 657-704 (2004) PUBMED 15189156 REMARK Review article REFERENCE 5 (residues 1 to 133) AUTHORS Rossman,T.G., Visalli,M.A. and Komissarova,E.V. TITLE fau and its ubiquitin-like domain (FUBI) transforms human osteogenic sarcoma (HOS) cells to anchorage-independence JOURNAL Oncogene 22 (12), 1817-1821 (2003) PUBMED 12660817 REMARK GeneRIF: role in transforming osteogenic sarcoma cells to anchorage-independence REFERENCE 6 (residues 1 to 133) AUTHORS Vladimirov,S.N., Ivanov,A.V., Karpova,G.G., Musolyamov,A.K., Egorov,T.A., Thiede,B., Wittmann-Liebold,B. and Otto,A. TITLE Characterization of the human small-ribosomal-subunit proteins by N-terminal and internal sequencing, and mass spectrometry JOURNAL Eur. J. Biochem. 239 (1), 144-149 (1996) PUBMED 8706699 REFERENCE 7 (residues 1 to 133) AUTHORS Wool,I.G., Chan,Y.L. and Gluck,A. TITLE Structure and evolution of mammalian ribosomal proteins JOURNAL Biochem. Cell Biol. 73 (11-12), 933-947 (1995) PUBMED 8722009 REMARK Review article REFERENCE 8 (residues 1 to 133) AUTHORS Michiels,L., Van der Rauwelaert,E., Van Hasselt,F., Kas,K. and Merregaert,J. TITLE fau cDNA encodes a ubiquitin-like-S30 fusion protein and is expressed as an antisense sequence in the Finkel-Biskis-Reilly murine sarcoma virus JOURNAL Oncogene 8 (9), 2537-2546 (1993) PUBMED 8395683 REFERENCE 9 (residues 1 to 133) AUTHORS Kas,K., Schoenmakers,E., van de Ven,W., Weber,G., Nordenskjold,M., Michiels,L., Merregaert,J. and Larsson,C. TITLE Assignment of the human FAU gene to a subregion of chromosome 11q13 JOURNAL Genomics 17 (2), 387-392 (1993) PUBMED 8406491 REFERENCE 10 (residues 1 to 133) AUTHORS Kas,K., Michiels,L. and Merregaert,J. TITLE Genomic structure and expression of the human fau gene: encoding the ribosomal protein S30 fused to a ubiquitin-like protein JOURNAL Biochem. Biophys. Res. Commun. 187 (2), 927-933 (1992) PUBMED 1326960 COMMENT REVIEWED REFSEQ: This record has been curated by NCBI staff in collaboration with Francesco Amaldi. The reference sequence was derived from BP296770.1, X65923.1 and AK026639.1. Summary: This gene is the cellular homolog of the fox sequence in the Finkel-Biskis-Reilly murine sarcoma virus (FBR-MuSV). It encodes a fusion protein consisting of the ubiquitin-like protein fubi at the N terminus and ribosomal protein S30 at the C terminus. It has been proposed that the fusion protein is post-translationally processed to generate free fubi and free ribosomal protein S30. Fubi is a member of the ubiquitin family, and ribosomal protein S30 belongs to the S30E family of ribosomal proteins. Whereas the function of fubi is currently unknown, ribosomal protein S30 is a component of the 40S subunit of the cytoplasmic ribosome. Pseudogenes derived from this gene are present in the genome. Similar to ribosomal protein S30, ribosomal proteins S27a and L40 are synthesized as fusion proteins with ubiquitin. [provided by RefSeq]. Publication Note: This RefSeq record includes a subset of the publications that are available for this gene. Please see the Entrez Gene record to access additional publications. FEATURES Location/Qualifiers source 1..133 /organism="Homo sapiens" /db_xref="taxon:9606" /chromosome="11" /map="11q13" Protein 1..133 /product="ubiquitin-like protein fubi and ribosomal protein S30 precursor" /note="FBR-MuSV-associated ubiquitously expressed; ubiquitin-like-S30 fusion protein; 40S ribosomal protein S30; ubiquitin-like protein fubi; FAU-encoded ubiquitin-like protein; Monoclonal nonspecific suppressor factor beta; Finkel-Biskis-Reilly murine sarcoma virus (FBR-MuSV) ubiquitously expressed (fox derived)" /calculated_mol_wt=14259 Region 1..74 /region_name="Fubi" /note="Fubi is a ubiquitin-like protein encoded by the fau gene which has an N-terminal ubiquitin-like domain (also referred to as FUBI) fused to the ribosomal protein S30. Fubi is thought to be a tumor suppressor protein and the FUBI domain may act as a...; cd01793" /db_xref="CDD:29195" mat_peptide 1..74 /product="ubiquitin-like protein fubi" /calculated_mol_wt=7760 Region 74..133 /region_name="Ribosomal_S30" /note="Ribosomal protein S30; cl02062" /db_xref="CDD:141357" mat_peptide 75..133 /product="ribosomal protein S30" /calculated_mol_wt=6648 CDS 1..133 /gene="FAU" /gene_synonym="asr1; FAU1; FLJ22986; Fub1; Fubi; MNSFbeta; RPS30" /coded_by="NM_001997.3:108..509" /db_xref="CCDS:CCDS8095.1" /db_xref="GeneID:2197" /db_xref="HGNC:3597" /db_xref="HPRD:00002" /db_xref="MIM:134690" ORIGIN 1 mqlfvraqel htfevtgqet vaqikahvas legiapedqv vllagapled eatlgqcgve 61 alttlevagr mlggkvhgsl aragkvrgqt pkvakqekkk kktgrakrrm qynrrfvnvv 121 ptfgkkkgpn ans //
SELEX is an interleaved multiple sequence alignment format used by Sean Eddy's HMMER package. HMMER is a freely distributable implementation of profile HMM software for protein sequence analysis. SELEX format can store RNA secondary structure as part of the sequence annotation. For further information see:
http://hmmer.janelia.org/ |
#=SQ IXI_234 1.00 - - 0..0:0 - #=SQ IXI_235 1.00 - - 0..0:0 - #=SQ IXI_236 1.00 - - 0..0:0 - #=SQ IXI_237 1.00 - - 0..0:0 - IXI_234 TSPASIRPPAGPSSRPAMVSSRRTRPSPPGPRRPTGRPCCSAAPRRPQAT IXI_235 TSPASIRPPAGPSSR---------RPSPPGPRRPTGRPCCSAAPRRPQAT IXI_236 TSPASIRPPAGPSSRPAMVSSR--RPSPPPPRRPPGRPCCSAAPPRPQAT IXI_237 TSPASLRPPAGPSSRPAMVSSRR-RPSPPGPRRPT----CSAAPRRPQAT IXI_234 GGWKTCSGTCTTSTSTRHRGRSGWSARTTTAACLRASRKSMRAACSRSAG IXI_235 GGWKTCSGTCTTSTSTRHRGRSGW----------RASRKSMRAACSRSAG IXI_236 GGWKTCSGTCTTSTSTRHRGRSGWSARTTTAACLRASRKSMRAACSR--G IXI_237 GGYKTCSGTCTTSTSTRHRGRSGYSARTTTAACLRASRKSMRAACSR--G IXI_234 SRPNRFAPTLMSSCITSTTGPPAWAGDRSHE IXI_235 SRPNRFAPTLMSSCITSTTGPPAWAGDRSHE IXI_236 SRPPRFAPPLMSSCITSTTGPPPPAGDRSHE IXI_237 SRPNRFAPTLMSSCLTSTTGPPAYAGDRSHE
The format used by older versions of the Staden package. Staden is a package of programs for sequence handling and analysis that is particularly useful for analysis of sequence trace data and large scale sequence assembly projects. For further information see:
http://staden.sourceforge.net/ |
Staden stores single sequencing experiment reads in a format derived from EMBL. All EMBL tags are allowed, plus many extras. Unusually, the extra tags are allowed to continue beyond the //
line which only marks the end of the sequence. The EX
experiment line is used to create a sequence description. Accuracy values are stored, or at least the largest value for each sequence position. Optional comments may be inserted at any position within the sequence. When EMBOSS reads Staden format, it recognizes a comment at the top of the sequence but considers comments inside the sequence as part of the sequence. In addition, some alternative nucleotide ambiguity codes are used and must be converted.
Staden format is now obsolete: the latest version of the Staden package does not support it. EMBOSS retains it to accept old data files. Use the "Staden experiment" format (see below) with the latest Staden version.
<X65923----> ttcctctttctcgactccatcttcgcggtagctgggaccgccgttcagtcgccaatatgc agctctttgtccgcgcccaggagctacacaccttcgaggtgaccggccaggaaacggtcg cccagatcaaggctcatgtagcctcactggagggcattgccccggaagatcaagtcgtgc tcctggcaggcgcgcccctggaggatgaggccactctgggccagtgcggggtggaggccc tgactaccctggaagtagcaggccgcatgcttggaggtaaagttcatggttccctggccc gtgctggaaaagtgagaggtcagactcctaaggtggccaaacaggagaagaagaagaaga agacaggtcgggctaagcggcggatgcagtacaaccggcgctttgtcaacgttgtgccca cctttggcaagaagaagggccccaatgccaactcttaagtcttttgtaattctggctttc tctaataaaaaagccacttagttcagtcaaaaaaaaaa
Format used by DNA Strider, which is a molecular sequence editor with integrated tools for DNA and protein sequence analysis. For further information see:
http://nar.oxfordjournals.org/cgi/reprint/16/5/1829 |
; ### from DNA Strider ;-) ; DNA sequence HSFAU, 518 bases ; ttcctctttctcgactccatcttcgcggtagctgggaccgccgttcagtc gccaatatgcagctctttgtccgcgcccaggagctacacaccttcgaggt gaccggccaggaaacggtcgcccagatcaaggctcatgtagcctcactgg agggcattgccccggaagatcaagtcgtgctcctggcaggcgcgcccctg gaggatgaggccactctgggccagtgcggggtggaggccctgactaccct ggaagtagcaggccgcatgcttggaggtaaagttcatggttccctggccc gtgctggaaaagtgagaggtcagactcctaaggtggccaaacaggagaag aagaagaagaagacaggtcgggctaagcggcggatgcagtacaaccggcg ctttgtcaacgttgtgcccacctttggcaagaagaagggccccaatgcca actcttaagtcttttgtaattctggctttctctaataaaaaagccactta gttcagtcaaaaaaaaaa // ; ### from DNA Strider ;-) ; DNA sequence HSFAU1, 2016 bases ; ctaccattttccctctcgattctatatgtacactcgggacaagttctcct gatcgaaaacggcaaaactaaggccccaagtaggaatgccttagttttcg gggttaacaatgattaacactgagcctcacacccacgcgatgccctcagc tcctcgctcagcgctctcaccaacagccgtagcccgcagccccgctggac accggttctccatccccgcagcgtagcccggaacatggtagctgccatct ttacctgctacgccagccttctgtgcgcgcaactgtctggtcccgccccg tcctgcgcgagctgctgcccaggcaggttcgccggtgcgagcgtaaaggg gcggagctaggactgccttgggcggtacaaatagcagggaaccgcgcggt cgctcagcagtgacgtgacacgcagcccacggtctgtactgacgcgccct cgcttcttcctctttctcgactccatcttcgcggtagctgggaccgccgt tcaggtaagaatggggccttggctggatccgaagggcttgtagcaggttg gctgcggggtcagaaggcgcggggggaaccgaagaacggggcctgctccg tggccctgctccagtccctatccgaactccttgggaggcactggccttcc gcacgtgagccgccgcgaccaccatcccgtcgcgatcgtttctggaccgc tttccactcccaaatctcctttatcccagagcatttcttggcttctctta caagccgtcttttctttactcagtcgccaatatgcagctctttgtccgcg cccaggagctacacaccttcgaggtgaccggccaggaaacggtcgcccag atcaaggtaaggctgcttggtgcgccctgggttccattttcttgtgctct tcactctcgcggcccgagggaacgcttacgagccttatctttccctgtag gctcatgtagcctcactggagggcattgccccggaagatcaagtcgtgct cctggcaggcgcgcccctggaggatgaggccactctgggccagtgcgggg tggaggccctgactaccctggaagtagcaggccgcatgcttggaggtgag tgagagaggaatgttctttgaagtaccggtaagcgtctagtgagtgtggg gtgcatagtcctgacagctgagtgtcacacctatggtaatagagtacttc tcactgtcttcagttcagagtgattcttcctgtttacatccctcatgttg aacacagacgtccatgggagactgagccagagtgtagttgtatttcagtc acatcacgagatcctagtctggttatcagcttccacactaaaaattaggt cagaccaggccccaaagtgctctataaattagaagctggaagatcctgaa atgaaacttaagatttcaaggtcaaatatctgcaactttgttctcattac ctattgggcgcagcttctctttaaaggcttgaattgagaaaagaggggtt ctgctgggtggcaccttcttgctcttacctgctggtgccttcctttccca ctacaggtaaagtccatggttccctggcccgtgctggaaaagtgagaggt cagactcctaaggtgagtgagagtattagtggtcatggtgttaggacttt ttttcctttcacagctaaaccaagtccctgggctcttactcggtttgcct tctccctccctggagatgagcctgagggaagggatgctaggtgtggaaga caggaaccagggcctgattaaccttcccttctccaggtggccaaacagga gaagaagaagaagaagacaggtcgggctaagcggcggatgcagtacaacc ggcgctttgtcaacgttgtgcccacctttggcaagaagaagggccccaat gccaactcttaagtcttttgtaattctggctttctctaataaaaaagcca cttagttcagtcatcgcattgtttcatctttacttgcaaggcctcaggga gaggtgtgcttctcgg //
SwissProt entry format, including all fields in the latest format. UniProtKB / SwissProt is a curated protein sequence database with high quality annotation on the protein function, domain structure, post-translational modifications, variants, etc. It has a minimal level of redundancy and high level of integration with other databases. For further information see:
http://www.expasy.org/sprot/ |
Where SwissProt format is used for output, fields for which data are available will be completed and others with no information will omitted. Exactly what data will be present depends very much on the source of input sequences. The EMBOSS command line allows data, such as accession numbers, to be provided if they do not form part of the input sequence data (see Section 6.4, “Datatype-specific Command Line Qualifiers”).
ID UBR5_RAT Reviewed; 920 AA. AC Q62671; DT 01-NOV-1997, integrated into UniProtKB/Swiss-Prot. DT 05-MAR-2002, sequence version 2. DT 16-JUN-2009, entry version 67. DE RecName: Full=E3 ubiquitin-protein ligase UBR5; DE EC=6.3.2.-; DE AltName: Full=E3 ubiquitin-protein ligase, HECT domain-containing 1; DE AltName: Full=Hyperplastic discs protein homolog; DE AltName: Full=100 kDa protein; DE Flags: Fragment; GN Name=Ubr5; Synonyms=Dd5, Edd, Edd1, Hyd; OS Rattus norvegicus (Rat). OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; OC Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Sciurognathi; OC Muroidea; Muridae; Murinae; Rattus. OX NCBI_TaxID=10116; RN [1] RP NUCLEOTIDE SEQUENCE [MRNA]. RC STRAIN=Wistar; TISSUE=Testis; RX MEDLINE=92253337; PubMed=1533713; DOI=10.1093/nar/20.7.1471; RA Mueller D., Rehbein M., Baumeister H., Richter D.; RT "Molecular characterization of a novel rat protein structurally RT related to poly(A) binding proteins and the 70K protein of the U1 RT small nuclear ribonucleoprotein particle (snRNP)."; RL Nucleic Acids Res. 20:1471-1475(1992). RN [2] RP ERRATUM. RA Mueller D., Rehbein M., Baumeister H., Richter D.; RL Nucleic Acids Res. 20:2624-2624(1992). RN [3] RP IDENTIFICATION OF PROBABLE FRAMESHIFT. RX MEDLINE=99153743; PubMed=10030672; DOI=10.1038/sj.onc.1202249; RA Callaghan M.J., Russell A.J., Woollatt E., Sutherland G.R., RA Sutherland R.L., Watts C.K.W.; RT "Identification of a human HECT family protein with homology to the RT Drosophila tumor suppressor gene hyperplastic discs."; RL Oncogene 17:3479-3491(1998). RN [4] RP TISSUE SPECIFICITY, AND DEVELOPMENTAL STAGE. RX PubMed=12239083; DOI=10.1210/en.2002-220262; RA Oughtred R., Bedard N., Adegoke O.A.J., Morales C.R., Trasler J., RA Rajapurohitam V., Wing S.S.; RT "Characterization of rat100, a 300-kilodalton ubiquitin-protein ligase RT induced in germ cells of the rat testis and similar to the Drosophila RT hyperplastic discs gene."; RL Endocrinology 143:3740-3747(2002). CC -!- FUNCTION: E3 ubiquitin-protein ligase which is a component of the CC N-end rule pathway. Recognizes and binds to proteins bearing CC specific amino-terminal residues that are destabilizing according CC to the N-end rule, leading to their ubiquitination and subsequent CC degradation (By similarity). May be involved in maturation and/or CC post-transcriptional regulation of mRNA. May play a role in CC control of cell cycle progression. May have tumor suppressor CC function. Regulates DNA topoisomerase II binding protein (TopBP1) CC for the DNA damage response. Plays an essential role in CC extraembryonic development (By similarity). CC -!- PATHWAY: Protein modification; protein ubiquitination. CC -!- SUBUNIT: Binds TOPBP1 (By similarity). CC -!- SUBCELLULAR LOCATION: Nucleus (By similarity). CC -!- TISSUE SPECIFICITY: Highest levels found in testis. Also present CC in liver, kidney, lung and brain. CC -!- DEVELOPMENTAL STAGE: In early postnatal life, expression in the CC testis increases to reach a maximum around day 28. CC -!- PTM: Phosphorylated upon DNA damage, probably by ATM or ATR (By CC similarity). CC -!- MISCELLANEOUS: A cysteine residue is required for ubiquitin- CC thioester formation. CC -!- SIMILARITY: Contains 1 HECT (E6AP-type E3 ubiquitin-protein CC ligase) domain. CC -!- SIMILARITY: Contains 1 PABC domain. CC -!- SEQUENCE CAUTION: CC Sequence=CAA45756.1; Type=Frameshift; Positions=30; CC ----------------------------------------------------------------------- CC Copyrighted by the UniProt Consortium, see http://www.uniprot.org/terms CC Distributed under the Creative Commons Attribution-NoDerivs License CC ----------------------------------------------------------------------- DR EMBL; X64411; CAA45756.1; ALT_FRAME; mRNA. DR IPI; IPI00207158; -. DR PIR; S22659; S22659. DR UniGene; Rn.54812; -. DR HSSP; O95071; 1I2T. DR SMR; Q62671; 515-575. DR PhosphoSite; Q62671; -. DR Ensembl; ENSRNOG00000006816; Rattus norvegicus. DR RGD; 621236; Dd5. DR HOVERGEN; Q62671; -. DR ArrayExpress; Q62671; -. DR GermOnline; ENSRNOG00000006816; Rattus norvegicus. DR GO; GO:0005634; C:nucleus; IEA:UniProtKB-SubCell. DR GO; GO:0003723; F:RNA binding; IEA:InterPro. DR GO; GO:0004842; F:ubiquitin-protein ligase activity; TAS:RGD. DR GO; GO:0019941; P:modification-dependent protein catabolic pr...; IEA:UniProtKB-KW. DR GO; GO:0016567; P:protein ubiquitination; TAS:RGD. DR InterPro; IPR000569; HECT. DR InterPro; IPR002004; PABP_HYD. DR Gene3D; G3DSA:1.10.1900.10; PABP_HYD; 1. DR Pfam; PF00632; HECT; 1. DR Pfam; PF00658; PABP; 1. DR SMART; SM00119; HECTc; 1. DR SMART; SM00517; PolyA; 1. DR PROSITE; PS50237; HECT; 1. DR PROSITE; PS51309; PABC; 1. PE 2: Evidence at transcript level; KW Ligase; Nucleus; Phosphoprotein; Ubl conjugation pathway. FT CHAIN <1 920 E3 ubiquitin-protein ligase UBR5. FT /FTId=PRO_0000086933. FT DOMAIN 499 576 PABC. FT DOMAIN 583 920 HECT. FT COMPBIAS 108 119 Asp/Glu-rich (acidic). FT COMPBIAS 158 181 Pro-rich. FT COMPBIAS 451 470 Arg/Glu-rich (mixed charge). FT COMPBIAS 479 488 Arg/Asp-rich (mixed charge). FT COMPBIAS 610 621 Asp/Glu-rich (acidic). FT COMPBIAS 858 878 Pro-rich. FT ACT_SITE 889 889 Glycyl thioester intermediate (By FT similarity). FT MOD_RES 91 91 Phosphothreonine (By similarity). FT MOD_RES 193 193 Phosphoserine (By similarity). FT MOD_RES 607 607 Phosphoserine (By similarity). FT NON_TER 1 1 SQ SEQUENCE 920 AA; 103950 MW; 465771084536C3AA CRC64; ARRERMTARE EASLRTLEGR RRATLLSARQ GMMSARGDFL NYALSLMRSH NDEHSDVLPV LDVCSLKHVA YVFQALIYWI KAMNQQTTLD TPQLERKRTR ELLELGIDNE DSEHENDDDT SQSATLNDKD DESLPAETGQ NHPFFRRSDS MTFLGCIPPN PFEVPLAEAI PLADQPHLLQ PNARKEDLFG RPSQGLYSSS AGSGKCLVEV TMDRNCLEVL PTKMSYAANL KNVMNMQNRQ KKAGEDQSML AEEADSSKPG PSAHDVAAQL KSSLLAEIGL TESEGPPLTS FRPQCSFMGM VISHDMLLGR WRLSLELFGR VFMEDVGAEP GSILTELGGF EVKESKFRRE MEKLRNQQSR DLSLEVDRDR DLLIQQTMRQ LNNHFGRRCA TTPMAVHRVK VTFKDEPGEG SGVARSFYTA IAQAFLSNEK LPNLDCIQNA NKGTHTSLMQ RLRNRGERDR EREREREMRR SSGLRAGSRR DRDRDFRRQL SIDTRPFRPA SEGNPSDDPD PLPAHRQALG ERLYPRVQAM QPAFASKITG MLLELSPAQL LLLLASEDSL RARVEEAMEL IVAHGRENGA DSILDLGLLD SSEKVQENRK RHGSSRSVVD MDLDDTDDGD DNAPLFYQPG KRGFYTPRPG KNTEARLNCF RNIGRILGLC LLQNELCPIT LNRHVIKVLL GRKVNWHDFA FFDPVMYESL RQLILASQSS DADAVFSAMD LAFAVDLCKE EGGGQVELIP NGVNIPVTPQ NVYEYVRKYA EHRMLVVAEQ PLHAMRKGLL DVLPKNSLED LTAEDFRLLV NGCGEVNVQM LISFTSFNDE SGENAEKLLQ FKRWFWSIVE RMSMTERQDL VYFWTSSPSL PASEEGFQPM PSITIRPPDD QHLPTANTCI SRLYVPLYSS KQILKQKLLL AIKTKNFGFV //
Plain is the "no format" format: the entire file contents are read in as a sequence; the file must contain no annotation, comments or heading lines. Anything is acceptable in this format. This means that any character will be included in the sequence, even digits and punctuation. This format is not detetected automatically. Specify -sformat
text only when you are sure that the input sequence file is correct and contains only what you want to be considered as your sequence. The safer "raw" format reads only sequence characters and rejects input with other data. The raw format can be detected automatically.
ttcctctttctcgactccatcttcgcggtagctgggaccgccgttcagtc gccaatatgcagctctttgtccgcgcccaggagctacacaccttcgaggt gaccggccaggaaacggtcgcccagatcaaggctcatgtagcctcactgg agggcattgccccggaagatcaagtcgtgctcctggcaggcgcgcccctg gaggatgaggccactctgggccagtgcggggtggaggccctgactaccct ggaagtagcaggccgcatgcttggaggtaaagttcatggttccctggccc gtgctggaaaagtgagaggtcagactcctaaggtggccaaacaggagaag aagaagaagaagacaggtcgggctaagcggcggatgcagtacaaccggcg ctttgtcaacgttgtgcccacctttggcaagaagaagggccccaatgcca actcttaagtcttttgtaattctggctttctctaataaaaaagccactta gttcagtcaaaaaaaaaa
Format used by the Treecon package for the construction and drawing of evolutionary distance trees. For further information see:
http://bioinformatics.psb.ugent.be/software/details/TREECON |
2016 HSFAU ttcctctttctcgactccatcttcgcggtagctgggaccgccgttcagtcgccaatatgcagctctttgtccgcgcccaggagctacacaccttcgaggtgaccggccaggaaacggtcgcccagatcaaggctcatgtagcctcactggagggcattgccccggaagatcaagtcgtgctcctggcaggcgcgcccctggaggatgaggccactctgggccagtgcggggtggaggccctgactaccctggaagtagcaggccgcatgcttggaggtaaagttcatggttccctggcccgtgctggaaaagtgagaggtcagactcctaaggtggccaaacaggagaagaagaagaagaagacaggtcgggctaagcggcggatgcagtacaaccggcgctttgtcaacgttgtgcccacctttggcaagaagaagggccccaatgccaactcttaagtcttttgtaattctggctttctctaataaaaaagccacttagttcagtcaaaaaaaaaa HSFAU1 ctaccattttccctctcgattctatatgtacactcgggacaagttctcctgatcgaaaacggcaaaactaaggccccaagtaggaatgccttagttttcggggttaacaatgattaacactgagcctcacacccacgcgatgccctcagctcctcgctcagcgctctcaccaacagccgtagcccgcagccccgctggacaccggttctccatccccgcagcgtagcccggaacatggtagctgccatctttacctgctacgccagccttctgtgcgcgcaactgtctggtcccgccccgtcctgcgcgagctgctgcccaggcaggttcgccggtgcgagcgtaaaggggcggagctaggactgccttgggcggtacaaatagcagggaaccgcgcggtcgctcagcagtgacgtgacacgcagcccacggtctgtactgacgcgccctcgcttcttcctctttctcgactccatcttcgcggtagctgggaccgccgttcaggtaagaatggggccttggctggatccgaagggcttgtagcaggttggctgcggggtcagaaggcgcggggggaaccgaagaacggggcctgctccgtggccctgctccagtccctatccgaactccttgggaggcactggccttccgcacgtgagccgccgcgaccaccatcccgtcgcgatcgtttctggaccgctttccactcccaaatctcctttatcccagagcatttcttggcttctcttacaagccgtcttttctttactcagtcgccaatatgcagctctttgtccgcgcccaggagctacacaccttcgaggtgaccggccaggaaacggtcgcccagatcaaggtaaggctgcttggtgcgccctgggttccattttcttgtgctcttcactctcgcggcccgagggaacgcttacgagccttatctttccctgtaggctcatgtagcctcactggagggcattgccccggaagatcaagtcgtgctcctggcaggcgcgcccctggaggatgaggccactctgggccagtgcggggtggaggccctgactaccctggaagtagcaggccgcatgcttggaggtgagtgagagaggaatgttctttgaagtaccggtaagcgtctagtgagtgtggggtgcatagtcctgacagctgagtgtcacacctatggtaatagagtacttctcactgtcttcagttcagagtgattcttcctgtttacatccctcatgttgaacacagacgtccatgggagactgagccagagtgtagttgtatttcagtcacatcacgagatcctagtctggttatcagcttccacactaaaaattaggtcagaccaggccccaaagtgctctataaattagaagctggaagatcctgaaatgaaacttaagatttcaaggtcaaatatctgcaactttgttctcattacctattgggcgcagcttctctttaaaggcttgaattgagaaaagaggggttctgctgggtggcaccttcttgctcttacctgctggtgccttcctttcccactacaggtaaagtccatggttccctggcccgtgctggaaaagtgagaggtcagactcctaaggtgagtgagagtattagtggtcatggtgttaggactttttttcctttcacagctaaaccaagtccctgggctcttactcggtttgccttctccctccctggagatgagcctgagggaagggatgctaggtgtggaagacaggaaccagggcctgattaaccttcccttctccaggtggccaaacaggagaagaagaagaagaagacaggtcgggctaagcggcggatgcagtacaaccggcgctttgtcaacgttgtgcccacctttggcaagaagaagggccccaatgccaactcttaagtcttttgtaattctggctttctctaataaaaaagccacttagttcagtcatcgcattgtttcatctttacttgcaaggcctcagggagaggtgtgcttctcgg