Appdoc:Checktrans

From EMBOSS

Jump to: navigation, search

Contents

Function

Reports STOP codons and ORF statistics of a protein

Description

checktrans reads a protein sequence containing stop characters and writes a statistical report of any open reading frames (ORFs) that are greater than a minimum size. An open reading frame is defined as a continuous region of protein sequence with no stop characters. The default minimum ORF size is 100 residues. In addition to the report output, any ORF sequences are written to file and features of those sequences written to a separate file.

Usage

Here is a sample session with checktrans


% checktrans 
Reports STOP codons and ORF statistics of a protein
Input protein sequence(s): paamir.pep
Minimum ORF Length to report [100]: 
Output file [paamir_1.checktrans]: 
output sequence(s) [paamir_1.fasta]: 
Features output [paamir_1.gff]: 

Go to the input files for this example
Go to the output files for this example

Command line arguments

Qualifier Type Description Allowed values Default
Standard (Mandatory) qualifiers
[-sequence]
(Parameter 1)
seqall Protein sequence(s) filename and optional format, or reference (input USA) Readable sequence(s) Required
-orfml integer Minimum ORF Length to report Integer 1 or more 100
[-outfile]
(Parameter 2)
outfile Output file name Output file <*>.checktrans
[-outseq]
(Parameter 3)
seqoutall Sequence file to hold output ORF sequences Writeable sequence(s) <*>.format
[-outfeat]
(Parameter 4)
featout File for output features Writeable feature table unknown.gff
Additional (Optional) qualifiers
-[no]addlast boolean An asterisk in the protein sequence indicates the position of a STOP codon. Checktrans assumes that all ORFs end in a STOP codon. Forcing the sequence to end with an asterisk, if there is not one there already, makes checktrans treat the end as a potential ORF. If an asterisk is added, it is not included in the reported count of STOPs. Boolean value Yes/No Yes
Advanced (Unprompted) qualifiers
(none)
Associated qualifiers
"-sequence" associated seqall qualifiers
-sbegin1
-sbegin_sequence
integer Start of each sequence to be used Any integer value 0
-send1
-send_sequence
integer End of each sequence to be used Any integer value 0
-sreverse1
-sreverse_sequence
boolean Reverse (if DNA) Boolean value Yes/No N
-sask1
-sask_sequence
boolean Ask for begin/end/reverse Boolean value Yes/No N
-snucleotide1
-snucleotide_sequence
boolean Sequence is nucleotide Boolean value Yes/No N
-sprotein1
-sprotein_sequence
boolean Sequence is protein Boolean value Yes/No N
-slower1
-slower_sequence
boolean Make lower case Boolean value Yes/No N
-supper1
-supper_sequence
boolean Make upper case Boolean value Yes/No N
-sformat1
-sformat_sequence
string Input sequence format Any string  
-sdbname1
-sdbname_sequence
string Database name Any string  
-sid1
-sid_sequence
string Entryname Any string  
-ufo1
-ufo_sequence
string UFO features Any string  
-fformat1
-fformat_sequence
string Features format Any string  
-fopenfile1
-fopenfile_sequence
string Features file name Any string  
"-outfile" associated outfile qualifiers
-odirectory2
-odirectory_outfile
string Output directory Any string  
"-outseq" associated seqoutall qualifiers
-osformat3
-osformat_outseq
string Output seq format Any string  
-osextension3
-osextension_outseq
string File name extension Any string  
-osname3
-osname_outseq
string Base file name Any string  
-osdirectory3
-osdirectory_outseq
string Output directory Any string  
-osdbname3
-osdbname_outseq
string Database name to add Any string  
-ossingle3
-ossingle_outseq
boolean Separate file for each entry Boolean value Yes/No N
-oufo3
-oufo_outseq
string UFO features Any string  
-offormat3
-offormat_outseq
string Features format Any string  
-ofname3
-ofname_outseq
string Features file name Any string  
-ofdirectory3
-ofdirectory_outseq
string Output directory Any string  
"-outfeat" associated featout qualifiers
-offormat4
-offormat_outfeat
string Output feature format Any string  
-ofopenfile4
-ofopenfile_outfeat
string Features file name Any string  
-ofextension4
-ofextension_outfeat
string File name extension Any string  
-ofdirectory4
-ofdirectory_outfeat
string Output directory Any string  
-ofname4
-ofname_outfeat
string Base file name Any string  
-ofsingle4
-ofsingle_outfeat
boolean Separate file for each entry Boolean value Yes/No N
General qualifiers
-auto boolean Turn off prompts Boolean value Yes/No N
-stdout boolean Write first file to standard output Boolean value Yes/No N
-filter boolean Read first file from standard input, write first file to standard output Boolean value Yes/No N
-options boolean Prompt for standard and additional values Boolean value Yes/No N
-debug boolean Write debug output to program.dbg Boolean value Yes/No N
-verbose boolean Report some/full command line options Boolean value Yes/No Y
-help boolean Report command line options. More information on associated and general qualifiers can be found with -help -verbose Boolean value Yes/No N
-warning boolean Report warnings Boolean value Yes/No Y
-error boolean Report errors Boolean value Yes/No Y
-fatal boolean Report fatal errors Boolean value Yes/No Y
-die boolean Report dying program messages Boolean value Yes/No Y

Input file format

This program reads the USA of a protein sequence with STOP codons in it.


Input example

File: paamir.pep

>PAAMIR_1 Pseudomonas aeruginosa amiC and amiR gene for aliphatic amidase regulation
GTAGRASARSPPAGRRELHDLPGEPGARAGSLRTALSDSHRRGNGWDRTRSGR*SACCSP
KPASPPISSARTRMAHCSRSSN*TARAASAVARSKRCPRTPAATRTAIGCAPRTSFATGG
YGSSWAATCRTRARR*CRWSSAPTRCSATRPPTRASSIRRTSSTAVRRRTRTVRRWRRT*
FATTASGWCSSARTTSIRGKATM*CATCIASTAARCSRKSTFRCIPPTTTCSAPSSASTR
RAPTWSSPPWWAPAPPSCIAPSPVATATAGGRRSPA*PPARRRWRRWRVTWQRGRWWSRL
TSPASIRPPAGPSSRPAMVSSRRTRPSPPGPRRPTGRPCCSAAPRRPQATGGWKTCSGTC
TTSTSTRHRGRSGWSARTTTAACLRASRKSMRAACSRSAGSRPNRFAPTLMSSCITSTTG
PPAWAGDRSHERQLAARQPARVAGAGPQPAGGGQRRPGLAADPHRLFGAPVLAAAGSLRR
AGGRGLHQHFPEWPPRRDRCAARRRDSAHYPGGAGGVRKPRGALADHRAGVPRRDHPAAR
CPPGAACAGIGAAHQRGNGEAEAEDRAAPGPHRRPGPDQPGQGVADAAPWLGRARGAPAP
VAGSDEAARADPEDRSGVAGKRAVRLSDPGRPEQ*QEGYRHHAGTGSAVRWRGAVSQCRL
VAGQDQRSGGGGDQLPGRRAERLRRVLPDLFRSSRAGLAEGRSADPAIRFYLSVGGRQPV
PRX

Output file format

This program writes three files: the ORF report file (x13776_1.checktrans), the output sequence file (x13776_1.fasta) and the feature file (x13776_1.out3) which is in GFF format by default.

The ORF report file gives the numeric count of the ORF, the position of the terminating STOP codon, the length of the ORF, its start and end positions and the name of the sequence it has been written out as.

The name of the output sequences is constructed from the name of the input sequence followed by an underscore and then the numeric count of the ORF (e.g. 'X13776_1_7').


Output example

File: paamir_1.checktrans

CHECKTRANS of PAAMIR_1 from 1 to 724

	ORF#	Pos	Len	ORF Range	Sequence name

	7	635	357	278-634	PAAMIR_1_7

	Total STOPS:     7


File: paamir_1.fasta

>PAAMIR_1_7
PPARRRWRRWRVTWQRGRWWSRLTSPASIRPPAGPSSRPAMVSSRRTRPSPPGPRRPTGR
PCCSAAPRRPQATGGWKTCSGTCTTSTSTRHRGRSGWSARTTTAACLRASRKSMRAACSR
SAGSRPNRFAPTLMSSCITSTTGPPAWAGDRSHERQLAARQPARVAGAGPQPAGGGQRRP
GLAADPHRLFGAPVLAAAGSLRRAGGRGLHQHFPEWPPRRDRCAARRRDSAHYPGGAGGV
RKPRGALADHRAGVPRRDHPAARCPPGAACAGIGAAHQRGNGEAEAEDRAAPGPHRRPGP
DQPGQGVADAAPWLGRARGAPAPVAGSDEAARADPEDRSGVAGKRAVRLSDPGRPEQ

File: paamir_1.gff

##gff-version 3
##sequence-region PAAMIR_1 1 634
#!Date 2009-07-15
#!Type Protein
#!Source-version EMBOSS 6.1.0
PAAMIR_1	checktrans	polypeptide_region	278	634	0.000	+	.	ID="PAAMIR_1.1"

Data files

None.

Notes

A reading frame is a relative position in DNA or RNA from which contiguous, non-overlapping codons are read during transcription. There are 3 possible reading frames in mRNA strand and six in a double stranded DNA where transcription is possible from either strand. An open reading frame (ORF) is a reading frame that begins with a start codon and includes the subsequent transcribed region, stopping immediately before the first stop codon.

Where you have a nucleotide sequence for analysis, it should first be translated by using transeq. The transeq output file will then serve as the input to checktrans. Note that if you have only translated a nucleic sequence in one frame, checktrans will miss possible ORFs in the other frames. You must provide checktrans with translations in all three (six?) frames in order for it to be effective at finding all possible ORFs.

References

None.

Warnings

None.

Diagnostic Error Messages

None.

Exit status

This program always exits with a status of 0.

Known bugs

None.

See also

backtranambig Back-translate a protein sequence to ambiguous nucleotide sequence
backtranseq Back-translate a protein sequence to a nucleotide sequence
charge Draw a protein charge plot
compseq Calculate the composition of unique words in sequences
emowse Search protein sequences by digest fragment molecular weight
freak Generate residue/base frequency table or plot
iep Calculate the isoelectric point of proteins
mwcontam Find weights common to multiple molecular weights files
mwfilter Filter noisy data from molecular weights file
octanol Draw a White-Wimley protein hydropathy plot
pepinfo Plot amino acid properties of a protein sequence in parallel
pepstats Calculates statistics of protein properties
pepwindow Draw a hydropathy plot for a protein sequence
pepwindowall Draw Kyte-Doolittle hydropathy plot for a protein alignment
wordcount Count and extract unique words in DNA sequence(s)

Author(s)

Rodrigo Lopez European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK

Please report all bugs to the EMBOSS bug team (emboss-bug (@) emboss.open-bio.org) not to the original author.

and modified by Gary Williams formerly at MRC Rosalind Franklin Centre for Genomics Research Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SB, UK

Please report all bugs to the EMBOSS bug team (emboss-bug (@) emboss.open-bio.org) not to the original author.

to output the sequence data to a single file in the conventional EMBOSS style.

History

Completed 24 Feb 2000 - Rodrigo Lopez

Modified 2 March 2000 - Gary Williams

Target users

This program is intended to be used by everyone and everything, from naive users to embedded scripts.

Comments

None

Personal tools