|
|
sizeseq |
Please help by correcting and extending the Wiki pages.
% sizeseq -osformat swiss Sort sequences by size Input sequence set: globins.fasta Return longest sequence first [N]: output sequence(s) [globins.swiss]: |
Go to the input files for this example
Go to the output files for this example
Sort sequences by size
Version: EMBOSS:6.4.0.0
Standard (Mandatory) qualifiers:
[-sequences] seqset Sequence set filename and optional format,
or reference (input USA)
-descending boolean [N] By default the shortest sequence is
given first.
[-outseq] seqoutall [
|
| Qualifier | Type | Description | Allowed values | Default |
|---|---|---|---|---|
| Standard (Mandatory) qualifiers | ||||
| [-sequences] (Parameter 1) |
seqset | Sequence set filename and optional format, or reference (input USA) | Readable set of sequences | Required |
| -descending | boolean | By default the shortest sequence is given first. | Boolean value Yes/No | No |
| [-outseq] (Parameter 2) |
seqoutall | Sequence set(s) filename and optional format (output USA) | Writeable sequence(s) | <*>.format |
| Additional (Optional) qualifiers | ||||
| (none) | ||||
| Advanced (Unprompted) qualifiers | ||||
| -feature | boolean | Sequence feature information will be retained if this option is set. | Boolean value Yes/No | No |
| Associated qualifiers | ||||
| "-sequences" associated seqset qualifiers | ||||
| -sbegin1 -sbegin_sequences |
integer | Start of each sequence to be used | Any integer value | 0 |
| -send1 -send_sequences |
integer | End of each sequence to be used | Any integer value | 0 |
| -sreverse1 -sreverse_sequences |
boolean | Reverse (if DNA) | Boolean value Yes/No | N |
| -sask1 -sask_sequences |
boolean | Ask for begin/end/reverse | Boolean value Yes/No | N |
| -snucleotide1 -snucleotide_sequences |
boolean | Sequence is nucleotide | Boolean value Yes/No | N |
| -sprotein1 -sprotein_sequences |
boolean | Sequence is protein | Boolean value Yes/No | N |
| -slower1 -slower_sequences |
boolean | Make lower case | Boolean value Yes/No | N |
| -supper1 -supper_sequences |
boolean | Make upper case | Boolean value Yes/No | N |
| -sformat1 -sformat_sequences |
string | Input sequence format | Any string | |
| -sdbname1 -sdbname_sequences |
string | Database name | Any string | |
| -sid1 -sid_sequences |
string | Entryname | Any string | |
| -ufo1 -ufo_sequences |
string | UFO features | Any string | |
| -fformat1 -fformat_sequences |
string | Features format | Any string | |
| -fopenfile1 -fopenfile_sequences |
string | Features file name | Any string | |
| "-outseq" associated seqoutall qualifiers | ||||
| -osformat2 -osformat_outseq |
string | Output seq format | Any string | |
| -osextension2 -osextension_outseq |
string | File name extension | Any string | |
| -osname2 -osname_outseq |
string | Base file name | Any string | |
| -osdirectory2 -osdirectory_outseq |
string | Output directory | Any string | |
| -osdbname2 -osdbname_outseq |
string | Database name to add | Any string | |
| -ossingle2 -ossingle_outseq |
boolean | Separate file for each entry | Boolean value Yes/No | N |
| -oufo2 -oufo_outseq |
string | UFO features | Any string | |
| -offormat2 -offormat_outseq |
string | Features format | Any string | |
| -ofname2 -ofname_outseq |
string | Features file name | Any string | |
| -ofdirectory2 -ofdirectory_outseq |
string | Output directory | Any string | |
| General qualifiers | ||||
| -auto | boolean | Turn off prompts | Boolean value Yes/No | N |
| -stdout | boolean | Write first file to standard output | Boolean value Yes/No | N |
| -filter | boolean | Read first file from standard input, write first file to standard output | Boolean value Yes/No | N |
| -options | boolean | Prompt for standard and additional values | Boolean value Yes/No | N |
| -debug | boolean | Write debug output to program.dbg | Boolean value Yes/No | N |
| -verbose | boolean | Report some/full command line options | Boolean value Yes/No | Y |
| -help | boolean | Report command line options and exit. More information on associated and general qualifiers can be found with -help -verbose | Boolean value Yes/No | N |
| -warning | boolean | Report warnings | Boolean value Yes/No | Y |
| -error | boolean | Report errors | Boolean value Yes/No | Y |
| -fatal | boolean | Report fatal errors | Boolean value Yes/No | Y |
| -die | boolean | Report dying program messages | Boolean value Yes/No | Y |
| -version | boolean | Report version number and exit | Boolean value Yes/No | N |
The input is a standard EMBOSS sequence query (also known as a 'USA').
Major sequence database sources defined as standard in EMBOSS installations include srs:embl, srs:uniprot and ensembl
Data can also be read from sequence output in any supported format written by an EMBOSS or third-party application.
The input format can be specified by using the command-line qualifier -sformat xxx, where 'xxx' is replaced by the name of the required format. The available format names are: gff (gff3), gff2, embl (em), genbank (gb, refseq), ddbj, refseqp, pir (nbrf), swissprot (swiss, sw), dasgff and debug.
See: http://emboss.sf.net/docs/themes/SequenceFormats.html for further information on sequence formats.
>HBB_HUMAN Sw:Hbb_Human => HBB_HUMAN VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKV KAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGK EFTPPVQAAYQKVVAGVANALAHKYH >HBB_HORSE Sw:Hbb_Horse => HBB_HORSE VQLSGEEKAAVLALWDKVNEEEVGGEALGRLLVVYPWTQRFFDSFGDLSNPGAVMGNPKV KAHGKKVLHSFGEGVHHLDNLKGTFAALSELHCDKLHVDPENFRLLGNVLVVVLARHFGK DFTPELQASYQKVVAGVANALAHKYH >HBA_HUMAN Sw:Hba_Human => HBA_HUMAN VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGK KVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPA VHASLDKFLASVSTVLTSKYR >HBA_HORSE Sw:Hba_Horse => HBA_HORSE VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHFDLSHGSAQVKAHGK KVGDALTLAVGHLDDLPGALSNLSDLHAHKLRVDPVNFKLLSHCLLSTLAVHLPNDFTPA VHASLDKFLSSVSTVLTSKYR >MYG_PHYCA Sw:Myg_Phyca => MYG_PHYCA VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKTEAEMKASED LKKHGVTVLTALGAILKKKGHHEAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRHP GDFGADAQGAMNKALELFRKDIAAKYKELGYQG >GLB5_PETMA Sw:Glb5_Petma => GLB5_PETMA PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQEFFPKFKGLTT ADQLKKSADVRWHAERIINAVNDAVASMDDTEKMSMKLRDLSGKHAKSFQVDPQYFKVLA AVIADTVAAGDAGFEKLMSMICILLRSAY >LGB2_LUPLU Sw:Lgb2_Luplu => LGB2_LUPLU GALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKDLFSFLKGTSEVPQNNPEL QAHAGKVFKLVYEAAIQLQVTGVVVTDATLKNLGSVHVSKGVADAHFPVVKEAILKTIKE VVGAKWSEELNSAWTIAYDELAIVIKKEMNDAA |
The output is a standard EMBOSS sequence file.
The results can be output in one of several styles by using the command-line qualifier -osformat xxx, where 'xxx' is replaced by the name of the required format. The available format names are: embl, genbank, gff, pir, swiss, dasgff, debug, listfile, dbmotif, diffseq, excel, feattable, motif, nametable, regions, seqtable, simple, srs, table, tagseq.
See: http://emboss.sf.net/docs/themes/SequenceFormats.html for further information on sequence formats.
sizeseq rewrites the sequences in sorted order.
ID HBA_HUMAN Reviewed; 141 AA.
DE Sw:Hba_Human => HBA_HUMAN
SQ SEQUENCE 141 AA; 15126 MW; 34D13618E62A33C1 CRC64;
VLSPADKTNV KAAWGKVGAH AGEYGAEALE RMFLSFPTTK TYFPHFDLSH GSAQVKGHGK
KVADALTNAV AHVDDMPNAL SALSDLHAHK LRVDPVNFKL LSHCLLVTLA AHLPAEFTPA
VHASLDKFLA SVSTVLTSKY R
//
ID HBA_HORSE Reviewed; 141 AA.
DE Sw:Hba_Horse => HBA_HORSE
SQ SEQUENCE 141 AA; 15114 MW; 128B9100A4D8457A CRC64;
VLSAADKTNV KAAWSKVGGH AGEYGAEALE RMFLGFPTTK TYFPHFDLSH GSAQVKAHGK
KVGDALTLAV GHLDDLPGAL SNLSDLHAHK LRVDPVNFKL LSHCLLSTLA VHLPNDFTPA
VHASLDKFLS SVSTVLTSKY R
//
ID HBB_HUMAN Reviewed; 146 AA.
DE Sw:Hbb_Human => HBB_HUMAN
SQ SEQUENCE 146 AA; 15867 MW; EACBC707CFD466A1 CRC64;
VHLTPEEKSA VTALWGKVNV DEVGGEALGR LLVVYPWTQR FFESFGDLST PDAVMGNPKV
KAHGKKVLGA FSDGLAHLDN LKGTFATLSE LHCDKLHVDP ENFRLLGNVL VCVLAHHFGK
EFTPPVQAAY QKVVAGVANA LAHKYH
//
ID HBB_HORSE Reviewed; 146 AA.
DE Sw:Hbb_Horse => HBB_HORSE
SQ SEQUENCE 146 AA; 16008 MW; 734664793DA642EE CRC64;
VQLSGEEKAA VLALWDKVNE EEVGGEALGR LLVVYPWTQR FFDSFGDLSN PGAVMGNPKV
KAHGKKVLHS FGEGVHHLDN LKGTFAALSE LHCDKLHVDP ENFRLLGNVL VVVLARHFGK
DFTPELQASY QKVVAGVANA LAHKYH
//
ID GLB5_PETMA Reviewed; 149 AA.
DE Sw:Glb5_Petma => GLB5_PETMA
SQ SEQUENCE 149 AA; 16270 MW; CD2305FB92DACD59 CRC64;
PIVDTGSVAP LSAAEKTKIR SAWAPVYSTY ETSGVDILVK FFTSTPAAQE FFPKFKGLTT
ADQLKKSADV RWHAERIINA VNDAVASMDD TEKMSMKLRD LSGKHAKSFQ VDPQYFKVLA
AVIADTVAAG DAGFEKLMSM ICILLRSAY
//
ID LGB2_LUPLU Reviewed; 153 AA.
DE Sw:Lgb2_Luplu => LGB2_LUPLU
SQ SEQUENCE 153 AA; 16652 MW; FE29AB9DEF33AFC8 CRC64;
GALTESQAAL VKSSWEEFNA NIPKHTHRFF ILVLEIAPAA KDLFSFLKGT SEVPQNNPEL
QAHAGKVFKL VYEAAIQLQV TGVVVTDATL KNLGSVHVSK GVADAHFPVV KEAILKTIKE
VVGAKWSEEL NSAWTIAYDE LAIVIKKEMN DAA
//
ID MYG_PHYCA Reviewed; 153 AA.
DE Sw:Myg_Phyca => MYG_PHYCA
SQ SEQUENCE 153 AA; 17200 MW; 543D385C66FEE1E1 CRC64;
VLSEGEWQLV LHVWAKVEAD VAGHGQDILI RLFKSHPETL EKFDRFKHLK TEAEMKASED
LKKHGVTVLT ALGAILKKKG HHEAELKPLA QSHATKHKIP IKYLEFISEA IIHVLHSRHP
GDFGADAQGA MNKALELFRK DIAAKYKELG YQG
//
|
| Program name | Description |
|---|---|
| aligncopy | Reads and writes alignments |
| aligncopypair | Reads and writes pairs from alignments |
| biosed | Replace or delete sequence sections |
| codcopy | Copy and reformat a codon usage table |
| cutseq | Removes a section from a sequence |
| degapseq | Removes non-alphabetic (e.g. gap) characters from sequences |
| descseq | Alter the name or description of a sequence |
| entret | Retrieves sequence entries from flatfile databases and files |
| extractalign | Extract regions from a sequence alignment |
| extractfeat | Extract features from sequence(s) |
| extractseq | Extract regions from a sequence |
| featcopy | Reads and writes a feature table |
| featreport | Reads and writes a feature table |
| feattext | Return a feature table original text |
| listor | Write a list file of the logical OR of two sets of sequences |
| makenucseq | Create random nucleotide sequences |
| makeprotseq | Create random protein sequences |
| maskambignuc | Masks all ambiguity characters in nucleotide sequences with N |
| maskambigprot | Masks all ambiguity characters in protein sequences with X |
| maskfeat | Write a sequence with masked features |
| maskseq | Write a sequence with masked regions |
| newseq | Create a sequence file from a typed-in sequence |
| nohtml | Remove mark-up (e.g. HTML tags) from an ASCII text file |
| noreturn | Remove carriage return from ASCII files |
| nospace | Remove whitespace from an ASCII text file |
| notab | Replace tabs with spaces in an ASCII text file |
| notseq | Write to file a subset of an input stream of sequences |
| nthseq | Write to file a single sequence from an input stream of sequences |
| nthseqset | Reads and writes (returns) one set of sequences from many |
| pasteseq | Insert one sequence into another |
| revseq | Reverse and complement a nucleotide sequence |
| seqcount | Reads and counts sequences |
| seqret | Reads and writes (returns) sequences |
| seqretsetall | Reads and writes (returns) many sets of sequences |
| seqretsplit | Reads sequences and writes them to individual files |
| skipredundant | Remove redundant sequences from an input set |
| skipseq | Reads and writes (returns) sequences, skipping first few |
| splitsource | Split sequence(s) into original source sequences |
| splitter | Split sequence(s) into smaller sequences |
| trimest | Remove poly-A tails from nucleotide sequences |
| trimseq | Remove unwanted characters from start and end of sequence(s) |
| trimspace | Remove extra whitespace from an ASCII text file |
| union | Concatenate multiple sequences into a single sequence |
| vectorstrip | Removes vectors from the ends of nucleotide sequence(s) |
| yank | Add a sequence reference (a full USA) to a list file |
Please report all bugs to the EMBOSS bug team (emboss-bug © emboss.open-bio.org) not to the original author.