MATGEN3D documentation


 

CONTENTS

1.0 SUMMARY
2.0 INPUTS & OUTPUTS
3.0 INPUT FILE FORMAT
4.0 OUTPUT FILE FORMAT
5.0 DATA FILES
6.0 USAGE
7.0 KNOWN BUGS & WARNINGS
8.0 NOTES
9.0 DESCRIPTION
10.0 ALGORITHM
11.0 RELATED APPLICATIONS
12.0 DIAGNOSTIC ERROR MESSAGES
13.0 AUTHORS
14.0 REFERENCES



1.0 SUMMARY

Generate a 3D-1D scoring matrix from CCF files

Generates a 3D-1D scoring matrix from CCF files (clean coordinate files).


2.0 INPUTS & OUTPUTS

MATGEN3D reads a DCF file (domain classification file) and a directory of domain CCF files (clean coordinate files) which have been processed by using PDBPLUS so that they contain solvent accessibility and secondary structure information. The directory must contain a CCF file for the first domain from each family represented in the DCF file. A matrix of 3D:1D scores (environment:residue scoring matrix), reflecting the probability of the amino acids occuring in different tertiary environments, is calculated from the CCF files of the first domain from each family only. The path of the CCF files is specified by the user and the file extensions is specified in the ACD file. Two log files of informative messages are also written.


3.0 INPUT FILE FORMAT

The format of domain CCF files is described in the DOMAINER documentation.
The format of the DCF file (domain classification file) is described in the SCOPPARSE documentation and the CATHPARSE documentation.

Input files for usage example

File: ../scopparse-keep/all.scop

ID   D1CS4A_
XX
EN   1CS4
XX
TY   SCOP
XX
SI   53931 CL; 54861 FO; 55073 SF; 55074 FA; 55077 DO; 55078 SO; 39418 DD;
XX
CL   Alpha and beta proteins (a+b)
XX
FO   Ferredoxin-like
XX
SF   Adenylyl and guanylyl cyclase catalytic domain
XX
FA   Adenylyl and guanylyl cyclase catalytic domain
XX
DO   Adenylyl cyclase VC1, domain C1a
XX
OS   Dog (Canis familiaris)
XX
NC   1
XX
CN   [1]
XX
CH   A CHAIN; . START; . END;
//
ID   D1II7A_
XX
EN   1II7
XX
TY   SCOP
XX
SI   53931 CL; 56299 FO; 56300 SF; 64427 FA; 64428 DO; 64429 SO; 62415 DD;
XX
CL   Alpha and beta proteins (a+b)
XX
FO   Metallo-dependent phosphatases
XX
SF   Metallo-dependent phosphatases
XX
FA   DNA double-strand break repair nuclease
XX
DO   Mre11
XX
OS   Archaeon Pyrococcus furiosus
XX
NC   1
XX
CN   [1]
XX
CH   A CHAIN; . START; . END;
//




4.0 OUTPUT FILE FORMAT

The matrix of 3D:1D scores (environment:residue scoring matrix, Figure 1) follows the standard EMBOSS format. Single letter amino acid codes are column labels and environments are row labels. The environment labels are strings of 2 characters, beginning with AA, AB, AC, through to AZ, BA, BB etc. The final row and column have the label '*' and give default substitution values (the minimum from the entire matrix). In the example shown (Figure 1), only two environments AA through AX are defined but only AA to AB are given scores owing to the sparse input data for this example (typically, all environments would receive scores).

Output files for usage example

File: matgen3d.calc


SUMiNijArr
A	0
B	0
C	0
D	10
E	0
F	4
G	5
H	0
I	5
J	6
K	0
L	6
M	12
N	0
O	6
P	14
Q	0
R	8
S	4
T	0
U	9

NiArr
A	10
B	0
C	3
D	6
E	11
F	7
G	4
H	3
I	7
J	0
K	4
L	11
M	1
N	4
O	0
P	1
Q	5
R	3
S	2
T	3
U	0
V	3
W	0
X	0
Y	1


  [Part of this file has been deleted for brevity]


B	0.000

C	0.034

D	0.067

E	0.124

F	0.079

G	0.045

H	0.034

I	0.079

J	0.000

K	0.045

L	0.124

M	0.011

N	0.045

O	0.000

P	0.011

Q	0.056

R	0.034

S	0.022

T	0.034

U	0.000

V	0.034

W	0.000

X	0.000

Y	0.011

Z	0.000

File: matgen3d.log

D1CS4A_

R:D-2 S:C A:63.80
	OEnv AO:D
R:I-3 S:T A:28.30
	OEnv AF:I
R:E-4 S:T A:113.00
	OEnv AU:E
R:G-5 S:T A:104.20
	OEnv AU:G
R:F-6 S:H A:23.70
	OEnv AD:F
R:T-7 S:H A:111.60
	OEnv AS:T
R:S-8 S:H A:84.60
	OEnv AP:S
R:L-9 S:H A:57.60
	OEnv AJ:L
R:A-10 S:H A:24.00
	OEnv AD:A
R:S-11 S:T A:78.20
	OEnv AR:S
R:Q-12 S:T A:78.00
	OEnv AR:Q
R:C-13 S:T A:28.80
	OEnv AF:C
R:T-14 S:C A:72.00
	OEnv AO:T
R:A-15 S:H A:103.00
	OEnv AS:A
R:Q-16 S:H A:88.10
	OEnv AP:Q
R:E-17 S:H A:66.30
	OEnv AM:E
R:L-18 S:H A:17.70
	OEnv AD:L
R:V-19 S:H A:79.50
	OEnv AP:V
R:M-20 S:H A:78.40
	OEnv AP:M
R:T-21 S:H A:74.90
	OEnv AM:T
R:L-22 S:H A:16.50
	OEnv AD:L
R:N-23 S:H A:94.00
	OEnv AS:N
R:E-24 S:H A:74.00
	OEnv AM:E
R:L-25 S:H A:40.60
	OEnv AG:L


  [Part of this file has been deleted for brevity]

	OEnv AD:F
R:A-26 S:H A:59.60
	OEnv AJ:A
R:E-27 S:H A:62.30
	OEnv AM:E
R:A-28 S:H A:76.30
	OEnv AP:A
R:F-29 S:H A:20.30
	OEnv AD:F
R:K-30 S:H A:64.60
	OEnv AM:K
R:N-31 S:H A:68.30
	OEnv AM:N
R:A-32 S:H A:71.80
	OEnv AM:A
R:L-33 S:H A:42.20
	OEnv AG:L
R:E-34 S:H A:56.50
	OEnv AJ:E
R:I-35 S:H A:70.20
	OEnv AM:I
R:A-36 S:H A:27.70
	OEnv AD:A
R:V-37 S:H A:89.80
	OEnv AP:V
R:Q-38 S:H A:80.20
	OEnv AP:Q
R:E-39 S:H A:73.50
	OEnv AM:E
R:N-40 S:C A:109.90
	OEnv AU:N
R:V-41 S:C A:44.50
	OEnv AI:V
R:D-42 S:C A:105.60
	OEnv AU:D
R:F-43 S:C A:83.50
	OEnv AR:F
R:I-44 S:C A:43.50
	OEnv AI:I
R:L-45 S:C A:87.00
	OEnv AR:L
R:I-46 S:C A:30.40
	OEnv AI:I
R:A-47 S:C A:108.40
	OEnv AU:A
R:G-48 S:C A:102.60
	OEnv AU:G
R:D-49 S:C A:72.50
	OEnv AO:D
R:L-50 S:C A:55.40
	OEnv AL:L

File: matgen3d.out

# 3D-1D Scoring matrix created by matgen3d
# ajResidueEnv1
# Total SCOP entries: 2
# No. of files opened: 2
# No. of files not opened: 0
         A     R     N     D     C     Q     E     G     H     I     L     K     M     F     P     S     T     W     Y     V     B     Z     X     *
AA    0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00 -2.64
AB    0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00 -2.64
AC    0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00 -2.64
AD    0.98  1.09  0.80  0.39  1.09  0.58 -0.21  0.80  1.09  0.24  0.48  0.80  2.19  1.63  2.19  1.49  1.09 -2.30  2.19  1.09 -2.30 -2.30 -2.30 -2.64
AE    0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00 -2.64
AF    0.80  2.00  1.72  1.31  2.00  1.49  0.70  1.72  2.00  1.85  0.70  1.72  3.10  1.16  3.10  2.41  2.00 -1.39  3.10  2.00 -1.39 -1.39 -1.39 -2.64
AG    0.58  1.78  1.49  1.09  1.78  1.27  0.48  1.49  1.78  0.93  1.17  1.49  2.88  0.93  2.88  2.19  1.78 -1.61  2.88  1.78 -1.61 -1.61 -1.61 -2.64
AH    0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00 -2.64
AI    0.58  1.78  1.49  1.09  1.78  1.27  0.48  1.49  1.78  2.03  0.48  1.49  2.88  0.93  2.88  2.19  1.78 -1.61  2.88  1.78 -1.61 -1.61 -1.61 -2.64
AJ    1.09  1.60  1.31  0.91  1.60  1.09  0.99  1.31  1.60  0.75  0.30  1.31  2.70  0.75  2.70  2.00  1.60 -1.79  2.70  1.60 -1.79 -1.79 -1.79 -2.64
AK    0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00 -2.64
AL    0.39  1.60  1.31  0.91  1.60  1.09  0.30  1.31  1.60  0.75  0.30  1.31  2.70  0.75  2.70  2.00  1.60 -1.79  2.70  1.60 -1.79 -1.79 -1.79 -2.64
AM   -0.30  0.91  0.62  0.21  0.91  0.39  0.99  0.62  0.91  0.06 -0.39  0.62  2.00  0.06  2.00  1.31  0.91 -2.48  2.00  0.91 -2.48 -2.48 -2.48 -2.64
AN    0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00 -2.64
AO    0.39  1.60  1.31  1.60  1.60  1.09  0.30  1.31  1.60  0.75  0.30  1.31  2.70  0.75  2.70  2.00  1.60 -1.79  2.70  1.60 -1.79 -1.79 -1.79 -2.64
AP    0.24  0.75  0.46  0.06  0.75  1.34  0.14  0.46  0.75 -0.10 -0.55  0.46  1.85 -0.10  1.85  1.16  0.75 -2.64  1.85  1.44 -2.64 -2.64 -2.64 -2.64
AQ    0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00 -2.64
AR    0.11  1.31  1.02  0.62  1.31  0.80  0.01  1.02  1.31  0.46  0.01  1.02  2.41  0.46  2.41  1.72  1.31 -2.08  2.41  1.31 -2.08 -2.08 -2.08 -2.64
AS    0.80  2.00  1.72  1.31  2.00  1.49  0.70  1.72  2.00  1.16  0.70  1.72  3.10  1.16  3.10  2.41  2.00 -1.39  3.10  2.00 -1.39 -1.39 -1.39 -2.64
AT    0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00 -2.64
AU   -0.01  1.19  0.91  0.50  1.19  0.68 -0.11  2.00  1.19  0.35 -0.11  0.91  2.29  0.35  2.29  1.60  1.19 -2.20  2.29  1.19 -2.20 -2.20 -2.20 -2.64
*    -2.64 -2.64 -2.64 -2.64 -2.64 -2.64 -2.64 -2.64 -2.64 -2.64 -2.64 -2.64 -2.64 -2.64 -2.64 -2.64 -2.64 -2.64 -2.64 -2.64 -2.64 -2.64 -2.64 -2.64




5.0 DATA FILES

MATGEN3D does not use any data files.


6.0 USAGE

6.1 COMMAND LINE ARGUMENTS

Generate a 3D-1D scoring matrix from CCF files.
Version: EMBOSS:6.4.0.0

   Standard (Mandatory) qualifiers (* if not always prompted):
   -mode               menu       [1] This option specifies the amino acid
                                  residue positions to use in calculating the
                                  substitution data. (Values: 1 (All amino
                                  acid positions for domains in a DCF file); 2
                                  (Ligand-binding positions.))
*  -model              menu       [1] This option specifies whether to use all
                                  ligands or a specific set of ligands
                                  present in a CON file when calculating the
                                  substitution data. (Values: 1 (All ligands
                                  within a CON file); 2 (Select ligands within
                                  a CON file.))
*  -dcfinfile          infile     This option specifies the name of DCF file
                                  (domain classification file) (input). A
                                  'domain classification file' contains
                                  classification and other data for domains
                                  from SCOP or CATH, in DCF format
                                  (EMBL-like). The files are generated by
                                  using SCOPPARSE and CATHPARSE. Domain
                                  sequence information can be added to the
                                  file by using DOMAINSEQS.
*  -coninfile          infile     This option specifies the location of CON
                                  files (contact files) (output). A 'contact
                                  file' contains contact data for a protein or
                                  a domain from SCOP or CATH, in the CON
                                  format (EMBL-like). The contacts may be
                                  intra-chain residue-residue, inter-chain
                                  residue-residue or residue-ligand. The files
                                  are generated by using CONTACTS, INTERFACE
                                  and SITES.
*  -liginfile          infile     This option specifies the location of the
                                  ligand list file. This file contains a list
                                  of ligand ('heterogen') 3-character
                                  identifier codes. One id code should be
                                  given per line.
  [-ccfddir]           directory  [./] This option specifies the location of
                                  CCF files (clean coordinate files) (input)
                                  which have been processed by using PDBPLUS.
                                  A 'clean cordinate file' contains protein
                                  coordinate and derived data for a single PDB
                                  file ('protein clean coordinate file') or a
                                  single domain from SCOP or CATH ('domain
                                  clean coordinate file'), in CCF format
                                  (EMBL-like). The files, generated by using
                                  PDBPARSE (PDB files) or DOMAINER (domains),
                                  contain 'cleaned-up' data that is
                                  self-consistent and error-corrected. Records
                                  for residue solvent accessibility and
                                  secondary structure are added to the file by
                                  using PDBPLUS.
*  -ccfpdir            directory  [./] This option specifies the location of
                                  CCF files (clean coordinate files) (input)
                                  which have been processed by using PDBPLUS.
                                  A 'clean cordinate file' contains protein
                                  coordinate and derived data for a single PDB
                                  file ('protein clean coordinate file') or a
                                  single domain from SCOP or CATH ('domain
                                  clean coordinate file'), in CCF format
                                  (EMBL-like). The files, generated by using
                                  PDBPARSE (PDB files) or DOMAINER (domains),
                                  contain 'cleaned-up' data that is
                                  self-consistent and error-corrected. Records
                                  for residue solvent accessibility and
                                  secondary structure are added to the file by
                                  using PDBPLUS.
   -modee              menu       [1] This option specifies the environment
                                  definition. See matgen3d documentation for
                                  description of definitions. (Values: 1
                                  (Env1); 2 (Env2); 3 (Env3); 4 (Env4); 5
                                  (Env5); 6 (Env6); 7 (Env7); 8 (Env8); 9
                                  (Env9); 10 (Env10); 11 (Env11); 12 (Env12);
                                  13 (Env13); 14 (Env14); 15 (Env15); 16
                                  (Env16))
  [-scmatrixfile]      outfile    [matgen3d.out] Domainatrix substitution
                                  matrix output file
  [-calclogfile]       outfile    [matgen3d.calc] Domainatrix calculations log
                                  output file
  [-logfile]           outfile    [matgen3d.log] Domainatrix log output file

   Additional (Optional) qualifiers: (none)
   Advanced (Unprompted) qualifiers: (none)
   Associated qualifiers:

   "-ccfddir" associated qualifiers
   -extension1         string     Default file extension

   "-ccfpdir" associated qualifiers
   -extension          string     Default file extension

   "-scmatrixfile" associated qualifiers
   -odirectory2        string     Output directory

   "-calclogfile" associated qualifiers
   -odirectory3        string     Output directory

   "-logfile" associated qualifiers
   -odirectory4        string     Output directory

   General qualifiers:
   -auto               boolean    Turn off prompts
   -stdout             boolean    Write first file to standard output
   -filter             boolean    Read first file from standard input, write
                                  first file to standard output
   -options            boolean    Prompt for standard and additional values
   -debug              boolean    Write debug output to program.dbg
   -verbose            boolean    Report some/full command line options
   -help               boolean    Report command line options and exit. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning            boolean    Report warnings
   -error              boolean    Report errors
   -fatal              boolean    Report fatal errors
   -die                boolean    Report dying program messages
   -version            boolean    Report version number and exit

Qualifier Type Description Allowed values Default
Standard (Mandatory) qualifiers
-mode list This option specifies the amino acid residue positions to use in calculating the substitution data.
1 (All amino acid positions for domains in a DCF file)
2 (Ligand-binding positions.)
1
-model list This option specifies whether to use all ligands or a specific set of ligands present in a CON file when calculating the substitution data.
1 (All ligands within a CON file)
2 (Select ligands within a CON file.)
1
-dcfinfile infile This option specifies the name of DCF file (domain classification file) (input). A 'domain classification file' contains classification and other data for domains from SCOP or CATH, in DCF format (EMBL-like). The files are generated by using SCOPPARSE and CATHPARSE. Domain sequence information can be added to the file by using DOMAINSEQS. Input file Required
-coninfile infile This option specifies the location of CON files (contact files) (output). A 'contact file' contains contact data for a protein or a domain from SCOP or CATH, in the CON format (EMBL-like). The contacts may be intra-chain residue-residue, inter-chain residue-residue or residue-ligand. The files are generated by using CONTACTS, INTERFACE and SITES. Input file Required
-liginfile infile This option specifies the location of the ligand list file. This file contains a list of ligand ('heterogen') 3-character identifier codes. One id code should be given per line. Input file Required
[-ccfddir]
(Parameter 1)
directory This option specifies the location of CCF files (clean coordinate files) (input) which have been processed by using PDBPLUS. A 'clean cordinate file' contains protein coordinate and derived data for a single PDB file ('protein clean coordinate file') or a single domain from SCOP or CATH ('domain clean coordinate file'), in CCF format (EMBL-like). The files, generated by using PDBPARSE (PDB files) or DOMAINER (domains), contain 'cleaned-up' data that is self-consistent and error-corrected. Records for residue solvent accessibility and secondary structure are added to the file by using PDBPLUS. Directory ./
-ccfpdir directory This option specifies the location of CCF files (clean coordinate files) (input) which have been processed by using PDBPLUS. A 'clean cordinate file' contains protein coordinate and derived data for a single PDB file ('protein clean coordinate file') or a single domain from SCOP or CATH ('domain clean coordinate file'), in CCF format (EMBL-like). The files, generated by using PDBPARSE (PDB files) or DOMAINER (domains), contain 'cleaned-up' data that is self-consistent and error-corrected. Records for residue solvent accessibility and secondary structure are added to the file by using PDBPLUS. Directory ./
-modee list This option specifies the environment definition. See matgen3d documentation for description of definitions.
1 (Env1)
2 (Env2)
3 (Env3)
4 (Env4)
5 (Env5)
6 (Env6)
7 (Env7)
8 (Env8)
9 (Env9)
10 (Env10)
11 (Env11)
12 (Env12)
13 (Env13)
14 (Env14)
15 (Env15)
16 (Env16)
1
[-scmatrixfile]
(Parameter 2)
outfile Domainatrix substitution matrix output file Output file matgen3d.out
[-calclogfile]
(Parameter 3)
outfile Domainatrix calculations log output file Output file matgen3d.calc
[-logfile]
(Parameter 4)
outfile Domainatrix log output file Output file matgen3d.log
Additional (Optional) qualifiers
(none)
Advanced (Unprompted) qualifiers
(none)
Associated qualifiers
"-ccfddir" associated directory qualifiers
-extension1
-extension_ccfddir
string Default file extension Any string ccf
"-ccfpdir" associated directory qualifiers
-extension string Default file extension Any string ccf
"-scmatrixfile" associated outfile qualifiers
-odirectory2
-odirectory_scmatrixfile
string Output directory Any string  
"-calclogfile" associated outfile qualifiers
-odirectory3
-odirectory_calclogfile
string Output directory Any string  
"-logfile" associated outfile qualifiers
-odirectory4
-odirectory_logfile
string Output directory Any string  
General qualifiers
-auto boolean Turn off prompts Boolean value Yes/No N
-stdout boolean Write first file to standard output Boolean value Yes/No N
-filter boolean Read first file from standard input, write first file to standard output Boolean value Yes/No N
-options boolean Prompt for standard and additional values Boolean value Yes/No N
-debug boolean Write debug output to program.dbg Boolean value Yes/No N
-verbose boolean Report some/full command line options Boolean value Yes/No Y
-help boolean Report command line options and exit. More information on associated and general qualifiers can be found with -help -verbose Boolean value Yes/No N
-warning boolean Report warnings Boolean value Yes/No Y
-error boolean Report errors Boolean value Yes/No Y
-fatal boolean Report fatal errors Boolean value Yes/No Y
-die boolean Report dying program messages Boolean value Yes/No Y
-version boolean Report version number and exit Boolean value Yes/No N

6.2 EXAMPLE SESSION

An example of interactive use of MATGEN3D is shown below. Here is a sample session with matgen3d


% matgen3d 
Generate a 3D-1D scoring matrix from CCF files.
DCF file amino acid positions
         1 : All amino acid positions for domains in a DCF file
         2 : Ligand-binding positions.
This option specifies the amino acid residue positions to use in calculating the substitution data. [1]: 1
Domain classification file (optional): ../scopparse-keep/all.scop
Clean domain coordinates directory [./]: ../domainer-keep
Choose environment definition.
         1 : Env1
         2 : Env2
         3 : Env3
         4 : Env4
         5 : Env5
         6 : Env6
         7 : Env7
         8 : Env8
         9 : Env9
        10 : Env10
        11 : Env11
        12 : Env12
        13 : Env13
        14 : Env14
        15 : Env15
        16 : Env16
This option specifies the environment definition. See matgen3d documentation for description of definitions. [1]: 1
Domainatrix substitution matrix output file [matgen3d.out]: 
Domainatrix calculations log output file [matgen3d.calc]: 
Domainatrix log output file [matgen3d.log]: 

Go to the input files for this example
Go to the output files for this example




7.0 KNOWN BUGS & WARNINGS

The CCF files read must contain secondary structure and solvent accessibility data. These can be added to the file by using PDBPLUS


8.0 NOTES

8.1 GLOSSARY OF FILE TYPES

FILE TYPE FORMAT DESCRIPTION CREATED BY SEE ALSO
Domain classification file (for SCOP) DCF format (EMBL-like). Classification and other data for domains from SCOP. SCOPPARSE Domain sequence information can be added to the file by using DOMAINSEQS.
Domain classification file (for CATH) DCF format (EMBL-like). Classification and other data for domains from CATH. CATHPARSE Domain sequence information can be added to the file by using DOMAINSEQS.
Clean coordinate file (for domain) CCF format (EMBL-like). Protein coordinate and derived data for a single domain from SCOP or CATH. The data are 'cleaned-up': self-consistent and error-corrected. DOMAINER Records for residue solvent accessibility and secondary structure are added to the file by using PDBPLUS.



9.0 DESCRIPTION

MATGEN3D generates a matrix of 3D:1D scores which give the probability of finding a certain amino acid residue (1D) in a certain environment in space (3D). The environments are defined on the basis of secondary structure and solvent accessibility. The 3D:1D scores are calculated from the first domain only from each family represented in the DCF file (input). This ensures the scores are not biased to any particular family of proteins.


10.0 ALGORITHM

Environment definitions will be described here.


11.0 RELATED APPLICATIONS

See also

Program name Description
cathparse Generates DCF file from raw CATH files
domainalign Generate alignments (DAF file) for nodes in a DCF file
domainnr Removes redundant domains from a DCF file
domainrep Reorder DCF file to identify representative structures
domainseqs Adds sequence records to a DCF file
domainsse Add secondary structure records to a DCF file
helixturnhelix Identify nucleic acid-binding motifs in protein sequences
libgen Generate discriminating elements from alignments
pepcoil Predicts coiled coil regions in protein sequences
rocon Generates a hits file from comparing two DHF files
rocplot Performs ROC analysis on hits files
scopparse Generate DCF file from raw SCOP files
seqalign Extend alignments (DAF file) with sequences (DHF file)
seqfraggle Removes fragment sequences from DHF files
seqsort Remove ambiguous classified sequences from DHF files
seqwords Generates DHF files from keyword search of UniProt
ssematch Search a DCF file for secondary structure matches



12.0 DIAGNOSTIC ERROR MESSAGES

None.


13.0 AUTHORS

Waqas Awan
Jon Ison (jison@ebi.ac.uk)
The European Bioinformatics Institute Wellcome Trust Genome Campus Cambridge CB10 1SD UK


14.0 REFERENCES

Please cite the authors and EMBOSS.

Rice P, Longden I and Bleasby A (2000) "EMBOSS - The European Molecular Biology Open Software Suite" Trends in Genetics, 15:276-278.

See also http://emboss.sourceforge.net/

14.1 Other useful references