4.5. Secondary Structure Prediction

The question of how DNA sequence determines specific protein structure remains a difficult area; generally referred to as the 'folding problem', it is one of the major outstanding questions in molecular biology. Many attempts have been made to predict the tertiary structure of a protein from its sequence. These fall into two broad approaches:

The approach to structure prediction based on mechanical models has the attraction that, in theory, it requires no prior knowledge of protein tertiary structure. If successful it could be applied uniformly to all sequences. By contrast, all methods based on inference from known structures are often limited, or biased, in their applicability. They are likely to be more appropriate for predicting structures similar to those which were used in the inference process. Fortunately there are often biophysical or biochemical clues that help make this decision and these are often integrated in the methods for structure prediction.

Currently the best way to achieve reasonable secondary structure predictions is to run a variety of prediction algorithms over your sequence and determine a consensus among the results. There are various web servers that will do these multiple analyses for you, for example Jpred at the University of Dundee:


As yet, coverage of secondary structure prediction within EMBOSS is limited. More algorithms will be added to enable the consensus approach described above. You'll take a look now at some of the predictions you can currently perform using EMBOSS.

4.5.1. pepinfo

pepinfo produces information on amino acid properties (size, polarity, aromaticity, charge etc). Hydrophobicity profiles are also available and are useful for locating turns, potential antigenic peptides and transmembrane helices. Various algorithms are employed including the Kyte and Doolittle hydropathy measure - this curve is the average of a residue-specific hydrophobicity index over a window of nine residues. When the line is in the upper half of the frame, it indicates a hydrophobic region, and when it is in the lower half, a hydrophilic region.

4.5.2. Exercise: pepinfo

% pepinfo L07770.pep
Plots simple amino acid properties in parallel
Graph type [x11]:
Output file [pepinfo.out]:

You will see two screens (press <RETURN> to move from the first to the second screen) that look like this:

pepinfo (1)

pepinfo (2)

4.5.3. Predicting Transmembrane Regions

The results from the pepinfo hydropathy plot showed seven highly hydrophobic regions within L07770.pep. Could these be transmembrane domains? You can use the EMBOSS program tmap to investigate this possibility.

4.5.4. Exercise: tmap

% tmap
Displays membrane spanning regions
Sequences file to be read in: L07770.pep
Graph type [x11]:

You will see a window that looks like this:


The bars across the top represent areas where transmembrane segments are predicted. Taken in combination with the results from pepinfo, you can see that there may be seven transmembrane helices in this protein. This corresponds well with both the SwissProt entry for this sequence (opsd_xenla) and with some information you will gather about patterns and profiles in the next section.

There are various other programs you can use to analyse your peptide sequence - to find out what is available, try rerunning wossname as you did earlier on.