You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

395 lines
13 KiB
Plaintext

STRIDE: Protein secondary structure assignment
from atomic coordinates
Dmitrij Frishman & Patrick Argos
European Molecular Biology Laboratory
Postfach 102209, Meyerhofstr. 1
69012 Heidelberg
Germany
FRISHMAN@EMBL-HEIDELBERG.DE
ARGOS@EMBL-HEIDELBERG.DE
CONTENTS
1. About the method
2. Copyright notice
3. Availability
4. Installation
5. Using STRIDE
6. Output format
7. Bug reports and user feedback
8. References
----------------------------------------------------------------------
1. About the method
STRIDE [1] is a program to recognize secondary structural elements in
proteins from their atomic coordinates. It performs the same task as
DSSP by Kabsch and Sander [2] but utilizes both hydrogen bond energy
and mainchain dihedral angles rather than hydrogen bonds alone. It
relies on database-derived recognition parameters with the
crystallographers' secondary structure definitions as a standard-of-
truth. Please see Frishman and Argos [1] for detailed description of
the algorithm.
2. Copyright notice
All rights reserved, whether the whole or part of the program is
concerned. Permission to use, copy, and modify this software and its
documentation is granted for academic use, provided that:
i. this copyright notice appears in all copies of the software and
related documentation;
ii. the reference given below (Frishman and Argos, 1995) must be
cited in any publication of scientific results based in part or
completely on the use of the program;
iii. bugs will be reported to the authors.
The use of the software in commercial activities is not allowed
without a prior written commercial license agreement.
WARNING: STRIDE is provided "as-is" and without warranty of any kind,
express, implied or otherwise, including without limitation any
warranty of merchantability or fitness for a particular purpose. In no
event will the authors be liable for any special, incidental, indirect
or consequential damages of any kind, or any damages whatsoever
resulting from loss of data or profits, whether or not advised of the
possibility of damage, and on any theory of liability, arising out of
or in connection with the use or performance of this software.
For calculation of the residue solvent accessible area the program NSC
[3,4] is used and was kindly provided by Dr. F.Eisenhaber
(EISENHABER@EMBL-HEIDELBERG.DE). Please direct to him all questions
concerning specifically accessibility calculations.
3. Availability
Executables of STRIDE for several UNIX platforms, VAX/VMS, OpenVMS,
Dos and Mac together with documentation and source code are available
by anonymous FTP from ftp.ebi.ac.uk (directories
/pub/software/unix/stride, /pub/software/dos/stride,
/pub/software/vms/stride, /pub/software/mac/stride). We are willing to
compile the program for other architectures if temporary access to
them will be granted by an interested user.
Data files with STRIDE secondary structure assignments for the current
release of the PDB [5] databank are in the directory
/pub/databases/stride of the same site. Atomic coordinate sets can be
submitted for secondary structure assignment through electronic mail
to stride@embl-heildelberg.de. A mail message containing HELP in the
first line will be answered with appropriate instructions. See also
WWW page http://www.embl-heidelberg.de/stride/stride_info.html.
4. Installation
For UNIX, DOS and Mac no installation is needed. Just download the
executable corresponding to your platform, and you are all set. For
VAX and OpenVMS you need only to link the executable with a logical
name; for example:
yourlogicalname:= $ $yourdiskname:[your.directory.name]stride.exe
and then use yourlogicalname as the program name.
5. Using STRIDE
The only required parameter for STRIDE is the name of the file
containing a set of atomic coordinates in PDB [5] format. By default
STRIDE writes to standard output, i.e. your screen. On systems that
allow to redirect output you can do so to create a disk file. Help is
available if you just type STRIDE without parameters. The following
options are accepted:
-fFilename Write output to the file "Filename" rather than to
stdout.
-h Report hydrogen bonds. By default no hydrogen bond
information is included in the output.
-o Report secondary structure summary only.
-rId1Id2.. Read only chains Id1, Id2 etc. of the PDB file *). All
other chains will be ignored. By default all valid
protein chains are read.
-cId1Id2.. Process only chains Id1, Id2 ...etc *). Secondary
structure assignment will be produced only for these
chains, but other chains that are present will be taken
into account while calculating residue accessible
surface and detecting inter-chain hydrogen bonds and,
possibly, interchain beta-sheets. By default all
protein chains read are processed.
-mFilename Generate a Molscript [6] file. Using the program
Molscript by Per Craulis you can create a postscript
picture of your structure. You can manually edit the
Molscript file produced by STRIDE to achieve the
desired orientation and to include additional details.
-q[Filename] Generate sequence file in FASTA [7] format and die.
Filename is optional. If no file name is specified,
stdandard output is used.
All options are case- and position-insensitive.
Examples:
1. Calculate secondary structure assignment for 1ACP including
hydrogen bond information:
stride 1acp.brk -h
2. Calculate secondary structure assignment for 4RUB and write the
output to the file 4rub.str
stride 4rub.brk -f4rub.str
3. Calculate secondary structure assignment for chain B of 4RUB.
Ignore all other chains. Generate a Molscript file 4rub.mol.
stride 4rub.brk -rb -m4rub.mol
4. Calculate secondary structure assignment for chain C of 2GLS in
the presence of chains A and B. Report secondary structure
summary only.
stride 2gls.brk -rabc -cc -o
6. Output format
STRIDE produces output that is easily readable both visually and with
computer programs. The side effect of this conveniency is larger file
size of individual STRIDE entries. Every record is 79 symbols long and
has the following general format:
Position Description
1-3 Record code
4-5 Not used
6-73 Data
74-75 Not used
75-79 Four letter PDB code (if available)
Below follows the description of each record type.
Code Description and format of data
REM Remarks and blank lines
Format: free
HDR Header. Protein name, date of file creation and PDB code
Format: free
CMP Compound.Full name of the molecule and identifying
information
Format: free
SRC Species, organ, tissue, and mutant from which the molecule
has been obtained
Format: free
AUT Names of the structure authors
Format: free
CHN File name and PDB chain identifier*).
Format: File name beginning from position 6 followed
by one space and one-letter chain identifier
SEQ Amino acid sequence
Format: 6-9 First residue PDB number
11-60 Sequence
62-65 Last residue PDB number
STR Secondary structure summary
Format: 11-60 Secondary structure assignment **)
LOC Location of secondary structure elements
Format: 6-17 Element name
19-21 First residue name
32-26 First residue PDB number
28-28 First residue chain identifier
36-38 Last residue name
42-45 Last residue PDB number
47-47 Last residue chain identifier
ASG Detailed secondary structure assignment
Format: 6-8 Residue name
10-10 Protein chain identifier
12-15 PDB residue number
17-20 Ordinal residue number
25-25 One letter secondary structure code **)
27-39 Full secondary structure name
43-49 Phi angle
53-59 Psi angle
65-69 Residue solvent accessible area
DNR Donor residue
Format: 6-8 Donor residue name
10-10 Protein chain identifier
12-15 PDB residue number
17-20 Ordinal residue number
26-28 Acceptor residue name
30-30 Protein chain identifier
32-35 PDB residue number
37-40 Ordinal residue number
42-45 N..0 distance
47-52 N..O=C angle
54-59 O..N-C angle
61-66 Angle between the planes of donor
complex and O..N-C
68-73 angle between the planes of acceptor
complex and N..O=C
ACC Acceptor residue
Format: 6-8 Acceptor residue name
10-10 Protein chain identifier
12-15 PDB residue number
17-20 Ordinal residue number
26-28 Donor residue name
30-30 Protein chain identifier
32-35 PDB residue number
37-40 Ordinal residue number
42-45 N..0 distance
47-52 N..O=C angle
54-59 O..N-C angle
61-66 Angle between the planes of donor
complex and O..N-C
68-73 angle between the planes of acceptor
complex and N..O=C
HDR, CMP, SCR and AUT records are directly copied from the PDB file,
if supplied by the authors. If only the secondary structure summary is
requested, only CHN, SEQ, STR and LOC records will be output.
Hydrogen bond information (records DNR and ACC) was made very
redundant to facilitate human reading and will not be reported by
default.
*) IMPORTANT NOTE: if the protein chain identifier is ' ' (space), it
will be substituted by '-' (dash) everywhere in the STRIDE output.
The same is true for command line parameters involving chain
identifiers where you have to specify '-' instead of ' '.
**) One-letter secondary structure code is nearly the same as used in
DSSP [2] (see Frishman and Argos [1] for details):
H Alpha helix
G 3-10 helix
I PI-helix
E Extended conformation
B or b Isolated bridge
T Turn
C Coil (none of the above)
For each record (data line) except those with codes REM and STR the
number of fields is consistent and is readily suitable for processing
with external tools, such as awk, perl, etc.
7. Bug reports and user feedback
Please send your suggestions, questions and bug reports to
FRISHMAN@EMBL-HEIDELBERG.DE. Send your contact address to get
information on updates and new features.
8. References
1. Frishman,D & Argos,P. (1995) Knowledge-based secondary structure
assignment. Proteins: structure, function and genetics, 23,
566-579.
2. Kabsch,W. & Sander,C. (1983) Dictionary of protein secondary
structure: pattern recognition of hydrogen-bonded and
geometrical features. Biopolymers, 22: 2577-2637.
3. Eisenhaber, F. and Argos, P. (1993) Improved strategy in
analytic surface calculation for molecular systems: handling of
singularities and computational efficiency. J. comput. Chem. 14,
1272-1280.
4. Eisenhaber, F., Lijnzaad, P., Argos, P., Sander, C., and Scharf,
M. (1995) The double cubic lattice method: efficient approaches
to numerical integration of surface area and volume and to dot
surface contouring of molecular assemblies. J. comput. Chem. 16,
273-284.
5. Bernstein, F.C., Koetzle, T.F., Williams, G.J., Meyer, E.F.,
Brice, M.D., Rodgers, J.R., Kennard, O., Shimanouchi, T., and
Tasumi, M. (1977) The protein data bank: a computer-based
archival file for macromolecular structures. J. Mol. Biol. 112,
535-542.
6. Kraulis, P.J. (1991) MOLSCRIPT: a program to produce both
detailed and schematic plots of protein structures. J. Appl.
Cryst. 24, 946-950.
7. Pearson, W.R. (1990) Rapid and sensitive sequence comparison
with FASTP and FASTA. Methods. Enzymol. 183, 63-98.